悦月直播免费版app下载 - 悦月直播app大全下载最新版本免费安装软件

基于鋼鐵物流數(shù)據(jù)的索引與查詢技術(shù)研究

  • 打印
  • 收藏
收藏成功


打開文本圖片集

摘要:隨著鋼鐵物流的數(shù)字化轉(zhuǎn)型發(fā)展,鋼鐵物流數(shù)據(jù)的規(guī)模也迎來快速增長,傳統(tǒng)的關(guān)系型數(shù)據(jù)庫已無法滿足海量鋼鐵物流數(shù)據(jù)的存儲(chǔ)與查詢需求.考慮分布式 NoSQL (Not Only Structured Query Language)數(shù)據(jù)庫具有擴(kuò)展簡單、讀寫速度快且成本低的特點(diǎn),本文利用分布式云存儲(chǔ)與 NoSQL 技術(shù),對海量鋼鐵物流數(shù)據(jù)進(jìn)行存儲(chǔ)并構(gòu)建索引,以提高對物流數(shù)據(jù)的存儲(chǔ)能力與查詢性能.首先,利用 Spark 對不同來源的數(shù)據(jù)進(jìn)行關(guān)聯(lián)與融合,再將貨運(yùn)平臺產(chǎn)生的歷史數(shù)據(jù)與實(shí)時(shí)數(shù)據(jù)分級存儲(chǔ)管理;然后,針對鋼鐵運(yùn)輸中主要涉及的3類查詢構(gòu)建時(shí)空索引和屬性索引,實(shí)現(xiàn)對多源物流數(shù)據(jù)的高效查詢;最后,基于鋼鐵物流真實(shí)數(shù)據(jù)的實(shí)驗(yàn)結(jié)果表明,本文所提出的方案在數(shù)據(jù)寫入、存儲(chǔ)和查詢等方面優(yōu)于傳統(tǒng)關(guān)系型數(shù)據(jù)庫的索引查詢方法,能夠有效支撐海量物流數(shù)據(jù)的存儲(chǔ)和查詢.

關(guān)鍵詞:鋼鐵物流數(shù)據(jù); 分布式數(shù)據(jù)庫; 時(shí)空索引; 查詢

中圖分類號: TP311  文獻(xiàn)標(biāo)志碼: A   DOI:10.3969/j.issn.1000-5641.2022.05.016

Indexing and query technology for steel logistic data

ZOU Tao,  QIANRongtao,  MAO Jiali

(School of Data Science and Engineering, East China Normal University, Shanghai 200062, China)

Abstract: With digital transformation and the development of iron and steel logistics, the scale of iron and steel logistic data has rapidly expanded, and traditional relational databases can no longer meet the storage and query needs. Considering that a distributed not only structured query language (NoSQL) database has a simple expansion capability, fast reading and writing speeds, and low cost, in this study, distributed cloud storage and NoSQL technologies are used to store and build indexes for massive steel logistic data, improving the accuracy of the storage capacity and query performance of the logistic data. First, Spark is used to associate and fuse the data from different sources, and then store and manage the historical and real-time data generated by the freight platform in a hierarchical manner. It then builds spatiotemporal and attribute indexes for the three types of queries mainly involved in steel transportation to achieve an efficient query of multi-source logistic data. Finally, the experimental results based on real steel logistic data show that the proposed scheme is superior to traditional relational database methods in terms of data writing, storage, and querying, and can effectively support the storage and querying of massive logistic data.

Keywords: steel logistic data; HBase; spatio-temporal index; query

0 引言

隨著大數(shù)據(jù)時(shí)代的發(fā)展,鋼鐵物流也迎來了數(shù)字化轉(zhuǎn)型,越來越多的網(wǎng)絡(luò)貨運(yùn)平臺應(yīng)運(yùn)而生,由此產(chǎn)生了大規(guī)模的多源物流數(shù)據(jù).這些數(shù)據(jù)作為網(wǎng)絡(luò)貨運(yùn)平臺的基礎(chǔ),為司機(jī)畫像、車貨匹配、車輛調(diào)度以及運(yùn)輸監(jiān)控等關(guān)鍵任務(wù)提供支撐,因此,研究面向物流數(shù)據(jù)的索引與查詢是極為重要的.

然而,使用傳統(tǒng)的關(guān)系型數(shù)據(jù)庫對物流數(shù)據(jù)進(jìn)行管理面臨一系列挑戰(zhàn).首先,貨運(yùn)數(shù)據(jù)類型繁多且來源復(fù)雜.以山東省某鋼鐵集團(tuán)下屬的物流企業(yè)—京創(chuàng)智匯物流科技有限公司(以下簡稱京創(chuàng))為例,在其研發(fā)的網(wǎng)絡(luò)貨運(yùn)平臺上每天產(chǎn)生的數(shù)據(jù)類型包括購買和銷售場景中的訂單數(shù)據(jù),倉儲(chǔ)場景中的庫存數(shù)據(jù),運(yùn)輸場景中的運(yùn)單數(shù)據(jù)和軌跡數(shù)據(jù)等.然而,京創(chuàng)在使用關(guān)系型數(shù)據(jù)庫管理這些數(shù)據(jù)時(shí),對不同場景中的數(shù)據(jù)存儲(chǔ)在不同表中管理,僅在每列數(shù)據(jù)上構(gòu)建索引,并不能直接滿足物流數(shù)據(jù)復(fù)雜查詢的需求.這體現(xiàn)在:①對于不同表中數(shù)據(jù)的連接查詢是最常用的查詢類型之一,京創(chuàng)通過多表連接構(gòu)建寬表的方式來完成連接查詢,這導(dǎo)致了大量的冗余查詢與計(jì)算,從而影響查詢性能;②面對以軌跡數(shù)據(jù)為代表的時(shí)空數(shù)據(jù)的快速更新與動(dòng)態(tài)增長,傳統(tǒng)關(guān)系型數(shù)據(jù)庫由于存儲(chǔ)容量和平臺可擴(kuò)展性的影響,無法很好應(yīng)對海量時(shí)空數(shù)據(jù)的存儲(chǔ)與查詢.例如,隨著單表中軌跡數(shù)據(jù)的增長,僅通過 B 樹索引會(huì)存在大量冗余計(jì)算,使得時(shí)空查詢性能隨著數(shù)據(jù)的增多而降低.而選擇各類 R 樹進(jìn)行索引則存在頻繁插入導(dǎo)致 R 樹分裂,更新索引開銷過大的問題.綜上所述,面向鋼鐵物流數(shù)據(jù),進(jìn)行數(shù)據(jù)融合,并設(shè)計(jì)合理的存儲(chǔ)方式、高效的索引與查詢方案是亟待解決的實(shí)際問題.表1給出該貨運(yùn)平臺貨運(yùn)物流數(shù)據(jù)量情況,可以發(fā)現(xiàn)軌跡數(shù)據(jù)總量最大、相應(yīng)的增長速度也最大.隨著業(yè)務(wù)的增多和對更高質(zhì)量(采樣間隔更低)軌跡的需求,軌跡數(shù)據(jù)的增長速度會(huì)大大提高.而京創(chuàng)現(xiàn)行的管理方式無法很好管理這些仍在快速增長的軌跡數(shù)據(jù)。(剩余15221字)

monitor