An efficient storage system towards high throughput of concurrent graph processing jobs
Article
Article Title | An efficient storage system towards high throughput of concurrent graph processing jobs |
---|---|
Article Category | Article |
Authors | Zhao, Jin, Jiang, Xinyu, Zhang, Yu, Zhu, Xiaofei, Jin, Hai, Liu, Haikun, Yang, Yun, Zhang, Ji, Wang, Biao and Yu, Ting |
Journal Title | Scientia Sinica Informationis |
Journal Citation | 52 (1), pp. 111-128 |
Number of Pages | 18 |
Year | 2022 |
Publisher | Science China Press., Co. Ltd. |
Place of Publication | China |
ISSN | 1674-7267 |
2095-9486 | |
Web Address (URL) | https://www.sciengine.com/SSI/doi/10.1360/SSI-2021-0020?&trans=true |
Abstract | With the rapidly growing demand of graph analytics, a large number of iterative graph processing jobs are often executed concurrently on the same platform.However, the existing graph processing system is mainly designed to efficiently perform a single graph processing job. Therefore when concurrent jobs are executed on the same underlying graph the system will suffer from high data access cost.To improve the throughput of the concurrent jobs, existing out-of-core concurrent graph processing schemes reduce the data storage and access cost by sharing graph data among these jobs.Due to the power-law property of real-world graphs and the differentiation between the concurrent jobs, existing schemes still suffer from a large amount of unnecessary I/O traffic, because even if the most proportion of the vertices in the static graph partition is inactive or only shared by a few jobs, this partition will still be entirely loaded into the memory for the processing of the concurrent jobs.To solve these problems, we propose an efficient storage system, called GraphDP, to achieve high throughput of concurrent graph processing jobs. It can be integrated into existing out-of-core graph processing systems to reduce the storage and data access overhead.Specifically, GraphDP uses a novel dynamic I/O scheduling strategy, which enables the graph data to be loaded in an optimal I/O mode and effectively reduces the redundant data loaded into the memory and cache.Meanwhile, GraphDP preferentially stores frequently accessed graph data in the memory via an efficient caching mechanism, which further reduces the data access overhead.To demonstrate the effectiveness of GraphDP, we integrate it into three state-of-the-art out-of-core graph processing systems, including GridGraph, GraphChi and X-Stream.Experiment results show that GraphDP improves the throughput of GridGraph, GraphChi and X-Stream by 1.57–2.19 times, 1.86–2.37 times and 1.62–2.21 times, respectively. |
Public Notes | Files associated with this item cannot be displayed due to copyright restrictions. |
Byline Affiliations | Huazhong University of Science and Technology, China |
University of Southern Queensland | |
Zhejiang Lab, China |
https://research.usq.edu.au/item/z02zv/an-efficient-storage-system-towards-high-throughput-of-concurrent-graph-processing-jobs
149
total views1
total downloads13
views this month0
downloads this month