TY - GEN
T1 - Optimizing Graph Partition by Optimal Vertex-Cut: A Holistic Approach
AU - Qu, Wenwen
AU - Zhang, Weixi
AU - Cheng, Ji
AU - Zhang, Chaorui
AU - Han, Wei
AU - Bai, Bo
AU - Zhang, Chen Jason
AU - He, Liang
AU - Wang, Xiaoling
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/7
Y1 - 2023/7
N2 - Graph partitioning is crucial in distributed graph-parallel computing systems, and it is challenging for graph partitioning to optimize the communication cost and load balancing together. Existing state-of-the-art works, such as Powerlyra and TopoX, optimize the load balancing by randomly distributing the edges of high-degree vertices, which inevitably brings a high communication cost that is unbounded. This paper proposes a graph partition model that can minimize communication cost while maximizing load balancing. More specifically, we model the graph partition as the combinatorial design problem. Our proposed model can provide high-quality partition that guarantees that the computing load can be evenly distributed to each worker and minimizes the communication cost with a near-optimal theoretical boundary.Based on the proposed model, we extend the hybrid-cut partitioning algorithm for the power-law graph and propose HCPD, a hybrid-cut partitioning algorithm based on combinatorial design. HCPD uses the proposed model to optimize the load balancing and communication cost simultaneously for high-degree vertices, and assigns the high-degree vertices and their low-degree neighbors to the same workers by label propagation to reduce the overall communication cost. In this way, we partition the low-degree and high-degree vertices holistically and further improve the partition quality, unlike Powerlyra and TopoX, which deal with the two parts independently. Our experiments show that HCPD outperforms Powerlyra on PageRank task by up to 2× faster on real-world power-law graphs with billions of edges.
AB - Graph partitioning is crucial in distributed graph-parallel computing systems, and it is challenging for graph partitioning to optimize the communication cost and load balancing together. Existing state-of-the-art works, such as Powerlyra and TopoX, optimize the load balancing by randomly distributing the edges of high-degree vertices, which inevitably brings a high communication cost that is unbounded. This paper proposes a graph partition model that can minimize communication cost while maximizing load balancing. More specifically, we model the graph partition as the combinatorial design problem. Our proposed model can provide high-quality partition that guarantees that the computing load can be evenly distributed to each worker and minimizes the communication cost with a near-optimal theoretical boundary.Based on the proposed model, we extend the hybrid-cut partitioning algorithm for the power-law graph and propose HCPD, a hybrid-cut partitioning algorithm based on combinatorial design. HCPD uses the proposed model to optimize the load balancing and communication cost simultaneously for high-degree vertices, and assigns the high-degree vertices and their low-degree neighbors to the same workers by label propagation to reduce the overall communication cost. In this way, we partition the low-degree and high-degree vertices holistically and further improve the partition quality, unlike Powerlyra and TopoX, which deal with the two parts independently. Our experiments show that HCPD outperforms Powerlyra on PageRank task by up to 2× faster on real-world power-law graphs with billions of edges.
KW - combinatorial design
KW - graph
KW - hybrid cut
KW - partitioning
UR - http://www.scopus.com/inward/record.url?scp=85167690015&partnerID=8YFLogxK
U2 - 10.1109/ICDE55515.2023.00083
DO - 10.1109/ICDE55515.2023.00083
M3 - Conference article published in proceeding or book
AN - SCOPUS:85167690015
T3 - Proceedings - International Conference on Data Engineering
SP - 1019
EP - 1031
BT - Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023
PB - IEEE Computer Society
T2 - 39th IEEE International Conference on Data Engineering, ICDE 2023
Y2 - 3 April 2023 through 7 April 2023
ER -