TY - GEN
T1 - Scalable and Effective Bipartite Network Embedding
AU - Yang, Renchi
AU - Shi, Jieming
AU - Huang, Keke
AU - Xiao, Xiaokui
N1 - Funding Information:
We thank the anonymous reviewers for their valuable feedback. This work was supported by the the Ministry of Education Singapore (Number MOE2018-T2-2-091). Jieming Shi is supported by Hong Kong RGC ECS (No. 25201221) and PolyU Start-up Fund (P0033898). Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies and organizations.
Publisher Copyright:
© 2022 ACM.
PY - 2022/6/10
Y1 - 2022/6/10
N2 - Given a bipartite graph G consisting of inter-set weighted edges connecting the nodes in two disjoint sets U and V, bipartite network embedding (BNE) maps each node ui in U and vj in V to compact embedding vectors that capture the hidden topological features surrounding the nodes, to facilitate downstream tasks. Effective BNE should preserve not only the direct connections between nodes but also the multi-hop relationships formed alternately by the two types of nodes in G, which can incur prohibitive overheads, especially on massive bipartite graphs with millions of nodes and billions of edges. Existing solutions are hardly scalable to massive bipartite graphs, and often produce low-quality results. This paper proposes GEBE, a generic BNE framework achieving state-of-the-art performance on massive bipartite graphs, via four main algorithmic designs. First, we present two generic measures to capture the multi-hop similarity/proximity between homogeneous/heterogeneous nodes respectively, and the measures can be instantiated with three popular probability distributions, including Poisson, Geometric, and Uniform distributions. Second, GEBE formulates a novel and unified BNE objective to preserve the two measures of all possible node pairs. Third, GEBE includes several efficiency designs to get high-quality embeddings on massive graphs. Finally, we observe that GEBE achieves the best performance when instantiating MHS and MHP using a Poisson distribution, and thus, we further develop GEBEp based on Poisson-instantiated MHS and MHP, with non-trivial efficiency optimizations. Extensive experiments, comparing 15 competitors on 10 real datasets, demonstrate that our solutions, especially GEBEp, obtain superior result utility than all competitors for top-N recommendation and link prediction, while being up to orders of magnitude faster.
AB - Given a bipartite graph G consisting of inter-set weighted edges connecting the nodes in two disjoint sets U and V, bipartite network embedding (BNE) maps each node ui in U and vj in V to compact embedding vectors that capture the hidden topological features surrounding the nodes, to facilitate downstream tasks. Effective BNE should preserve not only the direct connections between nodes but also the multi-hop relationships formed alternately by the two types of nodes in G, which can incur prohibitive overheads, especially on massive bipartite graphs with millions of nodes and billions of edges. Existing solutions are hardly scalable to massive bipartite graphs, and often produce low-quality results. This paper proposes GEBE, a generic BNE framework achieving state-of-the-art performance on massive bipartite graphs, via four main algorithmic designs. First, we present two generic measures to capture the multi-hop similarity/proximity between homogeneous/heterogeneous nodes respectively, and the measures can be instantiated with three popular probability distributions, including Poisson, Geometric, and Uniform distributions. Second, GEBE formulates a novel and unified BNE objective to preserve the two measures of all possible node pairs. Third, GEBE includes several efficiency designs to get high-quality embeddings on massive graphs. Finally, we observe that GEBE achieves the best performance when instantiating MHS and MHP using a Poisson distribution, and thus, we further develop GEBEp based on Poisson-instantiated MHS and MHP, with non-trivial efficiency optimizations. Extensive experiments, comparing 15 competitors on 10 real datasets, demonstrate that our solutions, especially GEBEp, obtain superior result utility than all competitors for top-N recommendation and link prediction, while being up to orders of magnitude faster.
KW - bipartite graphs
KW - network embedding
KW - Poisson distribution
UR - http://www.scopus.com/inward/record.url?scp=85132783145&partnerID=8YFLogxK
U2 - 10.1145/3514221.3517838
DO - 10.1145/3514221.3517838
M3 - Conference article published in proceeding or book
AN - SCOPUS:85132783145
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 1977
EP - 1991
BT - SIGMOD 2022 - Proceedings of the 2022 International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
Y2 - 12 June 2022 through 17 June 2022
ER -