TY - GEN
T1 - Generalization Error Bounds for Graph Embedding Using Negative Sampling: Linear vs Hyperbolic
AU - Suzuki, Atsushi
AU - Nitanda, Atsushi
AU - Wang, Jing
AU - Xu, Linchuan
AU - Yamanishi, Kenji
AU - Cavazza, Marc
N1 - Funding Information:
This work was partially supported by JST KAKENHI 191400000190, 19K20337, and JST-AIP JPMJCR19U4, Daiwa Foundation Award (Ref: 13849/14682), and JST-PRESTO.
Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021/9
Y1 - 2021/9
N2 - Graph embedding, which represents real-world entities in a mathematical space, has enabled numerous applications such as analyzing natural languages, social networks, biochemical networks, and knowledge bases. It has been experimentally shown that graph embedding in hyperbolic space can represent hierarchical tree-like data more effectively than embedding in linear space, owing to hyperbolic space's exponential growth property. However, since the theoretical comparison has been limited to ideal noiseless settings, the potential for the hyperbolic space's property to worsen the generalization error for practical data has not been analyzed. In this paper, we provide a generalization error bound applicable for graph embedding both in linear and hyperbolic spaces under various negative sampling settings that appear in graph embedding. Our bound states that error is polynomial and exponential with respect to the embedding space's radius in linear and hyperbolic spaces, respectively, which implies that hyperbolic space's exponential growth property worsens the error. Using our bound, we clarify the data size condition on which graph embedding in hyperbolic space can represent a tree better than in Euclidean space by discussing the bias-variance trade-off. Our bound also shows that imbalanced data distribution, which often appears in graph embedding, can worsen the error.
AB - Graph embedding, which represents real-world entities in a mathematical space, has enabled numerous applications such as analyzing natural languages, social networks, biochemical networks, and knowledge bases. It has been experimentally shown that graph embedding in hyperbolic space can represent hierarchical tree-like data more effectively than embedding in linear space, owing to hyperbolic space's exponential growth property. However, since the theoretical comparison has been limited to ideal noiseless settings, the potential for the hyperbolic space's property to worsen the generalization error for practical data has not been analyzed. In this paper, we provide a generalization error bound applicable for graph embedding both in linear and hyperbolic spaces under various negative sampling settings that appear in graph embedding. Our bound states that error is polynomial and exponential with respect to the embedding space's radius in linear and hyperbolic spaces, respectively, which implies that hyperbolic space's exponential growth property worsens the error. Using our bound, we clarify the data size condition on which graph embedding in hyperbolic space can represent a tree better than in Euclidean space by discussing the bias-variance trade-off. Our bound also shows that imbalanced data distribution, which often appears in graph embedding, can worsen the error.
UR - http://www.scopus.com/inward/record.url?scp=85131817845&partnerID=8YFLogxK
M3 - Conference article published in proceeding or book
AN - SCOPUS:85131817845
T3 - Advances in Neural Information Processing Systems
SP - 1243
EP - 1255
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -