A general embedding framework for heterogeneous information learning in large-scale networks

Xiao Huang, Jundong Li, Na Zou, Xia Hu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

11 Citations (Scopus)

Abstract

Network analysis has been widely applied in many real-world tasks, such as gene analysis and targeted marketing. To extract effective features for these analysis tasks, network embedding automatically learns a low-dimensional vector representation for each node, such that the meaningful topological proximity is well preserved. While the embedding algorithms on pure topological structure have attracted considerable attention, in practice, nodes are often abundantly accompanied with other types of meaningful information, such as node attributes, second-order proximity, and link directionality. A general framework for incorporating the heterogeneous information into network embedding could be potentially helpful in learning better vector representations. However, it remains a challenging task to jointly embed the geometrical structure and a distinct type of information due to the heterogeneity. In addition, the real-world networks often contain a large number of nodes, which put demands on the scalability of the embedding algorithms. To bridge the gap, in this article, we propose a general embedding framework named Heterogeneous Information Learning in Large-scale networks (HILL) to accelerate the joint learning. It enables the simultaneous node proximity assessing process to be done in a distributed manner by decomposing the complex modeling and optimization into many simple and independent sub-problems. We validate the significant correlation between the heterogeneous information and topological structure, and illustrate the generalizability of HILL by applying it to perform attributed network embedding and second-order proximity learning. A variation is proposed for link directionality modeling. Experimental results on real-world networks demonstrate the effectiveness and efficiency of HILL.

Original languageEnglish
Article numbera70
JournalACM Transactions on Knowledge Discovery from Data
Volume12
Issue number6
DOIs
Publication statusPublished - Aug 2018
Externally publishedYes

Keywords

  • Data mining
  • Distributed processing
  • Heterogeneity
  • Network embedding

ASJC Scopus subject areas

  • Computer Science(all)

Cite this