The (un)supervised NMF methods for discovering overlapping communities as well as hubs and outliers in networks

Xiao Wang, Xiaochun Cao, Di Jin, Yixin Cao, Dongxiao He

Research output: Journal article publicationJournal articleAcademic researchpeer-review

7 Citations (Scopus)

Abstract

For its crucial importance in the study of large-scale networks, many researchers devote to the detection of communities in various networks. It is now widely agreed that the communities usually overlap with each other. In some communities, there exist members that play a special role as hubs (also known as leaders), whose importance merits special attention. Moreover, it is also observed that some members of the network do not belong to any communities in a convincing way, and hence recognized as outliers. Failure to detect and exclude outliers will distort, sometimes significantly, the outcome of the detected communities. In short, it is preferable for a community detection method to detect all three structures altogether. This becomes even more interesting and also more challenging when we take the unsupervised assumption, that is, we do not assume the prior knowledge of the number K of communities. Our approach here is to define a novel generative model and formalize the detection of overlapping communities as well as hubs and outliers as an optimization problem on it. When K is given, we propose a normalized symmetric nonnegative matrix factorization algorithm based on Kullback-Leibler (KL) divergence to learn the parameters of the model. Otherwise, by combining KL divergence and prior model on parameters, we introduce another parameter learning method based on Bayesian symmetric nonnegative matrix factorization to learn the parameters of the model, while determining K. Therefore, we present a community detection method arguably in the most general sense, which detects all three structures altogether without prior knowledge of the number of communities. Finally, we test the proposed method on various real-world networks. The experimental results, in contrast to several state-of-art algorithms, indicate its superior performance over other ones in terms of both clustering accuracy and community quality.
Original languageEnglish
Pages (from-to)22-34
Number of pages13
JournalPhysica A: Statistical Mechanics and its Applications
Volume446
DOIs
Publication statusPublished - 15 Mar 2016

Keywords

  • (Bayesian) NMF
  • Hubs
  • Outliers
  • Overlapping community

ASJC Scopus subject areas

  • Statistics and Probability
  • Condensed Matter Physics

Cite this