RPC: Representative possible world based consistent clustering algorithm for uncertain data

Han Liu, Xiaotong Zhang, Xianchao Zhang, Qimai Li, Xiao Ming Wu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Clustering uncertain data is an essential task in data mining and machine learning. Possible world based algorithms seem promising for clustering uncertain data. However, there are two issues in existing possible world based algorithms: (1) They rely on all the possible worlds and treat them equally, but some marginal possible worlds may cause negative effects. (2) They do not well utilize the consistency among possible worlds, since they conduct clustering or construct the affinity matrix on each possible world independently. In this paper, we propose a representative possible world based consistent clustering (RPC) algorithm for uncertain data. First, by introducing representative loss and using Jensen–Shannon divergence as the distribution measure, we design a heuristic strategy for the selection of representative possible worlds, thus avoiding the negative effects caused by marginal possible worlds. Second, we integrate a consistency learning procedure into spectral clustering to deal with the representative possible worlds synergistically, thus utilizing the consistency to achieve better performance. Experimental results show that our proposed algorithm outperforms existing algorithms in effectiveness and performs competitively in efficiency.

Original languageEnglish
Pages (from-to)128-137
Number of pages10
JournalComputer Communications
Volume176
DOIs
Publication statusPublished - 1 Aug 2021

Keywords

  • Clustering
  • Consistency learning
  • Possible world
  • Uncertain data

ASJC Scopus subject areas

  • Computer Networks and Communications

Cite this