TY - JOUR
T1 - RPC: Representative possible world based consistent clustering algorithm for uncertain data
AU - Liu, Han
AU - Zhang, Xiaotong
AU - Zhang, Xianchao
AU - Li, Qimai
AU - Wu, Xiao Ming
N1 - Funding Information:
The authors are grateful to the editor in chief, the associate editor and the reviewers for their valuable comments and suggestions. This work was supported by National Natural Science Foundation of China (No. 61876028 ), the grants 1-ZVJJ and G-YBXV funded by the Hong Kong Polytechnic University, Hong Kong , and the Fundamental Research Funds for the Central Universities, China (No. DUT20RC(3)040 ).
Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/8/1
Y1 - 2021/8/1
N2 - Clustering uncertain data is an essential task in data mining and machine learning. Possible world based algorithms seem promising for clustering uncertain data. However, there are two issues in existing possible world based algorithms: (1) They rely on all the possible worlds and treat them equally, but some marginal possible worlds may cause negative effects. (2) They do not well utilize the consistency among possible worlds, since they conduct clustering or construct the affinity matrix on each possible world independently. In this paper, we propose a representative possible world based consistent clustering (RPC) algorithm for uncertain data. First, by introducing representative loss and using Jensen–Shannon divergence as the distribution measure, we design a heuristic strategy for the selection of representative possible worlds, thus avoiding the negative effects caused by marginal possible worlds. Second, we integrate a consistency learning procedure into spectral clustering to deal with the representative possible worlds synergistically, thus utilizing the consistency to achieve better performance. Experimental results show that our proposed algorithm outperforms existing algorithms in effectiveness and performs competitively in efficiency.
AB - Clustering uncertain data is an essential task in data mining and machine learning. Possible world based algorithms seem promising for clustering uncertain data. However, there are two issues in existing possible world based algorithms: (1) They rely on all the possible worlds and treat them equally, but some marginal possible worlds may cause negative effects. (2) They do not well utilize the consistency among possible worlds, since they conduct clustering or construct the affinity matrix on each possible world independently. In this paper, we propose a representative possible world based consistent clustering (RPC) algorithm for uncertain data. First, by introducing representative loss and using Jensen–Shannon divergence as the distribution measure, we design a heuristic strategy for the selection of representative possible worlds, thus avoiding the negative effects caused by marginal possible worlds. Second, we integrate a consistency learning procedure into spectral clustering to deal with the representative possible worlds synergistically, thus utilizing the consistency to achieve better performance. Experimental results show that our proposed algorithm outperforms existing algorithms in effectiveness and performs competitively in efficiency.
KW - Clustering
KW - Consistency learning
KW - Possible world
KW - Uncertain data
UR - http://www.scopus.com/inward/record.url?scp=85107549826&partnerID=8YFLogxK
U2 - 10.1016/j.comcom.2021.06.002
DO - 10.1016/j.comcom.2021.06.002
M3 - Journal article
AN - SCOPUS:85107549826
SN - 0140-3664
VL - 176
SP - 128
EP - 137
JO - Computer Communications
JF - Computer Communications
ER -