Probabilistic cluster structure ensemble

Zhiwen Yu, Le Li, Hau San Wong, Jia You, Guoqiang Han, Yunjun Gao, Guoxian Yu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

27 Citations (Scopus)

Abstract

Cluster structure ensemble focuses on integrating multiple cluster structures extracted from different datasets into a unified cluster structure, instead of aligning the individual labels from the clustering solutions derived from multiple homogenous datasets in the cluster ensemble framework. In this article, we design a novel probabilistic cluster structure ensemble framework, referred to as Gaussian mixture model based cluster structure ensemble framework (GMMSE), to identify the most representative cluster structure from the dataset. Specifically, GMMSE first applies the bagging approach to produce a set of variant datasets. Then, a set of Gaussian mixture models are used to capture the underlying cluster structures of the datasets. GMMSE applies K-means to initialize the values of the parameters of the Gaussian mixture model, and adopts the Expectation Maximization approach (EM) to estimate the parameter values of the model. Next, the components of the Gaussian mixture models are viewed as new data samples which are used to construct the representative matrix capturing the relationships among components. The similarity between two components corresponding to their respective Gaussian distributions is measured by the Bhattycharya distance function. Afterwards, GMMSE constructs a graph based on the new data samples and the representative matrix, and searches for the most representative cluster structure. Finally, we also design four criteria to assign the data samples to their corresponding clusters based on the unified cluster structure. The experimental results show that (i) GMMSE works well on synthetic datasets and real datasets in the UCI machine learning repository. (ii) GMMSE outperforms most of the previous cluster ensemble approaches.
Original languageEnglish
Pages (from-to)16-34
Number of pages19
JournalInformation Sciences
Volume267
DOIs
Publication statusPublished - 20 May 2014

Keywords

  • Cluster ensemble
  • Gaussian mixture model
  • Normalized cut
  • Structure ensemble

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this