A comparison of categorical attribute data clustering methods

Ville Hautamäki, Antti Pöllänen, Tomi Kinnunen, Kong Aik Lee, Haizhou Li, Pasi Fränti

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

11 Citations (Scopus)

Abstract

Clustering data in Euclidean space has a long tradition and there has been considerable attention on analyzing several different cost functions. Unfortunately these result rarely generalize to clustering of categorical attribute data. Instead, a simple heuristic k-modes is the most commonly used method despite its modest performance. In this study, we model clusters by their empirical distributions and use expected entropy as the objective function. A novel clustering algorithm is designed based on local search for this objective function and compared against six existing algorithms on well known data sets. The proposed method provides better clustering quality than the other iterative methods at the cost of higher time complexity.

Original languageEnglish
Title of host publicationStructural, Syntactic, and Statistical Pattern Recognition - Joint IAPR International Workshop, S+SSPR 2014, Proceedings
PublisherSpringer Verlag
Pages53-62
Number of pages10
ISBN (Print)9783662444146
DOIs
Publication statusPublished - 2014
Externally publishedYes
EventJoint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, S+SSPR 2014 - Joensuu, Finland
Duration: 20 Aug 201422 Aug 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8621 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceJoint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, S+SSPR 2014
Country/TerritoryFinland
CityJoensuu
Period20/08/1422/08/14

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'A comparison of categorical attribute data clustering methods'. Together they form a unique fingerprint.

Cite this