Discovering clusters in databases containing mixed continuous and discrete-valued attributes

Chun Chung Chan, Lewis L H Chung

Research output: Journal article publicationConference articleAcademic researchpeer-review

3 Citations (Scopus)

Abstract

Clustering is concerned with the unsupervised discovery of natural grouping of records in a database. Recently, a lot of effort has been put into the development of effective clustering algorithms for spatial data mining. These algorithms are mainly developed to deal with continuous valued attributes. They typically use a distance measure defined in the Euclidean space to determine if two records should be placed in the same or in disjoint clusters. Since such a distance measure cannot be validly defined in non-Euclidean space, these algorithms cannot be used to handle databases that contain records characterized also by discrete qualitative or categorical values. Owing to the fact that there is usually a mixture of both continuous and discrete valued attributes in many real-world databases, it is important that a clustering algorithm should be developed to handle data mining tasks involving them. In this paper, we present such a clustering algorithm. The algorithm can be divided into two phases: a discretization phase and a genetic algorithm-based clustering phase. The first phase involves discretizing continuous attributes into discrete attributes based on the use of an information theoretic measure that minimizes loss of information during such process. These discretized and the discrete attributes are then used in the second phase. In this phase, clustering is carried out using a genetic algorithm. By representing a specific grouping of records in a chromosome and using a weight of evidence measure as a fitness measure to determine if such grouping is meaningful, we present here an effective GA for data clustering. To evaluate the effectiveness of the proposed techniques, we tested it using some real data. The experimental results showed that it is very promising.
Original languageEnglish
Pages (from-to)22-30
Number of pages9
JournalProceedings of SPIE - The International Society for Optical Engineering
Volume3695
Publication statusPublished - 1 Jan 1999
EventProceedings of the 1999 Data Mining and Knowledge Discovery: Theory, Tools, and Technology - Orlando, FL, United States
Duration: 5 Apr 19996 Apr 1999

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Condensed Matter Physics

Cite this