Abstract
In this paper, we present a fuzzy clustering technique for relational database for data mining task. Clustering task for data mining application can be performed more effective if the technique is able to handle both continuous- and discrete-valued data commonly found in real-life relational databases. However, many of fuzzy clustering techniques such as fuzzy c-means are developed only for continuous-valued data due to their distance measure defined in the Euclidean space. When attributes are also characterized by discrete-valued attribute, they are unable to perform their task. Besides, how to deal with fuzzy input data in addition to mixed continuous and discrete is not clearly discussed. Instead of using a distance measure for defining similarity between records, we propose a technique based on a genetic algorithm (GA). By representing a specific grouping of records in a chromosome and using an objective measure as a fitness measure to determine if such grouping is meaningful and interesting, our technique is able to handle continuous, discrete, and even fuzzy input data. Unlike many of the existing clustering techniques, which can only produce the result of grouping with no interpretation, our proposed algorithm is able to generate a set of rules describing the interestingness of the discovered clusters. This feature, in turn, eases the understandability of the discovered result.
Original language | English |
---|---|
Pages (from-to) | 11-21 |
Number of pages | 11 |
Journal | Proceedings of SPIE - The International Society for Optical Engineering |
Volume | 4057 |
Publication status | Published - 1 Jan 2000 |
Event | Data Mining and Knowledge Discovery: Theory, Tools, and Technology II - Orlando, FL, United States Duration: 24 Apr 2000 → 25 Apr 2000 |
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Condensed Matter Physics