Abstract
In business and industry today, large databases with mixed data types (continuous and categorical) are very common. There are great needs to discover patterns from them for knowledge interpretation and understanding. In the past, for classification, this problem is solved as a discrete data problem by first discretizing the continuous data based on the class-attribute interdependence relationship. However, so far no proper solution exists when class information is unavailable. Hence, important pattern post-processing tasks such as pattern clustering and summarization cannot be applied to mixed-mode data. This paper presents a new method for solving the problem. It is based on two essential concepts. (1) Though class information is absent, yet for a correlated dataset, the attribute with the strongest interdependence with others in the group can be used to drive the discretization of the continuous data. (2) For a large database, correlated attribute groups must first be obtained by attribute clustering before (1) can be applied. Based on (1) and (2), pattern discovery methods are developed for mixed-mode data. Extensive experiments using synthetic and real world data were conducted to validate the usefulness and effectiveness of the proposed method.
Original language | English |
---|---|
Title of host publication | CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops |
Pages | 859-868 |
Number of pages | 10 |
DOIs | |
Publication status | Published - 1 Dec 2010 |
Event | 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 - Toronto, ON, Canada Duration: 26 Oct 2010 → 30 Oct 2010 |
Conference
Conference | 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 |
---|---|
Country/Territory | Canada |
City | Toronto, ON |
Period | 26/10/10 → 30/10/10 |
Keywords
- Attribute clustering
- Data mining
- Mixed mode data
- Mutual information
- Pattern discovery
- Unsupervised discretization
ASJC Scopus subject areas
- General Decision Sciences
- General Business,Management and Accounting