Abstract
When classifying tumors using gene expression data, mining tasks commonly make use of only a single data set. However, classification models based on patterns extracted from a single data set are often not indicative of an entire population and heterogeneous samples subsequently applied to these models may not fit, leading to performance degradation. In short, it is not possible to guarantee that mining results based on a single gene expression data set will be reliable or robust (Miller et al., 2002). This problem can be addressed using classification algorithms capable of handling multiple, heterogeneous gene expression data sets. Apart from improving mining performance, the use of such algorithms would make mining results less sensitive to the variations of different microarray platforms and to experimental conditions embedded in heterogeneous gene expression data sets.
Original language | English |
---|---|
Title of host publication | Encyclopedia of data warehousing and mining |
Publisher | Idea Group Publishing |
Pages | 550-554 |
Number of pages | 5 |
ISBN (Print) | 1591405572, 9781591405573 |
DOIs | |
Publication status | Published - 2005 |