Classification of heterogeneous gene expression data

B.Y.M. Fung, Vincent To Yee Ng

Research output: Journal article publicationJournal articleAcademic research


Recent advanced technologies in DNA microarray analysis are intensively applied in disease classification, especially for cancer classification. Most recent proposed gene expression classifiers can successfully classify testing samples obtained from the same microarray experiment as training samples with the assumption that the symmetric errors are constant among training and testing samples. However, the classification performance is degraded with heterogeneous testing samples obtained from different microarray experiments. In this paper, we propose the "impact factors" (IFs) to measure the variations between individual classes in training samples and heterogeneous testing samples, and integrate the IFs to classifiers for classification of heterogeneous samples. Two publicly available lung adenocarcinomas gene expression data sets are used in our experiments to demonstrate the effectiveness of the IFs. It shows that, with the integration of the IFs to the Golub and Slonim (GS) and k-nearest neighbors (kNN) classifiers, the classifiers can be further improved on the classification accuracy of heterogeneous samples. Even more, the classification accuracy of the integrated GS classifier is around 90%.
Original languageEnglish
Pages (from-to)69-78
Number of pages10
JournalACM SIGKDD Explorations newsletter
Issue number2
Publication statusPublished - 2003


  • Classification
  • Feature selection
  • Gene expression data
  • Significance analysis of microarrays


Dive into the research topics of 'Classification of heterogeneous gene expression data'. Together they form a unique fingerprint.

Cite this