Representative distance: A new similarity measure for class discovery from gene expression data

Zhiwen Yu, Jia You, Le Li, Hau San Wong, Guoqiang Han

Research output: Journal article publicationJournal articleAcademic researchpeer-review

14 Citations (Scopus)


Similarity measurement is one of the most important stages in the process of cancer discovery from gene expression data. Traditional distance functions, such as the Euclidean distance, the correlation coefficient measure, the cosine distance, and so on, are selected to quantify the similarity between two cancer samples. However, these measures do not take into account the properties of cancer samples and do not consider the relationships among the genes in gene expression data. In order to explore the properties of cancer samples and the relationships among genes, we design a new similarity measure called representative distance (RD) to identify cancer samples in gene expression data. Specifically, RD does not compute the distance between two cancer samples using all the genes, but only calculates the similarity using representative genes selected by the affinity propagation algorithm. Then, a similarity matrix is constructed based on the representative distance. Finally, the spectral clustering algorithm is adopted to partition the similarity matrix, and discover the biological meaningful samples. To our knowledge, this is the first time in which the representative distance is applied to class discovery for gene expression data. Experiments on real cancer datasets indicate that our similarity measure can i) outperform most of the traditional distance measures, ii) identify cancer samples correctly in most of the datasets.
Original languageEnglish
Article number6261551
Pages (from-to)341-351
Number of pages11
JournalIEEE Transactions on Nanobioscience
Issue number4
Publication statusPublished - 17 Dec 2012


  • Cancer discovery
  • distance
  • microarray
  • similarity measure

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Medicine (miscellaneous)
  • Biomedical Engineering
  • Pharmaceutical Science
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Representative distance: A new similarity measure for class discovery from gene expression data'. Together they form a unique fingerprint.

Cite this