A novel approach for discovering overlapping clusters in gene expression data

Patrick C.H. Ma, Chun Chung Chan

Research output: Journal article publicationJournal articleAcademic researchpeer-review

12 Citations (Scopus)

Abstract

Many existing clustering algorithms have been used to identify coexpressed genes in gene expression data. These algorithms are used mainly to partition data in the sense that each gene is allowed to belong only to one cluster. Since proteins typically interact with different groups of proteins in order to serve different biological roles, the genes that produce these proteins are therefore expected to coexpress with more than one group of genes. In other words, some genes are expected to belong to more than one cluster. This poses a challenge to gene expression data clustering as there is a need for overlapping clusters to be discovered in a noisy environment. For this task, we propose an effective information theoretical approach, which consists of an initial clustering phase and a second reclustering phase, in this paper. The proposed approach has been tested with both simulated and real expression data. Experimental results show that it can improve the performances of existing clustering algorithms and is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered.
Original languageEnglish
Article number4785521
Pages (from-to)1803-1809
Number of pages7
JournalIEEE Transactions on Biomedical Engineering
Volume56
Issue number7
DOIs
Publication statusPublished - 1 Jul 2009

Keywords

  • Bioinformatics
  • Data mining
  • Gene expression data clustering
  • Information theory

ASJC Scopus subject areas

  • Biomedical Engineering

Cite this