Mining gene expression patterns for the discovery of overlapping clusters

Patrick C.H. Ma, Chun Chung Chan

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Many clustering algorithms have been used to identify co-expressed genes in gene expression data. Since proteins typically interact with different groups of proteins in order to serve different biological roles, when responding to different external stimulants, the genes that produce these proteins are expected to co-express with more than one group of genes and therefore belong to more than one cluster. This poses a challenge to existing clustering algorithms as there is a need for overlapping clusters to be discovered in a noisy environment. For this reason, we propose an effective clustering approach, which consists of an initial clustering phase and a second re-clustering phase, in this paper. The proposed approach has several desirable features as follows. It makes use of both local and global information inherent in gene expression data to discover overlapping clusters by computing both a local pairwise similarity measure between gene expression profiles and a global probabilistic measure of interestingness of hidden patterns. When performing re-clustering, the proposed approach is able to distinguish between relevant and irrelevant expression data. In addition, it is able to make explicit the patterns discovered in each cluster for easy interpretation. For performance evaluation, the proposed approach has been tested with both simulated and real expression data sets. Experimental results show that it is able to effectively uncover interesting patterns in noisy gene expression data so that, based on these patterns, overlapping clusters can be discovered and also the expression levels at which each cluster of genes co-expresses under different conditions can be better understood.
Original languageEnglish
Title of host publicationEvolutionary Computation, Machine Learning and Data Mining in Bioinformatics - 6th European Conference, EvoBIO 2008, Proceedings
Pages117-128
Number of pages12
DOIs
Publication statusPublished - 21 Jul 2008
Event6th European Conference on Evolutionary Computation, Machine Learning, and Data Mining in Bioinformatics, EvoBIO 2008 - Naples, Italy
Duration: 26 Mar 200828 Mar 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4973 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th European Conference on Evolutionary Computation, Machine Learning, and Data Mining in Bioinformatics, EvoBIO 2008
Country/TerritoryItaly
CityNaples
Period26/03/0828/03/08

Keywords

  • Bioinformatics
  • Clustering
  • Data mining
  • Gene expression data analysis

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this