A graph-based algorithm for mining multi-level patterns in genomic data

Winnie W M Lam, Chun Chung Chan, David K Y Chiu, Andrew K C Wong

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Comparative genomics is concerned with the study of genome structure and function of different species. It can provide useful information for the derivation of evolutionary and functional relationships between genomes. Previous work on genome comparison focuses mainly on comparing the entire genomes for visualization without further analysis. As many interesting patterns may exist between genomes and may lead to the discovering of functional gene segments (groups of genes), we propose an algorithm called Multi-Level Genome Comparison Algorithm (MGC) that can be used to facilitate the analysis of genomes at multi-levels during the comparison process to discover sequential and regional consistency in gene segments. Different genomes may have common sub-sequences that differ from each other due to mutations, lateral gene transfers, gene rearrangements, etc., and these sub-sequences are usually not easily identified. Not all the genes can have a perfect one-to-one matching with each other. It is quite possible for one-to-many or many-to-many ambiguous relationships to exist between them. To perform the tasks effectively, MGC takes such ambiguity into consideration during genome comparison by representing genomes in a graph and then make use of a graph mining algorithm called the Multi-Level Attributed Graph Mining Algorithm (MAGMA) to build a hierarchical multi-level graph structure to facilitate genome comparison. To determine the effectiveness of these proposed algorithms, experiments were performed using intra- and inter-species of Microbial genomes. The results show that the proposed algorithms are able to discover multiple level matching patterns that show the similarities and dissimilarities among different genomes, in addition to confirming the specific role of the genes in the genomes.
Original languageEnglish
Pages (from-to)789-807
Number of pages19
JournalJournal of Bioinformatics and Computational Biology
Volume8
Issue number5
DOIs
Publication statusPublished - 1 Oct 2010

Keywords

  • gene segment
  • Genome comparison
  • graph mining algorithm
  • multi-level

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Cite this