Abstract
Graph data mining algorithms are increasingly applied to biological graph dataset. However, while existing graph mining algorithms can identify frequently occurring sub-graphs, these do not necessarily represent useful patterns. In this paper, we propose a novel graph mining algorithm, MIGDAC (Mining Graph DAta for Classification), that applies graph theory and an interestingness measure to discover interesting sub-graphs which can be both characterized and easily distinguished from other classes. Applying MIGDAC to the discovery of specific patterns of chemical compounds, we first represent each chemical compound as a graph and transform it into a set of hierarchical graphs. This not only represents more information that traditional formats, it also simplifies the complex graph structures. We then apply MIGDAC to extract a set of class-specific patterns defined in terms of an interestingness threshold and measure with residue analysis. The next step is to use weight of evidence to estimate whether the identified class-specific pattern will positively or negatively characterize a class of drug. Experiments on a drug dataset from the KEGG ligand database show that MIGDAC using hierarchical graph representation greatly improves the accuracy of the traditional frequent graph mining algorithms.
Original language | English |
---|---|
Title of host publication | Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 |
Pages | 321-324 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 1 Dec 2008 |
Event | 2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 - Philadelphia, PA, United States Duration: 3 Nov 2008 → 5 Nov 2008 |
Conference
Conference | 2008 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2008 |
---|---|
Country/Territory | United States |
City | Philadelphia, PA |
Period | 3/11/08 → 5/11/08 |
ASJC Scopus subject areas
- Molecular Biology
- Information Systems
- Biomedical Engineering