Abstract
With an exponentially growing number of graphs from disparate repositories, there is a strong need to analyze a graph database containing an extensive collection of small- or medium-sized data graphs (eg chemical compounds). Although subgraph enumeration and subgraph mining have been proposed to bring insights into a graph database by a set of subgraph structures, they often end up with similar or homogenous topologies, which is undesirable in many graph applications. To address this limitation, we propose the <italic>Top-k Edge-Diversified Patterns Discovery problem</italic> to retrieve a set of subgraphs that cover the maximum number of edges in a database. To efficiently process such query, we present a generic and extensible framework called <inline-formula><tex-math notation="LaTeX">$\textsc {Ted}^+$</tex-math></inline-formula> which achieves a guaranteed approximation ratio to the optimal result. Three optimization strategies are further developed to improve the performance, and a lightweight version called <sc>TedLite</sc> is designed for even larger graph databases. Experimental studies on real-world datasets demonstrate the superiority of <inline-formula><tex-math notation="LaTeX">$\textsc {Ted}^+$</tex-math></inline-formula> to traditional techniques.
Original language | English |
---|---|
Article number | 10241997 |
Pages (from-to) | 1-14 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
DOIs | |
Publication status | Accepted/In press - 6 Sept 2023 |
Keywords
- Computer science
- Data mining
- Databases
- Edge-Diversified Patterns
- Graph Database
- Indexing
- Optimization
- Subgraph Enumeration
- Subgraph Mining
- Topology
- Visualization
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics