Optimizing nonindexed join processing in flash storage-based systems

Yu Li, Sai Tung On, Jianliang Xu, Byron Choi, Haibo Hu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Flash memory-based disks (or simply flash disks) have been widely used in today's computer systems. With their continuously increasing capacity and dropping price, it is envisioned that some database systems will operate on flash disks in the near future. However, the I/O characteristics of flash disks are different from those of magnetic hard disks. Motivated by this, we study the core of query processing in row-based database systems&join processing&on flash storage media. More specifically, we propose a new framework, called DigestJoin, to optimize nonindexed join processing by reducing the intermediate result size and exploiting fast random reads of flash disks. DigestJoin consists of two phases: 1) projecting the join attributes followed by a join on the projected attributes, and 2) fetching the full tuples that satisfy the join to produce the final join results. While the problem of tuple/page fetching with the minimum I/O cost (in the second phase) is intractable, we propose three heuristic page-fetching strategies for flash disks. We have implemented DigestJoin and conducted extensive experiments on a real flash disk. Our evaluation results based on TPC-H data sets show that DigestJoin clearly outperforms the traditional sort-merge join and hash join under a wide range of system configurations.
Original languageEnglish
Article number6189311
Pages (from-to)1417-1431
Number of pages15
JournalIEEE Transactions on Computers
Volume62
Issue number7
DOIs
Publication statusPublished - 5 Jun 2013
Externally publishedYes

Keywords

  • flash memory
  • joins
  • Query processing
  • relational databases

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this