GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing

Haotian Liu, Bo Tang, Jiashu Zhang, Yangshen Deng, Xiao Yan, Xinying Zheng, Qiaomu Shen, Dan Zeng, Zunyao Mao, Chaozu Zhang, Zhengxin You, Zhihao Wang, Runzhe Jiang, Fang Wang, Man Lung Yiu, Huan Li, Mingji Han, Qian Li, Zhenghai Luo

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

9 Citations (Scopus)

Abstract

As a popular distributed data warehouse system, Apache Hive has been widely used for big data analytics in many organizations. Meanwhile, exploiting the massive parallelism of GPU to accelerate online analytical processing (OLAP) has been extensively explored in the database community. In this paper, we present GHive, which enhances CPU-based Hive via CPU-GPU heterogeneous computing. GHive is designed for the business intelligence applications and provides the same API as Hive for compatibility. To run SQL queries jointly on both CPU and GPU, GHive comes with three key techniques: (i) a novel data model gTable, which is column-based and enables efficient data movement between CPU memory and GPU memory; (ii) a GPU-based operator library Panda, which provides a complete set of SQL operators with extensively optimized GPU implementations; (iii) a hardware-aware MapReduce job placement scheme, which puts jobs judiciously on either GPU or CPU via a cost-based approach. In the experiments, we observe that GHive outperforms Hive in both query processing speed and operating expense on the Star Schema Benchmark (SSB).

Original languageEnglish
Title of host publicationSoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing
PublisherAssociation for Computing Machinery, Inc
Pages158-172
Number of pages15
ISBN (Electronic)9781450394147
DOIs
Publication statusPublished - 7 Nov 2022
Event13th Annual ACM Symposium on Cloud Computing, SoCC 2022 - San Francisco, United States
Duration: 7 Nov 202211 Nov 2022

Publication series

NameSoCC 2022 - Proceedings of the 13th Symposium on Cloud Computing

Conference

Conference13th Annual ACM Symposium on Cloud Computing, SoCC 2022
Country/TerritoryUnited States
CitySan Francisco
Period7/11/2211/11/22

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems
  • Software
  • Computational Theory and Mathematics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing'. Together they form a unique fingerprint.

Cite this