LRC: Dependency-aware cache management for data analytics clusters

Yinghao Yu, Wei Wang, Jun Zhang, Khaled Ben Letaief

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

43 Citations (Scopus)

Abstract

Memory caches are being aggressively used in today's data-parallel systems such as Spark, Tez, and Piccolo. However, prevalent systems employ rather simple cache management policies - notably the Least Recently Used (LRU) policy - that are oblivious to the application semantics of data dependency, expressed as a directed acyclic graph (DAG). Without this knowledge, memory caching can at best be performed by 'guessing' the future data access patterns based on historical information (e.g., the access recency and/or frequency), which frequently results in inefficient, erroneous caching with low hit ratio and a long response time. In this paper, we propose a novel cache replacement policy, Least Reference Count (LRC), which exploits the application-specific DAG information to optimize the cache management. LRC evicts the cached data blocks whose reference count is the smallest. The reference count is defined, for each data block, as the number of dependent child blocks that have not been computed yet. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, compared with LRU, LRC speeds up typical applications by 60%.

Original languageEnglish
Title of host publicationINFOCOM 2017 - IEEE Conference on Computer Communications
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509053360
DOIs
Publication statusPublished - 1 May 2017
Externally publishedYes
Event2017 IEEE Conference on Computer Communications, INFOCOM 2017 - Atlanta, United States
Duration: 1 May 20174 May 2017

Publication series

NameProceedings - IEEE INFOCOM
ISSN (Print)0743-166X

Conference

Conference2017 IEEE Conference on Computer Communications, INFOCOM 2017
Country/TerritoryUnited States
CityAtlanta
Period1/05/174/05/17

ASJC Scopus subject areas

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'LRC: Dependency-aware cache management for data analytics clusters'. Together they form a unique fingerprint.

Cite this