TY - GEN
T1 - LRC
T2 - 2017 IEEE Conference on Computer Communications, INFOCOM 2017
AU - Yu, Yinghao
AU - Wang, Wei
AU - Zhang, Jun
AU - Ben Letaief, Khaled
PY - 2017/5/1
Y1 - 2017/5/1
N2 - Memory caches are being aggressively used in today's data-parallel systems such as Spark, Tez, and Piccolo. However, prevalent systems employ rather simple cache management policies - notably the Least Recently Used (LRU) policy - that are oblivious to the application semantics of data dependency, expressed as a directed acyclic graph (DAG). Without this knowledge, memory caching can at best be performed by 'guessing' the future data access patterns based on historical information (e.g., the access recency and/or frequency), which frequently results in inefficient, erroneous caching with low hit ratio and a long response time. In this paper, we propose a novel cache replacement policy, Least Reference Count (LRC), which exploits the application-specific DAG information to optimize the cache management. LRC evicts the cached data blocks whose reference count is the smallest. The reference count is defined, for each data block, as the number of dependent child blocks that have not been computed yet. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, compared with LRU, LRC speeds up typical applications by 60%.
AB - Memory caches are being aggressively used in today's data-parallel systems such as Spark, Tez, and Piccolo. However, prevalent systems employ rather simple cache management policies - notably the Least Recently Used (LRU) policy - that are oblivious to the application semantics of data dependency, expressed as a directed acyclic graph (DAG). Without this knowledge, memory caching can at best be performed by 'guessing' the future data access patterns based on historical information (e.g., the access recency and/or frequency), which frequently results in inefficient, erroneous caching with low hit ratio and a long response time. In this paper, we propose a novel cache replacement policy, Least Reference Count (LRC), which exploits the application-specific DAG information to optimize the cache management. LRC evicts the cached data blocks whose reference count is the smallest. The reference count is defined, for each data block, as the number of dependent child blocks that have not been computed yet. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, compared with LRU, LRC speeds up typical applications by 60%.
UR - http://www.scopus.com/inward/record.url?scp=85034073995&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2017.8057007
DO - 10.1109/INFOCOM.2017.8057007
M3 - Conference article published in proceeding or book
AN - SCOPUS:85034073995
T3 - Proceedings - IEEE INFOCOM
BT - INFOCOM 2017 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 May 2017 through 4 May 2017
ER -