TY - GEN
T1 - Swallow
T2 - 32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018
AU - Zhou, Qihua
AU - Li, Peng
AU - Wang, Kun
AU - Zeng, Deze
AU - Guo, Song
AU - Guo, Minyi
PY - 2018/8/3
Y1 - 2018/8/3
N2 - Big data analytics in datacenters often involves scheduling of data-parallel job, which are bottlenecked by limited bandwidth of datacenter networks. To alleviate the shortage of bandwidth, some existing work has proposed traffic compression to reduce the amount of data transmitted over the network. However, their proposed traffic compression works in a coarse-grained manner at job level, leaving a large optimization space unexplored for further performance improvement. In this paper, we propose a flow-level traffic compression and scheduling system, called Swallow, to accelerate data-intensive applications. Specifically, we target on coflows, which is an elegant abstraction of parallel flows generated by big data jobs. With the objective of minimizing coflow completion time (CCT), we propose a heuristic algorithm called Fastest-Volume-Disposal-First (FVDV) and implement Swallow based on Spark. The results of both trace-driven simulations and real experiments show the superiority of our system, over existing algorithms. Swallow can reduce CCT and job completion time (JCT) by up to 1.47 × and 1.66 × on average, respectively, over the SEBF in Varys, one of the most efficient coflow scheduling algorithms so far. Moreover, with coflow compression, Swallow reduces data traffic by up to 48.41% on average.
AB - Big data analytics in datacenters often involves scheduling of data-parallel job, which are bottlenecked by limited bandwidth of datacenter networks. To alleviate the shortage of bandwidth, some existing work has proposed traffic compression to reduce the amount of data transmitted over the network. However, their proposed traffic compression works in a coarse-grained manner at job level, leaving a large optimization space unexplored for further performance improvement. In this paper, we propose a flow-level traffic compression and scheduling system, called Swallow, to accelerate data-intensive applications. Specifically, we target on coflows, which is an elegant abstraction of parallel flows generated by big data jobs. With the objective of minimizing coflow completion time (CCT), we propose a heuristic algorithm called Fastest-Volume-Disposal-First (FVDV) and implement Swallow based on Spark. The results of both trace-driven simulations and real experiments show the superiority of our system, over existing algorithms. Swallow can reduce CCT and job completion time (JCT) by up to 1.47 × and 1.66 × on average, respectively, over the SEBF in Varys, one of the most efficient coflow scheduling algorithms so far. Moreover, with coflow compression, Swallow reduces data traffic by up to 48.41% on average.
KW - Big Data
KW - Coflow Scheduling
KW - Datacenter Networks
KW - Traffic Compression
UR - http://www.scopus.com/inward/record.url?scp=85052237886&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2018.00060
DO - 10.1109/IPDPS.2018.00060
M3 - Conference article published in proceeding or book
AN - SCOPUS:85052237886
SN - 9781538643686
T3 - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
SP - 505
EP - 514
BT - Proceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 May 2018 through 25 May 2018
ER -