Swallow: Joint online scheduling and coflow compression in datacenter networks

Qihua Zhou, Peng Li, Kun Wang, Deze Zeng, Song Guo, Minyi Guo

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

7 Citations (Scopus)

Abstract

Big data analytics in datacenters often involves scheduling of data-parallel job, which are bottlenecked by limited bandwidth of datacenter networks. To alleviate the shortage of bandwidth, some existing work has proposed traffic compression to reduce the amount of data transmitted over the network. However, their proposed traffic compression works in a coarse-grained manner at job level, leaving a large optimization space unexplored for further performance improvement. In this paper, we propose a flow-level traffic compression and scheduling system, called Swallow, to accelerate data-intensive applications. Specifically, we target on coflows, which is an elegant abstraction of parallel flows generated by big data jobs. With the objective of minimizing coflow completion time (CCT), we propose a heuristic algorithm called Fastest-Volume-Disposal-First (FVDV) and implement Swallow based on Spark. The results of both trace-driven simulations and real experiments show the superiority of our system, over existing algorithms. Swallow can reduce CCT and job completion time (JCT) by up to 1.47 × and 1.66 × on average, respectively, over the SEBF in Varys, one of the most efficient coflow scheduling algorithms so far. Moreover, with coflow compression, Swallow reduces data traffic by up to 48.41% on average.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages505-514
Number of pages10
ISBN (Print)9781538643686
DOIs
Publication statusPublished - 3 Aug 2018
Externally publishedYes
Event32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018 - Vancouver, Canada
Duration: 21 May 201825 May 2018

Publication series

NameProceedings - 2018 IEEE 32nd International Parallel and Distributed Processing Symposium, IPDPS 2018

Conference

Conference32nd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2018
CountryCanada
CityVancouver
Period21/05/1825/05/18

Keywords

  • Big Data
  • Coflow Scheduling
  • Datacenter Networks
  • Traffic Compression

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems and Management

Cite this