CoMan: Managing Bandwidth Across Computing Frameworks in Multiplexed Datacenters

Wenxin Li, Deke Guo, Alex X. Liu, Keqiu Li, Heng Qi, Song Guo, Ali Munir, Xiaoyi Tao

Research output: Journal article publicationJournal articleAcademic researchpeer-review

5 Citations (Scopus)

Abstract

Inefficient bandwidth sharing in a datacenter network, between different application frameworks, e.g., MapReduce and Spark, can lead to inelastic and skewed usage of link bandwidth and increased completion times for the applications. Existing work, however, either solely focuses on managing computation and storage resources or controlling only sending/receiving rate at hosts. In this paper, we present CoMan, a solution that provides global in-network bandwidth management in multiplexed data centers, with two goals: improving bandwidth utilization and reducing application completion time. CoMan first designs a novel abstraction of virtual link groups (VLGs) to establish a shared bandwidth resource pool. Based on this pool, CoMan implements a three-level bandwidth allocation model, which enables elastic bandwidth sharing among computing frameworks as well as guarantees network performance for the applications. CoMan further improves the bandwidth utilization by devising a VLG dependency graph and solves an optimization problemto guide the path selection using a 3 2-approximation algorithm. We conduct comprehensive trace-driven simulations as well as small-scale testbed experiments to evaluate the performance of CoMan. Extensive simulation results show that CoMan improves the bandwidth utilization and speeds up the application completion time by up to 2.83× and 6.68×, respectively, compared to the ECMP + ElasticSwitch solution. Our implementation also verifies that CoMan can realistically speed up the application completion times by 2.32× on average.

Original languageEnglish
Pages (from-to)1013-1029
Number of pages17
JournalIEEE Transactions on Parallel and Distributed Systems
Volume29
Issue number5
DOIs
Publication statusPublished - 1 May 2018

Keywords

  • bandwidth management
  • Data-parallel computing frameworks
  • multiplexed datacenter

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this