A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers

Lin Gu, Deze Zeng, Song Guo, Yong Xiang, Jiankun Hu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

77 Citations (Scopus)


With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.
Original languageEnglish
Article number7070679
Pages (from-to)19-29
Number of pages11
JournalIEEE Transactions on Computers
Issue number1
Publication statusPublished - 1 Jan 2016
Externally publishedYes


  • big data
  • geo-distributed data centers
  • network cost minimization
  • stream processing
  • VM placement

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers'. Together they form a unique fingerprint.

Cite this