ran-GJS: Orchestrating data analytics for heterogeneous geo-distributed edges

Yibo Jin, Sheng Zhang, Zhuzhong Qian, Xiaoliang Wang, Song Guo, Sanglu Lu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Many organizations and companies have deployed not only datacenters but also large number of geo-distributed heterogeneous edges to provide fast data analytics services. Since large volume of data transmission across WAN can be costly, existing works mainly focus on pre-processing data in-place to avoid transmission. However, the heterogeneity of edges on either local computing capacity or network bandwidth limits the efficient use on scarce resource, which may result in long task completion time. To cope with dynamic demands on scarce resource, we take the heterogeneity of both computing capacity and network bandwidth of geo-distributed edges into consideration when assigning data analytical tasks and their associated data between the central datacenter and edges such that the overall latency can be reduced. We formulate the geo-distributed data-task joint scheduling problem (GJS), show its NP-hardness, and propose a near-optimal randomized scheduling algorithm (ran-GJS). ran-GJS can be proved concentrated around its optimum value with high probability, i.e., 1-O(e-t2) where t is the concentration bound by using Martingale Analysis. The experimental results obtained form both extensive simulations and Yarn-based prototype show that ran-GJS significantly speeds up the geo-distributed analytics with a gain on average completion time of at least 28% over state-of-the-art baseline algorithms.

Original languageEnglish
Title of host publicationProceedings of the 47th International Conference on Parallel Processing, ICPP 2018
PublisherAssociation for Computing Machinery
ISBN (Print)9781450365109
DOIs
Publication statusPublished - 13 Aug 2018
Event47th International Conference on Parallel Processing, ICPP 2018 - Eugene, United States
Duration: 14 Aug 201816 Aug 2018

Publication series

NameACM International Conference Proceeding Series

Conference

Conference47th International Conference on Parallel Processing, ICPP 2018
CountryUnited States
CityEugene
Period14/08/1816/08/18

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this