Abstract
Data centers are widely used for big data analytics, which often involve data-parallel jobs, including query and web service. Meanwhile, cluster frameworks are rapidly developed for data-intensive applications in data center networks (DCNs). To promote the performance of these frameworks, many efforts have been paid to improve scheduling strategies and resource allocation algorithms. With the deployment of geo-distributed data centers and data-intensive applications, the optimization in DCNs regains pervasive attention in both industry and academia. Many solutions, such as the coflow-aware scheduling and speculative execution, have been proposed to meet various requirements. Therefore, we present a solid starting ground and comprehensive overview in this area to help readers quickly understand state-of-the-art technologies and research progress. We observe that algorithms in cluster frameworks are implemented with different guidelines and can be classified according to scheduling granularity, controller management, and prior-knowledge requirement. In addition, mechanisms for conquering crucial challenges in DCNs are discussed, including providing low latency and minimizing job completion time. Moreover, we analyze desirable properties of fault tolerance and scalability to illuminate the design principles of distributed systems. We hope that this paper will shed light on this promising land and serve as a guide for further researches.
Original language | English |
---|---|
Article number | 8416689 |
Pages (from-to) | 3560-3580 |
Number of pages | 21 |
Journal | IEEE Communications Surveys and Tutorials |
Volume | 20 |
Issue number | 4 |
DOIs | |
Publication status | Published - 1 Oct 2018 |
Keywords
- Big data
- Cluster frameworks
- Coflow
- Data center networks
- Data-parallel jobs
- Distributed systems
- Resource allocation
- Scheduling
ASJC Scopus subject areas
- Electrical and Electronic Engineering