Abstract
Parameter Server (PS) has been widely used to train a large amount of data on multiple machines in parallel. In parameter server, a critical problem is how to effectively schedule multiple training jobs to minimize the job completion time. Some existing work has proposed methods of setting the number of concurrent workers. However, they do not effectively consider the topology of GPU placement which affects the efficiency of communication. This paper proposes a novel resource-to-time model based on the number of workers and the topology of GPU placement. According to the model, we propose an algorithm called TOPO-PS particularly for topology problem in parameter servers. The algorithm achieves the placement strategy based on graph mapping algorithm. Evaluation under various algorithms evidences the superiority of our algorithm. TOPO-PS yields shorter job completion, by up to 53.48% of that of FIFO and 88.77% of OASIS.
Original language | English |
---|---|
DOIs | |
Publication status | Published - Dec 2019 |
Externally published | Yes |
Event | 2019 IEEE Global Communications Conference, GLOBECOM 2019 - Waikoloa, United States Duration: 9 Dec 2019 → 13 Dec 2019 |
Conference
Conference | 2019 IEEE Global Communications Conference, GLOBECOM 2019 |
---|---|
Country/Territory | United States |
City | Waikoloa |
Period | 9/12/19 → 13/12/19 |
Keywords
- Cloud Computing
- Machine Learning
- Parameter Server
- Scheduling Algorithms
ASJC Scopus subject areas
- Computer Networks and Communications
- Hardware and Architecture
- Information Systems
- Signal Processing
- Information Systems and Management
- Safety, Risk, Reliability and Quality
- Media Technology
- Health Informatics