Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment

Haozhao Wang, Song Guo, Zhihao Qu, Ruixuan Li, Ziming Liu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

18 Citations (Scopus)

Abstract

Communication has been considered as a major bottleneck in large-scale decentralized training systems since participating nodes iteratively exchange large amounts of intermediate data with their neighbors. Although compression techniques like sparsification can significantly reduce the communication overhead in each iteration, errors caused by compression will be accumulated, resulting in a severely degraded convergence rate. Recently, the error compensation method for sparsification has been proposed in centralized training to tolerate the accumulated compression errors. However, the analog technique and the corresponding theory about its convergence in decentralized training are still unknown. To fill in the gap, we design a method named ECSD-SGD that significantly accelerates decentralized training via error-compensated sparsification. The novelty lies in that we identify the component of the exchanging information in each iteration (i.e., the sparsified model update) and make targeted error compensation over the component. Our thorough theoretical analysis shows that ECSD-SGD supports arbitrary sparsification ratio and achieves the same convergence rate as the non-sparsified decentralized training methods. We also conduct extensive experiments on multiple deep learning models to validate our theoretical findings. Results show that ECSD-SGD outperforms all the start-of-the-art sparsified methods in terms of both the convergence speed and the final generalization accuracy.

Original languageEnglish
Article number9442310
Pages (from-to)14-25
Number of pages12
JournalIEEE Transactions on Parallel and Distributed Systems
Volume33
Issue number1
DOIs
Publication statusPublished - 1 Jan 2022

Keywords

  • communication compression
  • decentralized training
  • Distributed machine learning
  • error compensation

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Error-Compensated Sparsification for Communication-Efficient Decentralized Training in Edge Environment'. Together they form a unique fingerprint.

Cite this