TY - JOUR
T1 - Gradient Scheduling with Global Momentum for Asynchronous Federated Learning in Edge Environment
AU - Wang, Haozhao
AU - Li, Ruixuan
AU - Li, Chengjie
AU - Zhou, Pan
AU - Li, Yuhua
AU - Xu, Wenchao
AU - Guo, Song
PY - 2022/3/25
Y1 - 2022/3/25
N2 - Federated Learning has attracted widespread attention in recent years because it allows massive edge nodes to collaboratively train machine learning models without sharing their private datasets. However, these edge nodes are usually heterogeneous in computational capability and statistically different in data distribution, i.e., non-IID, leading to significant performance degradation. Although status quo asynchronous training methods can solve the heterogeneity issue, they cannot prevent the non-IID problem from reducing the convergence rate. In this paper, we propose a novel paradigm that schedules the gradient with partially averaged gradients and applies the global momentum (GSGM) for asynchronous training over non-IID datasets in edge environment. Our key idea is to apply global momentum and partial average on the biased gradients calculated on edge nodes after scheduling, to make the training process stable. Empirical results demonstrate that GSGM can well adapt to different degrees of non-IID data, and bring 20% performance gains in terms of training stability for popular optimization algorithms with enhanced accuracy over Fashion-Mnist and CIFAR-10 datasets.
AB - Federated Learning has attracted widespread attention in recent years because it allows massive edge nodes to collaboratively train machine learning models without sharing their private datasets. However, these edge nodes are usually heterogeneous in computational capability and statistically different in data distribution, i.e., non-IID, leading to significant performance degradation. Although status quo asynchronous training methods can solve the heterogeneity issue, they cannot prevent the non-IID problem from reducing the convergence rate. In this paper, we propose a novel paradigm that schedules the gradient with partially averaged gradients and applies the global momentum (GSGM) for asynchronous training over non-IID datasets in edge environment. Our key idea is to apply global momentum and partial average on the biased gradients calculated on edge nodes after scheduling, to make the training process stable. Empirical results demonstrate that GSGM can well adapt to different degrees of non-IID data, and bring 20% performance gains in terms of training stability for popular optimization algorithms with enhanced accuracy over Fashion-Mnist and CIFAR-10 datasets.
M3 - Journal article
SN - 2327-4662
SP - 1
EP - 13
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
ER -