Fully Decoupled Neural Network Learning Using Delayed Gradients

Huiping Zhuang, Yi Wang, Qinglai Liu, Zhiping Lin

Research output: Journal article publicationJournal articleAcademic researchpeer-review

13 Citations (Scopus)

Abstract

Training neural networks with backpropagation (BP) requires a sequential passing of activations and gradients. This has been recognized as the lockings (i.e., the forward, backward, and update lockings) among modules (each module contains a stack of layers) inherited from the BP. In this brief, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The FDG splits a neural network into multiple modules and trains them independently and asynchronously using different workers (e.g., GPUs). We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. Our theoretical proofs show that the FDG can converge to critical points under certain conditions. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on several benchmark data sets. These experiments show comparable or better results of our approach compared with the state-of-the-art methods in terms of generalization and acceleration. We also show that the FDG is able to train various networks, including extremely deep ones (e.g., ResNet-1202), in a decoupled fashion.

Original languageEnglish
Article number9399673
Pages (from-to)6013-6020
Number of pages8
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume33
Issue number10
DOIs
Publication statusPublished - 9 Apr 2021
Externally publishedYes

Keywords

  • Decoupled learning
  • delayed gradients
  • gradient shrinking (GS)
  • neural network lockings

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Fully Decoupled Neural Network Learning Using Delayed Gradients'. Together they form a unique fingerprint.

Cite this