TY - GEN
T1 - A Fully Distributed Training for Class Incremental Learning in Multihead Networks
AU - Dai, Mingjun
AU - Kong, Yonghao
AU - Zhong, Junpei
AU - Zhang, Shengli
AU - Wang, Hui
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/8/29
Y1 - 2023/8/29
N2 - Due to good elastic scalability, multi-head network is favored in incremental learning (IL). During IL process, the model size of multi-head network continually grows with the increasing number of branches, which makes it difficult to store and train within a single node. To this end, within model parallelism framework, a distributed training architecture together with its pre-requisite is proposed. Based on the assumption that the pre-requisite is satisfied, a distributed training algorithm is proposed. In addition, to avoid the dilemma that prevalent cross-entropy (CE) loss function does not fit distributed setting, a fully distributed cross-entropy (D-CE) loss function is proposed, which avoids information exchange among nodes. Corresponding training based on D-CE is proposed (D-CE-Train). This method avoids model size expansion problem in centralized training. It employs distributed implementation to speed up training, and reduces the interaction between multiple nodes that may significantly slow down the training. A series of experiments verify the effectiveness of the proposed method.
AB - Due to good elastic scalability, multi-head network is favored in incremental learning (IL). During IL process, the model size of multi-head network continually grows with the increasing number of branches, which makes it difficult to store and train within a single node. To this end, within model parallelism framework, a distributed training architecture together with its pre-requisite is proposed. Based on the assumption that the pre-requisite is satisfied, a distributed training algorithm is proposed. In addition, to avoid the dilemma that prevalent cross-entropy (CE) loss function does not fit distributed setting, a fully distributed cross-entropy (D-CE) loss function is proposed, which avoids information exchange among nodes. Corresponding training based on D-CE is proposed (D-CE-Train). This method avoids model size expansion problem in centralized training. It employs distributed implementation to speed up training, and reduces the interaction between multiple nodes that may significantly slow down the training. A series of experiments verify the effectiveness of the proposed method.
KW - class incremental learning
KW - cross-entropy
KW - distributed implementation
KW - multi-head network
UR - https://www.scopus.com/pages/publications/85171629072
U2 - 10.1109/INFOCOMWKSHPS57453.2023.10225999
DO - 10.1109/INFOCOMWKSHPS57453.2023.10225999
M3 - Conference article published in proceeding or book
AN - SCOPUS:85171629072
T3 - IEEE INFOCOM 2023 - Conference on Computer Communications Workshops, INFOCOM WKSHPS 2023
BT - IEEE INFOCOM 2023 - Conference on Computer Communications Workshops, INFOCOM WKSHPS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE INFOCOM Conference on Computer Communications Workshops, INFOCOM WKSHPS 2023
Y2 - 20 May 2023
ER -