TY - GEN
T1 - Octo: INT8 training with loss-aware compensation and backward quantization for tiny on-device learning
AU - Zhou, Qihua
AU - Guo, Song
AU - Qu, Zhihao
AU - Guo, Jingcai
AU - Xu, Zhenda
AU - Zhang, Jiewei
AU - Guo, Tao
AU - Luo, Boyuan
AU - Zhou, Jingren
N1 - Funding Information:
This research was supported by the funding from Hong Kong RGC Research Impact Fund (RIF) with the Project No. R5060-19 and R5034-18, General Research Fund (GRF) with the Project No. 152221/19E and 15220320/20E, Collaborative Research Fund (CRF) with the Project No. C5026-18G, the National Natural Science Foundation of China (61872310), Shenzhen Science and Technology Innovation Commission (R2020A045), and Fundamental Research Funds for the Central Universities (B210202079).
Publisher Copyright:
© 2021 USENIX Annual Technical Conference. All rights reserved.
PY - 2021/7
Y1 - 2021/7
N2 - On-device learning is an emerging technique to pave the last mile of enabling edge intelligence, which eliminates the limitations of conventional in-cloud computing where dozens of computational capacities and memories are needed. A highperformance on-device learning system requires breaking the constraints of limited resources and alleviating computational overhead. In this paper, we show that employing the 8-bit fixed-point (INT8) quantization in both forward and backward passes over a deep model is a promising way to enable tiny on-device learning in practice. The key to an efficient quantization-aware training method is to exploit the hardwarelevel enabled acceleration while preserving the training quality in each layer. However, off-the-shelf quantization methods cannot handle the on-device learning paradigm of fixed-point processing. To overcome these challenges, we propose a novel INT8 training method, which optimizes the computation of forward and backward passes via the delicately designed Lossaware Compensation (LAC) and Parameterized Range Clipping (PRC), respectively. Specifically, we build a new network component, the compensation layer, to automatically counteract the quantization error of tensor arithmetic. We implement our method in Octo, a lightweight cross-platform system for tiny on-device learning. Evaluation on commercial AI chips shows that Octo holds higher training efficiency over state-of-the-art quantization training methods, while achieving adequate processing speedup and memory reduction over the full-precision training.
AB - On-device learning is an emerging technique to pave the last mile of enabling edge intelligence, which eliminates the limitations of conventional in-cloud computing where dozens of computational capacities and memories are needed. A highperformance on-device learning system requires breaking the constraints of limited resources and alleviating computational overhead. In this paper, we show that employing the 8-bit fixed-point (INT8) quantization in both forward and backward passes over a deep model is a promising way to enable tiny on-device learning in practice. The key to an efficient quantization-aware training method is to exploit the hardwarelevel enabled acceleration while preserving the training quality in each layer. However, off-the-shelf quantization methods cannot handle the on-device learning paradigm of fixed-point processing. To overcome these challenges, we propose a novel INT8 training method, which optimizes the computation of forward and backward passes via the delicately designed Lossaware Compensation (LAC) and Parameterized Range Clipping (PRC), respectively. Specifically, we build a new network component, the compensation layer, to automatically counteract the quantization error of tensor arithmetic. We implement our method in Octo, a lightweight cross-platform system for tiny on-device learning. Evaluation on commercial AI chips shows that Octo holds higher training efficiency over state-of-the-art quantization training methods, while achieving adequate processing speedup and memory reduction over the full-precision training.
UR - http://www.scopus.com/inward/record.url?scp=85111777610&partnerID=8YFLogxK
M3 - Conference article published in proceeding or book
AN - SCOPUS:85111777610
T3 - 2021 USENIX Annual Technical Conference
SP - 365
EP - 380
BT - 2021 USENIX Annual Technical Conference
PB - USENIX Association
T2 - 2021 USENIX Annual Technical Conference, ATC 2021
Y2 - 14 July 2021 through 16 July 2021
ER -