One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification

Xiaoqin Chang, Sophia Yat Mei Lee, Suyang Zhu, Shoushan Li, Guodong Zhou

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Knowledge distillation is an effective method to transfer knowledge from a large pre-trained teacher model to a compacted student model. However, in previous studies, the distilled student models are still large and remain impractical in highly speed-sensitive systems (e.g., an IR system). In this study, we aim to distill a deep pre-trained model into an extremely compacted shallow model like CNN. Specifically, we propose a novel one-teacher and multiple-student knowledge distillation approach to distill a deep pre-trained teacher model into multiple shallow student models with ensemble learning. Moreover, we leverage large-scale unlabeled data to improve the performance of students. Empirical studies on three sentiment classification tasks demonstrate that our approach achieves better results with much fewer parameters (0.9%-18%) and extremely high speedup ratios (100X-1000X).
Original languageEnglish
Title of host publicationProceedings of the 29th International Conference on Computational Linguistics
EditorsNicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
PublisherAssociation for Computational Linguistics (ACL)
Pages7042–7052
ISBN (Print)2951-2093
Publication statusPublished - Oct 2022
EventThe 29th International Conference on Computational Linguistics - Gyeongju, Korea, Democratic People's Republic of
Duration: 12 Oct 202217 Oct 2022
http://coling2022.org/

Conference

ConferenceThe 29th International Conference on Computational Linguistics
Abbreviated titleCOLING2022
Country/TerritoryKorea, Democratic People's Republic of
CityGyeongju
Period12/10/2217/10/22
Internet address

Fingerprint

Dive into the research topics of 'One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification'. Together they form a unique fingerprint.

Cite this