One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification

Xiaoqin Chang, Sophia Yat Mei Lee, Suyang Zhu, Shoushan Li, Guodong Zhou

Research output: Journal article publicationConference articleAcademic researchpeer-review

4 Citations (Scopus)

Abstract

Knowledge distillation is an effective method to transfer knowledge from a large pre-trained teacher model to a compacted student model. However, in previous studies, the distilled student models are still large and remain impractical in highly speed-sensitive systems (e.g., an IR system). In this study, we aim to distill a deep pre-trained model into an extremely compacted shallow model like CNN. Specifically, we propose a novel one-teacher and multiple-student knowledge distillation approach to distill a deep pre-trained teacher model into multiple shallow student models with ensemble learning1. Moreover, we leverage large-scale unlabeled data to improve the performance of students. Empirical studies on three sentiment classification tasks demonstrate that our approach achieves better results with much fewer parameters (0.9%-18%) and extremely high speedup ratios (100X-1000X).

Original languageEnglish
Pages (from-to)7042-7052
Number of pages11
JournalProceedings - International Conference on Computational Linguistics, COLING
Volume29
Issue number1
Publication statusPublished - Oct 2022
Event29th International Conference on Computational Linguistics, COLING 2022 - Gyeongju, Korea, Republic of
Duration: 12 Oct 202217 Oct 2022

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'One-Teacher and Multiple-Student Knowledge Distillation on Sentiment Classification'. Together they form a unique fingerprint.

Cite this