Distill and Replay for Continual Language Learning

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

28 Citations (Scopus)

Abstract

Accumulating knowledge to tackle new tasks without necessarily forgetting the old ones is a hallmark of human-like intelligence. But the current dominant paradigm of machine learning is still to train a model that works well on static datasets. When learning tasks in a stream where data distribution may fluctuate, fitting on new tasks often leads to forgetting on the previous ones. We propose a simple yet effective framework that continually learns natural language understanding tasks with one model. Our framework distills knowledge and replays experience from previous tasks when fitting on a new task, thus named DnR (distill and replay). The framework is based on language models and can be smoothly built with different language model architectures. Experimental results demonstrate that DnR outperfoms previous state-of-the-art models in continually learning tasks of the same type but from different domains, as well as tasks of radically different types. With the distillation method, we further show that it’s possible for DnR to incrementally compress the model size while still outperforming most of the baselines. We hope that DnR could promote the empirical application of continual language learning, and contribute to building human-level language intelligence minimally bothered by catastrophic forgetting.

Original languageEnglish
Title of host publicationCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference
EditorsDonia Scott, Nuria Bel, Chengqing Zong
PublisherAssociation for Computational Linguistics (ACL)
Pages3569-3579
Number of pages11
ISBN (Electronic)9781952148279
DOIs
Publication statusPublished - Dec 2020
Externally publishedYes
Event28th International Conference on Computational Linguistics, COLING 2020 - Virtual, Online, Spain
Duration: 8 Dec 202013 Dec 2020

Publication series

NameCOLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference

Conference

Conference28th International Conference on Computational Linguistics, COLING 2020
Country/TerritorySpain
CityVirtual, Online
Period8/12/2013/12/20

ASJC Scopus subject areas

  • Computer Science Applications
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Distill and Replay for Continual Language Learning'. Together they form a unique fingerprint.

Cite this