M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining

Junyang Lin, Rui Men, An Yang, Chang Zhou, Yichang Zhang, Peng Wang, Jingren Zhou, Jie Tang, Hongxia Yang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

34 Citations (Scopus)

Abstract

Multimodal pretraining has demonstrated success in the downstream tasks of cross-modal representation learning. However, it is limited to the English data, and there is still a lack of large-scale dataset for multimodal pretraining in Chinese. In this work, we propose the largest dataset for pretraining in Chinese, which consists of over 1.9TB images and 292GB texts. The dataset has large coverage over domains, including encyclopedia, question answering, forum discussion, etc. Besides, we propose a method called M6, referring to Multi-Modality-to-Multi-Modality Multitask Mega-transformer, for unified pretraining on the data of single modality and multiple modalities. The model is pretrained with our proposed tasks, including text-to-text transfer, image-to-text transfer, as well as multi-modality-to-text transfer. The tasks endow the model with strong capability of understanding and generation. We scale the model to 10 billion parameters, and build the largest pretrained model in Chinese. Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities, and the 10B-parameter pretrained model demonstrates strong potential in the setting of zero-shot learning.

Original languageEnglish
Title of host publicationKDD 2021 - Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages3251-3261
Number of pages11
ISBN (Electronic)9781450383325
DOIs
Publication statusPublished - 14 Aug 2021
Externally publishedYes
Event27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021 - Virtual, Online, Singapore
Duration: 14 Aug 202118 Aug 2021

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2021
Country/TerritorySingapore
CityVirtual, Online
Period14/08/2118/08/21

Keywords

  • cross-modal understanding and generation
  • large-scale pretraining
  • multi-modal pretraining

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining'. Together they form a unique fingerprint.

Cite this