Skip to main navigation Skip to search Skip to main content

OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services

  • Xiao Liu
  • , Da Yin
  • , Jingnan Zheng
  • , Xingjian Zhang
  • , Peng Zhang
  • , Hongxia Yang
  • , Yuxiao Dong
  • , Jie Tang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Academic Knowledge Services have substantially facilitated the development of human science and technology, providing a plenitude of useful research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand professional contents, hindering deployments in real world. To create a unified backbone language model for various knowledge-intensive academic knowledge mining challenges, based on the world's largest public academic graph Open Academic Graph (OAG), we pre-train an academic language model, namely OAG-BERT, to integrate massive heterogeneous entity knowledge beyond scientific corpora. We develop novel pre-training strategies along with zero-shot inference techniques. OAG-BERT's superior performance on 9 knowledge-intensive academic tasks (including 2 demo applications) demonstrates its qualification to serve as a foundation for academic knowledge services. Its zero-shot capability also offers great potential to mitigate the need of costly annotations. OAG-BERT has been deployed to multiple real-world applications, such as reviewer recommendations for NSFC (National Nature Science Foundation of China) and paper tagging in the AMiner system. All codes and pre-trained models are available via the CogDL.

Original languageEnglish
Title of host publicationKDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages3418-3428
Number of pages11
ISBN (Electronic)9781450393850
DOIs
Publication statusPublished - 14 Aug 2022
Externally publishedYes
Event28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022 - Washington, United States
Duration: 14 Aug 202218 Aug 2022

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
Country/TerritoryUnited States
CityWashington
Period14/08/2218/08/22

Keywords

  • heterogeneous knowledge graph
  • language model
  • pre-training

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services'. Together they form a unique fingerprint.

Cite this