TY - GEN
T1 - OAG-BERT
T2 - 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2022
AU - Liu, Xiao
AU - Yin, Da
AU - Zheng, Jingnan
AU - Zhang, Xingjian
AU - Zhang, Peng
AU - Yang, Hongxia
AU - Dong, Yuxiao
AU - Tang, Jie
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/14
Y1 - 2022/8/14
N2 - Academic Knowledge Services have substantially facilitated the development of human science and technology, providing a plenitude of useful research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand professional contents, hindering deployments in real world. To create a unified backbone language model for various knowledge-intensive academic knowledge mining challenges, based on the world's largest public academic graph Open Academic Graph (OAG), we pre-train an academic language model, namely OAG-BERT, to integrate massive heterogeneous entity knowledge beyond scientific corpora. We develop novel pre-training strategies along with zero-shot inference techniques. OAG-BERT's superior performance on 9 knowledge-intensive academic tasks (including 2 demo applications) demonstrates its qualification to serve as a foundation for academic knowledge services. Its zero-shot capability also offers great potential to mitigate the need of costly annotations. OAG-BERT has been deployed to multiple real-world applications, such as reviewer recommendations for NSFC (National Nature Science Foundation of China) and paper tagging in the AMiner system. All codes and pre-trained models are available via the CogDL.
AB - Academic Knowledge Services have substantially facilitated the development of human science and technology, providing a plenitude of useful research tools. However, many applications highly depend on ad-hoc models and expensive human labeling to understand professional contents, hindering deployments in real world. To create a unified backbone language model for various knowledge-intensive academic knowledge mining challenges, based on the world's largest public academic graph Open Academic Graph (OAG), we pre-train an academic language model, namely OAG-BERT, to integrate massive heterogeneous entity knowledge beyond scientific corpora. We develop novel pre-training strategies along with zero-shot inference techniques. OAG-BERT's superior performance on 9 knowledge-intensive academic tasks (including 2 demo applications) demonstrates its qualification to serve as a foundation for academic knowledge services. Its zero-shot capability also offers great potential to mitigate the need of costly annotations. OAG-BERT has been deployed to multiple real-world applications, such as reviewer recommendations for NSFC (National Nature Science Foundation of China) and paper tagging in the AMiner system. All codes and pre-trained models are available via the CogDL.
KW - heterogeneous knowledge graph
KW - language model
KW - pre-training
UR - https://www.scopus.com/pages/publications/85137142056
U2 - 10.1145/3534678.3539210
DO - 10.1145/3534678.3539210
M3 - Conference article published in proceeding or book
AN - SCOPUS:85137142056
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 3418
EP - 3428
BT - KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 14 August 2022 through 18 August 2022
ER -