TY - GEN
T1 - Active Ensemble Learning for Knowledge Graph Error Detection
AU - Dong, Junnan
AU - Zhang, Qinggang
AU - Huang, Xiao
AU - Tan, Qiaoyu
AU - Zha, Daochen
AU - Zihao, Zhao
N1 - Funding Information:
The authors gratefully acknowledge receipt of the following financial support for the research, authorship, and/or publication of this article. This work was supported in full by the Hong Kong Polytechnic University, Start-up Fund (project number: P0033934).
Publisher Copyright:
© 2023 ACM.
PY - 2023/2/27
Y1 - 2023/2/27
N2 - Knowledge graphs (KGs) could effectively integrate a large number of real-world assertions, and improve the performance of various applications, such as recommendation and search. KG error detection has been intensively studied since real-world KGs inevitably contain erroneous triples. While existing studies focus on developing a novel algorithm dedicated to one or a few data characteristics, we explore advancing KG error detection by assembling a set of state-of-the-art (SOTA) KG error detectors. However, it is nontrivial to develop a practical ensemble learning framework for KG error detection. Existing ensemble learning models heavily rely on labels, while it is expensive to acquire labeled errors in KGs. Also, KG error detection itself is challenging since triples contain rich semantic information and might be false because of various reasons. To this end, we propose to leverage active learning to minimize human efforts. Our proposed framework - KAEL, could effectively assemble a set of off-the-shelf error detection algorithms, by actively using a limited number of manual annotations. It adaptively updates the ensemble learning policy in each iteration based on active queries, i.e., the answers from experts. After all annotation budget is used, KAEL utilizes the trained policy to identify remaining suspicious triples. Experiments on real-world KGs demonstrate that we can achieve significant improvement when applying KAEL to assemble SOTA error detectors. KAEL also outperforms SOTA ensemble learning baselines significantly.
AB - Knowledge graphs (KGs) could effectively integrate a large number of real-world assertions, and improve the performance of various applications, such as recommendation and search. KG error detection has been intensively studied since real-world KGs inevitably contain erroneous triples. While existing studies focus on developing a novel algorithm dedicated to one or a few data characteristics, we explore advancing KG error detection by assembling a set of state-of-the-art (SOTA) KG error detectors. However, it is nontrivial to develop a practical ensemble learning framework for KG error detection. Existing ensemble learning models heavily rely on labels, while it is expensive to acquire labeled errors in KGs. Also, KG error detection itself is challenging since triples contain rich semantic information and might be false because of various reasons. To this end, we propose to leverage active learning to minimize human efforts. Our proposed framework - KAEL, could effectively assemble a set of off-the-shelf error detection algorithms, by actively using a limited number of manual annotations. It adaptively updates the ensemble learning policy in each iteration based on active queries, i.e., the answers from experts. After all annotation budget is used, KAEL utilizes the trained policy to identify remaining suspicious triples. Experiments on real-world KGs demonstrate that we can achieve significant improvement when applying KAEL to assemble SOTA error detectors. KAEL also outperforms SOTA ensemble learning baselines significantly.
KW - ensemble learning
KW - knowledge graph refinement
KW - knowledge graphs
UR - https://www.scopus.com/pages/publications/85145450793
U2 - 10.1145/3539597.3570368
DO - 10.1145/3539597.3570368
M3 - Conference article published in proceeding or book
AN - SCOPUS:85145450793
T3 - WSDM 2023 - Proceedings of the 16th ACM International Conference on Web Search and Data Mining
SP - 877
EP - 885
BT - WSDM 2023 - Proceedings of the 16th ACM International Conference on Web Search and Data Mining
PB - Association for Computing Machinery, Inc
T2 - 16th ACM International Conference on Web Search and Data Mining, WSDM 2023
Y2 - 27 February 2023 through 3 March 2023
ER -