TY - GEN
T1 - Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution
AU - Wu, Shiwen
AU - Wu, Qiyu
AU - Dong, Honghua
AU - Hua, Wen
AU - Zhou, Xiaofang
N1 - Publisher Copyright:
© 2023, VLDB Endowment. All rights reserved.
PY - 2023/11
Y1 - 2023/11
N2 - Entity resolution (ER) approaches typically consist of a blocker and a matcher. They share the same goal and cooperate in different roles: the blocker first quickly removes obvious non-matches, and the matcher subsequently determines whether the remaining pairs refer to the same real-world entity. Despite the state-of-the-art performance achieved by deep learning methods in ER, these techniques often rely on a large amount of labeled data for training, which can be challenging or costly to obtain. Thus, there is a need to develop effective ER systems under low-resource settings. In this work, we propose an end-to-end iterative Co-learning framework for ER, aimed at jointly training the blocker and the matcher by leveraging their cooperative relationship. In particular, we let the blocker and the matcher share their learned knowledge with each other via iteratively updated pseudo labels, which broaden the supervision signals. To mitigate the impact of noise in pseudo labels, we develop optimization techniques from three aspects: label generation, label selection and model training. Through extensive experiments on benchmark datasets, we demonstrate that our proposed framework outperforms baselines by an average of 9.13-51.55%. Furthermore, our analysis confirms that our framework achieves mutual benefits between the blocker and the matcher.
AB - Entity resolution (ER) approaches typically consist of a blocker and a matcher. They share the same goal and cooperate in different roles: the blocker first quickly removes obvious non-matches, and the matcher subsequently determines whether the remaining pairs refer to the same real-world entity. Despite the state-of-the-art performance achieved by deep learning methods in ER, these techniques often rely on a large amount of labeled data for training, which can be challenging or costly to obtain. Thus, there is a need to develop effective ER systems under low-resource settings. In this work, we propose an end-to-end iterative Co-learning framework for ER, aimed at jointly training the blocker and the matcher by leveraging their cooperative relationship. In particular, we let the blocker and the matcher share their learned knowledge with each other via iteratively updated pseudo labels, which broaden the supervision signals. To mitigate the impact of noise in pseudo labels, we develop optimization techniques from three aspects: label generation, label selection and model training. Through extensive experiments on benchmark datasets, we demonstrate that our proposed framework outperforms baselines by an average of 9.13-51.55%. Furthermore, our analysis confirms that our framework achieves mutual benefits between the blocker and the matcher.
UR - http://www.scopus.com/inward/record.url?scp=85183595053&partnerID=8YFLogxK
U2 - 10.14778/3632093.3632096
DO - 10.14778/3632093.3632096
M3 - Conference article published in proceeding or book
AN - SCOPUS:85183595053
VL - 17
T3 - Proceedings of the VLDB Endowment
SP - 292
EP - 304
BT - Proceedings of the VLDB Endowment. 2023 Nov 50th International Conference on Very Large Data Bases, VLDB 2024
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
Y2 - 25 August 2024 through 29 August 2024
ER -