Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution

Shiwen Wu, Qiyu Wu, Honghua Dong, Wen Hua, Xiaofang Zhou

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Entity resolution (ER) approaches typically consist of a blocker and a matcher. They share the same goal and cooperate in different roles: the blocker first quickly removes obvious non-matches, and the matcher subsequently determines whether the remaining pairs refer to the same real-world entity. Despite the state-of-the-art performance achieved by deep learning methods in ER, these techniques often rely on a large amount of labeled data for training, which can be challenging or costly to obtain. Thus, there is a need to develop effective ER systems under low-resource settings. In this work, we propose an end-to-end iterative Co-learning framework for ER, aimed at jointly training the blocker and the matcher by leveraging their cooperative relationship. In particular, we let the blocker and the matcher share their learned knowledge with each other via iteratively updated pseudo labels, which broaden the supervision signals. To mitigate the impact of noise in pseudo labels, we develop optimization techniques from three aspects: label generation, label selection and model training. Through extensive experiments on benchmark datasets, we demonstrate that our proposed framework outperforms baselines by an average of 9.13-51.55%. Furthermore, our analysis confirms that our framework achieves mutual benefits between the blocker and the matcher.

Original languageEnglish
Title of host publicationProceedings of the VLDB Endowment. 2023 Nov 50th International Conference on Very Large Data Bases, VLDB 2024
Pages292-304
Number of pages13
Volume17
Edition3
DOIs
Publication statusPublished - Nov 2023
Event50th International Conference on Very Large Data Bases, VLDB 2024 - Guangzhou, China
Duration: 25 Aug 202429 Aug 2024

Publication series

NameProceedings of the VLDB Endowment
PublisherVery Large Data Base Endowment Inc.
ISSN (Print)2150-8097

Conference

Conference50th International Conference on Very Large Data Bases, VLDB 2024
Country/TerritoryChina
CityGuangzhou
Period25/08/2429/08/24

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'Blocker and Matcher Can Mutually Benefit: A Co-Learning Framework for Low-Resource Entity Resolution'. Together they form a unique fingerprint.

Cite this