Abstract
Distant supervision is an efficient way to generate large-scale training data for relation extraction without human efforts. However, the accompanying challenges have been plaguing the advance of the extractor: (1) the automatically annotated labels for training data contain much noisy data and hurt the performance of the extractor; (2) the annotations, based on bag-level (cluster of sentences) instead of sentence-level (single sentence), are too coarse to train an accurate extractor; (3) hetergeneous sentences are hard for a denoising model to capture the underlying commonness among valid relational expressions. To address these issues, we bulid a novel sentence representation and craft reinforcement learning to select the expressive sentence for each relation mentioned in a bag. More specifically, we introduce entity-free sentence pattern incorporated with attentive type information. Furthermore, multiple interactions between entity-specific and entity-free representation are proposed to generate complementary sentence features (for challenge 3). Then we design a fine-grained reward function, and model the sentence selection process as an auction where different relations for a bag need to compete together to achieve the possession of a specific sentence based on its expressiveness. In this way, our model can be dynamically self-adapted, and eventually implements the accurate one-to-one mapping from a relation label to its chosen expressive sentence, which serves as training instances for the extractor (for challenge 1 and 2). The experimental results on two public datasets demonstrate the superiority of our model compared with current state-of-the-art methods for distantly supervised relation extraction.
| Original language | English |
|---|---|
| Pages (from-to) | 1134-1148 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Knowledge and Data Engineering |
| Volume | 35 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 1 Feb 2023 |
| Externally published | Yes |
Keywords
- distant supervision
- heterogeneous sentences
- multi-instance multi-label
- reinforcement learning
- Relation extraction
ASJC Scopus subject areas
- Information Systems
- Artificial Intelligence