Chemical-induced disease relation extraction with various linguistic features

Jinghang Gu, Longhua Qian, Guodong Zhou

Research output: Journal article publicationJournal articleAcademic researchpeer-review

49 Citations (Scopus)

Abstract

Understanding the relations between chemicals and diseases is crucial in various biomedical tasks such as new drug discoveries and new therapy developments. While manually mining these relations from the biomedical literature is costly and time-consuming, such a procedure is often difficult to keep up-to-date. To address these issues, the BioCreative-V community proposed a challenging task of automatic extraction of chemical-induced disease (CID) relations in order to benefit biocuration. This article describes our work on the CID relation extraction task on the BioCreative-V tasks. We built a machine learning based system that utilized simple yet effective linguistic features to extract relations with maximum entropy models. In addition to leveraging various features, the hypernym relations between entity concepts derived from the Medical Subject Headings (MeSH)-controlled vocabulary were also employed during both training and testing stages to obtain more accurate classification models and better extraction performance, respectively. We demoted relation extraction between entities in documents to relation extraction between entity mentions. In our system, pairs of chemical and disease mentions at both intra- And inter-sentence levels were first constructed as relation instances for training and testing, then two classification models at both levels were trained from the training examples and applied to the testing examples. Finally, we merged the classification results from mention level to document level to acquire final relations between chemicals and diseases. Our system achieved promising F-scores of 60.4% on the development dataset and 58.3% on the test dataset using gold-standard entity annotations, respectively.

Original languageEnglish
Article numberbaw042
JournalDatabase
Volume2016
DOIs
Publication statusPublished - 2016
Externally publishedYes

ASJC Scopus subject areas

  • General Medicine

Fingerprint

Dive into the research topics of 'Chemical-induced disease relation extraction with various linguistic features'. Together they form a unique fingerprint.

Cite this