MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation

Wenge Liu, Jianheng Tang, Yi Cheng, Wenjie Li, Yefeng Zheng, Xiaodan Liang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

9 Citations (Scopus)

Abstract

Medical dialogue systems interact with patients to collect symptoms and provide treatment advice. In this task, medical entities (e.g., diseases, symptoms, and medicines) are the most central part of the dialogues. However, existing datasets either do not provide entity annotation or are too small in scale. In this paper, we present MedDG, an entity-centric medical dialogue dataset, where medical entities are annotated with the help of domain experts. It consists of 17,864 Chinese dialogues, 385,951 utterances, and 217,205 entities, at least one magnitude larger than existing entity-annotated datasets. Based on MedDG, we conduct preliminary research on entity-aware medical dialogue generation by implementing several benchmark models. Extensive experiments show that the entity-aware adaptions on the generation models consistently enhance the response quality but there still remains a large space of improvement for future research. The codes and the dataset are released at https://github.com/lwgkzl/MedDG.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings
EditorsWei Lu, Shujian Huang, Yu Hong, Xiabing Zhou
PublisherSpringer Science and Business Media Deutschland GmbH
Pages447-459
Number of pages13
ISBN (Print)9783031171192
DOIs
Publication statusPublished - 2022
Event11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022 - Guilin, China
Duration: 24 Sept 202225 Sept 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13551 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022
Country/TerritoryChina
CityGuilin
Period24/09/2225/09/22

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware Medical Dialogue Generation'. Together they form a unique fingerprint.

Cite this