TY - GEN
T1 - MedDG
T2 - 11th CCF International Conference on Natural Language Processing and Chinese Computing, NLPCC 2022
AU - Liu, Wenge
AU - Tang, Jianheng
AU - Cheng, Yi
AU - Li, Wenjie
AU - Zheng, Yefeng
AU - Liang, Xiaodan
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Medical dialogue systems interact with patients to collect symptoms and provide treatment advice. In this task, medical entities (e.g., diseases, symptoms, and medicines) are the most central part of the dialogues. However, existing datasets either do not provide entity annotation or are too small in scale. In this paper, we present MedDG, an entity-centric medical dialogue dataset, where medical entities are annotated with the help of domain experts. It consists of 17,864 Chinese dialogues, 385,951 utterances, and 217,205 entities, at least one magnitude larger than existing entity-annotated datasets. Based on MedDG, we conduct preliminary research on entity-aware medical dialogue generation by implementing several benchmark models. Extensive experiments show that the entity-aware adaptions on the generation models consistently enhance the response quality but there still remains a large space of improvement for future research. The codes and the dataset are released at https://github.com/lwgkzl/MedDG.
AB - Medical dialogue systems interact with patients to collect symptoms and provide treatment advice. In this task, medical entities (e.g., diseases, symptoms, and medicines) are the most central part of the dialogues. However, existing datasets either do not provide entity annotation or are too small in scale. In this paper, we present MedDG, an entity-centric medical dialogue dataset, where medical entities are annotated with the help of domain experts. It consists of 17,864 Chinese dialogues, 385,951 utterances, and 217,205 entities, at least one magnitude larger than existing entity-annotated datasets. Based on MedDG, we conduct preliminary research on entity-aware medical dialogue generation by implementing several benchmark models. Extensive experiments show that the entity-aware adaptions on the generation models consistently enhance the response quality but there still remains a large space of improvement for future research. The codes and the dataset are released at https://github.com/lwgkzl/MedDG.
UR - https://www.scopus.com/pages/publications/85140483225
U2 - 10.1007/978-3-031-17120-8_35
DO - 10.1007/978-3-031-17120-8_35
M3 - Conference article published in proceeding or book
AN - SCOPUS:85140483225
SN - 9783031171192
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 447
EP - 459
BT - Natural Language Processing and Chinese Computing - 11th CCF International Conference, NLPCC 2022, Proceedings
A2 - Lu, Wei
A2 - Huang, Shujian
A2 - Hong, Yu
A2 - Zhou, Xiabing
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 24 September 2022 through 25 September 2022
ER -