TY - GEN
T1 - Information-oriented evaluation metric for dialogue response generation systems
AU - Liu, Peiqi
AU - Zhong, Sheng Hua
AU - Ming, Zhong
AU - Liu, Yan
PY - 2018/12/13
Y1 - 2018/12/13
N2 - Dialogue response generation system is one of the hot topics in natural language processing, but it is still a long way to go before it can generate human-like dialogues. A good evaluation method will help narrow the gap between the machine and human in dialogue generation. Unfortunately, current evaluation methods cannot measure whether the dialogue response generation system is able to produce high-quality, knowledge-related, and informative dialogues. Aiming to identify and measure the existence of information in dialogues, we propose a novel automatic evaluation metric. By learning from the knowledge representation method in knowledge base, we define the heuristic rules to extract the information triples from dialogue pairs. And we design an information matching method to measure the probability of the existence of information in a dialogue. In experiments, our proposed metric demonstrates its effectiveness in dialogue selection and model evaluation on the Reddit dataset (English) and the Weibo dataset (Chinese).
AB - Dialogue response generation system is one of the hot topics in natural language processing, but it is still a long way to go before it can generate human-like dialogues. A good evaluation method will help narrow the gap between the machine and human in dialogue generation. Unfortunately, current evaluation methods cannot measure whether the dialogue response generation system is able to produce high-quality, knowledge-related, and informative dialogues. Aiming to identify and measure the existence of information in dialogues, we propose a novel automatic evaluation metric. By learning from the knowledge representation method in knowledge base, we define the heuristic rules to extract the information triples from dialogue pairs. And we design an information matching method to measure the probability of the existence of information in a dialogue. In experiments, our proposed metric demonstrates its effectiveness in dialogue selection and model evaluation on the Reddit dataset (English) and the Weibo dataset (Chinese).
KW - Dialogue response generation system
KW - Information oriented evaluation metric
KW - Knowledge base
UR - http://www.scopus.com/inward/record.url?scp=85060802531&partnerID=8YFLogxK
U2 - 10.1109/ICTAI.2018.00122
DO - 10.1109/ICTAI.2018.00122
M3 - Conference article published in proceeding or book
AN - SCOPUS:85060802531
T3 - Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI
SP - 780
EP - 785
BT - Proceedings - 2018 IEEE 30th International Conference on Tools with Artificial Intelligence, ICTAI 2018
PB - IEEE Computer Society
T2 - 30th International Conference on Tools with Artificial Intelligence, ICTAI 2018
Y2 - 5 November 2018 through 7 November 2018
ER -