TY - GEN
T1 - Empirical Analysis of Beam Search Curse and Search Errors with Model Errors in Neural Machine Translation
AU - He, Jianfei
AU - Sun, Shichao
AU - Jia, Xiaohua
AU - Li, Wenjie
N1 - Publisher Copyright:
© 2023 The authors. This article is licensed under a Creative Commons 4.0 licence, no derivative works, attribution, CC-BY-ND.
PY - 2023
Y1 - 2023
N2 - Beam search is the most popular decoding method for Neural Machine Translation (NMT) and is still a strong baseline compared with the newly proposed sampling-based methods. To better understand the beam search, we investigate its two well-recognized issues, beam search curse and search error, not only on the test data as a whole but also at the sentence level. We find that only less than 30% of sentences in the WMT17 En–De and De–En test set experience these issues. Meanwhile, there is a related phenomenon. For the majority of sentences, their gold references get lower probabilities than the predictions from the beam search. We also test with different levels of model errors including a special test using training samples and models without regularization. In this test, the model has an accuracy of 95% in predicting the tokens on the training data. We find that these phenomena still exist even for such a model with very high accuracy. These findings show that it is not promising to improve the beam search by seeking higher probabilities and further reducing the search errors in decoding. The relationship between the quality and the probability at the sentence level in our results provides useful information to find new ways to improve NMT.
AB - Beam search is the most popular decoding method for Neural Machine Translation (NMT) and is still a strong baseline compared with the newly proposed sampling-based methods. To better understand the beam search, we investigate its two well-recognized issues, beam search curse and search error, not only on the test data as a whole but also at the sentence level. We find that only less than 30% of sentences in the WMT17 En–De and De–En test set experience these issues. Meanwhile, there is a related phenomenon. For the majority of sentences, their gold references get lower probabilities than the predictions from the beam search. We also test with different levels of model errors including a special test using training samples and models without regularization. In this test, the model has an accuracy of 95% in predicting the tokens on the training data. We find that these phenomena still exist even for such a model with very high accuracy. These findings show that it is not promising to improve the beam search by seeking higher probabilities and further reducing the search errors in decoding. The relationship between the quality and the probability at the sentence level in our results provides useful information to find new ways to improve NMT.
UR - http://www.scopus.com/inward/record.url?scp=85184828700&partnerID=8YFLogxK
M3 - Conference article published in proceeding or book
AN - SCOPUS:85184828700
T3 - Proceedings of the 24th Annual Conference of the European Association for Machine Translation, EAMT 2023
SP - 91
EP - 101
BT - Proceedings of the 24th Annual Conference of the European Association for Machine Translation, EAMT 2023
A2 - Nurminen, Mary
A2 - Nurminen, Mary
A2 - Brenner, Judith
A2 - Koponen, Maarit
A2 - Latomaa, Sirkku
A2 - Mikhailov, Mikhail
A2 - Schierl, Frederike
A2 - Ranasinghe, Tharindu
A2 - Vanmassenhove, Eva
A2 - Vidal, Sergi Alvarez
A2 - Aranberri, Nora
A2 - Nunziatini, Mara
A2 - Escartin, Carla Parra
A2 - Forcada, Mikel
A2 - Popovic, Maja
A2 - Scarton, Carolina
A2 - Moniz, Helena
PB - European Association for Machine Translation
T2 - 24th Annual Conference of the European Association for Machine Translation, EAMT 2023
Y2 - 12 June 2023 through 15 June 2023
ER -