TY - GEN
T1 - On the analysis and evaluation of prosody conversion techniques
AU - Sisman, Berrak
AU - Lee, Grandee
AU - Li, Haizhou
AU - Tan, Kay Chen
N1 - Funding Information:
This research is supported by Ministry of Education, Singapore AcRF Tier 1 NUS Start-up Grant FY2016. Berrak Sisman is also funded by SINGA Scholarship under A*STAR Graduate Academy.
Publisher Copyright:
© 2017 IEEE.
PY - 2018/2/21
Y1 - 2018/2/21
N2 - Voice conversion is a process of modifying the characteristics of source speaker such as spectrum or/and prosody, to sound as if it was spoken by another speaker. In this paper, we study the evaluation of prosody transformation, in particular, the evaluation of Fundamental Frequency (F0) conversion. F0 is an essential prosody feature that should be taken care of in a compressive voice conversion framework. So far, the evaluation of the converted prosody features is performed mainly by looking at Pearson Correlation Coefficient and Root Mean Square Error (RMSE). Unfortunately, these techniques do not explicitly measure the F0 alignment between the source and target signals. We believe that an evaluation measure that takes into account the time alignment of F0 is needed to provide a new perspective. Therefore, in this paper, we study a new technique to assess the accuracy of prosody transformation. In our experiments with different prosody transformation techniques, we report that the proposed evaluation approach achieves consistent results with the baseline evaluation metrics.
AB - Voice conversion is a process of modifying the characteristics of source speaker such as spectrum or/and prosody, to sound as if it was spoken by another speaker. In this paper, we study the evaluation of prosody transformation, in particular, the evaluation of Fundamental Frequency (F0) conversion. F0 is an essential prosody feature that should be taken care of in a compressive voice conversion framework. So far, the evaluation of the converted prosody features is performed mainly by looking at Pearson Correlation Coefficient and Root Mean Square Error (RMSE). Unfortunately, these techniques do not explicitly measure the F0 alignment between the source and target signals. We believe that an evaluation measure that takes into account the time alignment of F0 is needed to provide a new perspective. Therefore, in this paper, we study a new technique to assess the accuracy of prosody transformation. In our experiments with different prosody transformation techniques, we report that the proposed evaluation approach achieves consistent results with the baseline evaluation metrics.
KW - Prosody evaluation
KW - prosody transformation
KW - voice conversion
UR - http://www.scopus.com/inward/record.url?scp=85046644126&partnerID=8YFLogxK
U2 - 10.1109/IALP.2017.8300542
DO - 10.1109/IALP.2017.8300542
M3 - Conference article published in proceeding or book
AN - SCOPUS:85046644126
T3 - Proceedings of the 2017 International Conference on Asian Language Processing, IALP 2017
SP - 44
EP - 47
BT - Proceedings of the 2017 International Conference on Asian Language Processing, IALP 2017
A2 - Tong, Rong
A2 - Dong, Minghui
A2 - Lu, Yanfeng
A2 - Zhang, Yue
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st International Conference on Asian Language Processing, IALP 2017
Y2 - 5 December 2017 through 7 December 2017
ER -