TY - JOUR
T1 - ASVspoof 2019:a large-scale public database of synthetized, converted and replayed speech
AU - Wang, Xin
AU - Yamagishi, Junichi
AU - Todisco, Massimiliano
AU - Delgado, Héctor
AU - Nautsch, Andreas
AU - Evans, Nicholas
AU - Sahidullah, Md
AU - Vestman, Ville
AU - Kinnunen, Tomi
AU - Lee, Kong Aik
AU - Juvela, Lauri
AU - Alku, Paavo
AU - Peng, Yu Huai
AU - Hwang, Hsin Te
AU - Tsao, Yu
AU - Wang, Hsin Min
AU - Maguer, Sébastien Le
AU - Becker, Markus
AU - Henderson, Fergus
AU - Clark, Rob
AU - Zhang, Yu
AU - Wang, Quan
AU - Jia, Ye
AU - Onuma, Kai
AU - Mushika, Koji
AU - Kaneda, Takashi
AU - Jiang, Yuan
AU - Liu, Li Juan
AU - Wu, Yi Chiao
AU - Huang, Wen Chin
AU - Toda, Tomoki
AU - Tanaka, Kou
AU - Kameoka, Hirokazu
AU - Steiner, Ingmar
AU - Matrouf, Driss
AU - Bonastre, Jean François
AU - Govender, Avashna
AU - Ronanki, Srikanth
AU - Zhang, Jing Xuan
AU - Ling, Zhen Hua
N1 - Publisher Copyright:
© 2020
PY - 2020/11
Y1 - 2020/11
N2 - Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as “presentation attacks.” These vulnerabilities are generally unacceptable and call for spoofing countermeasures or “presentation attack detection” systems. In addition to impersonation, ASV systems are vulnerable to replay, speech synthesis, and voice conversion attacks. The ASVspoof challenge initiative was created to foster research on anti-spoofing and to provide common platforms for the assessment and comparison of spoofing countermeasures. The first edition, ASVspoof 2015, focused upon the study of countermeasures for detecting of text-to-speech synthesis (TTS) and voice conversion (VC) attacks. The second edition, ASVspoof 2017, focused instead upon replay spoofing attacks and countermeasures. The ASVspoof 2019 edition is the first to consider all three spoofing attack types within a single challenge. While they originate from the same source database and same underlying protocol, they are explored in two specific use case scenarios. Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques. Replay spoofing attacks within a physical access (PA) scenario are generated through carefully controlled simulations that support much more revealing analysis than possible previously. Also new to the 2019 edition is the use of the tandem detection cost function metric, which reflects the impact of spoofing and countermeasures on the reliability of a fixed ASV system. This paper describes the database design, protocol, spoofing attack implementations, and baseline ASV and countermeasure results. It also describes a human assessment on spoofed data in logical access. It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona fide utterances even by human subjects. It is expected that the ASVspoof 2019 database, with its varied coverage of different types of spoofing data, could further foster research on anti-spoofing.
AB - Automatic speaker verification (ASV) is one of the most natural and convenient means of biometric person recognition. Unfortunately, just like all other biometric systems, ASV is vulnerable to spoofing, also referred to as “presentation attacks.” These vulnerabilities are generally unacceptable and call for spoofing countermeasures or “presentation attack detection” systems. In addition to impersonation, ASV systems are vulnerable to replay, speech synthesis, and voice conversion attacks. The ASVspoof challenge initiative was created to foster research on anti-spoofing and to provide common platforms for the assessment and comparison of spoofing countermeasures. The first edition, ASVspoof 2015, focused upon the study of countermeasures for detecting of text-to-speech synthesis (TTS) and voice conversion (VC) attacks. The second edition, ASVspoof 2017, focused instead upon replay spoofing attacks and countermeasures. The ASVspoof 2019 edition is the first to consider all three spoofing attack types within a single challenge. While they originate from the same source database and same underlying protocol, they are explored in two specific use case scenarios. Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques. Replay spoofing attacks within a physical access (PA) scenario are generated through carefully controlled simulations that support much more revealing analysis than possible previously. Also new to the 2019 edition is the use of the tandem detection cost function metric, which reflects the impact of spoofing and countermeasures on the reliability of a fixed ASV system. This paper describes the database design, protocol, spoofing attack implementations, and baseline ASV and countermeasure results. It also describes a human assessment on spoofed data in logical access. It was demonstrated that the spoofing data in the ASVspoof 2019 database have varied degrees of perceived quality and similarity to the target speakers, including spoofed data that cannot be differentiated from bona fide utterances even by human subjects. It is expected that the ASVspoof 2019 database, with its varied coverage of different types of spoofing data, could further foster research on anti-spoofing.
KW - Anti-spoofing
KW - ASVspoof challenge
KW - Automatic speaker verification
KW - Biometrics
KW - Countermeasure
KW - Media forensics
KW - Presentation attack
KW - Presentation attack detection
KW - Replay
KW - Text-to-speech synthesis
KW - Voice conversion
UR - https://www.scopus.com/pages/publications/85085554705
U2 - 10.1016/j.csl.2020.101114
DO - 10.1016/j.csl.2020.101114
M3 - Journal article
AN - SCOPUS:85085554705
SN - 0885-2308
VL - 64
JO - Computer Speech and Language
JF - Computer Speech and Language
M1 - 101114
ER -