TY - GEN
T1 - VoShield: Voice Liveness Detection with Sound Field Dynamics
AU - Yang, Qiang
AU - Cui, Kaiyan
AU - Zheng, Yuanqing
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/8
Y1 - 2023/8
N2 - Voice assistants are widely integrated into a variety of smart devices, enabling users to easily complete daily tasks and even critical operations like online transactions with voice commands. Thus, once attackers replay a secretly-recorded voice command by loudspeakers to compromise users' voice assistants, this operation will cause serious consequences, such as information leakage and property loss. Unfortunately, most voice liveness detection approaches against replay attacks mainly rely on detecting lip motions or subtle physiological features in speech, which are limited within a very short range. In this paper, we propose VoShield to check whether a voice command is from a genuine user or a loudspeaker imposter. VoShield measures sound field dynamics, a feature that changes fast as the human mouths dynamically open and close. In contrast, it would remain rather stable for loudspeakers due to the fixed size. This feature enables VoShield to largely extend the working distance and remain resilient to user locations. Besides, sound field dynamics are extracted from the difference between multiple microphone channels, making this feature robust to voice volume. To evaluate VoShield, we conducted comprehensive experiments with various settings in different working scenarios. The results show that VoShield can achieve a detection accuracy of 98.2% and an Equal Error Rate of 2.0%, which serves as a promising complement to current voice authentication systems for smart devices.
AB - Voice assistants are widely integrated into a variety of smart devices, enabling users to easily complete daily tasks and even critical operations like online transactions with voice commands. Thus, once attackers replay a secretly-recorded voice command by loudspeakers to compromise users' voice assistants, this operation will cause serious consequences, such as information leakage and property loss. Unfortunately, most voice liveness detection approaches against replay attacks mainly rely on detecting lip motions or subtle physiological features in speech, which are limited within a very short range. In this paper, we propose VoShield to check whether a voice command is from a genuine user or a loudspeaker imposter. VoShield measures sound field dynamics, a feature that changes fast as the human mouths dynamically open and close. In contrast, it would remain rather stable for loudspeakers due to the fixed size. This feature enables VoShield to largely extend the working distance and remain resilient to user locations. Besides, sound field dynamics are extracted from the difference between multiple microphone channels, making this feature robust to voice volume. To evaluate VoShield, we conducted comprehensive experiments with various settings in different working scenarios. The results show that VoShield can achieve a detection accuracy of 98.2% and an Equal Error Rate of 2.0%, which serves as a promising complement to current voice authentication systems for smart devices.
KW - Liveness Detection
KW - Microphone Array
KW - Replay Attack
KW - Voice Assistant
UR - http://www.scopus.com/inward/record.url?scp=85160104717&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM53939.2023.10229038
DO - 10.1109/INFOCOM53939.2023.10229038
M3 - Conference article published in proceeding or book
AN - SCOPUS:85160104717
T3 - Proceedings - IEEE INFOCOM
SP - 1
EP - 10
BT - INFOCOM 2023 - IEEE Conference on Computer Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 42nd IEEE International Conference on Computer Communications, INFOCOM 2023
Y2 - 17 May 2023 through 20 May 2023
ER -