Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask

Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

11 Citations (Scopus)

Abstract

Traditional acoustic-phonetic approach makes use of both spectral and phonetic information when comparing the voice of speakers. While phonetic units are not equally informative, the phonetic context of speech plays an important role in speaker verification (SV). In this paper, we propose a neural acoustic-phonetic approach that learns to dynamically assign differentiated weights to spectral features for SV. Such differentiated weights form a phonetic attention mask (PAM). The neural acoustic-phonetic framework consists of two training pipelines, one for SV and another for speech recognition. Through the PAM, we leverage the phonetic information for SV. We evaluate the proposed neural acoustic-phonetic framework on the RSR2015 database Part III corpus, that consists of random digit strings. We show that the proposed framework with PAM consistently outperforms baseline with an equal error rate reduction of 13.45% and 10.20% for female and male data, respectively.

Original languageEnglish
Article number9681187
Pages (from-to)782-786
Number of pages5
JournalIEEE Signal Processing Letters
Volume29
DOIs
Publication statusPublished - Jan 2022
Externally publishedYes

Keywords

  • attention
  • masking
  • phonetic information
  • prompted digit recognition
  • Speaker verification
  • text-dependent

ASJC Scopus subject areas

  • Signal Processing
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask'. Together they form a unique fingerprint.

Cite this