Domain mismatch is a common problem in speaker verification (SV) and often causes performance degradation. For the system relying on the Gaussian PLDA backend to suppress the channel variability, the performance would be further limited if there is no Gaussianity constraint on the learned embeddings. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) that incorporates an InfoVAE into domain adversarial training (DAT) to reduce domain mismatch and simultaneously meet the Gaussianity requirement of the PLDA backend. Specifically, DAT is applied to produce speaker discriminative and domain-invariant features, while the InfoVAE performs variational regularization on the embedded features so that they follow a Gaussian distribution. Another benefit of the InfoVAE is that it avoids posterior collapse in VAEs by preserving the mutual information between the embedded features and the training set so that extra speaker information can be retained in the features. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the embedded features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
|Number of pages||12|
|Journal||IEEE/ACM Transactions on Audio Speech and Language Processing|
|Publication status||Published - Jun 2020|
- Speaker recognition
- Adversarial learning
- Mutual information
- Variational Autoencoder