Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters

Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

4 Citations (Scopus)

Abstract

Fine-tuning a pre-trained Transformer model (PTM) for speech applications in a parameter-efficient manner offers the dual benefits of reducing memory and leveraging the rich feature representations in massive unlabeled datasets. However, existing parameter-efficient fine-tuning approaches either adapt the classification head or the whole PTM. The former is unsuitable when the PTM is used as a feature extractor, and the latter does not leverage the different degrees of feature abstraction at different Transformer layers. We propose two solutions to address these limitations. First, we apply speaker prompt tuning to update the task-specific embeddings of a PTM. The tuning enhances speaker feature relevance in the speaker embeddings through the cross-attention between prompt and speaker features. Second, we insert adapter blocks into the Transformer encoders and their outputs. This novel arrangement enables the fine-tuned PTM to determine the most suitable layers to extract relevant information for the downstream task. Extensive speaker verification experiments on Voxceleb and CU-MARVEL demonstrate higher parameter efficiency and better model adaptability of the proposed methods than the existing ones.
Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
Pages10751-10755
Number of pages5
ISBN (Electronic)9798350344851
DOIs
Publication statusPublished - Apr 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Keywords

  • Speaker verification
  • Transformer adapter
  • parameter-efficient tuning
  • pre-trained Transformer
  • prompt tuning

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters'. Together they form a unique fingerprint.

Cite this