Deep Spectro-temporal Artifacts for Detecting Synthesized Speech

Xiaohui Liu, Meng Liu, Lin Zhang, Linjuan Zhang, Kai Li, Nan Li, Kong Aik Lee, Longbiao Wang, Jianwu Dang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

2 Citations (Scopus)

Abstract

The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech. With our submitted system, this paper provides an overall assessment of track 1 (Lowquality Fake Audio Detection) and track 2 (Partially Fake Audio Detection). In this paper, spectro-temporal artifacts were detected using raw temporal signals, spectral features, as well as deep embedding features. To address track 1, low-quality data augmentation, domain adaptation via finetuning, and various complementary feature information fusion were aggregated in our system. Furthermore, we analyzed the clustering characteristics of subsystems with different features by visualization method and explained the effectiveness of our proposed greedy fusion strategy. As for track 2, frame transition and smoothing were detected using self-supervised learning structure to capture the manipulation of PF attacks in the time domain. We ranked 4th and 5th in track 1 and track 2, respectively.

Original languageEnglish
Title of host publicationDDAM 2022 - Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages69-75
Number of pages7
ISBN (Electronic)9781450394963
DOIs
Publication statusPublished - 14 Oct 2022
Externally publishedYes
Event1st International Workshop on Deepfake Detection for Audio Multimedia, DDAM 2022 - Lisboa, Portugal
Duration: 14 Oct 2022 → …

Publication series

NameDDAM 2022 - Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia

Conference

Conference1st International Workshop on Deepfake Detection for Audio Multimedia, DDAM 2022
Country/TerritoryPortugal
CityLisboa
Period14/10/22 → …

Keywords

  • Audio Deep Synthesis Detection
  • Domain Adaptation
  • Frame transition
  • Greedy Fusion
  • Self-Supervised Learning
  • Spectro-temporal

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Deep Spectro-temporal Artifacts for Detecting Synthesized Speech'. Together they form a unique fingerprint.

Cite this