Maximal Figure-of-Merit Embedding for Multi-Label Audio Classification

Ivan Kukanov, Ville Hautamaki, Kong Aik Lee

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

6 Citations (Scopus)

Abstract

This work tackles the problem of the domestic audio tagging or environmental sound classification, where one audio recording can contain one or more acoustic events and a recognizer should output all of those tags. A baseline model for this task is a convolutional recurrent neural network (CRNN) with sigmoid output nodes optimized using the binary cross-entropy objective. Traditional error metrics, such as classification error, are not suitable for this type of task. In this work, we show that the maximal figure-of-merit (MFoM) framework helps to separate the multi-label classes in terms of equal error rate (EER). We embed MFoM into the deep learning objective function and gain more than 9% relative improvement, compared to the baseline model with binary cross-entropy.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages136-140
Number of pages5
ISBN (Print)9781538646588
DOIs
Publication statusPublished - 10 Sept 2018
Externally publishedYes
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Conference

Conference2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Country/TerritoryCanada
CityCalgary
Period15/04/1820/04/18

Keywords

  • Audio tagging
  • Deep learning
  • Equal error rate
  • Multi-label classification

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Maximal Figure-of-Merit Embedding for Multi-Label Audio Classification'. Together they form a unique fingerprint.

Cite this