TY - GEN
T1 - Maximal Figure-of-Merit Embedding for Multi-Label Audio Classification
AU - Kukanov, Ivan
AU - Hautamaki, Ville
AU - Lee, Kong Aik
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - This work tackles the problem of the domestic audio tagging or environmental sound classification, where one audio recording can contain one or more acoustic events and a recognizer should output all of those tags. A baseline model for this task is a convolutional recurrent neural network (CRNN) with sigmoid output nodes optimized using the binary cross-entropy objective. Traditional error metrics, such as classification error, are not suitable for this type of task. In this work, we show that the maximal figure-of-merit (MFoM) framework helps to separate the multi-label classes in terms of equal error rate (EER). We embed MFoM into the deep learning objective function and gain more than 9% relative improvement, compared to the baseline model with binary cross-entropy.
AB - This work tackles the problem of the domestic audio tagging or environmental sound classification, where one audio recording can contain one or more acoustic events and a recognizer should output all of those tags. A baseline model for this task is a convolutional recurrent neural network (CRNN) with sigmoid output nodes optimized using the binary cross-entropy objective. Traditional error metrics, such as classification error, are not suitable for this type of task. In this work, we show that the maximal figure-of-merit (MFoM) framework helps to separate the multi-label classes in terms of equal error rate (EER). We embed MFoM into the deep learning objective function and gain more than 9% relative improvement, compared to the baseline model with binary cross-entropy.
KW - Audio tagging
KW - Deep learning
KW - Equal error rate
KW - Multi-label classification
UR - http://www.scopus.com/inward/record.url?scp=85054262135&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8461396
DO - 10.1109/ICASSP.2018.8461396
M3 - Conference article published in proceeding or book
AN - SCOPUS:85054262135
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 136
EP - 140
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -