TY - GEN
T1 - Comparison of Supervector and Majority Voting in Acoustic Scene Identification
AU - Jiang, Yuechi
AU - Leung, Frank H.F.
PY - 2018/11/19
Y1 - 2018/11/19
N2 - Acoustic scene identification aims to identify the acoustic environment from the acoustic signal. Usually one first divides a piece of acoustic signal into multiple short-time frames and then calculates frame-level features. A natural question is then how to make use of these frame-level features for identification purposes. In this paper, we compare two feature aggregation methods. One method is Majority Voting (MV), which treats each frame-level feature as an independent feature vector and then perform identification using majority voting strategies. In this way, an acoustic signal is represented by multiple feature vectors. The other method is Supervector, which maps the frame-level features to a single feature vector. In this way, an acoustic signal is represented by one feature vector. Particularly, we consider three types of Supervector, which are Gaussian Supervector, Factor Analysis Supervector, and i-vector. We then compare Supervector with MV in an acoustic identification task. Different classifiers are employed, including Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Deep Neural Network (DNN). Experimental results indicate that these two feature aggregation methods give very similar performances, nonetheless, each has its own advantages and disadvantages.
AB - Acoustic scene identification aims to identify the acoustic environment from the acoustic signal. Usually one first divides a piece of acoustic signal into multiple short-time frames and then calculates frame-level features. A natural question is then how to make use of these frame-level features for identification purposes. In this paper, we compare two feature aggregation methods. One method is Majority Voting (MV), which treats each frame-level feature as an independent feature vector and then perform identification using majority voting strategies. In this way, an acoustic signal is represented by multiple feature vectors. The other method is Supervector, which maps the frame-level features to a single feature vector. In this way, an acoustic signal is represented by one feature vector. Particularly, we consider three types of Supervector, which are Gaussian Supervector, Factor Analysis Supervector, and i-vector. We then compare Supervector with MV in an acoustic identification task. Different classifiers are employed, including Gaussian Mixture Model (GMM), Support Vector Machine (SVM), Multilayer Perceptron (MLP), and Deep Neural Network (DNN). Experimental results indicate that these two feature aggregation methods give very similar performances, nonetheless, each has its own advantages and disadvantages.
KW - acoustic scene identification
KW - factor analysis supervector
KW - Gaussian supervector
KW - i-vector
KW - majority voting
UR - http://www.scopus.com/inward/record.url?scp=85062793059&partnerID=8YFLogxK
U2 - 10.1109/ICDSP.2018.8631624
DO - 10.1109/ICDSP.2018.8631624
M3 - Conference article published in proceeding or book
AN - SCOPUS:85062793059
T3 - International Conference on Digital Signal Processing, DSP
BT - 2018 IEEE 23rd International Conference on Digital Signal Processing, DSP 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on Digital Signal Processing, DSP 2018
Y2 - 19 November 2018 through 21 November 2018
ER -