TY - JOUR
T1 - A spiking neural network framework for robust sound classification
AU - Wu, Jibin
AU - Chua, Yansong
AU - Zhang, Malu
AU - Li, Haizhou
AU - Tan, Kay Chen
N1 - Funding Information:
This research is supported by Programmatic Grant No. A1687b0033 from the Singapore Government's Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain).
Funding Information:
This research is supported by Programmatic Grant No. A1687b0033 from the Singapore Government’s Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain).
Publisher Copyright:
Copyright © 2018 Wu, Chua, Zhang, Li and Tan.
PY - 2018/11/19
Y1 - 2018/11/19
N2 - Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, we propose a biologically plausible ASC framework, namely SOM-SNN. This framework uses the unsupervised self-organizing map (SOM) for representing frequency contents embedded within the acoustic signals, followed by an event-based spiking neural network (SNN) for spatiotemporal spiking pattern classification. We report experimental results on the RWCP environmental sound and TIDIGITS spoken digits datasets, which demonstrate competitive classification accuracies over other deep learning and SNN-based models. The SOM-SNN framework is also shown to be highly robust to corrupting noise after multi-condition training, whereby the model is trained with noise-corrupted sound samples. Moreover, we discover the early decision making capability of the proposed framework: an accurate classification can be made with an only partial presentation of the input.
AB - Environmental sounds form part of our daily life. With the advancement of deep learning models and the abundance of training data, the performance of automatic sound classification (ASC) systems has improved significantly in recent years. However, the high computational cost, hence high power consumption, remains a major hurdle for large-scale implementation of ASC systems on mobile and wearable devices. Motivated by the observations that humans are highly effective and consume little power whilst analyzing complex audio scenes, we propose a biologically plausible ASC framework, namely SOM-SNN. This framework uses the unsupervised self-organizing map (SOM) for representing frequency contents embedded within the acoustic signals, followed by an event-based spiking neural network (SNN) for spatiotemporal spiking pattern classification. We report experimental results on the RWCP environmental sound and TIDIGITS spoken digits datasets, which demonstrate competitive classification accuracies over other deep learning and SNN-based models. The SOM-SNN framework is also shown to be highly robust to corrupting noise after multi-condition training, whereby the model is trained with noise-corrupted sound samples. Moreover, we discover the early decision making capability of the proposed framework: an accurate classification can be made with an only partial presentation of the input.
KW - Automatic sound classification
KW - Maximum-margin tempotron classifier
KW - Noise robust multi-condition training
KW - Self-organizing map
KW - Spiking neural network
UR - http://www.scopus.com/inward/record.url?scp=85057188973&partnerID=8YFLogxK
U2 - 10.3389/fnins.2018.00836
DO - 10.3389/fnins.2018.00836
M3 - Journal article
AN - SCOPUS:85057188973
SN - 1662-4548
VL - 12
SP - 1
EP - 17
JO - Frontiers in Neuroscience
JF - Frontiers in Neuroscience
IS - NOV
M1 - 836
ER -