TY - GEN
T1 - Deep representation-decoupling neural networks for monaural music mixture separation
AU - Li, Zhuo
AU - Wang, Hongwei
AU - Zhao, Miao
AU - Li, Wenjie
AU - Guo, Minyi
N1 - Publisher Copyright:
Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2018
Y1 - 2018
N2 - Monaural source separation (MSS) aims to extract and reconstruct different sources from a single-channel mixture, which could facilitate a variety of applications such as chord recognition, pitch estimation and automatic transcription. In this paper, we study the problem of separating vocals and instruments from monaural music mixture. Existing works for monaural source separation either utilize linear and shallow models (e.g., non-negative matrix factorization), or do not explicitly address the coupling and tangling of multiple sources in original input signals, hence they do not perform satisfactorily in real-world scenarios. To overcome the above limitations, we propose a novel end-to-end framework for monaural music mixture separation called Deep Representation-Decoupling Neural Networks (DRDNN). DRDNN takes advantages of both traditional signal processing methods and popular deep learning models. For each input of music mixture, DRDNN converts it to a two-dimensional time-frequency spectrogram using short-time Fourier transform (STFT), followed by stacked convolutional neural networks (CNN) layers and long-short term memory (LSTM) layers to extract more condensed features. Afterwards, DRDNN utilizes a decoupling component, which consists of a group of multi-layer perceptrons (MLP), to decouple the features further into different separated sources. The design of decoupling component in DRDNN produces purified single-source signals for subsequent full-size restoration, and can significantly improve the performance of final separation. Through extensive experiments on real-world dataset, we prove that DRDNN outperforms state-of-the-art baselines in the task of monaural music mixture separation and reconstruction.
AB - Monaural source separation (MSS) aims to extract and reconstruct different sources from a single-channel mixture, which could facilitate a variety of applications such as chord recognition, pitch estimation and automatic transcription. In this paper, we study the problem of separating vocals and instruments from monaural music mixture. Existing works for monaural source separation either utilize linear and shallow models (e.g., non-negative matrix factorization), or do not explicitly address the coupling and tangling of multiple sources in original input signals, hence they do not perform satisfactorily in real-world scenarios. To overcome the above limitations, we propose a novel end-to-end framework for monaural music mixture separation called Deep Representation-Decoupling Neural Networks (DRDNN). DRDNN takes advantages of both traditional signal processing methods and popular deep learning models. For each input of music mixture, DRDNN converts it to a two-dimensional time-frequency spectrogram using short-time Fourier transform (STFT), followed by stacked convolutional neural networks (CNN) layers and long-short term memory (LSTM) layers to extract more condensed features. Afterwards, DRDNN utilizes a decoupling component, which consists of a group of multi-layer perceptrons (MLP), to decouple the features further into different separated sources. The design of decoupling component in DRDNN produces purified single-source signals for subsequent full-size restoration, and can significantly improve the performance of final separation. Through extensive experiments on real-world dataset, we prove that DRDNN outperforms state-of-the-art baselines in the task of monaural music mixture separation and reconstruction.
UR - http://www.scopus.com/inward/record.url?scp=85060436018&partnerID=8YFLogxK
M3 - Conference article published in proceeding or book
AN - SCOPUS:85060436018
T3 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
SP - 93
EP - 100
BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PB - AAAI press
T2 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Y2 - 2 February 2018 through 7 February 2018
ER -