SingleChannel Signal Separation and Deconvolution
with Generative Adversarial Networks
Abstract
Singlechannel signal separation and deconvolution aims to separate and deconvolve individual sources from a singlechannel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain nonstationary noise which is unseen in the training set. We propose a synthesizingdecomposition (SD) approach to solve the singlechannel separation and deconvolution problem. In synthesizing, a generative model for sources is built using a generative adversarial network (GAN). In decomposition, both mixing filters and sources are optimized to minimize the reconstruction error of the mixture. The proposed SD approach achieves a peaktonoiseratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline convolutional neural network PSNR of 15.3 dB and 12.2 dB, respectively and achieves a PSNR of 13.2 dB in source separation together with deconvolution, outperforming a convolutive nonnegative matrix factorization (NMF) baseline of 10.1 dB.
SingleChannel Signal Separation and Deconvolution
with Generative Adversarial Networks
Qiuqiang Kong , Yong Xu , Wenwu Wang , Philip J.B. Jackson and Mark D. Plumbley
University of Surrey, Guildford, UK
Tencent AI lab, Bellevue, USA
{q.kong, w.wang, p.jackson, m.plumbley}@surrey.ac.uk, lucayongxu@tencent.com
1 Introduction
SingleChannel signal separation and deconvolution aims to separate and deconvolve sources from a singlechannel mixture. One challenging aspect of singlechannel signal separation and deconvolution is that only a singlechannel mixture is available, so this problem is underdetermined. Second, there is no prior knowledge of the mixing filters. Both individual sources and mixing filters are unknown and need to be estimated. Third, there is no prior knowledge on the noise, which can be nonstationary and has not been seen in the training data. These difficulties lead to singlechannel signal separation and deconvolution being a very challenging problem. Singlechannel signal separation and deconvolution has many applications in image, speech and audio denoising [?], inpainting [?], deconvolution and separation [?; ?]. For example, an audio sensor usually receives signals from multiple sources convolved with channel distortion.
Much previous work focuses on source separation [?; ?] or deconvolution [?; ?] independently, but not together. We categorize previous source separation and deconvolution methods into decomposition based approaches and regression based approaches. Decomposition methods usually learn a set of bases for sources and use these bases to decompose a mixture. Decomposing methods including nonnegative matrix factorization (NMF) [?; ?; ?] assumes that a source can be represented by linear combination of a set of bases. NMF has been used in source representation and separation [?; ?]. In contrast to the decomposition based approaches, regression based approaches learn a mapping from a mixture to an individual source. Such mappings can be modeled by neural networks, for example, fully connected neural networks [?] and convolutional neural networks (CNNs) [?; ?]. In [?], a stacked denoising autoencoder (DAE) is proposed to recover sources from a mixture. CNNs are used for source deconvolution in [?].
However, many decomposition methods such as NMF and ICA are shallow layer models, which are typically a linear combination of bases. These shallow layer models do not have enough capacity to represent a broad range of sources compared with neural networks [?]. On the other hand, regression based approaches such as deep neural networks are able to model complicated mappings but require both mixture and target sources for training. Regression based methods may not generalize well if the mixing filter and noise in the testing data have different distribution from the training data, which will result in poor separation results when the mixing filter and noise are unseen in the training data [?]. Recently generative adversarial networks (GANs) have been proposed for solving the source separation problem [?; ?; ?]. So far these methods assume that the mixing filters in the singlechannel signal separation problem are known.
This paper proposed a novel synthesizingdecomposition (SD) approach to solve the singlechannel source separation and deconvolution problem. Compared to the conventional regression approaches, the SD approach applies generative adversarial network (GANs) to solve this problem in a generative way. The SD approach can estimate both the sources and convolutive mixing filters, while conventional regression methods do not estimate convolutive mixing filters. In addition, we formulate the singlechannel signal separation and deconvolution problem as a Bayesian maximum a posteriori (MAP) estimation which is a constrained nonconvex optimization problem. In the SD approach, a generative model is built for sources using a generative adversarial network (GAN). In decomposition, both sources and mixing filters can be obtained by minimizing the reconstruction error of a mixture. To tackle the nonconvex optimization problem, repeating the decomposition with different initializations can significantly increase the underdetermined singlechannel signal separation and deconvolution performance. We carry out the underdetermined singlechannel signal separation and deconvolution experiments on MNIST dataset as a starting research to show the effectiveness of the proposed SD approach with GANs.
This paper is organized as follows: Section 2 formulates the underdetermined singlechannel signal separation and deconvolution problem. Section 3 proposes the synthesisingdecomposition (SD) approach for this problem. Section 4 shows experimental results. Section 5 concludes and forecasts future work.
2 SingleChannel Signal Separation and Deconvolution
In underdetermined singlechannel signal separation and deconvolution, a singlechannel mixture is composed of individual sources convolved with unknown filters followed by unknown additional noise . The space can be a Euclidean space where and denote the number and the dimension of sources, respectively:
(1) 
The symbol represents the convolution operation:
(2) 
For the simple case of source separation without deconvolution, in (2) simplifies to where is the Dirac delta function. General singlechannel signal separation and deconvolution problem concerns both separating and deconvolving individual sources from a singlechannel mixture while the mixing filters and the noise signal are unknown in (1). In the following paper, we simplify the notation of to and , respectively.
In the regression based approaches [?; ?], a mapping from a mixture to a source signal is modeled by deep neural networks and learned to separate the th source: . In separation, separated sources are obtained by forwarding a mixture to the model: . However there are several problems associated with the regression based approaches as follows:
Problem 1.
In regression based supervised learning, the training data and testing data should have the same distribution, otherwise the trained model will be biased [?]. However, in singlechannel signal separation and deconvolution, no prior knowledge of test noise is available. The model trained with training noise may not generalize well to sources with unseen nonstationary noise.
Problem 2.
In singlechannel signal separation and deconvolution, both the sources and mixing filters are unknown and need to be estimated.
Problem 3.
Previous regression and decomposition based approaches do not constrain the distribution of the separated sources to be the same as the distribution of real sources . Ideally, the separated sources should be regularized in the area where has larger value.
Decomposition approaches such as NMF can be trained on individual sources instead of on a mixture so that Problem 1 can be mitigated. Recently, GANs [?; ?; ?] have been applied to source separation to solve Problem 3 to constrain the separated sources to be laid in natural source space. However, those methods are based on the assumption that the mixing filters are constants so that they are solving only separation but not deconvolution problem as shown in (1).
3 Proposed SynthesisingDecomposition (SD) Approach
3.1 Maximum a Posteriori (MAP) Estimation
In this section, we first formulate the singlechannel signal separation and deconvolution problem in (1) as a Bayesian parameter estimation problem. We denote as the set of parameters to be estimated, including sources and mixing filters. The estimated can be obtained by maximum a posteriori (MAP) estimation:
(3) 
The first term in (3) is a likelihood function. The reconstructed signal can be written as . Assuming is a Gaussian process, the likelihood of observed signal given estimated signal can be written as:
(4) 
where is the probability density of a Gaussian distribution. The second term in (3) is the prior probability of . Assuming the sources and filters are independent of each other, we can write as:
(5) 
We assume to have a compact support . Substituting equations (4) and (5) to equation (3) the estimation of sources and filters can be obtained by solving the following optimization problem:
(6) 

Sample minibatch of m noise samples from a Gaussian distribution .

Sample minibatch of m examples from real data.

Update the discriminator by ascending its stochastic gradient:

Sample minibatch of m noise samples from a Gaussian distribution .

Update the generator by descending its stochastic gradient:
3.2 Optimization with SD Approach
To optimize (6) is difficult because of the constraint of . The source prior is unknown, so that can not be written in a closed form. Our solution is to convert (6) to an unconstrained optimization problem. In the proposed SD approach, we first build a generative model for with a GAN [?; ?]. A GAN consists of a generator and a discriminator . The generator is a mapping from any distribution such as a Gaussian distribution to a real distribution of sources. We call a seed distribution and sample as seeds. The generator is trained to generate samples to fool the discriminator . The discriminator is trained to discriminate fake sources from real sources. In other words, the generator and the discriminator play the following twoplayer minimax game with value function [?]:
(7) 
where is the real data probability density. The training of the GAN is shown in Algorithm 1. The generator and discriminator are trained iteratively. If both and have enough capacity, then the generated source distribution will converge to [?]. Once GAN is successfully trained, there is for all . To solve the optimization problem in (6), we substitute and optimize over instead of so that the constraint is eliminated. Now the variables to be optimized are and the mixing filters . In addition, GAN does not predict the probability density of so the optimization of equation (6) is intractable. To solve this problem, we approximate with:
(8) 
Equation (8) assumes the probability density outside is zero. It is not required to know the value of as it is eliminated when optimizing (6):
(9) 
We assume the coefficients in to be Gaussian . Taking the logarithm of (9) the optimization can be written as:
(10) 
where is a regularization term for (10).
Noise  Mixing filters  
Denoising  Gaussian  , is a constant 
Inpainting, Completion  Unknown  , is a constant 
Deconvolution    , is a tensor 
Separation    , are constants 
Separation + deconvolution    , are tensors 
3.3 Optimization
To solve (10), we apply a gradient based iterative approach. We denote where and need to be optimized. First we randomly initialize , then the gradients of are calculated by:
(11) 
The parameters are optimized using Algorithm 2. Because is a nonlinear mapping, so (10) is a nonconvex function over . The gradient based methods might reach a local minimum depending on the initialization of seeds. To mitigate this problem we repeat Algorithm 2 for times and choose the one with smallest reconstruction error.
4 Experiments
In this section, we apply the proposed SD method to solve underdetermined image singlechannel signal separation and deconvolution problem. We carry out experiments on MNIST 10digit dataset [?] as a starting research for this challenging problem and show the effectiveness of the proposed SD method. With different types of unknown mixing filters and unknown interference noise , the problem of (1) can be categorized as image denoising, inpainting, completion, deconvolution and separation, as shown in Table 1. The symbol ‘’ represents any type of noise. Previous works usually focus on one of these problems such as denoising [?], inpainting [?], deconvolution [?] or separation [?]. In this paper we solve these problem together with the proposed SD method. The PyTorch implementation of this paper is released^{1}^{1}1https://github.com/qiuqiangkong/gan_separation_deconvolution.
4.1 Model Configuration
In the proposed SD approach, we model the synthesising procedure with a deep convolutive generative adversarial network (DCGAN) [?], which can stabilize the training of a GAN and can generate high quality images as shown in [?]. A DCGAN consists of a generator and a discriminator . The input to consists of a seed sampled from a Gaussian distribution . The Gaussian distribution has a dimension of 100 following [?]. The generator has 4 transpose convolutional layers with number of feature maps of 512, 256, 128 and 1, respectively. Following [?], batch normalization [?] and ReLU nonlinearity are applied after each transpose convolutional layer. The output of is an image which has the same size as the images in the training data. The discriminator takes a fake or a real image as input. The discriminator consists of 4 convolutional layers, with a sigmoid output representing the probability that the input to is from real data instead of generated data. Following [?], we use the Adam [?] optimizer with a learning rate of 0.0002, a of 0.5 and a of 0.999 to train the generator. In decomposition, we freeze the trained generator . We approximate with a Gaussian distribution which works well in our experiment. We set to 0.001 to regularize the mixing filters to be searched. The filters and are randomly initialized and optimized with Adam optimizer with a learning rate of 0.01, a of 0.9 and a of 0.999 (Algorithm 2).
For comparison with regression based approaches, we apply a CNN [?] which consists 4 layers with batch normalization [?] and ReLU nonlinearity. The number of layers and parameters are set to be the same as the discriminator in the DCGAN. The CNN is trained to regress from individual source with noise to individual source . For comparison with decomposition based approaches, we train a dictionary for each of the 10 digits using NMF [?] with Euclidean distance. Each dictionary consists of 20 bases which performs well in our experiment. In decomposition, the trained dictionaries are concatenated to form a dictionary of 200 bases which is then used to decompose the mixtures.
4.2 Evaluation
Following [?; ?; ?], we use peak signal to noise ratio (PSNR) to evaluate singlechannel signal separation and deconvolution quality. A higher PSNR indicates a better reconstruction quality. PSNR is defined as:
(12) 
where is the maximum value of a noisefree image. MSE represents mean squared error between two images and with size of :
(13) 
denoising  inpainting  completion  

CNN  26.0 dB  15.3 dB  12.2 dB 
NMF  17.4 dB  13.4 dB  12.9 dB 
convolutive NMF  18.3 dB  13.4 dB  13.0 dB 
SD with 1 init.  23.1 dB  15.2 dB  13.6 dB 
SD with 8 init.  25.1 dB  18.2 dB  15.4 dB 
SD with 32 init.  25.1 dB  18.9 dB  15.4 dB 
4.3 Denoising, Inpainting and Completion
Denoising, inpaining and completion are special case of singlechannel signal separation and deconvolution problem where is an unknown constant and is unknown noise such as Gaussian noise, nonstationary noise or corruption of an image. The first and second rows of Fig. 1 show the clean and noisy images. The third to the fifth rows show the denoised images with CNN, NMF and the proposed SD approach. In the first column, testing noise and training noise have the same distribution so CNN performs well. However CNN based denoising methods do not generalize well to unseen noise such as nonstationary noise or image corruption shown in the second and third columns in Fig. 1. NMF performs better than CNN under unseen noise but sometimes produces unnatural separation result, which is due to Problem 3 we stated in Section 2. SD approach has a good performance in all of image denoising, inpainting and completion. Table 2 shows PSNR of CNN, NMF, convolutive NMF and SD approaches. SD approach achieves a PSNR of 25.1 dB in image denoising which is comparable to CNN. NMF and convolutive NMF achieve similar PSNR of 17.4 dB and 18.3 dB, respectively. In image inpainting, SD achieves a PSNR of 18.9 dB, outperforming NMF and CNN methods of 13.4 dB and 15.3 dB, respectively. This result shows source separation with SD generalize well to unseen noise than NMF and CNN. In image completion, SD approach achieves a PSNR of 15.4 dB, outperforming convolutive CNN of 12.2 dB and convolutive NMF of 12.9 dB respectively. Table 2 also shows the decomposition in SD approach with respect to the number of initializations. With 8 or 32 initializations the performance is 2 dB better than with only 1 initialization. This may result from the fact that the optimization problem in (10) is nonconvex. Algorithm 2 is a gradient based method which may lead to the solution being in a local minimum. Repeating Algorithm 2 several times with different initializations and choosing the solution with least reconstruction error shows better performance.
deconv.  sep.  sep. + deconv.  
NMF  15.3 dB  9.4 dB  8.7 dB 
convolutive NMF  18.3 dB  14.2 dB  10.1 dB 
SD with 1 init.  17.3 dB  13.7 dB  9.3 dB 
SD with 8 init.  21.9 dB  16.8 dB  11.5 dB 
SD with 32 init.  23.2 dB  18.5 dB  13.2 dB 
4.4 Separation and Deconvolution
We evaluate singlechannel signal separation and deconvolution with the mixing filters as unknown tensors, which is a very challenging task. In this case both of the mixing tensors and individual sources need to be estimated. Fig. 2 shows a mixture obtained by convolving clean sources with mixing filters followed by summation. In our experiment we set and each mixing filter has a size of . In actual application scenarios the size of mixing filter depends on the task. Fig. 2 shows NMF based separation often leads to unnatural images. The SD based approach can separate images with high quality and both the sources and mixing filters can be estimated. Fig. 2 shows both estimated sources and mixing filters are learned correctly compared with the ground truth sources and mixing filters. The first column of Table 3 shows the results of image deconvolution without separation where K=1 and is an unknown tensor. SD achieves a PSNR of 23.2 dB and performs better than NMF and the convolutive NMF of 15.3 dB and 18.3 dB, respectively. The second column of Table 3 shows the results of image separation where are unknown constants and . SD achieves a PSNR of 18.5 dB and performs better than NMF and convolutive NMF of of 9.4 dB and 14.2 dB, respectively. The third column of Table 3 shows both of source separation and deconvolution where are unknown tensors and . SD achieves a PSNR of 13.2 dB and outperforms NMF and convolutive NMF of 8.7 dB and 10.1 dB, respectively. SD with 32 initializations has higher PSNR than 8 initializations and than 1 initialization, which shows the effectiveness of repeating Algorithm 2 several times to solve the nonconvex optimization problem in (10).
5 Conclusion
In this paper, we propose a synthesisdecomposition (SD) approach to solve singlechannel signal separation and deconvolution problem. In synthesizing, a generative model for source signals is trained using a generative adversarial network (GAN). In decomposition, both sources and filters are optimized to minimize the reconstruction error. Instead of optimizing sources directly, we optimize over the seeds of a GAN. The proposed SD approach achieves a PSNR of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming the regression approach CNN and decomposition approach NMF. The SD approach achieves a PSNR of 13.2 dB in image source separation with deconvolution, outperforming NMF of 8.7 dB. Repeating the decomposition in SD several times can significantly improve PSNR. In future, we will explore the SD approach to more source separation and deconvolution problems.
Acknowledgements
This research was supported by EPSRC grant EP/N014111/1 “Making Sense of Sounds” and a Research Scholarship from the China Scholarship Council (CSC) No. 201406150082.
References
 [Campisi and Egiazarian, 2017] Patrizio Campisi and Karen Egiazarian. Blind image deconvolution: theory and applications. CRC press, 2017.
 [Cichocki et al., 2006] A. Cichocki, R. Zdunek, and S. Amari. New algorithms for nonnegative matrix factorization in applications to blind source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2006.
 [Cichocki et al., 2009] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Nonnegative matrix and tensor factorizations: applications to exploratory multiway data analysis and blind source separation. John Wiley & Sons, 2009.
 [Fan et al., 2018] Z. Fan, Y. Lai, and J. Jang. SVSGAN: Singing voice separation via generative adversarial network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
 [Goodfellow et al., 2014] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.
 [Grais et al., 2014] E. M. Grais, M. Sen, and H. Erdogan. Deep neural networks for single channel source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3734–3738, 2014.
 [Ioffe and Szegedy, 2015] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
 [Jain and Seung, 2009] V. Jain and S. Seung. Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems (NIPS), pages 769–776, 2009.
 [Kingma and Ba, 2015] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
 [Kitamura et al., 2013] D. Kitamura, H. Saruwatari, K. Shikano, K. Kondo, and Y. Takahashi. Music signal separation by supervised nonnegative matrix factorization with basis deformation. In International Conference on Digital Signal Processing (DSP), 2013.
 [LeCun et al., 1998] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [Lee and Seung, 1999] D. D. Lee and H. S. Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 1999.
 [Levin et al., 2009] Anat Levin, Yair Weiss, Fredo Durand, and William T Freeman. Understanding and evaluating blind deconvolution algorithms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
 [Mijovic et al., 2010] B. Mijovic, M. De Vos, I. Gligorijevic, J. Taelman, and S. Van Huffel. Source separation from singlechannel recordings by combining empiricalmode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 2010.
 [Radford et al., 2015] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
 [Stoller et al., 2017] D. Stoller, S. Ewert, and S. Dixon. Adversarial semisupervised audio source separation applied to singing voice extraction. arXiv preprint arXiv:1711.00048, 2017.
 [Subakan and Smaragdis, 2017] Cem Subakan and Paris Smaragdis. Generative adversarial source separation. arXiv preprint arXiv:1710.10779, 2017.
 [Xie et al., 2012] J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 341–349, 2012.
 [Xu et al., 2014] L. Xu, J. Ren, C. Liu, and J. Jia. Deep convolutional neural network for image deconvolution. In Advances in Neural Information Processing Systems (NIPS), pages 1790–1798, 2014.
 [Yeh et al., 2016] R. Yeh, C. Chen, T. Y. Lim, M. HasegawaJohnson, and M. N. Do. Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539, 2016.
 [Yosinski et al., 2014] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems (NIPS), pages 3320–3328, 2014.
 [Zhang et al., 2017] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.