Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

Single-Channel Signal Separation and Deconvolution
with Generative Adversarial Networks

Qiuqiang Kong    Yong Xu    Wenwu Wang    Philip J.B. Jackson &Mark D. Plumbley \affiliationsUniversity of Surrey, Guildford, UK
Tencent AI lab, Bellevue, USA \emails{q.kong, w.wang, p.jackson, m.plumbley}@surrey.ac.uk, lucayongxu@tencent.com
Abstract

Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synthesizing-decomposition (S-D) approach to solve the single-channel separation and deconvolution problem. In synthesizing, a generative model for sources is built using a generative adversarial network (GAN). In decomposition, both mixing filters and sources are optimized to minimize the reconstruction error of the mixture. The proposed S-D approach achieves a peak-to-noise-ratio (PSNR) of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming a baseline convolutional neural network PSNR of 15.3 dB and 12.2 dB, respectively and achieves a PSNR of 13.2 dB in source separation together with deconvolution, outperforming a convolutive non-negative matrix factorization (NMF) baseline of 10.1 dB.

Single-Channel Signal Separation and Deconvolution
with Generative Adversarial Networks


Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J.B. Jacksonand Mark D. Plumbley

University of Surrey, Guildford, UK
Tencent AI lab, Bellevue, USA

{q.kong, w.wang, p.jackson, m.plumbley}@surrey.ac.uk, lucayongxu@tencent.com

1 Introduction

Single-Channel signal separation and deconvolution aims to separate and deconvolve sources from a single-channel mixture. One challenging aspect of single-channel signal separation and deconvolution is that only a single-channel mixture is available, so this problem is underdetermined. Second, there is no prior knowledge of the mixing filters. Both individual sources and mixing filters are unknown and need to be estimated. Third, there is no prior knowledge on the noise, which can be non-stationary and has not been seen in the training data. These difficulties lead to single-channel signal separation and deconvolution being a very challenging problem. Single-channel signal separation and deconvolution has many applications in image, speech and audio denoising [?], inpainting [?], deconvolution and separation [??]. For example, an audio sensor usually receives signals from multiple sources convolved with channel distortion.

Much previous work focuses on source separation [??] or deconvolution [??] independently, but not together. We categorize previous source separation and deconvolution methods into decomposition based approaches and regression based approaches. Decomposition methods usually learn a set of bases for sources and use these bases to decompose a mixture. Decomposing methods including non-negative matrix factorization (NMF) [???] assumes that a source can be represented by linear combination of a set of bases. NMF has been used in source representation and separation [??]. In contrast to the decomposition based approaches, regression based approaches learn a mapping from a mixture to an individual source. Such mappings can be modeled by neural networks, for example, fully connected neural networks [?] and convolutional neural networks (CNNs) [??]. In [?], a stacked denoising auto-encoder (DAE) is proposed to recover sources from a mixture. CNNs are used for source deconvolution in [?].

However, many decomposition methods such as NMF and ICA are shallow layer models, which are typically a linear combination of bases. These shallow layer models do not have enough capacity to represent a broad range of sources compared with neural networks [?]. On the other hand, regression based approaches such as deep neural networks are able to model complicated mappings but require both mixture and target sources for training. Regression based methods may not generalize well if the mixing filter and noise in the testing data have different distribution from the training data, which will result in poor separation results when the mixing filter and noise are unseen in the training data [?]. Recently generative adversarial networks (GANs) have been proposed for solving the source separation problem [???]. So far these methods assume that the mixing filters in the single-channel signal separation problem are known.

This paper proposed a novel synthesizing-decomposition (S-D) approach to solve the single-channel source separation and deconvolution problem. Compared to the conventional regression approaches, the S-D approach applies generative adversarial network (GANs) to solve this problem in a generative way. The S-D approach can estimate both the sources and convolutive mixing filters, while conventional regression methods do not estimate convolutive mixing filters. In addition, we formulate the single-channel signal separation and deconvolution problem as a Bayesian maximum a posteriori (MAP) estimation which is a constrained non-convex optimization problem. In the S-D approach, a generative model is built for sources using a generative adversarial network (GAN). In decomposition, both sources and mixing filters can be obtained by minimizing the reconstruction error of a mixture. To tackle the non-convex optimization problem, repeating the decomposition with different initializations can significantly increase the underdetermined single-channel signal separation and deconvolution performance. We carry out the underdetermined single-channel signal separation and deconvolution experiments on MNIST dataset as a starting research to show the effectiveness of the proposed S-D approach with GANs.

This paper is organized as follows: Section 2 formulates the underdetermined single-channel signal separation and deconvolution problem. Section 3 proposes the synthesising-decomposition (S-D) approach for this problem. Section 4 shows experimental results. Section 5 concludes and forecasts future work.

2 Single-Channel Signal Separation and Deconvolution

In underdetermined single-channel signal separation and deconvolution, a single-channel mixture is composed of individual sources convolved with unknown filters followed by unknown additional noise . The space can be a Euclidean space where and denote the number and the dimension of sources, respectively:

(1)

The symbol represents the convolution operation:

(2)

For the simple case of source separation without deconvolution, in (2) simplifies to where is the Dirac delta function. General single-channel signal separation and deconvolution problem concerns both separating and deconvolving individual sources from a single-channel mixture while the mixing filters and the noise signal are unknown in (1). In the following paper, we simplify the notation of to and , respectively.

In the regression based approaches [??], a mapping from a mixture to a source signal is modeled by deep neural networks and learned to separate the -th source: . In separation, separated sources are obtained by forwarding a mixture to the model: . However there are several problems associated with the regression based approaches as follows:

Problem 1.

In regression based supervised learning, the training data and testing data should have the same distribution, otherwise the trained model will be biased [?]. However, in single-channel signal separation and deconvolution, no prior knowledge of test noise is available. The model trained with training noise may not generalize well to sources with unseen non-stationary noise.

Problem 2.

In single-channel signal separation and deconvolution, both the sources and mixing filters are unknown and need to be estimated.

Problem 3.

Previous regression and decomposition based approaches do not constrain the distribution of the separated sources to be the same as the distribution of real sources . Ideally, the separated sources should be regularized in the area where has larger value.

Decomposition approaches such as NMF can be trained on individual sources instead of on a mixture so that Problem 1 can be mitigated. Recently, GANs [???] have been applied to source separation to solve Problem 3 to constrain the separated sources to be laid in natural source space. However, those methods are based on the assumption that the mixing filters are constants so that they are solving only separation but not deconvolution problem as shown in (1).

3 Proposed Synthesising-Decomposition (S-D) Approach

3.1 Maximum a Posteriori (MAP) Estimation

In this section, we first formulate the single-channel signal separation and deconvolution problem in (1) as a Bayesian parameter estimation problem. We denote as the set of parameters to be estimated, including sources and mixing filters. The estimated can be obtained by maximum a posteriori (MAP) estimation:

(3)

The first term in (3) is a likelihood function. The reconstructed signal can be written as . Assuming is a Gaussian process, the likelihood of observed signal given estimated signal can be written as:

(4)

where is the probability density of a Gaussian distribution. The second term in (3) is the prior probability of . Assuming the sources and filters are independent of each other, we can write as:

(5)

We assume to have a compact support . Substituting equations (4) and (5) to equation (3) the estimation of sources and filters can be obtained by solving the following optimization problem:

(6)
1:Inputs: Real data .
2:Outputs: Parameters of the discriminator and the generator of a GAN.
3:for number of iterations do
  • Sample minibatch of m noise samples from a Gaussian distribution .

  • Sample minibatch of m examples from real data.

  • Update the discriminator by ascending its stochastic gradient:

  • Sample minibatch of m noise samples from a Gaussian distribution .

  • Update the generator by descending its stochastic gradient:

4:end for
Algorithm 1 Training of a GAN [?].
1:Inputs: A mixture source. Generator trained using algorithm 1.
2:Outputs: Separated and deconvolved sources and mixing filters .
3:Sample seeds and mixing filters from a Gaussian distribution .
4:for number of iterations do
  • Calculate reconstructed signal .

  • Calculate gradient from equation (11) where .

  • Update using Adam optimizer [?].

5:end for
Algorithm 2 Decomposition of a mixture source. Hyperparameters: : Number of individual sources.

3.2 Optimization with S-D Approach

To optimize (6) is difficult because of the constraint of . The source prior is unknown, so that can not be written in a closed form. Our solution is to convert (6) to an unconstrained optimization problem. In the proposed S-D approach, we first build a generative model for with a GAN [??]. A GAN consists of a generator and a discriminator . The generator is a mapping from any distribution such as a Gaussian distribution to a real distribution of sources. We call a seed distribution and sample as seeds. The generator is trained to generate samples to fool the discriminator . The discriminator is trained to discriminate fake sources from real sources. In other words, the generator and the discriminator play the following two-player minimax game with value function [?]:

(7)

where is the real data probability density. The training of the GAN is shown in Algorithm 1. The generator and discriminator are trained iteratively. If both and have enough capacity, then the generated source distribution will converge to [?]. Once GAN is successfully trained, there is for all . To solve the optimization problem in (6), we substitute and optimize over instead of so that the constraint is eliminated. Now the variables to be optimized are and the mixing filters . In addition, GAN does not predict the probability density of so the optimization of equation (6) is intractable. To solve this problem, we approximate with:

(8)

Equation (8) assumes the probability density outside is zero. It is not required to know the value of as it is eliminated when optimizing (6):

(9)

We assume the coefficients in to be Gaussian . Taking the logarithm of (9) the optimization can be written as:

(10)

where is a regularization term for (10).

Noise Mixing filters
Denoising Gaussian , is a constant
Inpainting, Completion Unknown , is a constant
Deconvolution - , is a tensor
Separation - , are constants
Separation + deconvolution - , are tensors
Table 1: Category of single-channel signal separation and deconvolution problem with different noise and mixing filters.

3.3 Optimization

To solve (10), we apply a gradient based iterative approach. We denote where and need to be optimized. First we randomly initialize , then the gradients of are calculated by:

(11)

The parameters are optimized using Algorithm 2. Because is a non-linear mapping, so (10) is a non-convex function over . The gradient based methods might reach a local minimum depending on the initialization of seeds. To mitigate this problem we repeat Algorithm 2 for times and choose the one with smallest reconstruction error.

4 Experiments

In this section, we apply the proposed S-D method to solve underdetermined image single-channel signal separation and deconvolution problem. We carry out experiments on MNIST 10-digit dataset [?] as a starting research for this challenging problem and show the effectiveness of the proposed S-D method. With different types of unknown mixing filters and unknown interference noise , the problem of (1) can be categorized as image denoising, inpainting, completion, deconvolution and separation, as shown in Table 1. The symbol ‘-’ represents any type of noise. Previous works usually focus on one of these problems such as denoising [?], inpainting [?], deconvolution [?] or separation [?]. In this paper we solve these problem together with the proposed S-D method. The PyTorch implementation of this paper is released111https://github.com/qiuqiangkong/gan_separation_deconvolution.

4.1 Model Configuration

In the proposed S-D approach, we model the synthesising procedure with a deep convolutive generative adversarial network (DCGAN) [?], which can stabilize the training of a GAN and can generate high quality images as shown in [?]. A DCGAN consists of a generator and a discriminator . The input to consists of a seed sampled from a Gaussian distribution . The Gaussian distribution has a dimension of 100 following [?]. The generator has 4 transpose convolutional layers with number of feature maps of 512, 256, 128 and 1, respectively. Following [?], batch normalization [?] and ReLU non-linearity are applied after each transpose convolutional layer. The output of is an image which has the same size as the images in the training data. The discriminator takes a fake or a real image as input. The discriminator consists of 4 convolutional layers, with a sigmoid output representing the probability that the input to is from real data instead of generated data. Following [?], we use the Adam [?] optimizer with a learning rate of 0.0002, a of 0.5 and a of 0.999 to train the generator. In decomposition, we freeze the trained generator . We approximate with a Gaussian distribution which works well in our experiment. We set to 0.001 to regularize the mixing filters to be searched. The filters and are randomly initialized and optimized with Adam optimizer with a learning rate of 0.01, a of 0.9 and a of 0.999 (Algorithm 2).

For comparison with regression based approaches, we apply a CNN [?] which consists 4 layers with batch normalization [?] and ReLU non-linearity. The number of layers and parameters are set to be the same as the discriminator in the DCGAN. The CNN is trained to regress from individual source with noise to individual source . For comparison with decomposition based approaches, we train a dictionary for each of the 10 digits using NMF [?] with Euclidean distance. Each dictionary consists of 20 bases which performs well in our experiment. In decomposition, the trained dictionaries are concatenated to form a dictionary of 200 bases which is then used to decompose the mixtures.

Figure 1: Image denoising, inpainting and completion with CNN, NMF and S-D approaches.

4.2 Evaluation

Following [???], we use peak signal to noise ratio (PSNR) to evaluate single-channel signal separation and deconvolution quality. A higher PSNR indicates a better reconstruction quality. PSNR is defined as:

(12)

where is the maximum value of a noise-free image. MSE represents mean squared error between two images and with size of :

(13)
denoising inpainting completion
CNN 26.0 dB 15.3 dB 12.2 dB
NMF 17.4 dB 13.4 dB 12.9 dB
convolutive NMF 18.3 dB 13.4 dB 13.0 dB
S-D with 1 init. 23.1 dB 15.2 dB 13.6 dB
S-D with 8 init. 25.1 dB 18.2 dB 15.4 dB
S-D with 32 init. 25.1 dB 18.9 dB 15.4 dB
Table 2: PSNR of image denoising, inpainting and completion with different approaches.

4.3 Denoising, Inpainting and Completion

Denoising, inpaining and completion are special case of single-channel signal separation and deconvolution problem where is an unknown constant and is unknown noise such as Gaussian noise, non-stationary noise or corruption of an image. The first and second rows of Fig. 1 show the clean and noisy images. The third to the fifth rows show the denoised images with CNN, NMF and the proposed S-D approach. In the first column, testing noise and training noise have the same distribution so CNN performs well. However CNN based denoising methods do not generalize well to unseen noise such as non-stationary noise or image corruption shown in the second and third columns in Fig. 1. NMF performs better than CNN under unseen noise but sometimes produces unnatural separation result, which is due to Problem 3 we stated in Section 2. S-D approach has a good performance in all of image denoising, inpainting and completion. Table 2 shows PSNR of CNN, NMF, convolutive NMF and S-D approaches. S-D approach achieves a PSNR of 25.1 dB in image denoising which is comparable to CNN. NMF and convolutive NMF achieve similar PSNR of 17.4 dB and 18.3 dB, respectively. In image inpainting, S-D achieves a PSNR of 18.9 dB, outperforming NMF and CNN methods of 13.4 dB and 15.3 dB, respectively. This result shows source separation with S-D generalize well to unseen noise than NMF and CNN. In image completion, S-D approach achieves a PSNR of 15.4 dB, outperforming convolutive CNN of 12.2 dB and convolutive NMF of 12.9 dB respectively. Table 2 also shows the decomposition in S-D approach with respect to the number of initializations. With 8 or 32 initializations the performance is 2 dB better than with only 1 initialization. This may result from the fact that the optimization problem in (10) is non-convex. Algorithm 2 is a gradient based method which may lead to the solution being in a local minimum. Repeating Algorithm 2 several times with different initializations and choosing the solution with least reconstruction error shows better performance.

deconv. sep. sep. + deconv.
NMF 15.3 dB 9.4 dB 8.7 dB
convolutive NMF 18.3 dB 14.2 dB 10.1 dB
S-D with 1 init. 17.3 dB 13.7 dB 9.3 dB
S-D with 8 init. 21.9 dB 16.8 dB 11.5 dB
S-D with 32 init. 23.2 dB 18.5 dB 13.2 dB
Table 3: PSNR of image separation and deconvolution with different approaches.

Figure 2: Image separation and deconvolution with NMF and S-D approach.

4.4 Separation and Deconvolution

We evaluate single-channel signal separation and deconvolution with the mixing filters as unknown tensors, which is a very challenging task. In this case both of the mixing tensors and individual sources need to be estimated. Fig. 2 shows a mixture obtained by convolving clean sources with mixing filters followed by summation. In our experiment we set and each mixing filter has a size of . In actual application scenarios the size of mixing filter depends on the task. Fig. 2 shows NMF based separation often leads to unnatural images. The S-D based approach can separate images with high quality and both the sources and mixing filters can be estimated. Fig. 2 shows both estimated sources and mixing filters are learned correctly compared with the ground truth sources and mixing filters. The first column of Table 3 shows the results of image deconvolution without separation where K=1 and is an unknown tensor. S-D achieves a PSNR of 23.2 dB and performs better than NMF and the convolutive NMF of 15.3 dB and 18.3 dB, respectively. The second column of Table 3 shows the results of image separation where are unknown constants and . S-D achieves a PSNR of 18.5 dB and performs better than NMF and convolutive NMF of of 9.4 dB and 14.2 dB, respectively. The third column of Table 3 shows both of source separation and deconvolution where are unknown tensors and . S-D achieves a PSNR of 13.2 dB and outperforms NMF and convolutive NMF of 8.7 dB and 10.1 dB, respectively. S-D with 32 initializations has higher PSNR than 8 initializations and than 1 initialization, which shows the effectiveness of repeating Algorithm 2 several times to solve the non-convex optimization problem in (10).

5 Conclusion

In this paper, we propose a synthesis-decomposition (S-D) approach to solve single-channel signal separation and deconvolution problem. In synthesizing, a generative model for source signals is trained using a generative adversarial network (GAN). In decomposition, both sources and filters are optimized to minimize the reconstruction error. Instead of optimizing sources directly, we optimize over the seeds of a GAN. The proposed S-D approach achieves a PSNR of 18.9 dB and 15.4 dB in image inpainting and completion, outperforming the regression approach CNN and decomposition approach NMF. The S-D approach achieves a PSNR of 13.2 dB in image source separation with deconvolution, outperforming NMF of 8.7 dB. Repeating the decomposition in S-D several times can significantly improve PSNR. In future, we will explore the S-D approach to more source separation and deconvolution problems.

Acknowledgements

This research was supported by EPSRC grant EP/N014111/1 “Making Sense of Sounds” and a Research Scholarship from the China Scholarship Council (CSC) No. 201406150082.

References

  • [Campisi and Egiazarian, 2017] Patrizio Campisi and Karen Egiazarian. Blind image deconvolution: theory and applications. CRC press, 2017.
  • [Cichocki et al., 2006] A. Cichocki, R. Zdunek, and S. Amari. New algorithms for non-negative matrix factorization in applications to blind source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2006.
  • [Cichocki et al., 2009] A. Cichocki, R. Zdunek, A. H. Phan, and S. Amari. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, 2009.
  • [Fan et al., 2018] Z. Fan, Y. Lai, and J. Jang. SVSGAN: Singing voice separation via generative adversarial network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
  • [Goodfellow et al., 2014] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.
  • [Grais et al., 2014] E. M. Grais, M. Sen, and H. Erdogan. Deep neural networks for single channel source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3734–3738, 2014.
  • [Ioffe and Szegedy, 2015] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
  • [Jain and Seung, 2009] V. Jain and S. Seung. Natural image denoising with convolutional networks. In Advances in Neural Information Processing Systems (NIPS), pages 769–776, 2009.
  • [Kingma and Ba, 2015] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), 2015.
  • [Kitamura et al., 2013] D. Kitamura, H. Saruwatari, K. Shikano, K. Kondo, and Y. Takahashi. Music signal separation by supervised nonnegative matrix factorization with basis deformation. In International Conference on Digital Signal Processing (DSP), 2013.
  • [LeCun et al., 1998] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  • [Lee and Seung, 1999] D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 1999.
  • [Levin et al., 2009] Anat Levin, Yair Weiss, Fredo Durand, and William T Freeman. Understanding and evaluating blind deconvolution algorithms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  • [Mijovic et al., 2010] B. Mijovic, M. De Vos, I. Gligorijevic, J. Taelman, and S. Van Huffel. Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Transactions on Biomedical Engineering, 2010.
  • [Radford et al., 2015] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  • [Stoller et al., 2017] D. Stoller, S. Ewert, and S. Dixon. Adversarial semi-supervised audio source separation applied to singing voice extraction. arXiv preprint arXiv:1711.00048, 2017.
  • [Subakan and Smaragdis, 2017] Cem Subakan and Paris Smaragdis. Generative adversarial source separation. arXiv preprint arXiv:1710.10779, 2017.
  • [Xie et al., 2012] J. Xie, L. Xu, and E. Chen. Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems (NIPS), pages 341–349, 2012.
  • [Xu et al., 2014] L. Xu, J. Ren, C. Liu, and J. Jia. Deep convolutional neural network for image deconvolution. In Advances in Neural Information Processing Systems (NIPS), pages 1790–1798, 2014.
  • [Yeh et al., 2016] R. Yeh, C. Chen, T. Y. Lim, M. Hasegawa-Johnson, and M. N. Do. Semantic image inpainting with perceptual and contextual losses. arXiv preprint arXiv:1607.07539, 2016.
  • [Yosinski et al., 2014] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems (NIPS), pages 3320–3328, 2014.
  • [Zhang et al., 2017] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
Comments 5
Request Comment
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
377280
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel
5

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description