ADIS  A robust pursuit algorithm for probabilistic, constrained and nonsquare blind source separation with application to fMRI
Abstract
In this article, we develop an algorithm for probabilistic and constrained projection pursuit. Our algorithm called ADIS (automated decomposition into sources) accepts arbitrary nonlinear contrast functions and constraints from the user and performs nonsquare blind source separation (BSS). In the first stage, we estimate the latent dimensionality using a combination of bootstrap and cross validation techniques. In the second stage, we apply our stateoftheart optimization algorithm to perform BSS. We validate the latent dimensionality estimation procedure via simulations on sources with different kurtosis excess properties. Our optimization algorithm is benchmarked via standard benchmarks from GAMS performance library. We develop two different algorithmic frameworks for improving the quality of local solution for BSS. Our algorithm also outputs extensive convergence diagnostics that validate the convergence to an optimal solution for each extracted component. The quality of extracted sources from ADIS is compared to other well known algorithms such as Fixed Point ICA (FPICA), efficient Fast ICA (EFICA), Joint Approximate Diagonalization (JADE) and others using the ICALAB toolbox for algorithm comparison. In several cases, ADIS outperforms these algorithms. Finally we apply our algorithm to a standard functional MRI dataset as a case study.
1 Introduction
The Generalized Linear Model (GLM) is a popular tool for analyzing functional MRI (fMRI) data. GLM analysis proceeds on a voxel by voxel basis using the same design matrix. One of the difficulties associated with GLM analysis is the construction of an appropriate design matrix. Unmodeled regressors that modulate the fMRI signal in addition to the EVs but are not a part of the design matrix will invalidate the analysis and the associated inferences. Further, these unmodeled regressors might be different in different brain regions and a voxel by voxel analysis with a common design matrix may not be appropriate. These considerations imply that model based analyses make very strong assumptions which are very likely violated in a real fMRI dataset.
Model free analysis on the other hand does not need any postulations as to the shape of the expected response. One such technique that has become popular in recent years, particularly for application to fMRI is Independent Component Analysis (ICA) [9]. See [25] for a survey on ICA. Popular software packages such as FSL ([35]) come with ICA software for doing nonsquare ICA via automatic latent dimensionality estimation also known as Probabilistic Independent Component Analysis (PICA) [2]. Essentially these techniques consists of a data reduction step using PCA followed by the application of standard ICA algorithms. In the signal processing community, many well established algorithms exist for doing ICA, the most popular ones being efficient FastICA [27], Fixed Point ICA (FPICA) [26], Joint Approximate Diagonalization of Cumulant Matrices (JADE) [6], Extended Robust ICA (ERICA) [13] and unbiased ICA (UNICA) [14]. In this article, we will question the validity of solution produced by current ICA codes. Are these solutions really optimal? Can we afford to pay the price for using nonoptimal solutions?
One of the challenges involved in applying ICA to real data is the confidence in the quality of estimated solution. This problem has been recognized before. For example the software package ICASSO [21] uses bootstrapping simulations to run an ICA algorithm multiple times and then clusters the estimated sources to assess reliability. ICASSO takes into account the ”algorithmic” variability and the variability in the original data induced due to sampling, but since it assumes square, noise free mixing it ignores the estimation errors induced due to noisy source mixing. The presence of nonsquare mixing in real data also introduces additional variability due to the unknown latent dimensionality of the sources.
While a bootstrapping strategy can always be used to test the ”sensitivity” of estimated BSS solution from any algorithm, it is critical to have a reliable and verifiable optimization solver to solve the nonconvex BSS problem in the first place.
Currently existing ICA codes perform optimization using techniques that have formulas for updating the unknown variables using a Newton step, gradient descent, natural gradient [3], [1] [18] or similar strategies. In addition they also potentially have a number of heuristic rules for updating the various ”learning” parameters in these algorithms. No convergence diagnostics are used to check the optimality of estimated solution. To the best of our knowledge, such optimization codes are not benchmarked using standard optimization test cases. Such adhoc strategies along with nonverified optimality may severely affect the quality of solution produced by these algorithms and potentially impact the practical conclusions drawn from incorrect results. The popular FastICA algorithm [27] uses an approximate Newton iteration where the approximate Hessian simplification reduces it to a gradient descent algorithm with a fixed step size. However there is no reason to use these approximations. One can use state of the art optimization software to compute near exact step sizes with locally varying Hessian approximations. In fact, it has been shown [39] that these exact step size search significantly increases the estimation efficiency and robustness to initialization in comparison to the fixed update rules of FastICA. The optimization core in our algorithm ADIS efficiently handles local nonconvexity as well as allows for infeasible steps, i.e., steps that violate the constraints leading to a more fuller exploration of parameter space and increasing the likelihood of converging to a global optimum.
Another issue is the modification of the default contrast function in ICA (e.g. Negentropy) to do other types of source extraction. It is also desirable to be able to add additional equality and/or inequality constraints to BSS estimation depending on the application at hand. These issues cannot be addressed using current ICA software. In this paper, we develop our algorithm ADIS and validate its various components. Finally, we compare ADIS to currently existing ICA codes on many standard benchmark datasets from ICALAB.
Our algorithm ADIS (section 2, 3):

Uses a bootstrap simulation/crossvalidation based approach for latent dimensionality estimation (in case of nonsquare BSS) (section 6)

Enables the user to use arbitrary contrast functions and nonlinear constraints for BSS

Produces ”good quality” local solutions using a special multistage framework for BSS (section 3)

Produces extensive convergence diagnostics for each extracted component to validate the ”optimality” of the extracted source (section 4)
We perform validation of each component of ADIS as follows:

Validation of the latent dimensionality estimation procedure using simulations on sources with different statistical properties (section 6)

Validation of our optimization core using standard benchmarks from the GAMS performance library (http://www.gamsworld.org/performance, [15]) (section 15)

Finally we apply ADIS to real fMRI data as a case study (section 8)
2 Probabilistic Projection Pursuit
Projection Pursuit is a standard statistical technique for data analysis [17],[16], [22]. In this article we generalize projection pursuit in a probabilistic framework similar to the one proposed by Beckmann et. al. ([2]). We consider data generation at points via a noisy mixing process as follows:
(1) 
where , and , . We assume that to achieve a compact representation of the observed data.
The problem is to estimate automatically , , and given observations . This problem is also called a blind source separation (BSS) problem since , and are all unknown. The inclusion of noise term makes the problem into a probabilistic one.
First consider the case when for all . Section 5.1 shows how to handle the case .
If be a vector of ones. Then
(2) 
Let
(3) 
Then we can write
(4) 
where Since the scaling of and is arbitrary we assume without loss of generality that
(5) 
No other assumptions are made about the joint source density other than the ones in 5. From equations 4 and 5:
(6) 
(7) 
Using the law of large numbers (LLN) we can approximate the covariance matrix as:
(8) 
In 8, with the singular values and a matrix containing the corresponding singular vectors of the covariance matrix. The estimate of is easily obtained from 6.
(9) 
Without knowing the true source densities it is not possible to estimate the maximum likelihood estimates of and . However one can find estimates that try to satisfy the second order condition 7 as closely as possible by minimizing the Frobenius norm:
It is easily shown that if is a submatrix of containing the first singular vectors corresponding to the largest singular values and is the submatrix of then the solution to 2 is given by:
(10) 
and
(11) 
where is an arbitrary orthogonal matrix (). Given and , the least squares estimate of are given by:
(12) 
where
(13) 
3 Problems solved in Projection Pursuit
In Projection Pursuit (PP), we parameterize the orthogonal matrix as
(14) 
where . The th PP projection is defined as for each point :
(15) 
In vector form we can write the th projection as:
(16) 
where is a vector and is a matrix. We define an objective function and optimize for the projection vectors. Mathematically we solve the optimization problem:
Here is function accounting for supplementary information that we want to include in the optimization problem. is all supplementary information of interest to the optimization problem. For example, in spatial problems could be a spatial location of points and could be a function accounting for spatial smoothness (such as a Markov random field). Many such problem specific functions can be proposed based on user objectives. There could also be additional user defined equality and inequality constraints, for example:
(17) 
(18) 
Thus in general, we get a constrained projection pursuit problem. Constraints 17 and 18 can be arbitrary nonlinear constraints not necessarily parameterized by .
3.1 Separability
The optimization problem in 3 must involve a joint optimization of the vectors in general. In many important practical cases (such as for example when is the joint negentropy index) the objective function has a separable structure [26] such that
(19) 
for some function , then the optimization can proceed sequentially where at each stage we solve
(20) 
At each stage is a unit vector orthogonal to the previously calculated vectors i.e
(21) 
3.2 Multistage Optimization stragegy
In this section we describe a 2 or 3 stage strategy which we have found via experience to converge to good ”local” solutions. It is well known that if an optimization problem that has multiple optima then the ”local” solution found by an algorithm is strongly dependent on how the algorithm is initialized. For applications such as BSS, even though an algorithm finds a ”local” solution as indicated by convergence diagnostics, it may not be a ”global” solution. In order to increase our chances of finding a solution that is global, we propose a random sampling strategy for initialilzation of an optimization algorithm first.
3.2.1 Stage 0: Search for good seed points
For concreteness, suppose are the optimal solutions found previously and suppose we are tying to find . Let be a matrix that is the orthogonal complement of as determined by say GramSchmidt orthogonalization. Then
(22) 

Generate vectors in whose elements are drawn from a uniform distribution on .

Standardize each vector to have unit norm to get the set of vectors where each .
For each we define seed points as
(23) 
It is easy to see that and . Thus satisfies the constraints in 21. We then compute the objective function at each of these points and choose points that give the highest objective function values as candidate seed points for the next step.
3.2.2 Stage 1: Computing local optimum at R best points from Stage 1
In this stage, we compute the local solutions of the optimization problem 20 starting from and choose the best local solution that has the highest function value.
(24) 
ADIS uses and as the defaults.
3.2.3 Stage 2: Joint optimization with initialization from Stage 2
After Stage 1 has been applied from we have an estimate of the solution vectors . In this stage we solve the joint optimization problem 19 subject to the single joint constraint:
(25) 
where if and otherwise. We initialize the algorithm with the solution from Stage 1. The algorithm converges to a local joint solution only in a few iterations.
4 Optimization Algorithm
For flexible and powerful BSS algorithms, a primary requirement is a fast and robust optimization algorithm that can handle nonlinear user defined constraints as well as handle nonconvexity in the objective function or the constraints. Furthermore, extensive convergence diagnostics should be a standard output of the optimization process to ensure convergence to a local solution. The optimization core in ADIS (coded in MATLAB, www.mathworks.com
) uses a modified augmented lagrangian algorithm (inspired by the implementation in LANCELOT package [10], [12]) to solve equality constrained problems generated in constrained nonconvex BSS problems. Inequality constraints are handled by first transforming them to equality constraints via slack variables and solving the resulting bound constrained optimization problem. Some features of interest are as follows:

A Trust region based approach [30] is used to generate search directions at each step (for both equality constrained and inequality constrained problems).

For equality constraints only, the subproblems above are solved using a conjugate gradient approach (NewtonCG Steihaug) [36] that is fast and accurate even for large problems and can handle both positive definite and indefinite hessian approximations. If both equality and inequality constraints are present then we solve the trust region problem with a nonlinear gradient projection technique [5] followed by subspace optimization using NewtonCGSteihaug. Our algorithm allows for infeasible iterates i.e., those that do not satisfy the problem constraints during optimization. This allows for a fuller exploration of parameter space and increases the likelihood of converging to a global optimum.

A symmetric rank 1 (SR1) quasiNewton approximation to the hessian [11] is used which is known to generate good hessian approximations for both convex and nonconvex problems. As suggested in [33] we do the update also on the rejected steps to gather curvature information about the function. We provide options for BFGS [4] especially for convex problems and an option for preconditioning the CG iterations. We also implement limited memory variants of SR1 and BFGS for large problems.

Our algorithm accepts vectorized constraints so that multiple constraints can be programmed simultaneously. Only gradient information is required. Hessian information is optional but not required. Optionally, it is easy to interface our code with INTLAB software package [34] for automatic differentiation in which case the user only codes the function and constraints and the gradients/hessians are generated automatically.
We tested the performance of our algorithm using standard optimization benchmarks from the GAMS performance benchmark problems (http://www.gamsworld.org/performance, [15]). The appendix shows some sample benchmarks as well as provides more technical details of the algorithm.
4.1 Convergence Diagnostics
It is critical to verify that the optimization algorithm has found a local solution by checking convergence diagnostics. These diagnostics help us determine if the KarushKuhnTucker (KKT) [33] necessary conditions for optimality have been satisfied or not. Profile plots are plots of a diagnostic measure versus iteration number. We propose the following checks for all BSS algorithms:

Convergence to a point satisfying necessary conditions for optimality can be accessed by looking at profile plots for

Objective function

Optimality error (such as ”gradient of the lagrangian” for equality constraints or ”KKT optimality checks” for general constraints)

Feasibility error (checking constraint satisfaction)

Lagrange multipliers

Other parameters in the algorithm (such as a barrier or penalty parameter)
The user should at the very least check these diagnostic plots to make sure convergence is attained. If possible the second order sufficient conditions for optimality should also be verified at the solution point using Hessian information.


The algorithm used for optimization should flag an error and stop running in case convergence is not attained at any intermediate stage. This prevents the user from getting access to incorrect results.
These convergence diagnostics are a standard feature of ADIS. Any solution returned by ADIS is guaranteed to be optimal.
5 Statistics on estimated sources
In this section we develop equations that enable us to apply ADIS to a real dataset and make inferences from extracted sources.
Once are estimated we can compute their variance using the GLM estimate
(26) 
where is the estimated variance at point
(27) 
We can create maps of contrasts of interest using the above equations. We also estimate the relative variance (RV) contribution at point using the component as follows:
If then
(28) 
Inspection of these voxelwise variance explained maps is very useful in searching through the estimated sources for application based relevance.
5.1 Correcting for Autocorrelation
Extending to the case of autocorrelated noise is straightforward. When then we proceed in an iterative fashion as follows:

Set and estimate , and compute the pointwise residual
(29) 
Compute autocorrelation in and prewhiten the data using the estimated correlation matrix to get . Run the PP algorithm on until does not change from one iteration to the next in an average sense. Various prewhitening schemes such as models or nonparametric approaches can be used. ADIS uses the nonparametric approach proposed in [38] to do prewhitening. By default ADIS will not do iterative prewhitening unless explicitly specified by the user.
6 Latent dimensionality estimation
Sophisticated bayesian strategies exist for estimating the latent dimensionality in the case of Gaussian sources [29]. In this section we develop a latent dimensionality estimation procedure that works very well both with Gaussian and nonGaussian sources using a bootstrap/crossvalidation procedure.
The latent dimensionality is estimated in two steps. We first estimate a lower bound on the latent dimensionality (6.1) followed by a cross validation analysis to refine the lower bound (6.2). In subsection 6.3 we validate this approach via extensive numerical simulations.
6.1 Stage 1  Estimating the lower bound
Let
(30) 
and suppose are the eigenvalues of .

Randomly permute each column of the matrix to get the matrix .

Compute the eigenvalues of such that

Choose to be the largest value of such that .
First we destroy the systematic correlations between the columns of via random permutations of each column. Then we estimate the singular values of the permuted covariance matrix and compare these with the singular values of the unpermuted covariance matrix. Only those singular values that exceed the ones from random permutation are deemed significant and the lower bound on latent dimensionality is estimated to be the cardinality of those singular values.
If are permutation matrices then we can write:
(31) 
Suppose
(32) 
is the singular value decomposition of A with
Let be the true latent dimensionality. Then for large it can be shown (and verified by simulation) that:
(33) 
and
(34) 
Thus the nonzero eigenvalues of satisfy , the noise variance. Hence the largest index such that is a lower bound for the latent dimensionality .
6.2 Stage 2  Cross Validation
Suppose the true latent dimensionality is assumed to be . Then for large the eigenvalues will follow equation (33) i.e, the eigenvalues {} should be well approximated by a constant . We estimate the quality of this model using leave one out cross validation [20]. Let be the mean of the eigenvalues from to excluding the index .
(35) 
The leave one out cross validation error assuming the true latent dimensionality to be at point is given by:
(36) 
The mean cross validation error and its variance can be estimated from these pointwise values as follows:
(37) 
(38) 
When is smaller than then both and will be large and when is greater than then both and will be small. Since the eigenvalues are expected to remain constant beyond the change in will be small beyond . We define the following index for a given value of quantifying the change in cross validation error from to .
(39) 
If is the true latent dimensionality then will tend to have a maximum at since this is the point which will show the largest change in cross validation error in going from to . We thus propose the estimate
(40) 
where is a lower bound on calculated from stage 1.
The estimation of uses a smaller number of points when gets very close to . Thus the estimate becomes unstable when is within a few time points of . In order to robustify our estimate against this instability we define the cumulative maximum index function which calculates the index of maximum of from .
(41) 
Then we count the number of times that a maximum is detected at
(42) 
and define the estimate of dimensionality to be
(43) 
6.3 Validation of the approach
To test this latent dimensionality algorithm we generate data as per equation (1). Details of the simulation are as follows:

The true mixing matrix was chosen to be a matrix where the elements were drawn from a uniform distribution in (0,1). This random matrix was then scaled so that its minimum singular value .

The sources in (1) were generated from three different types of distributions based on their kurtosis ”excess” values . The chosen distributions were Gaussian (), Uniform () and Gamma ().

The ratio of noise standard deviation in (1), to the minimum singular value of , was varied between .

The ratio of the true latent dimensionality to the dimensionality of the observed data was varied between 0.1,0.2,…,0.5.

This process was repeated 20 times for each combination of source type, ratio and ratio.

For each individual simulation we estimated the latent dimensionality based on the 2 stage estimation strategy described above.
We chose and as fixed parameters of the simulation. Simulations show that this 2 stage estimation is almost unbiased for both Gaussian and nonGaussian embedded sources for all values of the ratios and . Results are shown in figure 1(a)  1(c).
7 Benchmarking: Comparison with other BSS algorithms
Choosing the negative entropy index as our objective function and without imposing any additional constraints, our algorithm attempts to estimate sources that are independent. To test and compare our algorithm with others, we used an approximation to negative entropy as proposed in [26] (see appendix for details).
ICALAB [8], [7] (available from http://www.bsp.brain.riken.go.jp/ICALAB/) is a Matlab package for comparing algorithms for BSS. We used ICALAB to compare the performance of our algorithm with other standard BSS algorithms such as FJADE [6], FPICA [26], [24], EFICA [27], [28] , ERICA [13] and UNICA [14] which use higher order statistics to separate sources.
The quality of source extraction is measured using the Source to Interferences Ratio (SIR) [37] (of the estimated mixing matrix) which measures the ratio of the energy of the estimated source projected onto the true source to the energy of the estimated source projected onto the other sources. Higher values of (SIR) indicate better performance. Please see the appendix for details. ICALAB also comes with standard benchmarking datasets (http://www.bsp.brain.riken.jp/ICALAB/ICALABSignalProc/).
A Monte Carlo analysis was performed using ICALAB by generating uniformly distributed random matrices and creating a mixed source dataset for a given set of sources .
(44) 
To get a baseline measure of performance for each algorithm, we use square mixing without additional noise for the simulation. Each algorithm was then run on this mixed data set. This process was repeated times for each dataset using a new mixing matrix every time. In ICALAB, most of the algorithms are given default parameters that are tuned optimum values for typical data. As suggested in ICALAB, we use these default algorithmic parameters for benchmarking purposes.
ADIS is able to perform nonsquare BSS in the presence of noise. However, since the other algorithms in our benchmarking test have not been designed to do this, we think its unfair to compare nonsquare ability of ADIS with other algorithms.
The 13 benchmarking datasets and their short descriptions are given in the appendix. In order to evaluate the effect of different types of mixing matrices, we ran additional Monte Carlo simulations when was chosen to be one of the following (a) Random sparse (b) Random bipolar (c) Symmetric random (d) Ill conditioned random (e) Hilbert (f) Toeplitz (g) Hankel (h) Orthogonal (i) Nonnegative symmetric (j) Bipolar symmetric (k) Skew symmetric.
Algorithm, [dB] and [dB]  
Dataset  FJADE  FPICA  EFICA  ERICA  UNICA  ADIS  Stage 1  
nband5  M  
S  
10halo  M  
S  
GnBand  M  
S  
acspeech16  M  
S  
ABio5  M  
S  
ACsparse10  M  
S  
25SpeakersHALO 
M  
S  
VSparserand10 
M  
S  
sincpos10 
M  
S  
X5smooth 
M  
S  
Speech20 
M  
S  
X10randsparse 
M  
S  
64soundstd 
M  x  
S  x  

The results are shown in figures 2(a)  4(d) and table 1. The key performance feature is the SIR index [37] (see appendix for definition), the higher the value the better. Another important feature is the variability of SIR over 100 mixtures each generated using a different mixing matrix but containing the same underlying sources. Ideally an algorithm should be robust enough to converge to the same solution regardless of variation in the mixing matrix. The standard deviation of SIR captures this variability, the lower the variability of SIR the better. Additional results showing simulation results with different types of mixing matrices are shown in 7(a)  9(d).
ADIS perfomed better than all other algorithms in almost all cases both in terms of the mean SIR index as well as the standard deviation of the mean SIR, which measures the robustness and convergence to the same solution. ADIS also produces extensive convergence diagnostics a sample of which is shown in figure 6(a). These diagnostics guarantee convergence and improve confidence in the estimated sources.
An illustration of performance improvement using ADIS (Stage 1 + 2) over Stage 1 is shown fir the ACsparse10 dataset in figure 5(b). The corresponding convergence diagnostics are shown in figure 6(b). ADIS Stage 2 performs better than ADIS Stage 1 but we did not observe as dramatic an improvement as seen for ACsparse10. Other ICA algorithms were outperformed using only ADIS Stage 1.
All experiments were performed on a computer with an Intel Xeon (TM) processor (3.4 GHz) and 4GB of RAM. The runtimes of ADIS (Stage 1) were comparable to those of FPICA, ERICA and UNICA. We found EFICA and JADE to be faster than other algorithms in general.
8 Case Study using real fMRI data
8.1 fMRI case study
To demonstrate how ADIS performs on a real dataset, we used the ”FSL Evaluation and Example Data Suite” (FEEDS) from FMRIB Image Analysis Group, Oxford University. The URL for this data suite is: http://www.fmrib.ox.ac.uk/fsl/feeds/doc/index.html One of the datasets in the example suite contains an audio visual experiment with two explanatory variables, the visual stimulus (30s off, 30s on) and an auditory stimulus (45s off, 45s on). Analysis was carried out using FEAT (FMRI Expert Analysis Tool) Version 5.4, part of FSL (FMRIB’s Software Library).
www.fmrib.ox.ac.uk/fsl
The following prestatistics processing was applied; motion correction using MCFLIRT [Jenkinson 2002]; nonbrain removal using BET [Smith 2002]; spatial smoothing using a Gaussian kernel of FWHM 5mm; meanbased intensity normalisation of all volumes by the same factor; highpass temporal filtering (Gaussianweighted LSF straight line fitting, with sigma=50.0s).
ADIS estimated a latent dimensionality of . The source components activating the auditory and visual cortex were identified by inspecting the estimated voxelwise variance explained map. The results are shown in figures 1112.
9 Conclusion
We implemented ADIS, an algorithm for probabilistic, constrained and nonsquare projection pursuit. We validated all aspects of ADIS including the latent dimensionality estimation procedure and its optimization core. When compared to other algorithms using standard benchmarking datasets using ICALAB, we find our algorithm outperforms other standard algorithms such as FastICA, FPICA, JADE, ERICA and UNICA in terms of both robustness and separation quality. Our algorithm also guarantees ”optimality” for each blind source via extensive convergence diagnostics and enables the user to use arbitrary contrast function and constraints for BSS. We hope it will be useful as a general BSS tool for the signal processing and fMRI community.
10 Appendix
11 Negentropy Index
Given a random variable , the negative entropy a measure of nonGaussianity. It is easy to show that imposition of independence on sources in BSS is equivalent to maximization of negentropy.
Robust approximations to negative entropy were developed in [23]. If is a nonquadratic, nonlinear function and is a Gaussian random variable of the same variance as then the negentropy measure is given as [26]
(45) 
In this paper, we used the following function [23] for :
(46) 
where is the hyperbolic cosine function
(47) 
12 Sources to Interferences Ratio (SIR)
The SIR ratio is defined in [37]. We give here a brief summary of the key equations from that paper. Let and let be the orthogonal projector onto the subspace spanned by . If are the true sources and if are the corresponding estimated values then define:
(48) 
(49) 
The purity of source separation is measured using the SIR performance index defined as follows:
(50) 
13 Benchmarking datasets
The 13 benchmarking datasets and their short descriptions are as follows:
(http://www.bsp.brain.riken.jp/ICALAB/ICALABSignalProc/) :

nband5  contains 5 narrow band sources. This is a rather ”easy” benchmark for second order separation algorithms but apprently presents challenges for higher order algorithms.

10halo  contains 10 speech signals that are highly correlated (all 10 speakers say the same sentence).

GnBand  contains 5 fourth order colored sources with a distribution close to Gaussian. This is a rather ”difficult” benchmark.

acspeech16  contains 16 typical speech signals which have a temporal structure but are not precisely independent

ABio5  contains 5 typical biological sources

ACsparse10  contains 10 sparse (smooth bellshape) sources that are approximately independent. The SOS blind source separation algorithms fail to separate such sources.

25SpeakersHALO  25 highly correlated speech signals

Vsparserand10  very sparse random signals

ACsincpos10  positive sparse signals

X5smooth  smooth signals

Speech20  20 speech/music sources

X10randsparse  random sparse signals

64soundsstd  a variety of sound sources (64 in total)
14 Details on Optimization Algorithm
Our optimization algorithm solves the general problem:
(51)  
(52)  
(53) 
where .
We convert the inequality constraints into equality constraints via slack variables as follows:
(54)  
(55) 
Thus the optimization problem becomes:
(56)  
(57)  
(58)  
(59) 
This problem is now an equality constrained problem where the inequalities have been replaced by the bound constraints on the slack variables. Thus it suffices to consider equality constrained problems with bounds on independent variables as follows:
(60)  
(61)  
(62) 
where .
Our code uses a trust region based augmented lagrangian approach to solve these bound constrained problems following closely the LANCELOT software package [12], [10]. The augmented lagrangian function for the above problem is defined as:
(63) 
At each outer iteration , given current values of and we solve the subproblem:
(64)  
(65) 
If is the projection operator defined as
(66) 
then the KarushKuhnTucker (KKT) optimality condition for 64 is given as [10]:
(67) 
The outer iteration code is given in Framework 1. Note that the penalty parameter is updated based on a feasibility monitoring strategy that allows for a decrease in if sufficient accuracy is not achieved in solving the subproblem 64.
At each inner iteration we form a quadratic approximation to the augmented lagrangian and approximately solve the inequality constrained quadratic subproblem:
(68)  
(69)  
(70) 
The inner iteration code uses nonlinear gradient projection [5] followed by NewtonCGSteihaug conjugate gradient iterations [36]. QuasiNewton updates are performed using either SR1 [11] (recommended for nonconvex functions) or BFGS [4] (recommended for convex functions). For very large problems, we switch to the limited memory variants [32] of these quasiNewton approximations. The algorithm details are given in Framework 2. The trust region update code is based on a standard progress monitoring strategy [33] and is given in Framework 3.
(71)  
(72)  
(73) 
(74)  
(75)  
(76) 