A new algorithm for complex non orthogonal joint diagonalization based on Shear and Givens rotations
Abstract
This paper introduces a new algorithm to approximate non orthogonal joint diagonalization (NOJD) of a set of complex matrices. This algorithm is based on the Frobenius norm formulation of the JD problem and takes advantage from combining Givens and Shear rotations to attempt the approximate joint diagonalization (JD). It represents a non trivial generalization of the JDi (Joint Diagonalization) algorithm (Souloumiac 2009) to the complex case. The JDi is first slightly modified then generalized to the CJDi (i.e. Complex JDi) using complex to real matrix transformation. Also, since several methods exist already in the literature, we propose herein a brief overview of existing NOJD algorithms then we provide an extensive comparative study to illustrate the effectiveness and stability of the CJDi w.r.t. various system parameters and application contexts.
Index Terms— Non orthogonal joint diagonalization, Performance comparison of NOJD algorithm, Givens and Shear rotations.
A new algorithm for complex non orthogonal joint diagonalization based on Shear and Givens rotations
Mesloub Ammar Karim AbedMeraim Adel Belouchrani 
Ecole Militaire Polytechnique, BP 17 Bordj El Bahri, Algiers, Algeria 
Polytech’ Orléans/PRISME Lab., Univ. Orléans 12 rue de Blois, 45067 Orléans, France 
Ecole Nationale Polytechnique, 10 Avenue Hassen Badi, 16200 Algiers, Algeria 
mesloub.a@gmail.com, karim.abedmeraim@univorleans.fr, adel.belouchrani@enp.edu.dz 
1 Introduction
Joint diagonalization problem and its related algorithms are found in various applications, especially in blind source separation (BSS) and independent component analysis (ICA). In such problems, it is desired to diagonalize simultaneously a set of square matrices. These matrices can be covariance matrices estimated on different time windows [1], intercorrelation matrices with time shifts [2], fourth or higher order cumulant slice matrices [3][4] or spatial timefrequency matrices [5][6].
Mathematically, the joint diagonalization problem can be stated as follows: Given a set of square matrices , find a matrix V such that the transformed matrices are as diagonal as possible.
In the context of BSS, are complex matrices sharing the same structure defined by where
are diagonal matrices and A is an unknown mixing matrix. The problem consists of finding the diagonalizing matrix V that left inverts A and transforms into diagonal matrices.
Various algorithms have been developed to solve the JD problem. These algorithms can be classified in two classes, orthogonal joint diagonalization (OJD) and non orthogonal JD (NOJD). The first class imposes V to be orthogonal by transforming the JD problem into an OJD problem using the whitening step [2][7]. This step can introduce errors which might reduce the diagonalization performance [8][9]. The most popular algorithm for OJD is JADE [10] which is a Jacobilike algorithm based on Givens rotations.
The NOJD class treats the problem without any whitening step. Among the first NOJD algorithms, one can cite the Iterative Decorrelation Algorithm (IDA) developed for complex valued NOJD in [11][12] and the ACDC (Alternating ColumnsDiagonal Centers) given in [13]. The latter suffers from slow linear convergence performance. Many other algorithms have been developed by considering specific criteria or constraints in order to avoid trivial and degenerate solutions [14][15]. These algorithms can be listed as follow: QDiag [16] (Quadratic Diagonalization algorithm) developed by Vollgraf and Obermayer where the JD criterion is rearranged as a quadratic cost function; FAJD [17] (Fast Approximative Joint Diagonalization) developed by Li and Zhang where the diagonalizing matrix is estimated column by column; UWEDGE [14] (UnWeighted Exhaustive joint Diagonalization with Gauss itErations) developed by Tichavsky and Yeredor where numerical optimization is used to get the JD solution; JUST [18] (Joint Unitary and Shear Transformations) developed by Iferroudjene, AbedMeraim and Belouchrani where the algebraic joint diagonalization is considered; CVFFDiag [19][20] (Complex Valued Fast Frobenius Diagonalization) developed by Xu, Feng and Zheng where first order of Taylor expansion is used to minimize the JD criterion; ALS [21] (Alternating Least Squares) developed by Trainini and Moreau where the mixing and diagonal matrices are estimated alternatively by using least squares criterion and LUCJD [22][23] (LU decomposition for Complex Joint Diagonalization) developed by Wang, Gong and Lin where the diagonalizing matrix is estimated by LU decomposition.
In this paper, we generalize the JDi algorithm developed by Souloumiac in [24] for real joint diagonalization by using Shear and Givens rotations in the complex case^{1}^{1}1A first attempt to generalize the JDi has been given in [25]. Unfortunately, the latter algorithm has been found to diverge in most simulation contexts considered in section 6, and hence, it has been omitted in our comparative study.. We transform the considered complex matrices to real symmetric ones allowing us to apply the JDi algorithm. At the convergence, the diagonalizing (mixing) matrix is retrieved from the real diagonalizing one by taking into account the particular structure of the latter (see subsection 4.4 for more details). The main drawback of this algorithm’s version is that it does not take into consideration the particular structure of the real valued diagonalizing matrix along the iterations which results in a slight performance loss. To avoid this drawback, we propose an improved version which uses explicitly the complex matrices using special structure of Shear and Givens rotations to increase both the convergence rate and the estimation accuracy, while reducing the overall computational cost. Another contribution of this paper is a comparative study of different non orthogonal joint diagonalization algorithms with respect to their robustness in severe JD conditions: i.e. large dimensional matrices, noisy matrices, ill conditioned matrices and large valued MOU (modulus of uniqueness [24][26][27]).
The rest of the paper is organized as follows. In section 2, the problem formulation, mathematical notations and paper’s main objectives are stated. Section 3 introduces a brief overview of major NOJD algorithms and existing JD criteria. Section 4 presents the basic generalization of JDi algorithm to the complex case while section 5 is dedicated to the proposed method’s developments. In particular, we present in this section a complex implementation of our method with a computational cost comparison with existing NOJD algorithms. Simulation based performance assessment for exact and approximate joint diagonalizable matrices are provided in section 6. Section 7 is dedicated to the concluding remarks.
2 Problem formulation
Consider a set of square matrices, sharing the following decomposition:
(1) 
where are diagonal matrices and A is the unknown square complex nondefective matrix known as a mixing matrix. denotes the transpose conjugate of A.
The problem of joint diagonalization consists of estimating matrices A and , given the observed matrices , . Equivalently, the JD problem consists of finding the transformation V such that matrices are diagonal.
Note that JD decomposition given in equation (1) is not unique. Indeed, if is a solution, then for any permutation matrix and invertible diagonal matrix , , is also a solution. Fortunately, in most practical applications these indeterminacies are inherent and do not affect the final result of the considered problem.
In practice, matrices are given by some sample averaged statistics that are corrupted by estimation errors due to noise and finite sample size effects. Thus, they are only ”approximately” jointly diagonalizable matrices and can be rewritten as:
(2) 
where are perturbation (noise) matrices.
3 Review of major NOJD algorithms
In this section, we present a brief overview of NOJD algorithms. First, the different JD criteria are presented before giving a brief description for each considered algorithm.
3.1 Joint diagonalization criteria
In this subsection, we present different criteria considered for JD problem. The first one is given in [13] and expressed as follow:
(3) 
where refers to the Frobenius norm, are some positive weights and are the searched mixing matrix and diagonal matrices, respectively. This cost function is called in [28], the Direct LeastSquares (DLS) criterion as it takes into account the mixing matrix rather than the diagonalizing one.
Unlike the previous JD criterion, the second one is called the Indirect Least Squares (ILS) criterion and takes into account the diagonalizing matrix V. The latter is expressed as [28]:
(4) 
This criterion is widely used in numerous algorithms [2][14][26]. However, the minimization of (4) might lead to undesired solutions e.g. trivial solution or degenerate solutions where . Consequently, the algorithms based on the minimization of (4) introduce a constraint to avoid these undesirable solutions. In [2], the estimated mixing matrix (resp. the diagonalizing matrix) have to be unitary so that undesired solutions are avoided. In our developed algorithm, the diagonalizing matrix is estimated as a product of Givens and Shear rotations where undesired solutions are excluded implicitly. In [22], the diagonalizing matrix V is estimated in the form of LU (or LQ) factorization where L and U are lower and upper triangular matrices with ones at the diagonals and Q is a unitary matrix. These two factorizations impose a unit valued determinant for the diagonalizing matrix. Previous factorizations (Givens , Givens and shear, LU and LQ factorizations) represent the different constraints used to avoid undesired solutions.
In [17], the undesired solutions are excluded by considering the penalization term so that the JD criterion becomes:
(5) 
Another criterion has been introduced in [14] taking into account two matrices (V,A) which are the diagonalizing matrix and its residual mixing one, respectively. It is expressed as:
(6) 
The previous criterion fuses the direct and indirect forms by relaxing the dependency between A and V and it is known as least squares criterion. In [1], another criterion is developed for positive definite target matrices as follow:
(7) 
This criterion can not be applied for non positive target matrices, thus some real life applications can not be treated by minimizing (7). In [23], a scale invariant criterion in V is introduced as:
(8) 
Note that for any diagonal matrix D. This criterion is used only for real JD.
3.2 NOJD Algorithms
We describe herein the basic principles of each of major NOJD algorithms considered in our comparative study given in section 6.
ACDC [13]:
This algorithm is developed by Yeredor in 2002. It proceeds by minimizing criterion given in (3) by alternating two steps: The first one is the AC (Alternating Columns) step and the second one is the DC (Diagonal Centers) step. For AC step, only one column in the mixing matrix is updated by minimizing the cited criterion while the other parameters are kept fixed. For the DC step, the diagonal matrices entries are estimated by keeping the mixing matrix fixed.
Note that, the DC phase is followed by several AC phases in order to guarantee the algorithm’s convergence.
FAJD [17]:
This algorithm is developed by Li and Zhang in 2007. It estimates the diagonalizing matrix by minimizing the modified indirect least squares criterion given in (5). At each iteration, the algorithm updates one column of the diagonalizing matrix while keeping the others fixed. This process is repeated until reaching the convergence state. Note that the value assigned to in [17] is one.
QDiag [16]:
This algorithm is developed by Vollgraf and Obermayer in 2006.
It minimizes the indirect least squares criterion given in (4). At each iteration, the algorithm updates one column of the diagonalizing matrix while the others are kept fixed. This step is repeated until reaching the convergence state. Note that there is no update step for target matrices and the condition to avoid undesired solutions is implicitly included by normalizing the diagonalizing matrix columns.
UWEDGE [14]:
This algorithm is developed by Tichavsky and Yeredor in 2008. It minimizes the criterion given in (6) and computes in alternative way the residual mixing and diagonalizing matrices. At first, the diagonalizing matrix V is initialized as ^{2}^{2}2This initial value of V is known as the whitening matrix in BSS context and assumes that is positive definite. Otherwise, other initializations can be considered.. This value is introduced in the considered criterion to find the mixing matrix A. The minimization w.r.t. A is achieved by using numerical Gauss iterations. Once an estimate of the mixing matrix is obtained, the diagonalizing matrix is updated as ( is the iteration index). The previous process is repeated until the convergence is reached.
JUST [18]:
This algorithm is developed by Iferroudjene, AbedMeraim and in Belouchrani 2009. It is applied to target matrices sharing the algebraic joint diagonalization structure . Hence in our context, given the target matrices sharing the decomposition described in (1). The latter are transformed to another set of new target matrices sharing the algebraic joint diagonalization structure by right multiplying them by the inverted first target matrix. Once the new set of target matrices is obtained, JUST algorithm estimates the diagonalizing matrix by successive Shear and Givens rotations minimizing criterion^{3}^{3}3The considered in JUST can be expressed as in (4) by replacing by ..
CV FFDiag [19]:
This algorithm’s idea is given in [11] and it is formulated as an algorithm for real NOJD in [20]. Xu, Feng and Zheng generalized the latter to the complex NOJD in [19]. It minimizes criterion and estimates the diagonalizing matrix V in an iterative scheme using the following form where is a matrix having null diagonal elements. The latter is estimated in each iteration by optimizing the first order Taylor expansion of .
LUCJD [22]:
This algorithm is developed by Wang, Gong and Lin in 2012. It considers criterion as the CVFFDiag. It decomposes the mixing matrix in its LU form where L and U are lower and upper triangular matrices with diagonal entries equal to one. This algorithm is developed in [23] for real NOJD and generalized to complex case in [22]. Matrices L and U are optimized in alternating way by minimizing . Note that the entries of L and U are updated one by one (keeping the other entries fixed).
ALS [21]:
This algorithm is developed by Trainini and Moreau in 2011. It minimizes criterion as the ACDC and relaxes the relationship between and A. The algorithm is developed by considering three steps, the first one estimates diagonal matrices by keeping A and fixed. The second one uses the obtained and fixed to compute the mixing matrix A and the last step uses the obtained and A from the first and second steps, respectively, to estimate . These steps are realized for each iteration and repeated until the convergence state is reached.
Note that other algorithms exist in the literature, developed for special cases, but are not considered in our study. For example, in [1] the developed algorithm is applied only for positive definite matrices. In [28], the developed algorithm is a direct method (not iterative) which makes it more sensitive to difficult JD problem (the algorithm is not efficient when the number of matrices is less than the matrix dimension).
4 Basic generalization of JDi algorithm
We introduce herein the basic generalization of JDi algorithm, given in [24], from real to complex case. First, the basic idea is to transform hermitian matrices obtained in (9) to real symmetric ones given by (4.1) to which, we apply the JDi algorithm. Then in section 5, by modifying the first approach, we develop the CJDi algorithm which uses the hermitian matrices directly.
4.1 Complex to real matrix transformation
The first idea of our approach consists of transforming the original problem of complex matrix joint diagonalization into JD of real symmetric matrices which allows us to apply JDi algorithm. Hence, we transform the complex matrices into hermitian matrices according to:
(9) 
where and refer to the real part and imaginary part of a complex entity, respectively. Now, the hermitian matrices are transformed into real matrices according to:
Based on (9) and (4.1), one can easily see that matrices share the appropriate JD structure, i.e.
(11) 
where . This property allows us to apply JDi algorithm to achieve the desired joint diagonalization^{4}^{4}4Note that in (11), the diagonal entries of appear twice leading to an extra indeterminacy that should be taken into consideration when solving the complex JD problem (see lemma 1 in subsection 4.4)..
Like in the JDi method, the real diagonalizing matrix associated to the complex one V, is decomposed as a product of generalized rotation matrices according to:
(12) 
where represents the sweeps (iterations) number and is the generalized rotation matrix given by:
(13) 
and being the elementary Givens and Shear rotation matrices which are equal to the identity matrix except for their , , , and entries given by:
(14) 
(15) 
where and are the Givens angle and the Shear parameter, respectively. Based on these elementary transformations, we express next the transformed matrices as well as the JD criterion given in (4).
4.2 Matrix transformations
As shown in subsection 4.1, the set of complex matrices is transformed into a set of real symmetric ones, , to which all Givens and Shear rotations are applied. We denote by the updated matrices when using the elementary rotations, i.e.:
(16) 
Note that only the and rows and columns of are transformed so that entries are twice affected by the latter transformation. These entries can be expressed as:
(17) 
where
(18) 
Souloumiac in [24] introduces a simplified JD criterion which is the sum of squares of entries. This JD criterion denoted can be expressed by using (17) as:
(19) 
where
(20) 
and
(21) 
The results in [24] show that by minimizing this simplified criterion, joint diagonalization can be achieved in few iterations (see [24] for more details).
4.3 Direct generalization of JDi
In JDi algorithm, JD criterion given in (19) is minimized under the hyperbolic normalization as follows:
(22) 
with .
The solution of (22) is the eigenvector associated to the median generalized eigenvalue of denoted as .
Then, the optimal parameters can be expressed as :
(23) 
A normalization is introduced in JDi algorithm. It ensures that the estimated mixing matrix has columns of equal norm and determinant equal to one. The normalizing matrix can be expressed as:
(24) 
where are the columns of the estimated mixing matrix .
We propose here to modify the JDi algorithm by replacing the previous normalization step by the following eigenvector normalization:
(25) 
The modified JDi algorithm is summarized in Table 1, where and are a fixed threshold and maximum sweep number respectively, chosen to stop the algorithm.
Require : , a fixed threshold 
and a maximum sweep number . 
Initialization: . 
while and (#sweeps ) 
for all 
Build R as in (20). 
Compute the solution v of (22). 
Normalize vector v as in (25). 
if then . 
Compute and as in (23). 
Update matrices as in (16) 
. 
end for 
end while 
At this stage, we only applied Modified JDi algorithm to the set of transformed real symmetric matrices . Once the algorithm converges, we get (equivalently ) up to some inherent indeterminacies. Now, the question is how to get the complex diagonalizing matrix and get rid of the undesired indeterminacies. This question is considered in the next subsection.
4.4 Complex diagonalizing matrix retrieval
As mentioned in section 2, the JD problem has inherent indeterminacies in the sense that matrix V is estimated up to permutation and diagonal matrices. However, the specific structure of our real valued matrices given in (4.1) leads to extra indeterminacies according to the following lemma:
Lemma 1
Define vectors , as
Under the condition that the dimensional vectors are pairwise linearly independent, the JD problem’s solution of is such that, there exists a permutation matrix P satisfying:
with , being the column vector of and is a orthogonal matrix i.e. for a given scalar factor .
Proof:
This result can be deduced directly from Theorem 3 in [12].
To retrieve the original complex matrix A from the estimated matrix , one needs to find the permutation that associates correctly the column of to its one. For that, since matrix is orthogonal, one can represent it as:
(26) 
and hence
(27) 
where is the column vector of A and with () and , are given in (26).
From (27), one can observe that two columns of :
can be associated if they satisfy the relation :
(28) 
In practice, we solve equation (28) in the least squares sense to take into account the estimation errors.
Once, this pairing process is achieved, the column of matrix A is estimated (up to a scalar complex valued factor) from the first column of matrix as .
Similarly, if P is the permutation pairing correctly the columns of , then is the one pairing correctly the rows of : i.e. .
5 CJDi algorithm
In this section, we give, first, the algorithm’s development based on real symmetric matrices given in equation (4.1). Then, direct complex implementation is developed.
5.1 Algorithm’s development based on real matrices
Note that the real symmetric matrices given in (4.1) have a special structure which can be used to simplify and improve the previous generalization of JDi algorithm. More precisely, we look for transformations that preserve the considered matrix structure along the iterations which allows us to skip the step of complex diagonalizing (resp. mixing) matrix retrieval. Indeed, in the basic generalization, the introduction of the elementary rotation matrix causes the loss of matrix structure. For example, when rotation indices are then the updated entries and are not anymore equal to and , respectively. Hence, to preserve the considered matrix structure, one needs to introduce a second elementary rotation which is . The following lemma provides the solution to the previous problem.
Lemma 2
1) If the real symmetric matrices are updated by the elementary rotation matrix , , then the second elementary rotation which preserves the matrix structure in (4.1) is , i.e. the generalized rotation matrix with same angle and Shear parameters.
2) If the real symmetric matrices are updated by the elementary rotation matrix , , then the second elementary rotation which preserves matrix structure in (4.1) is where the sign of the Shear parameter is inverted.
Proof: The proof of this lemma is given in appendix A.
The rotation parameters are now optimized in such a way we minimize the simplified JD criterion given in (19) for the transformed matrices:
(29) 
and
(30) 
Interestingly, the estimation of the optimal parameters is the same as the one we obtained before when using the matrix transformation in (16).
Lemma 3
The JD criterion calculated with the matrix transform (16) for indices (,) (resp. the matrix transform (16) for indices (,)) is equal, up to constant factors, to the one calculated with the matrix transform (29) (resp. the matrix transform (30)), i.e.
(31) 
(32) 
where and are scalar constants independent from the angle and shear parameters. Consequently, the optimal parameters obtained by minimizing (31) with indices (, ) (resp. (32) with indices (, ) ) are the same as the one obtained by minimizing (19) with indices (, ) (resp. (19) with indices (, )).
Proof: The proof is given in appendix B.
Compared to the previous basic generalization of JDi algorithm, the developed algorithm called CJDi preserves our matrix structure and decreases the number of iterations per sweep. In the first generalization, the number of iterations (index pairs) per sweep is while in CJDi, this number decreases to . Also, CJDi takes into account some extra information about the matrix structure which leads to a slight performance improvement.
5.2 Complex implementation
The matrix rotations in (29) and (30) can be rewritten in the complex form by reversing the function in (4.1), i.e. using the real to complex transformation . Considering this transformation, one can express the previous rotations as:
(33)  
(34)  
where are elementary complex Shear and Givens rotations which are equal to the identity matrix except for , , and entries given by:
(35) 
(38) 
The CJDi algorithm is summarized in Table 3.
Remark:
Using the previous equations (37) and (38), one can express the entries as:
(39) 
5.3 Computational cost
We provide here an evaluation of the computational cost of our algorithm expressed in terms of real flops number (i.e. real multiplication plus real addition) per iteration (sweep).
In our evaluation, we took into account the matrices symmetry and the particular structure of the transformation matrices in (33) and (34) which entries are either real or pure imaginary numbers. Tacking this into consideration, the matrices product in (37) or (38) cost for each matrix , flops (instead of for a brute force implementation).
This numerical cost evaluation is performed similarly for the other considered NOJD algorithms and summarized in Table 4.
Algorithm  Number of real flops per sweep 

CJDi  + 
ACDC  + + 
FAJD  + 
QDiag  + 
UWEDGE  + 
JUST  + 
CVFFDiag  + 
LUCJD  + 
ALS  + + 
6 Comparative performance analysis
The aim of this section is to compare CJDi with the different existing NOJD algorithms, cited in section 3, for different scenarios. More precisely, we have chosen to evaluate and compare the algorithms sensitiveness to different factors that may affect the JD quality. These different criteria of comparison used in our study, are described in the next subsection.
6.1 Factors affecting NOJD problem
In ’good JD conditions’ most algorithms perform well, with slight differences in terms of convergence rate or JD quality. However, in adverse JD conditions, many of these algorithms lose their effectiveness or otherwise diverge. In this study, adverse conditions are accessible through the following factors:

Mixing matrix condition number: The JD quality depends on the good or ill conditioning of mixing matrix A (denoted Cond(A)). The comparative study reveals the algorithm’s robustness w.r.t. Cond(A).

Diagonal matrices condition number: In BSS problem, the power range of source signals affects the conditioning of diagonal matrices and hence the source separation quality. By comparing the algorithms sensitiveness w.r.t. this factor, we reveal their potential performance if used for separating sources with high dynamical range.

Matrix dimensions: The JD problem is more difficult for large dimensional matrices () and hence we compare the algorithms performance in the following two cases: (small dimension) and (large dimension).

Noise effect: In practice, matrices are given by some sample averaged statistics and hence are affected by finite sample size and noise effects. In that case, the exact JD (EJD) becomes approximate JD (AJD) and the algorithms performance is lower bounded by the noise level as shown by our comparative study. The algorithms behaviour and robustness w.r.t. noise effect is investigated in our study.
6.2 Simulation experiment set up
First, we have chosen the classical performance index () to measure the JD quality. It can be expressed as [30]:
(42) 
where is the () entry of global matrix , A being the generated mixing matrix and the estimated diagonalizing matrix. The closer the is to zero, the better is the JD quality. Contrary to other existing criteria, the is common to all algorithms and allows us to compare them properly.
Figure 1 illustrates the organization of our simulation scenarios where simulations are divided in two sets. The first one is dedicated to the exact joint diagonalization (EJD) case whilst the second one assess the algorithms performance in the AJD case by investigating the noise effect as given by equation (2). Also, for each experiment, we have considered two scenarios namely the small dimensional case () and the large dimensional case ().
Finally, for each of these cases, we run four simulation experiments: (i) a reference simulation of relatively good conditions where MOU (resp. MOU for large dimension), Cond(A) (resp. Cond(A) for large dimension) and Cond() (resp. Cond() for large dimension); (ii) a simulation experiment with MOU; (iii) a simulation experiment with Cond(A) and (iv) a simulation experiment with Cond().
The other simulation parameters are as follows: mixing matrix entries are generated as independent and normally distributed variables of unit variance and zero mean. Similarly, the diagonal entries of are independent and normally distributed variables of unit variance and zero mean except in the context where Cond() in which case has a standard deviation of or in the context where MOU in which case . being a random variable of small amplitude generated to tune the value of MOU. Target matrices are computed as in (1) for EJD case. The number of matrices is set to . For AJD case, these matrices are corrupted by additive noise as given in (2). The perturbation level is measured by the ratio between the norm of exact term and the norm of disturbance term (i.e. a dual of signal to noise ratio) expressed in dB as:
(43) 
The error matrix is generated as:
(44) 
where is a random pertu