Riemannian Newton’s method
for joint diagonalization on the Stiefel manifold
with application to ICA
Abstract
Joint approximate diagonalization of noncommuting symmetric matrices is an important process in independent component analysis. It is known that this problem can be formulated as an optimization problem on the Stiefel manifold. Riemannian optimization techniques can be used to solve this optimization problem. Among the available techniques, this article provides Riemannian Newton’s method for the joint diagonalization problem, which has the quadratic convergence property. In particular, it is shown that the resultant Newton’s equation can be effectively solved by means of the Kronecker product and vec operator, which reduces the dimension of the equation. Numerical experiments are performed to show that the proposed method improves the accuracy of an approximate solution of the problem. The proposed method is applied to independent component analysis.
Keywords: Joint diagonalization; Riemannian optimization; Newton’s method; Stiefel manifold; Independent component analysis, Image Separation
1 Introduction
The joint diagonalization (JD) problem for real symmetric matrices is often considered on the orthogonal group . The problem is to find an orthogonal matrix that minimizes the sum of the squared offdiagonal elements, or equivalently, maximizes the sum of the squared diagonal elements of [18]. For more information regarding finding nonorthogonal matrices, see [20]. A solution to the JD problem is valuable for independent component analysis (ICA) and the blind source separation problem [2, 5, 6, 8, 17].
Several approaches have been proposed in the context of Jacobi methods [3, 5, 6] and Riemannian optimization [17, 19]. In [17], the JD problem is considered on the Stiefel manifold with . That is, the required matrix is a rectangular orthonormal matrix. The orthogonal group is a special case of the Stiefel manifold because .
Riemannian optimization refers to optimization on Riemannian manifolds. Unconstrained optimization methods in Euclidean space such as steepest descent, conjugate gradient, and Newton’s methods have been generalized to those on a Riemannian manifold [1, 9, 12, 16]. In [17], the Riemannian trustregion method is applied to the JD problem on the Stiefel manifold . With varying on with , minimizing the sum of the squared offdiagonal elements of is no longer equivalent to maximizing the sum of the squared diagonal elements. According to [17], the JD problem on the Stiefel manifold maximizes the sum of the squared diagonal elements of with .
This article deals with Newton’s method for the JD problem on the Stiefel manifold. The Hessian of the objective function is fundamental for deriving Newton’s equation, the key equation in Newton’s method. We have intensively examined the Hessian to efficiently solve Newton’s equation. In particular, we have used the Kronecker product, and vec and veck operators to reduce the dimension of Newton’s equation and to transform the equation into the form “”.
This paper is organized as follows. We introduce the JD problem on the Stiefel manifold in Section 2, following on from [17]. We also present a brief review of computing the gradient and the Hessian of the objective function. In Section 3, we consider Newton’s equation, which is directly obtained by substituting the gradient and Hessian formulas in Section 2 into . To derive the representation matrix formula of the Hessian of the objective function, we use the Kronecker product, and vec and veck operators. This results in a smaller equation that is easier to solve. It should be noted that the dimension of the resultant equation is equal to the dimension of the Stiefel manifold in question. This means that the equation can be efficiently solved. Section 4 provides information about the two types of numerical experiments we use to evaluate our method. The first experiment is an application to ICA. We demonstrate that the proposed method improves the accuracy when compared with the solution generated by a Jacobilike method, which is an existing method for the JD problem. The other experiment uses a larger problem to show that sequences generated by the proposed Newton’s method converge quadratically. Section 5 contains our concluding remarks. In these sections, the Stiefel manifold is endowed with the induced metric from the natural inner product in ambient Euclidean space. In contrast to this, we derive another formula for the representation matrix of the Hessian in Appendix A, in which the Stiefel manifold is endowed with the canonical metric.
2 Joint diagonalization problem on the Stiefel manifold
2.1 Joint diagonalization problem
Let be real symmetric matrices. We consider the following JD problem on the Stiefel manifold [17]:
Problem
(2.1)  
(2.2) 
where with , denotes the Frobenius norm, and denotes the diagonal part of the matrix.
We wish to apply Newton’s method to Problem 2.1, so the Hessian of is fundamental. To derive and analyze the Hessian and other requisites, we first review the geometry of .
2.2 The geometry of the Stiefel manifold
The tangent space of at is
(2.3) 
In later sections, we will make full use of the equivalent form [9]
(2.4) 
rather than Eq. (2.3), where is an arbitrary matrix that satisfies and , and denotes the set of all skewsymmetric matrices. We here note that
(2.5) 
which is an important relation for rewriting Newton’s equation into a system of linear equations.
Since is a submanifold of the matrix Euclidean space , it can be endowed with the Riemannian metric
(2.6) 
which is induced from the natural inner product in . We view as a Riemannian submanifold of with the above metric. Under this metric, the orthogonal projection at onto is expressed as
(2.7) 
where denotes the symmetric part of the matrix.
In optimization algorithms on the Euclidean space, the line search is performed after computing the search direction. In Riemannian optimization, the concept of a straight line is replaced with a curve (not necessarily geodesic) on a general Riemannian manifold. A retraction on the manifold in question is needed to implement Riemannian optimization algorithms [1]. It defines an appropriate curve for searching a next iterate point on the manifold. We will use the QR retraction on the Stiefel manifold [1] defined as
(2.8) 
where denotes the Q factor of the QR decomposition of the matrix. In other words, if a fullrank matrix is uniquely decomposed into , where and is a upper triangular matrix with strictly positive diagonal entries, then .
2.3 The gradient and the Hessian of the objective function
We now return to Problem 2.1. We need the gradient, , and Hessian, , of the objective function, , on to describe Newton’s equation for the problem. Newton’s equation is defined at each as , where is an unknown tangent vector. Expressions for the gradient and Hessian can also be found in [17]. In this subsection, we briefly describe how to compute them.
Let be the function on defined in the same way as the righthand side of (2.1). We note that is the restriction of to . Since we view the Stiefel manifold, , as a submanifold of the Euclidean space , the gradient at can be expressed as
(2.9) 
where is the Euclidean gradient of on . Furthermore, the Hessian at acts on as
(2.10) 
Therefore, we only have to compute and .
We use the relation
(2.11) 
to compute the Frechét derivative with as
(2.12) 
so that
(2.13) 
The gradient of on can be obtained by Eq. (2.9) and (2.13) as
(2.14) 
We can also express as
(2.15) 
where is easily obtained from Eq. (2.13) as
(2.16) 
We here note that . It follows from Eqs. (2.10) and (2.15) that
(2.17) 
where we have used and in the first equality.
3 Newton’s method for the joint diagonalization problem on the Stiefel manifold
3.1 The Hessian of the objective function as a linear transformation on
In Newton’s method for minimizing an objective function, , on a general Riemannian manifold, , the search direction at the current point is computed as a solution to Newton’s equation, . Since we have already obtained the matrix expressions of and , Newton’s equation for Problem 2.1 at is written as
(3.1) 
This must be solved for , with given. Because is in , must satisfy . Equation (3.1) appears too difficult to solve. This is because Eq. (3.1) is complicated, and is an matrix with independent variables.
To overcome these difficulties, we wish to obtain the representation matrix of as a linear transformation on , for an arbitrarily fixed . To this end, we identify as and view as a ()dimensional vector. This can be done using the form in Eq. (2.4). We arbitrarily fix a that satisfies and . Such a can be computed by applying the GramSchmidt orthonormalization process to linearly independent column vectors of the matrix (see Algorithm 3.1). Then, can be expressed as
(3.2) 
can also be written as
(3.3) 
and we can write out and using and .
Proposition
3.2 Kronecker product and the vec and veck operators
The vec operator and the Kronecker product are useful for rewriting a matrix equation. They can be used to transform the matrix into an unknown column vector [13, 15]. The vec operator acts on a matrix as
(3.8) 
That is, is an dimensional column vector obtained by stacking the columns of one underneath the other. The Kronecker product of and (denoted by ) is an matrix defined as
(3.9) 
The following useful properties of these operators are known.

For , and ,
(3.10) 
There exists an permutation matrix such that
(3.11) where is given by
(3.12) and denotes the matrix that has component equal to , and the others equal to .
Furthermore, we can easily derive the following properties.

For ,
(3.13) 
There exists an diagonal matrix such that
(3.14) where .
For in Eq. (3.2), is an appropriate vector expression of , because all the elements of are independent variables. On the other hand, for in Eq. (3.2), contains zeros stemmed from the diagonal elements of . These should be removed. In addition, contains duplicates of each independent variable, because the upper triangular part (excluding the diagonal) of is the negative of the lower triangular part. Therefore, we use the veck operator [11]. The veck operator acts on an skewsymmetric matrix as
(3.15) 
That is, is an dimensional column vector obtained by stacking the columns of the lower triangular part of one underneath the other. There exists a matrix that only depends on (the size of ) such that
(3.16) 
where is given by
(3.17) 
We here note that Eq. (3.16) is valid only if is skewsymmetric. Because each column of has a and a , and because each row of has at most one nonzero element, we have . It follows that
(3.18) 
3.3 Representation matrix of the Hessian and Newton’s equation
We have obtained all the requisites in the previous subsections. We regard the Hessian at as a linear transformation on that transforms a ()vector, , into , where and . Our goal is to obtain the representation matrix of .
Proposition
Proof
Thus, Newton’s equation, , can be solved by the following method. We first note that Newton’s equation is equivalent to
(3.28) 
Applying the veck operator to the first equation of Eq. (3.28), and the vec operator to the second, yields
(3.29) 
where with and . If is invertible, we can solve Eq. (3.29) as
(3.30) 
After we have obtained and , we can easily reshape and . Therefore, we can calculate the solution of Newton’s equation (3.1).
3.4 Newton’s method
When implementing Newton’s equation, we must compute the matrix in Eq. (3.20). If the block matrices of have some relationships, we may reduce the computational cost of computing .
The Hessian, , is symmetric with respect to the metric . However, the representation matrix is not symmetric. This is because, for and with and , we have
(3.31) 
so that the independent coordinates of and are counted twice. If we endowed with the canonical metric [9], the representation matrix would be symmetric (see Appendix A for more details).
Although the representation matrix with the induced metric is not symmetric, it does satisfy the following.