Perturbation Analysis of An Eigenvector-Dependent Nonlinear Eigenvalue Problem With ApplicationsThis work was supported by NSFC No. 11671023, 11421101, 11671337, 11771188.

# Perturbation Analysis of An Eigenvector-Dependent Nonlinear Eigenvalue Problem With Applications††thanks: This work was supported by NSFC No. 11671023, 11421101, 11671337, 11771188.

Yunfeng Cai LMAM & DSEC, School of Mathematical Sciences, Peking Univ., Beijng, P.R. China,100871 (yfcai@math.pku.edu.cn).    Zhigang Jia School of Mathematics and Statistics, Jiangsu Normal University, Xuzhou, P.R. China, 221116 (zhgjia@jsnu.edu.cn).    Zheng-Jian Bai School of Mathematical Sciences and Fujian Provincial Key Laboratory on Mathematical Modeling & High Performance Scientific Computing, Xiamen University, Xiamen, P.R. China, 361005 (zjbai@xmu.edu.cn). The research of this author was partially supported by the Natural Science Foundation of Fujian Province of China (No. 2016J01035) and the Fundamental Research Funds for the Central Universities.
###### Abstract

The eigenvector-dependent nonlinear eigenvalue problem (NEPv) , where the columns of are orthonormal, , is Hermitian, and , arises in many important applications, such as the discretized Kohn-Sham equation in electronic structure calculations and the trace ratio problem in linear discriminant analysis. In this paper, we perform a perturbation analysis for the NEPv, which gives upper bounds for the distance between the solution to the original NEPv and the solution to the perturbed NEPv. A condition number for the NEPv is introduced, which reveals the factors that affect the sensitivity of the solution. Furthermore, two computable error bounds are given for the NEPv, which can be used to measure the quality of an approximate solution. The theoretical results are validated by numerical experiments for the Kohn-Sham equation and the trace ratio optimization.

Keywords. nonlinear eigenvalue problem, perturbation analysis, Kohn-Sham equation, trace ratio optimization

AMS subject classifications. 65F15, 65F30, 15A18, 47J10

## 1 Introduction

In this paper, we study the perturbation theory of the following eigenvector-dependent nonlinear eigenvalue problem (NEPv)

 A(P)V=VΛ, (1)

where has orthonormal column vectors, , is a continuous Hermitian matrix-valued function of , and is Hermitian, the eigenvalues of are also eigenvalues of . Usually, in practical applications, , and the eigenvalues of are the smallest or largest eigenvalues of . In this paper, we restrict our discussions to the case of the smallest eigenvalues. Furthermore, we consider in the following form

 A(P)=A0+A1(P)+A2(P), (2)

where , and are all Hermitian, is a constant matrix, is a homogeneous linear function of , and is a nonlinear function of .

Notice that if is a solution (1), then so is for any unitary matrix . Therefore, two solutions , are essentially the same if , where and are the subspaces spanned by the column vectors of and , respectively. Throughout the rest of this paper, when we say that is a solution to (1), we mean that the class solves (1).

Perhaps, the most well-known NEPv of the form (1) is the discretized Kohn-Sham (KS) equation arising from density function theory in electronic structure calculations (see [3, 11, 14] and references therein). NEPv (1) also arises from the trace ratio optimization in the linear discriminant analysis for dimension reduction [12, 20, 21], and the Gross-Pitaevskii equation for modeling particles in the state of matter called the Bose-Einstein condensate [1, 5, 6]. We believe that more potential applications will emerge.

The most widely used method for solving NEPv (1) is the so-called self-consistent field (SCF) iteration [11, 14]. Starting with orthonormal , at the th SCF iteration, one computes an orthonormal eigenvector matrix associated with the smallest eigenvalues of , and then is used as the approximation in the next iteration. Convergence analysis of SCF iteration for the KS equation is studied in [9, 10, 19], for the trace ratio problem in [21]. Quite recently, in [2], an existence and uniqueness condition of the solutions to NEPv (1) is given, and the convergence of the SCF iteration is also studied.

In practical applications, is usually obtained from the discretization of operators or constructed from empirical data, thus, contaminated by errors and noises. As a result, the NEPv (1) to be solved is in fact a perturbed NEPv. So, it is natural to ask whether we can trust the approximate solution obtained by solving the perturbed NEPv via certain numerical methods, say the SCF iteration. To be specific, let the perturbed NEPv be of the form

 ˜A(˜P)˜V=˜V˜Λ, (3)

where has orthonormal column vectors, , , and

 ˜A(˜P)=˜A0+˜A1(˜P)+˜A2(˜P) (4)

is a continuous Hermitian matrix-valued function of , is a constant Hermitian matrix, and are perturbed functions of and , respectively, and , are still Hermitian. Assume that the original NEPv (1) has a solution . Then we need to answer the following two fundamental questions:
Q1. Under what conditions the perturbed NEPv (3) has a solution nearby ?
Q2. What’s the distance between and ?

Let and be two -dimensional subspaces of . Let the columns of form an orthonormal basis for and the columns of form an orthonormal basis for . We use to measure the distance between and , where

 Θ(X,Y)=diag(θ1(X,Y),…,θk(X,Y)). (5)

Here, ’s denote the canonical angles between and [15, p. 43], which can be defined as

 0≤θj(X,Y):=arccosσj≤π2for 1≤j≤k, (6)

where ’s are the singular values of .

In this paper, we will focus on Q1 and Q2. The results are established via two approaches. One is based on the well-known theorem in the perturbation theory of Hermitian matrices [4] and Brouwer’s fixed-point theorem [7]; The other is inspired by J.-G. Sun’s technique (e.g., [8, 16, 17, 18]) – finding the radius of the perturbation by constructing an equation of the radius via the fixed-point theorem. Two perturbation bounds can be obtained from these two approaches, and each of them has its own merits. Based on the perturbation bounds, a condition number for the NEPv (1) is introduced, which quantitatively reveals the factors that affect the sensitivity of the solution. As corollaries, two computable error bounds are provided to measure the quality of the computed solution. Theoretical results are validated by numerical experiments for the KS equation and the trace ratio optimization.

The rest of this paper is organized as follows. In section 2, we use two approaches to answer Q1 and Q2, followed by some discussions on the condition number and error bounds for NEPv (1). In section 3, we apply our theoretical results to the KS equation and the trace ratio optimization problem, respectively. Finally, we give our concluding remarks in section 4.

## 2 Main results

In this section we provide two approaches to answer Q1 and Q2. A condition number and error bounds for NEPv will also be discussed. Before we proceed, we introduce the following notation, which will be used throughout the rest of this paper.

stands for the set of all matrices with complex entries. The superscripts “” and “” take the transpose and the complex conjugate transpose of a matrix or vector, respectively. The symbol denotes the 2-norm of a matrix or vector. Unless otherwise specified, we denote by for the eigenvalues of a Hermitian matrix and they are always arranged in nondecreasing order: . Define

 Vk :={V∈Cn×k∣∣VHV=Ik}, (7a) Pk :={P∈Cn×n∣∣P=VVH,V∈Vk}. (7b)

Let , be the solutions to (1) and (3), respectively. For any , define

 Vξ :={V∈Cn×k∣∣VHV=Ik,∥sinΘ(R(V),R(V∗))∥2≤ξ}, (8) Pξ :={P∈Cn×n∣∣P=VVH,V∈Vξ}. (9)

Denote , , , and also

 δ0 =∥˜A0−A0∥2, (10a) δ1 =supP∈Pξ∥˜A1(P)−A1(P)∥2, d1 =supP≠P∗,P∈Pξ∥A1(P)−A1(P∗)∥2∥P−P∗∥2, (10b) δ2 =supP∈Pξ∥˜A2(P)−A2(P)∥2, d2 =supP≠P∗,P∈Pξ∥A2(P)−A2(P∗)∥2∥P−P∗∥2, (10c) δ =δ0+δ1+δ2, d =d1+d2. (10d)

Note here that can be used to measure the magnitude of the perturbation, and is a “local Lipschitz constant” such that

 ∥A(P)−A(P∗)∥2≤d∥P−P∗∥2 (11)

for all . Thus, we may use to measure the sensitivity of within .

### 2.1 Approach one

In this subsection, we use the famous Weyl Theorem [15, p.203], Davis-Kahan theorem [4], and Brouwer’s fixed-point theorem [7] to answer questions Q1 and Q2.

###### Theorem 2.1

Let be a solution to (1), , and

 g=λk+1(A(P∗))−λk(A(P∗))>0. (12)

If

 δ<12 g−d, (13)

then the perturbed NEPv (3) has a solution with

 ξ∗=2δg−d−δ+√(g−d−δ)2−4dδ. (14)

Proof: Using (13), we know that given by (14) is a positive constant. Then it is easy to see that is a nonempty bounded closed convex set in . For any , letting , we define for , where is an eigenvector of corresponding with for and . If we can show that

• (which implies that the mapping is well-defined in the sense that is unique);

• is a continuous mapping within ;

• ,

then by Brouwer’s fixed-point theorem [7], has a fixed point in . Let be the fixed point, where . Then is a solution to the perturbed NEPv (3). Hence the conclusion follows immediately. Next, we show , and in order.

Proof of First, using (13) and (14), we have

 ξ∗ <2δg−d−δ+√(d+δ)2−4dδ =2δg−d−δ+|d−δ| =⎧⎪⎨⎪⎩2δg−2δ,if d≥δ,2δg−2d,otherwise <1. (15)

Second, direct calculations give rise to

 ∥˜A(˜P)−A(P∗)∥2 ≤∥˜A0−A0∥2+∥˜A1(˜P)−A1(P∗)∥2+∥˜A2(˜P)−A2(P∗)∥2 ≤δ0+∥˜A1(˜P)−A1(˜P)∥2+∥A1(˜P)−A1(P∗)∥2 +∥˜A2(˜P)−A2(˜P)∥2+∥A2(˜P)−A2(P∗)∥2 ≤δ+d∥˜P−P∗∥2 (16a) ≤δ+dξ∗, (16b)

where (16a) uses (10), (16b) uses and .

Third, by the famous Weyl Theorem [15, p.203], we have

 |λj(˜A(˜P))−λj(A(P∗))|≤∥˜A(˜P)−A(P∗)∥2, for j=1,2,…,n. (17)

Then it follows that

 λk+1(˜A(˜P))−λk(˜A(˜P)) =g+[λk+1(˜A(˜P))−λk+1(A(P∗))]+[λk(A(P∗))−λk(˜A(˜P))] ≥g−2∥˜A(˜P)−A(P∗)∥2 (18a) ≥g−2δ−2dξ∗ (18b) >0, (18c)

where (18a) uses (17), (18b) uses (16), (18c) uses (15) and (13).

Proof of We verify that is a continuous mapping within by showing that for any , , as , where and .

Let , , and

 ˜R=˜A(˜P1)˜V2ϕ−˜V2ϕdiag(λ1(˜A(˜P2)),…,λk(˜A(˜P2))).

Then

 ˜R=[˜A(˜P1)−˜A(˜P2)]˜V2ϕ,

and hence

 ∥˜R∥2=∥[˜A(˜P1)−˜A(˜P2)]˜V2ϕ∥2≤∥˜A(˜P1)−˜A(˜P2)∥2.

Using (15)–(17), we have

 λk+1(˜A(˜P2))−λk(˜A(˜P1)) =g+[λk+1(˜A(˜P2))−λk+1(A(P∗))]−[λk(˜A(˜P1))−λk(A(P∗))] ≥g−2(δ+dξ∗)≥g−2(δ+d)>0. (19)

By Davis-Kahan theorem [4], we have

 ∥sinΘ(R(˜V1ϕ),R(˜V2ϕ))∥2≤∥˜R∥2λk+1(˜A(˜P2))−λk(˜A(˜P1)). (20)

Letting , we know that since is continuous. Then it follows from (19) and (20) that

 ∥ϕ(˜P1)−ϕ(˜P2)∥2=∥sinΘ(R(˜V1ϕ),R(˜V2ϕ))∥2≤∥˜R∥2g−2(δ+d)→0.

Therefore, .

Proof of Define

 R=˜A(˜P)V∗−V∗Λ∗,

where . Then

 R=[˜A(˜P)−A(P∗)]V∗. (21)

Using (16) and (17), we have

 λk+1(A(P∗))−λk(˜A(˜P)) =λk+1(A(P∗))−λk(A(P∗))+λk(A(P∗))−λk(˜A(˜P)) ≥g−δ−dξ∗>0. (22)

Then it follows that

 ∥P∗−ϕ(˜P))∥2=∥sinΘ(R(V∗),R(˜V))∥2 ≤∥R∥2λk+1(A(P∗))−λk(˜A(˜P)) (23a) ≤∥˜A(˜P)−A(P∗)∥2g−δ−dξ∗ (23b) ≤δ+dξ∗g−δ−dξ∗ (23c) =ξ∗, (23d)

where (23a) uses Davis-Kahan theorem [4], (23b) uses (21) and (22), (23c) uses (16), (23d) uses (14). Therefore, . This completes the proof.

###### Remark 2.2

The above approach is inspired by [22] and is also used in [2], where the existence and uniqueness of the solution to (1) and the convergence of the SCF iteration are studied.

### 2.2 Approach two

In this subsection, we use another approach to answer questions Q1 and Q2, which is inspired by J.-G. Sun’s technique, see e.g., [8, 16, 17, 18].

###### Theorem 2.3

Let be a solution to (1), , be given by (12), and

 h=max1≤j≤k[λk+j(A(P∗))−λj(A(P∗))],ζ=√g√g+√2h. (24)

Assume that is sufficiently small such that

 f(η)≡gη−dη√1+η2−(1+η2)δ=0 (25)

has positive roots, and its smallest positive root, denoted by , is smaller than . Then the perturbed NEPv (3) has a solution with

 τ∗=η∗√1+η2∗. (26)

Proof: Let be a unitary matrix such that

 [V∗,Vc]HA(P∗)[V∗,Vc]=[Λ∗00Λc], (27)

where

 Λ∗=diag(λ1(A(P∗)),…,λk(A(P∗))),Λc=diag(λk+1(A(P∗)),…,λn(A(P∗))).

Then that the perturbed NEPv (3) has a solution is equivalent to that there exists a unitary matrix such that

 [˜V∗,˜Vc]H˜A(˜P∗)[˜V∗,˜Vc]=[˜Λ∗00˜Λc], (28)

where is Hermitian and its eigenvalues are the smallest eigenvalues of .

Without loss of generality111Note that and thus . By the CS decomposition [15, Chapter 1, Theorem 5.1], we know that there exist unitary matrices and such that . Rewrite . It still holds (28). Then (29) follows immediately by setting ., we let

 (29)

where is a parameter matrix, and are arbitrary unitary matrices. Substituting (29) into (28), we get

 (Ik+ZHZ)−12[Ik,ZH]D[IkZ](Ik+ZHZ)−12=Q∗˜Λ∗QH∗, (30a) (30b) [−Z,In−k]D[IkZ]=0, (30c)

where

 D=[V∗,Vc]H˜A(˜P∗)[V∗,Vc]. (31)

Then the perturbed NEPv (3) has a solution is equivalent to

• there exists such that (30c) holds;

• .

Next, we first prove (a) then (b).

Proof of It follows from (27), (30c) and (31) that

 0 =[−Z,In−k][V∗,Vc]H˜A(˜P∗)[V∗,Vc][IkZ] =ΛcZ−ZΛ∗+(−ZVH∗+VHc)[˜A(˜P∗)−A(P∗)](V∗+VcZ) =L(Z)+Φ(Z),

where

 L(Z) =ΛcZ−ZΛ∗, Φ(Z) =(−ZVH∗+VHc)[˜A(˜P∗)−A(P∗)](V∗+VcZ). (32)

Note that since defined in (12) is positive, is an invertible linear operator with

 ∥L−1∥−12=minλ∈λ(Λ∗),~λ∈λ(Λc)|λ−~λ|=λk+1(A(P∗))−λk(A(P∗))=g>0. (33)

Therefore, we may define a mapping as

 μ(Z)≡−L−1(Φ(Z)). (34)

By (29), we have

 ∥˜P∗−P∗∥2 =∥˜V∗˜VH∗−V∗VH∗∥2 =∥∥∥[V∗,Vc][IkZ](Ik+ZHZ)−1[Ik,ZH][V∗,Vc]H−V∗VH∗∥∥∥2 =∥Z∥2√1+∥Z∥22. (35)

Then it follows from (32), (16) and (35) that

 ∥L−1Φ(Z)∥2 ≤1g(1+∥Z∥22)(δ+d∥˜P∗−P∗∥2) =1g((1+∥Z∥22)δ+d∥Z∥2√1+∥Z∥22). (36)

Denote

 Bη∗={Z|∥Z∥2≤η∗}.

Note that is a nonempty bounded closed convex set, defined in (34) is a continuous mapping, and for any , by (36) and (25), it holds

 ∥μ(Z)∥2≤1g((1+η2∗)δ+dη∗√1+η2∗)=η∗,

i.e., maps into itself. So by Brouwer’s fixed-point theorem [7], has a fixed point in . In other words, (30c) has a solution . This completes the proof of (a).

Proof of If

 minQH∗Q∗=Ik∥Q∗˜Λ∗QH∗−Λ∗∥2+minQHcQc=In−k∥Qc˜ΛcQHc−Λc∥2

then by Weyl Theorem [15], we have . Consequently,

 λ1(˜Λc)−λk(˜Λ∗)=g+[λ1(˜Λc)−λ1(Λc)]−[λk(˜Λ∗)−λk(Λ∗)]>g−g=0.

Therefore, we only need to show (37), under the assumption .

We get by (16), (26), and (31) that

 D =[V∗,Vc]HA(P∗)[V∗,Vc]+[V∗,Vc]H[˜A(˜P∗)−A(P∗)][V∗,Vc] =[Λ∗00Λc]+ΔD, (38)

where satisfies

 ∥ΔD∥2=∥˜A(˜P∗)−A(P∗)∥≤δ+d∥˜P∗−P∗∥2≤δ+dτ∗. (39)

Let the singular value decomposition (SVD) of be , where has orthonormal columns, , , , and is unitary. Let for , , . Then using (30a), (38), (39), we have

 minQH∗Q∗=Ik∥Q∗˜Λ∗QH∗−Λ∗∥2 =minQH∗Q∗=Ik∥∥ ∥ ∥∥Q∗VZ[ˆC,ˆS,0]D⎡⎢ ⎢⎣ˆCˆS0⎤⎥ ⎥⎦VHZQH∗−Λ∗∥∥ ∥ ∥∥2 ≤∥∥ ∥ ∥∥[ˆC,ˆS,0]D⎡⎢ ⎢⎣ˆCˆS0⎤⎥ ⎥⎦−Λ∗∥∥ ∥ ∥∥2 ≤∥ΔD∥2+∥∥∥ˆCΛ∗ˆC+[ˆS,0]Λc[ˆS0]−Λ∗∥∥∥2 ≤δ+dτ∗+hsin2θ1 ≤δ+dτ∗+hτ2∗. (40)

Similarly,

 minQHcQc=In−k∥Qc˜ΛcQHc−Λc∥2≤δ+dτ∗+hτ2∗. (41)

Direct calculations give rise to

 2[δ+dτ∗+hτ2∗]−g =2(δ+dη∗√1+η2∗)+2hη2∗1+η2∗−g =2g