Numerical Optimization of Eigenvalues of Hermitian Matrix Functions
This work concerns the global minimization of a prescribed eigenvalue or a weighted sum of
prescribed eigenvalues of a Hermitian matrix-valued function depending on
its parameters analytically in a box. We describe how the analytical properties
of eigenvalue functions can be put into use to derive piece-wise quadratic functions
that underestimate the eigenvalue functions. These piece-wise quadratic under-estimators
lead us to a global minimization algorithm, originally due to Breiman and Cutler.
We prove the global convergence of the algorithm, and show that it can be effectively
used for the minimization of extreme eigenvalues, e.g., the largest eigenvalue or
the sum of the largest specified number of eigenvalues. This is particularly
facilitated by the analytical formulas for the first derivatives of eigenvalues,
as well as analytical lower bounds on the second derivatives that can be
deduced for extreme eigenvalue functions. The applications that we have in mind
also include the -norm of a linear dynamical system, numerical
radius, distance to uncontrollability and various other non-convex eigenvalue
optimization problems, for which, generically, the eigenvalue function involved is simple
at all points.
Hermitian eigenvalues, analytic, global optimization, perturbation of
eigenvalues, quadratic programming
AMS subject classifications. 65F15, 90C26
The main object of this work is a matrix-valued function that is analytic and Hermitian at all . Here, we consider the numerical global minimization of a prescribed eigenvalue of over , where denotes a box. From an application point of view, a prescribed eigenvalue typically refers to the th largest eigenvalue, i.e., , or a weighted sum of largest eigenvalues, i.e., for given real numbers . However, it may as well refer to a particular eigenvalue with respect to a different criterion as long as the (piece-wise) analyticity properties discussed below and in Section LABEL:sec:Her_eig_pert are satisfied.
The literature from various engineering fields and applied sciences is rich with eigenvalue optimization problems that fits into the setting of the previous paragraph. There are problems arising in structural design and vibroacoustics, for which the minimization of the largest eigenvalue or maximization of the smallest eigenvalue of a matrix-valued function is essential, e.g., the problem of designing the strongest column which originated from Euler in the 18th century . In control theory, various quantities regarding dynamical systems can be posed as eigenvalue optimization problems. For instance, the distance from a linear dynamical system to a nearest unstable system , and the -norm of a linear dynamical system have non-convex eigenvalue optimization characterizations . In graph theory, relaxations of some NP-hard graph partitioning problems give rise to optimization problems in which the sum of the largest eigenvalues is to be minimized .
In this paper, we offer a generic algorithm based on the analytical properties of eigenvalues of an analytic and Hermitian matrix-valued function, that is applicable for any eigenvalue optimization problem whenever lower bounds on the second derivatives of the eigenvalue function can be calculated analytically or numerically. All of the existing global eigenvalue optimization algorithms in the non-convex setting are designed for specific problems, e.g., [3, 5, 6, 7, 15, 17, 19, 20, 21, 22, 32], while widely adopted techniques such as interior point methods  - when it is possible to pose an eigenvalue optimization problem as a semi-definite program - or a bundle method  are effective in the convex setting. We foresee non-convex eigenvalue optimization problems that depend on a few parameters as the typical setting for the use of the algorithm here.
For the optimization of non-convex eigenvalue functions, it appears essential to benefit from the global properties of eigenvalue functions, such as their global Lipschitzness or global bounds on their derivatives. Such global properties lead us to approximate globally with under-estimating functions, which we call support functions. Furthermore, the derivatives of the eigenvalue functions can be evaluated effectively at no cost once the eigenvalue function is evaluated (due to analytic expressions for the derivatives of eigenvalues in terms of eigenvectors as discussed in Section LABEL:subsec:first_der). Therefore, the incorporation of the derivatives into the support functions yields quadratic support functions on which our algorithm relies. The quadratic support functions for eigenvalue functions are derived exploiting the analytical properties of eigenvalues and presume the availability of a lower bound on the second derivatives of the eigenvalue function that is obtained either analytically or numerically.
Example: Consider the minimization of the largest eigenvalue of
where are given symmetric matrices. It can be deduced from the expressions in Section LABEL:subsec:sec_der that for all such that is simple. Furthermore, due to expressions in Section LABEL:subsec:first_der, at all such , we have where is a unit eigenvector associated with . Consequently, it turns out that, about any where is simple, there is a support function
satisfying for all ; see Section LABEL:subsec:support_extreme for the details.
Support functions have earlier been explored by the global optimization community. The Piyavskii-Shubert algorithm [40, 45] is derivative-free, and constructs conic support functions based on Lipschitz continuity with a known global Lipschitz constant. It converges sub-linearly in practice. Sophisticated variants that make use of several Lipschitz constants simultaneously appeared in the literature [24, 43]. The idea of using derivatives in the context of global optimization yields powerful algorithms. Breimann and Cutler  developed an algorithm that utilizes quadratic support functions depending on the derivatives. Some variants of the Breimann-Cutler algorithm are also suggested for functions with Lipschitz-continuous derivatives; for instance [18, 26, 27] benefit from multiple Lipschitz constants for the derivatives,  estimates Lipschitz constants for the derivatives locally, while  modifies the support functions of the Breimann-Cutler algorithm in the univariate case so that the subproblems become smooth; however, all these variants in the multivariate case end up working on a mesh as a downside. The quadratic support functions that we derive for coincide with the quadratic support functions on which the Breimann-Cutler algorithm is built on. Consequently, our approach is a variant of the algorithm due to Breimann and Cutler .
At every iteration of the algorithm, a global minimizer of a piece-wise quadratic model defined as the maximum of a set of quadratic support functions is determined. A new quadratic support function is constructed around this global minimizer, and the piece-wise quadratic model is refined with the addition of this new support function. In practice, we observe a linear rate of convergence to a global minimizer.
The algorithm appears applicable especially to extremal eigenvalue functions of the form
where are given real numbers such that . This is facilitated by the simple quadratic support functions derived in Section LABEL:subsec:support_extreme, and expressions for the lower bound on the second derivatives derived in Section LABEL:sec:extreme_low_bound. The algorithm is also applicable if the eigenvalue function is simple over all , which holds for various eigenvalue optimization problems of interest.
Outline: We start in the next section with a list of eigenvalue optimization problems to which our proposed algorithm fits well. In Section LABEL:sec:Her_eig_pert, the basic results concerning the analyticity and derivatives of the eigenvalues of a Hermitian matrix-valued function that depends analytically on are reviewed. In Section LABEL:sec:quad_support, for a general eigenvalue function, the piece-wise quadratic support functions that are defined as the minimum of quadratic functions are derived. In Section LABEL:sec:squad_support, it is shown that these piece-wise quadratic support functions simplify to smooth quadratic support functions for the extremal eigenvalue functions, as well as for the eigenvalue functions that are simple for all . Global lower bounds on the second derivatives of an extremal eigenvalue function are deduced in Section LABEL:sec:extreme_low_bound. The algorithm based on the quadratic support functions is presented in Section LABEL:sec:algorithm. We establish the global convergence of the proposed algorithm in Section LABEL:sec:1d_anal. Finally, comprehensive numerical experiments are provided in Section LABEL:sec:eig_opt. The examples indicate the superiority of the algorithm over the Lipschitz continuity based algorithms, e.g., [24, 40, 45], as well as the level-set based approaches devised for particular non-convex eigenvalue optimization problems, e.g., [20, 32]. The reader who prefers to avoid technicalities at first could glance at the algorithm in Section LABEL:sec:algorithm, then go through Sections LABEL:sec:Her_eig_pert-LABEL:sec:extreme_low_bound for the theoretical foundation.
2.1 Quantities Related to Dynamical Systems
The numerical radius of is the modulus of the outer-most point in its field of values , and is defined by
This quantity gives information about the powers of , e.g.,, and is used in the literature to analyze the convergence of iterative methods for the solution of linear systems [1, 11]. An eigenvalue optimization characterization is given by :
The -norm is one of the most widely used norms in practice for the descriptor system
where and are the input and output functions, respectively, and , with are the system matrices. The -norm of the transfer function for this system is defined as
Here and elsewhere, represents the th largest singular value, and denotes the pseudoinverse of . Also above, with zero initial conditions for the descriptor system, the transfer function reveals the linear relation between the input and output, as with and denoting the Laplace transformations of and , respectively. Note that the -norm above is ill-posed (i.e., the associated operator is unbounded) if the pencil has an eigenvalue on the imaginary axis or to the right of the imaginary axis. Therefore, when the -norm is well-posed, the matrix-valued function is analytic at all . A relevant quantity is the (continuous) distance to instability from a matrix ; the eigenvalue optimization characterization for the -norm with and reduces to that for the distance to instability  from with respect to the -norm.
Paige  suggested the distance to uncontrollability, for a given and with , defined by
as a robust measure of controllability. Here, the controllability of a linear control system of the form means that the function can be driven into any state at a particular time by some input , and could be equivalently characterized as Therefore, the eigenvalue optimization characterization for the distance to uncontrollability takes the form :
2.2 Minimizing the Largest or Maximizing the Smallest Eigenvalues
In the 18th century, Euler considered the design of the strongest column with a given volume with respect to the radii of the cross-sections [30, 37]. The problem can be formulated as finding the parameters, representing the radii of cross-sections, maximizing the smallest eigenvalue of a fourth order differential operator. The analytical solution of the problem has been considered in several studies in 1970s and in 1980s [2, 33, 35], which were motivated by the earlier work of Keller and Tadjbakhsh . Later, the problem is treated numerically  by means of the finite-element discretization, giving rise to the problem
The treatment in  yields . In this affine setting, the minimization of the largest eigenvalue is a convex optimization problem (immediate from Theorem LABEL:thm:extreme_low_bound below) and received considerable attention [14, 16, 36].
In the general setting, when the dependence of the matrix function on the parameters is not affine, the problem in (LABEL:eq:minimize_largest) is non-convex. Such non-convex problems are significant (though they are not studied much excluding a few studies such as  that offer only local analysis) in robust control theory for instance to ensure robust stability. The dual form that concerns the maximization of the smallest eigenvalue is of interest in vibroacoustics.
2.3 Minimizing the Sum of the Largest Eigenvalues
In graph theory, relaxations of the NP-hard partitioning problems lead to eigenvalue optimization problems that require the minimization of the sum of the largest eigenvalues. For instance, given a weighted graph with vertices and nonnegative integers summing up to , consider finding a partitioning of the graph such that the th partition contains exactly vertices for and the sum of the weights of the edges within each partition is maximized. The relaxation of this problem suggested in  is of the form
Once again, in general, the minimization of the sum of the largest eigenvalues is not a convex optimization problem, and there are a few studies in the literature that attempted to analyze the problem locally for instance around the points where the eigenvalues coalesce .
3 Background on Perturbation Theory of Eigenvalues
In this section, we first briefly summarize the analyticity results, mostly borrowed from [41, Chapter 1], related to the eigenvalues of matrix-valued functions. Then, expressions  are provided for the derivatives of Hermitian eigenvalues in terms of eigenvectors and the derivatives of matrix-valued functions. Finally, we elaborate on the analyticity of singular value problems as special Hermitian eigenvalue problems.
3.1 Analyticity of Eigenvalues
3.1.1 Univariate Matrix Functions
For a univariate matrix-valued function that depends on analytically, which may or may not be Hermitian, the characteristic polynomial is of the form
where are analytic functions of . It follows from the Puiseux’ theorem (see, e.g., [48, Chapter 2]) that each root such that has a Puiseux series of the form
for all small , where is the multiplicity of the root .
Now suppose is Hermitian for all , and let be the smallest integer such that . Then, we have
which implies that is real, since and are real numbers for each . Furthermore,
is real, which implies that is real, or equivalently that is integer. This observation reveals that the first nonzero term in the Puiseux series of is an integer power of . The same argument applied to the derivatives of and the associated Puiseux series indicates that only integer powers of can appear in the Puiseux series (LABEL:eq:Puiseux), that is the Puiseux series reduces to a power series. This establishes that is an analytic function of . Indeed, it can also be deduced that, associated with , there is an orthonormal set of eigenvectors, where each of varies analytically with respect to (see  for details).
Theorem 3.1 (Rellich)
Let be a Hermitian matrix-valued function that depends on analytically.
The roots of the characteristic polynomial of can be arranged so that each root for is an analytic function of .
There exists an eigenvector associated with for that satisfies the following:
for , and
is an analytic function of .
3.1.2 Multivariate Matrix Functions
The eigenvalues of a multivariate matrix-valued function that depends on analytically do not have a power series representation in general even when is Hermitian. As an example, consider
On the other hand, it follows from Theorem LABEL:thm:Rellich that, there are underlying eigenvalue functions , of , each of which is analytic along every line in , when is Hermitian. This analyticity property along lines in implies the existence of the first partial derivatives of everywhere. Expressions for the first partial derivatives will be derived in the next subsection, indicating their continuity. As a consequence of the continuity of the first partial derivatives, each must be differentiable.
Let be a Hermitian matrix-valued function that depends on analytically. Then, the roots of the characteristic polynomial of can be arranged so that each root is (i) analytic on every line in , and (ii) differentiable on .
3.2 Derivatives of Eigenvalues
3.2.1 First Derivatives of Eigenvalues
Consider a univariate Hermitian matrix-valued function that depends on analytically. An analytic eigenvalue and the associated eigenvector as described in Theorem LABEL:thm:Rellich satisfy
Taking the derivatives of both sides, we obtain
Multiplying both sides by and using the identities as well as , we get
3.2.2 Second Derivatives of Eigenvalues
By differentiating both sides of (LABEL:eq:eigval_1der), it is possible to deduce the formula (the details are omitted for brevity)
for the second derivatives assuming that the (algebraic) multiplicity of is one.
If, on the other hand, the eigenvalues repeat at a given , specifically when the (algebraic) multiplicity of is greater than one, the formula (LABEL:eq:eigval_2der) generalizes as
Here, denotes the set of indices of the analytic eigenvalues (specified in Theorem LABEL:thm:Rellich) that are identical to at all .
3.2.3 Derivatives of Eigenvalues for Multivariate Hermitian Matrix Functions
Let be Hermitian and analytic. It follows from (LABEL:eq:eigval_1der) that
Since and are analytic with respect to for , this implies the continuity, also the analyticity with respect to , of each partial derivative , and hence the existence of , everywhere. If the multiplicity of is one, differentiating both sides of (LABEL:eq:eigval_part_der) with respect to would yield the following expressions for the second partial derivatives.
Expressions similar to (LABEL:eq:eigval_2der_rep) can be obtained for the second partial derivatives when has multiplicity greater than one.
3.3 Analyticity of Singular Values
Some of the applications (see Section LABEL:sec:app_dyn_sys) concern the optimization of the th largest singular value of an analytic matrix-valued function. The singular value problems are special Hermitian eigenvalue problems. In particular, denoting the th largest singular value of an analytic matrix-valued function (not necessarily Hermitian) by , the set of eigenvalues of the Hermitian matrix-valued function
is . In the univariate case is the th largest of the analytic eigenvalues, , of . The multivariate -dimensional case is similar, with the exception that each eigenvalue is differentiable and analytic along every line in . Let us focus on the univariate case throughout the rest of this section. Extensions to the multi-variate case are similar to the previous sections. Suppose , with , , is the analytic eigenvector function as specified in Theorem LABEL:thm:Rellich of associated with , that is
The above equation implies
In other words, , are analytic, and consist of a pair of consistent left and right singular vectors associated with . To summarize, in the univariate case, can be considered as a signed analytic singular value of , and there is a consistent pair of analytic left and right singular vector functions, and , respectively.
Next, in the univariate case, we derive expressions for the first derivative of , in terms of the corresponding left and right singular vectors. It follows from the singular value equations (LABEL:eq:sing_val_defn) above that (if , this equality follows from analyticity). Now, the application of the expression (LABEL:eq:eigval_1der) yields
In terms of the unit left and right singular vectors and , respectively, associated with , we obtain
Notation: Throughout the rest of the text, we denote the eigenvalues of that are analytic in the univariate case (stated in Theorem LABEL:thm:Rellich), and differentiable and analytic along every line in the multivariate case (stated in Theorem LABEL:thm:anal_mul_var) with . On the other hand, or denotes the th largest eigenvalue, and or denotes the th largest singular value of .
4 Piece-wise Quadratic Support Functions
Let be eigenvalue functions of a Hermitian matrix-valued function that are analytic along every line in and differentiable on , and let be the box defined by
Consider the closed and connected subsets of , with as small as possible, such that , and for each and , and such that in the interior of none of the eigenvalue functions intersect each other. Define as follows:
where is analytic, and is a vector of indices such that
and for all in order to ensure the continuity of on . The extremal eigenvalue function fits into the framework.
We derive a piece-wise quadratic support function about a given point bounding from below for all , and such that . Let us focus on the direction , the univariate function , and the analytic univariate functions for . Also, let us denote the isolated points in the interval , where two distinct functions among intersect each other by . At these points, may not be differentiable. We have
where and . Due to the existence of the second partial derivatives of (since the expression (LABEL:eq:eigval_part_der) implies the analyticity of the first partial derivatives with respect to each parameter disjointly), there exists a constant that satisfies
Furthermore, for all . Thus, applying the mean value theorem to the analytic functions for and since , we obtain
By substituting the last inequality in (LABEL:eq:quad_der_mult_int), integrating the right-hand side of (LABEL:eq:quad_der_mult_int), and using (since is differentiable), we arrive at the following:
Suppose is an analytic and Hermitian matrix-valued function, the eigenvalue function is defined as in (LABEL:eq:eigval_defn) in terms of the eigenvalues of that are differentiable and analytic on every line in , and is a lower bound as in (LABEL:eq:lb_sec_der). Then the following inequality holds for all :
5 Simplified Piece-wise Quadratic Support Functions
5.1 Support Functions under Generic Simplicity
In various instances, the eigenvalue functions do not intersect each other at any generically. In such cases, for some , we have for all , therefore is analytic in the univariate case and analytic along every line in the multivariate case. For instance, the singular values of the matrix function involved in the definition of the -norm do not coalesce at any on a dense subset of the set of quadruples . Similar remarks apply to all of the specific eigenvalue optimization problems in Section LABEL:sec:app_dyn_sys.
Under the generic simplicity assumption, the piece-wise quadratic support function (LABEL:eq:quad_model_multi) simplifies to
Here, is a lower bound on for all . In many cases, it may be possible to obtain a rough lower bound numerically by means of the expressions for the second derivatives in Sections LABEL:subsec:sec_der and LABEL:subsec:multi_der, and exploiting the Lipschitz continuity of the eigenvalue and other eigenvalues.
5.2 Support Functions for Extremal Eigenvalues
Consider the extremal eigenvalue function
for given real numbers . A special case when are integers is discussed in Section LABEL:sec:app_min_sum_largest. When , this reduces to the maximal eigenvalue function in Section LABEL:sec:app_min_largest.
For simplicity, let us suppose that is differentiable at , about which we derive a support function below. This is generically the case. In the unlikely case of two eigenvalues coalescing at , the non-differentiability is isolated at this point. Therefore, is differentiable at all nearby points. For a fixed , as in the previous section, define for . Denote the points where either one of is not simple by in increasing order. These are the points where is possibly not analytic. At any , by we refer to the analytic function satisfying for all sufficiently close to . Similarly, refers to the analytic function satisfying for all sufficiently close to . Furthermore, and represent the right-hand and left-hand derivatives of , respectively.
The following relation holds for all : (i) for all sufficiently close to ; (ii) consequently .
Proof. The functions and are of the form
for some indices and for all sufficiently close to . In (LABEL:eq:leftright_pieces), the latter equality follows from for all sufficiently close to by definition. The former equality is due to for all sufficiently close to implying is a weighted sum of of the analytic eigenvalues with weights . We rephrase the inequality as where , where .
Note that for each since are the largest eigenvalues. In particular, their sum cannot be less than the sum of . Therefore,
Part (ii) is immediate from part (i) due to , where the inequality follows from an application of the Taylor’s theorem to and