Lower Bounds for Ground States of Condensed Matter Systems
Abstract
Standard variational methods tend to obtain upper bounds on the ground state energy of quantum manybody systems. Here we study a complementary method that determines lower bounds on the ground state energy in a systematic fashion, scales polynomially in the system size and gives direct access to correlation functions. This is achieved by relaxing the positivity constraint on the density matrix and replacing it by positivity constraints on moment matrices, thus yielding a semidefinite programme. Further, the number of free parameters in the optimization problem can be reduced dramatically under the assumption of translational invariance. A novel numerical approach, principally a combination of a projected gradient algorithm with Dykstra’s algorithm, for solving the optimization problem in a memoryefficient manner is presented and a proof of convergence for this iterative method is given. Numerical experiments that determine lower bounds on the ground state energies for the Ising and Heisenberg Hamiltonians confirm that the approach can be applied to large systems, especially under the assumption of translational invariance.
I Background and Motivation
The determination of the ground state properties of strongly interacting quantum systems is of considerable interest. As only a very few models are exactly solvable, numerical approximation methods are of significant importance. A key challenge in this respect is the ‘curse of dimensionality’, namely the exponential growth of the Hilbert space dimension with the number of sites. Ingenious methods such as the density matrix renormalization group (DMRG) approach White92 (); Schollwoeck11 () can overcome this problem as they amount to formulating a variational problem over a wellchosen class of quantum states that can be parametrized efficiently, i.e. with a polynomial number of parameters RommerOstlund1995 (). As the variation is over a subclass of all possible quantum states, variational methods of this type provide upper bounds on the ground state energy. Assessing the quality of the so obtained upper bounds on the ground state energy is then difficult. Hence there is considerable interest in the development of efficient methods for the determination of lower bounds on the ground state energy. One straightforward approach in this direction consists in the determination of Anderson bounds AndersonLower (), i.e. the decomposition of a Hamiltonian and the subsequent determination of the ground state energy of each whose sum then yields a lower bound on the ground state energy of . While useful, this approach does not scale well with the system size as it remains exponential in the size of the support of the . The application of DMRG to the may overcome the scaling issue but amounts to finding an ‘upper bound to a lower bound’ and does not resolve the principal issue with DMRG.
As early as , however, it was suggested to avoid the use of wave functions altogether and rather consider reduced density matrices together with conditions that these arise from a valid quantum state husimi40 () – the socalled Nrepresentability problem Coleman (); Garrod (). Varying over the reduced density matrices that are compatible with a valid physical state then allows for the determination of the ground state energy. The exact ground state energy can be obtained if all conditions for the Nrepresentability problem are known. If only partial conditions are specified, then one minimizes over too large a set of reduced density matrices and one obtains a lower bound on the ground state energy. The analytical treatment of this problem is highly challenging and at the time numerical solutions were hard to come by. Further progress was made by Mazziotti mazziotti01 (); mazziotti07 (), who pointed out that the problem can be formulated as a semidefinite programme and use can be made of the guaranteed and certified convergence of the primaldual interiorpoint algorithm for semidefinite programming boyd09 (); vandenberghe96 (). A general mathematical framework for this type of optimization problems and applications was given in Pironio10 (). Unfortunately, the primaldual interiorpoint algorithm and related methods do come with a considerable overhead in computation time and, crucially, memory that is making it hard to scale to very large systems. Lately, new algorithms to address the optimization problems of finding lower bounds on ground state energies have been developed mazziotti04 (); mazziotti11 () and considerable improvements towards the interiorpoint algorithms have been demonstrated.
In this work, we are exploring further the idea of relaxations of the ground state energy finding problem that allow us to determine lower bounds on the ground state energy in an efficient manner. To this end we formulate the problem, introduce a number of simplifications including the explicit treatment of translation invariance and symmetries and point out additional potential routes for numerical simplifications. In addition, we present a novel algorithm, principally a combination of a projected gradient algorithm with Dykstra’s algorithm Dattorro (); deutsch01 (); Baumgratz (), for solving the resulting optimization problem in a memoryefficient manner and prove its convergence. Then, in order to demonstrate the capabilities of this approach, we present numerical experiments that determine lower bounds on the ground state energies for two onedimensional (1D) lattice models, namely the Ising and Heisenberg Hamiltonian, which confirm that the approach can be applied for large systems especially when implementing translational invariance. Note that although we restrict our numerical analysis to 1D lattice models it has been demonstrated that the variational determination of ground state energies considering only constraints on reduced density matrices applies to higher spatial dimensions mazziotti10 () and to nonperiodic systems such as molecules in quantum chemistry mazziotti04 () as well. We complete this work with a discussion and outlook.
Ii Basics
We begin by explaining the general ideas underlying the method for the determination of lower bounds on ground state energies following mazziotti01 () and then specialize to the case of fermionic systems, for which later sections will provide numerical examples. Analogous ideas can also be implemented for bosonic systems or general spinsystems.
ii.1 Generalities
The determination of the ground state energy of a manybody system composed of particles and described by the Hamilton operator can be formulated as the minimization problem
(1) 
This approach is pushed to its limits rather rapidly as it uses the positivity constraint on , which grows dramatically with the system size, e.g. for particles with spin or spinless fermions is a matrix. Hence, this problem cannot be solved efficiently due to the exponentially increasing parameter space. One technique to deal with this difficulty is to parametrize a specific family of candidates for the ground state by a set of parameters . One then minimizes the expectation value of the Hamiltonian with respect to this family of states. As this family of states does not encompass the entire state space, this method will find a state which is optimal in and which yields an upper bound on the true ground state energy. This is the route taken by a variety of variational methods including, for example, DMRG White92 (); Schollwoeck11 (); RommerOstlund1995 ().
To overcome the scaling issues and to provide lower rather than upper bounds on the ground state energy, we need to relax the positivity constraint , replacing it by a weaker set of constraints which scale only polynomially in the system’s size mazziotti01 (); Pironio10 (). This enlarges the class of states over which one varies the energy and hence yields lower bounds on the ground state energy. To this end we use the fact that
(2) 
for all operators . Demanding only
(3) 
for some operator does not, in general, imply . For some subset of operators , we can then define and the requirement that for all choices of is then equivalent to the positivity of the matrix with components , imposing a particular Nrepresentability condition. Hence, replacing in the original optimization problem eq. (1) the condition with the weaker condition and omitting the normalization , we find that
(4) 
The present formulation still contains the density operator explicitly in the computation of . Judicious choice of the set alleviates this shortcoming. Indeed, writing , where are the coefficients of the operators in the Hamiltonian , one ensures that the set contains all operators such that for all the expectation value is in . Note, furthermore, that the entries of will generally not be independent but suffer some linear equality constraints, for example due to (anti)commutation relations between operators. Denoting by the set of all Hermitian matrices that satisfy and any other linear constraints, the energy minimization problem can be written as mazziotti01 (); mazziotti07 ()
(5) 
which is evidently a semidefinite programme boyd09 () and efficiently solvable as long as the set contains a number of operators that scale polynomially in system size. The decomposition of into , and hence the choice of the set , is not unique and the optimal choice is not obvious. From a practical point of view however, the systematic choice presented in the remainder of the paper leading to the socalled ppositivity conditions mazziotti01 (); mazziotti07 () does perhaps provide the most straightforward formulation.
ii.2 Bosons and Fermions
For concreteness, let us now consider the class of bosonic or fermionic operators defined by
(6) 
where are arbitrary complex variables. The Nrepresentability conditions arising from this class of operators are called 1positivity and 2positivity conditions, which is a terminology used to reveal the powers of the creation and annihilation operators and in . In general, allowing polynomials of order p in the secondquantized operators leads to restrictions called ppositivity conditions mazziotti01 (); mazziotti07 (). For the particular choice of operators of the order of 2, requiring the positivity of is equivalent to
(7) 
where the matrices comprising the second moments are defined as follows:
(8) 
and for the third and fourth moments, we find the and matrices
(9)  
In addition to the positivity constraint on the Hermitian matrix , where denotes the vector space of all Hermitian matrices with appropriate dimension, there are linear constraints relating its entries. In fact, the fermionic anticommutator relations
(10) 
or the bosonic commutator relations
(11) 
force some entries of to be equal, to differ by sign or to be a linear function of others.
Now, let be the set of all positive semidefinite Hermitian matrices of the form eq. (7) and satisfying the additional requirement that the entries of obey the constraints given by the commutation or anticommutation relations. Further, let be a function expressing the expectation value of a Hamiltonian with up to 4mode terms, i.e. . Then, the original optimization problem eq. (1) can be replaced by
(12) 
where the relaxed optimization problem scales polynomially in the system size . Note that it is straightforward to extend this formulation to higher moments, e.g. 3positivity conditions. In the following, we are going to present scenarios and strategies where the number of free variables can be decreased considerably.
ii.3 Consequences of Superselection Rules and Particle Number Conservation for Fermions
For the remainder we will consider fermionic systems. In that case, superselection rules require that the Hamiltonian commutes with the parity operator given by . As a consequence, only Hamiltonians comprising terms with an even number of creation and annihilation operators are physically allowed. This in turn ensures that only expectation values of products of an even number of creation and annihilation operators are nonvanishing. Hence, we find that we can restrict to take the form
(13) 
and to satisfy the constraints dictated by the fermionic anticommutator relations. See appendix B for a list of constraints on the entries for a formulation for up to fourth moments in the fermionic case.
If the Hamiltonian conserves the particle number, a further simplification ensues. In this case, each term in the Hamiltonian will consist of an equal number of annihilation and creation operators. Hence, the Hamiltonian is invariant under the transformation and and we find that expectation values vanish unless they concern operators with the same number of annihilation and creation operators so that can be assumed to take the form
(14) 
Note that the elements of the matrices are coefficients of the oneelectron reduced density matrix (1RDM), while the matrices are representations of the twoelectron reduced density matrix (2RDM) mazziotti07 (). Restricting matrices of the form eq. (14) to be positive semidefinite is equivalent with the positivity of the individual matrices and . This constraint on the matrices containing the second moments defines the 1positivity conditions, while the positivity of the matrices containing the fourth moments generates the 2positivity conditions mazziotti01 (); mazziotti07 (). For concreteness we list the matrices comprising the constraints on a system composed of two spinless fermions and the additional assumption of particle number conservation in appendix B.
ii.4 Exploiting Translational Invariance
So far we considered optimization problems for up to 2positivity conditions which reduced the number of free variables to scale as . Apart from superselection rules and the particle number conservation, no specific assumption on the structure of the underlying Hamiltonian has been exploited such that in principle this procedure is applicable to a wide range of different systems, e.g. quantum chemistry calculations mazziotti04 () or higher spacial dimensions mazziotti10 (). Since our intention is to apply this method to large lattice models, we now discuss a further simplification of the system to decrease the number of parameters, namely the translational invariance. Translation invariance with periodic boundary conditions of the physical system reduces the number of free variables considerably. In fact for the second moments, i.e. for the matrices and , this can be exploited immediately as these matrices then obey , etc. Consequently, for translational invariant systems with periodic boundary conditions, the matrices comprising the second moments become circulant matrices horn91 () and can therefore be diagonalized by a discrete Fourier transform
(15) 
for . Reducing the number of variables for the fourth moments is not so transparent as for the second moments, but manageable. To do so, note that the ordering of the entries of the matrices and is not fixed but can be chosen arbitrarily. Remember that the indices of these matrices are of the form for . Rearranging the twodigit indices labeling the rows and columns in the following way:
(16) 
the matrices comprising the fourth moments decompose in a block structure where each block can be diagonalized by the unitary matrix given above. For example, we find for the matrix
(17) 
where with given by eq. (15) and
(18) 
with diagonal matrices , . Hence, for translational invariant systems the matrices contained in set will have the following form:
(19) 
where the matrices denoting the second moments in the upper left block are diagonal matrices and the matrices considering the fourth moments have a block diagonal structure as described above. These considerations reduce the complexity of the constraints considerably and hence make the numerical study feasible for large systems.
ii.5 Varying System Size for Higher Moments
As the treatment of the fourthorder correlations is computationally costly, it may in some cases be advantageous to consider fourthorder constraints on a smaller lattice, while the computationally cheaper secondorder constraints are treated on the full lattice. In particular, while and are matrices, and are matrices leading to the unfavourable scaling. To overcome this issue, it might therefore be beneficial to restrict higher moments to smaller subsystems; that is, for the second moments one would consider system size , while for fourth moments one would consider where . Reducing the support of the matrices and , on the one hand, will lead to disproportionate savings in memory and computation time and, on the other hand, will certainly lead to a decrease of the lower bounds.
Iii The Projected Gradient Algorithm
So far we have described how to relax the original optimization problem eq. (1) and discussed how to include symmetries to reduce the number of free variables. Now we are going to describe an iterative algorithm for addressing this minimization problem. Recall the minimization problem we have to face, that is,
(20) 
where again the function expresses and the set consists of all positive semidefinite Hermitian matrices fulfilling the constraints dictated by the (anti)commutator relations and potential other symmetries of the system. Note that the function is indeed linear in the elements of matrix and hence could be written as
(21) 
where is a Hermitian matrix and . Furthermore, the set is the intersection of the convex cone consisting of positive semidefinite matrices and the affine set defined by the linear constraints on the entries. Hence is convex, too boyd09 (). It is well known that this optimization problem is in fact a semidefinite programme and as such can be solved efficiently. Standard primaldual interior point methods for semidefinite programming come with a considerable overhead in memory and time which makes their application to large systems challenging. Recently, two algorithms addressing the problem of ground state energy estimation were developed which scale better than the conventional methods mazziotti04 (); mazziotti11 (). Here, we continue the research for efficient algorithms and formulate an alternative numerical method whose efficacy we demonstrate in later sections.
To address the issues described above, we suggest to solve the minimization problem by a projected gradient algorithm calamai87 (). Since this algorithm relies heavily on projections onto convex sets, we need to define the latter. Let be a vector space and be a convex set. The projection of onto is defined by
(22) 
where denotes the Frobenius or HilbertSchmidt norm. As the name suggests, the projection determines the nearest point in the convex set with respect to a given norm. With this, we claim that the solution of problem eq. (20) could be found by iterating
(23) 
where the algorithm is initialized by a Hermitian matrix and is an arbitrary real and positive number. Furthermore, denotes the projection onto set , i.e. the intersection of the positive semidefinite Hermitian matrices and the set given by the linear constraints. The scheme of this algorithm is visualized in figure 1.
We find that the following lemma holds.
Lemma 1
Let be a linear function of the form where and is a constant. Assume that the function is lower bounded on the closed convex set , i.e. there is an such that for all . Let be the sequence determined by the iterations of the algorithm eq. (23). Then where for all .
A detailed proof of this lemma is presented in appendix A.
So far we have shifted the problem of minimizing the function with respect to the convex set to the problem of computing the projection onto . As pointed out above, this set is the intersection of two convex sets. This complicates the computation of the projection because in principle one has to solve the optimization problem eq. (22) in each step of the algorithm. The most obvious method at hand to overcome this difficulty is to use Dykstra’s algorithm deutsch01 (); birgin05 (). Dykstra’s algorithm is an elaborate technique to compute the projection of onto the intersection of several convex sets by means of modified iterative projections onto the individual sets. To apply this procedure to the computation onto we need to determine the projections onto the set of positive semidefinite Hermitian matrices and onto the affine set given by the constraints imposed by the canonical (anti)commutation relations and other symmetries. The former can be implemented straightforwardly. For the latter we were able to determine formulae for the submatrices in eq. (13) and thus gain a method to compute efficiently. Note that since is convex, is unique for all deutsch01 (). Further, the proposed algorithm finds a lower bound on the ground state energy by only considering boundary points of the feasible set and hence is slightly related to the algorithm proposed in mazziotti11 ().
There are two parameters defining the accuracy of the estimated lower bound. In principle the algorithm (23) converges for perfect projections to a matrix minimizing the function in the limit , where is the iteration number. As for all iterative algorithms we need to introduce stopping criteria, on the one hand, for the projected gradient algorithm and, on the other hand, for Dykstra’s algorithm. For the latter we use the robust stopping criterion introduced in birgin05 () and determined by a small number . For the former one can exploit the fact that the dual problem of the optimization problem eq. (20), which we identify as the primal problem, can be solved using the projected gradient algorithm, too. Note that the primal problem can be written as
(24) 
where we identify the matrices and with the vectors and such that . The Hermitian matrices comprise the linear constraints dictated by the fermionic anticommutator relations such that the matrix is an element of the affine set for all . The dual problem of eq. (24) is then for the dual variable vandenberghe96 ()
(25) 
Note that the constraints of the dual problem force the solution to lie both in the positive semidefinite cone and in the affine set given by the linear equations. The intersection of these two convex sets is again convex and the dual problem becomes
(26) 
where denotes the intersection of the positive semidefinite cone with the set given by the linear constraints . Finally, lemma 1 shows that the projected gradient algorithm can be applied to solve the dual problem as well.
The benefit of the formulation as a primaldual problem is that the solution of the dual problem is a lower bound on the solution of the primal problem. Hence, solving the primal and dual problems simultaneously using the projected gradient algorithm provides us with an ingenious certificate for the accuracy of the solution. Consequently, one way to introduce a stopping criterion for the projected gradient algorithm is to initiate a small quantity and stop the computations when
(27) 
where is basically the dual function up to a constant. Note that eq. (27) is equivalent to the duality gap of the primal and dual problem and that for strictly feasible primal problems, i.e. for optimization problems of the form eq. (24) for which there exists an such that , the duality gap is zero if evaluated at optimal points vandenberghe96 (). For the optimization problem eq. (20), one can easily find a point in the feasible set which is strictly feasible. Consequently, we have , where and are solutions of the primal and dual problems, respectively, and hence eq. (27) suits perfectly as a stopping criterion. Due to the numerical cost to solve the primal and dual problems simultaneously, we decide on using a different stopping criterion for the projected gradient algorithm. We stop when the improvement per iteration step, i.e.
(28) 
falls below a predetermined small number . We implemented both routines for systems considering only second moments and numerical experiments suggest that for a sufficiently small number the latter stopping criterion yields results that are as good as in the primaldual formalism.
Another degree of freedom in the formulation of the algorithm is the step length . Numerical examples propose that for the first step in the direction of the negative gradient should be chosen to be large. For the remaining steps the computational cost could be reduced by choosing a small but appropriate value of remarkalgorithm ().
Iv Applications
In the remainder of this work, we provide a number of numerical examples that demonstrate that the proposed method can be applied to rather large systems and is capable of extracting accurate information concerning the ground state energy as well as the correlation functions. The purpose of what follows is not to improve lower bounds on unknown ground state energies, but rather to analyse the behaviour of the proposed algorithm for finite but large lattice models.
iv.1 Second Moments and the Ising Model
We begin our exposition of the numerical application of the algorithm with a simple, exactly solvable model for which the relaxation of the energy minimization problem has been analysed in great detail exploiting different numerical methods in Mazziotti09 (). The Hamiltonian we are considering as a first application of the projected gradient algorithm is given by
(29)  
where the operators describe spinless fermions. The ground state of this model coincides with that of the Ising model in a transverse magnetic field with periodic boundary conditions
(30) 
when and is even. This can be seen from the connection of eq. (30) with eq. (29) via a JordanWigner transformation Takahashi (); ReuterThesis (). To analyse a translational invariant system with periodic boundary conditions, we simply change the signs of the second and the fourth terms in the Hamiltonian eq. (29).
Now we require only constraints that are due to second moments, i.e. we require that
(31) 
and that the entries fulfil the constrains dictated by the anticommutator relations. Solving the relaxed minimization problem eq. (20) by applying the projected gradient algorithm and comparison of the soobtained lower bound to the exact ground state energy Takahashi (); ReuterThesis ()
(32) 
yields the results presented in table 1. Note that the ground state energy for the system given by the Hamiltonian eq. (29) is computed by the formula (32) with parameter , while one chooses for the energy of a translational invariant system with periodic boundary conditions.
N  1000  5000  10000  50000  100000  500000 

In fact, it can be shown directly that the ground state energy of the Ising model is exactly reproduced with the present method as the Ising model can be diagonalized by an orthogonal mode transformation mapping second moments to second moments, this making the transformed optimization problem straightforward to solve Cai (). Thus the so obtained lower bounds are, in this case, superior to other approaches such as Anderson lower bounds. In particular, solving the optimization problem for the nontranslational invariant Hamiltonian eq. (29) for even and negative yields the exact ground state energies for the translational invariant Ising Hamiltonian eq. (30). Even for this Hamiltonian one is able to solve the optimization problem for large particle numbers since one is only considering the 1positivity conditions which scale linearly in the system size.
iv.2 Fourth Moments and the Heisenberg model
While the Ising model is quasifree it is of considerable interest to study interacting models, such as the Heisenberg Hamiltonian, that contain, for example, fourthorder fermionic operators (see Mazziotti06 () for the study of another interacting fermionic system where a different numerical approach is employed). In the following, we consider the performance of the proposed algorithm in this situation. The Heisenberg Hamiltonian
(33) 
can be transformed to spinless fermions by the JordanWigner transformation. In this picture, the Hamiltonian reads
(34)  
Now we are in a position to compare the exact ground state energies of the Heisenberg model with the lower bounds obtained by applying the suggested algorithm. Indeed, for the Heisenberg Hamiltonian with parameter the lower bounds on the ground state energy lie very close to the actual energies. Table 2 presents some numerical results.
Lower bound  Ratio  

6  4.9872  4.9974  0.9979 
8  6.7499  6.7739  0.9964 
10  8.5161  8.5557  0.9954 
12  10.2842  10.3412  0.9945 
14  12.0534  12.1291  0.9938 
16  13.8235  13.9189  0.9931 
18  15.5940  15.7099  0.9926 
20  17.3649  17.5021  0.9922 
The range of applicability of the projected gradient algorithm can be extended considerably by analysing a system that is translationally invariant. To this end, we make the Hamiltonian in eq. (34) translationally invariant with periodic boundary conditions. The results for the ground state energy of this Hamiltonian are presented in the lefthand side of figure 2. As a reference, we depict the exact ground state energies for this model for up to . Additionally, the righthand side of figure 2 presents the nearestneighbour and nextnearestneighbour correlations determined by the gradient projection algorithm.
To illustrate the rate of convergence of the algorithm for fourth moments, we present the improvement per iteration of the function for the translational invariant JordanWigner transformed Heisenberg Hamiltonian describing sites in figure 3. It is noticeable that the algorithm converges to a good estimation in only a few steps and that the most timeconsuming factor is given by the projection onto the set .
Note that in all calculations we initialized the algorithm with the zero matrix since we wanted to test the algorithm without imposing prior knowledge. Generally, for translational invariant systems it is advantageous to complete the circulant submatrices obtained from a run with site number to circulant matrices describing sites and use these matrices as an initial matrix for .
Furthermore, since the matrices comprising the fourth moments in the translational invariant case decompose into blocks of diagonal matrices, it is possible to parallelize the projection onto the positive semidefinite matrices. In fact, this step in the computation is the most expensive since one has to diagonalize the matrices in each step. Parallelizing this computation might speed up the calculations. So far, no effort has been made to do so since our intention is rather to demonstrate that this technique yields good results.
V Conclusions
In this work, we have presented a method for the determination of lower bounds on the ground state energy of quantum manybody systems. To this end, we followed a large body of work to relax the general ground state energy finding problem to obtain a semidefinite programme that involves a number of variables that scale polynomially with the system size. To overcome unfavourable scaling in time and memory of standard primaldual interior point solvers, we combined a projected gradient algorithm with Dykstra’s algorithm to obtain access to larger system sizes and proved its convergence. For quasifree systems the method yields the exact ground state energies, while for general interacting models it provides very good lower bounds and direct access to correlation functions between sites. Different properties of the underlying system such as higher spatial dimensions mazziotti10 (), bosonic degrees of freedom mazziotti04_2 (), symmetries mazziotti05 () and translational invariance can be included straightforwardly and the relative weight of higher order correlation functions can be adjusted to optimize for computational performance.
We hope that further investigations will help to optimize this approach and thus elucidate its power and limitations.
Vi Acknowledgements
We acknowledge support from the Bundesministerium für Bildung und Forschung (FK 01BQ1012), the EU Integrating Project QESSENCE and the Alexander von Humboldt Foundation. Interesting discussions with K. Audenaert, J. Cai, M. Cramer, D.A. Mazziotti and M. Navascues are acknowledged.
Appendix A Proof of Convergence
The analysis of lemma 1 requires some basic properties of the projection operators defined in eq. (22). The following results can be found, for example, in deutsch01 (), theorems and .
Lemma 2
Let be the projection onto the convex set .

For all and all we have .

For all we have .
The following proof is motivated by calamai87 ().
Proof of lemma 1
Before we start with the main proof, we need to establish the relation
(35) 
for and where the sequence is defined via
(36) 
i.e. the projected gradient algorithm. To verify relation (35), note that with the first inequality of lemma 2, we find
(37) 
since . Hence we have
(38) 
and relation (35) is established. Note that with this expression the following holds for all :
(39) 
In a first step, we show that the sequence is monotonically decreasing. We find that
(40) 
where , which holds for all . Hence the sequence is monotonically decreasing. Since we assume that the function is lower bounded on , we then conclude that this sequence converges for all , i.e. for . It remains to be shown that is indeed the minimum of the function in .
In a second step, we show that
(41) 
Since is linear in , we have
(42) 
and hence
(43) 
Since the sequence converges, i.e. , we find that
(44) 
Inequality (35) then yields
(45) 
and hence
(46) 
Thus we have
(47) 
In a third step, we show that the elements are bounded. Let be such that for all . Then
since . Note that with lemma 2 and the fact that for all , one can show that the last two terms are greater than or equal to zero. Therefore
(49) 
and consequently
(50) 
which leads to
(51) 
We conclude that the elements are bounded. As a consequence we know that, since is closed, there is at least one converging subsequence for , where is an accumulation point of the sequence .
In a last step, we show that for any accumulation point it holds that for all . With lemma 2, we find for all
(52) 
Further, we have
(53) 
where we exploited the CauchySchwarz inequality. Finally, we find that
(54) 
holds for all . Now choose a converging subsequence such that . We already know that due to equation (44). Further, equation (47) claims that for . Additionally, since the sequence is bounded, we know that for , where for all ,
(55) 
where we exploited that the sequence is upper bounded by , see equation (51) , and where is a constant which may depend on arbitrary . Hence we find for all and all accumulation points