An Efficient Inexact Symmetric Gauss-Seidel Based Majorized ADMM for High-Dimensional Convex Composite Conic Programming ††thanks: The research of the first author was supported by the China Scholarship Council while visiting the National University of Singapore and the National Natural Science Foundation of China (Grant No. 11271117). The research of the second and the third authors was supported in part by the Ministry of Education, Singapore, Academic Research Fund (Grant No. R-146-000-194-112).
In this paper, we propose an inexact multi-block ADMM-type first-order method for solving a class of high-dimensional convex composite conic optimization problems to moderate accuracy. The design of this method combines an inexact 2-block majorized semi-proximal ADMM and the recent advances in the inexact symmetric Gauss-Seidel (sGS) technique for solving a multi-block convex composite quadratic programming whose objective contains a nonsmooth term involving only the first block-variable. One distinctive feature of our proposed method (the sGS-imsPADMM) is that it only needs one cycle of an inexact sGS method, instead of an unknown number of cycles, to solve each of the subproblems involved. With some simple and implementable error tolerance criteria, the cost for solving the subproblems can be greatly reduced, and many steps in the forward sweep of each sGS cycle can often be skipped, which further contributes to the efficiency of the proposed method. Global convergence as well as the iteration complexity in the non-ergodic sense is established. Preliminary numerical experiments on some high-dimensional linear and convex quadratic SDP problems with a large number of linear equality and inequality constraints are also provided. The results show that for the vast majority of the tested problems, the sGS-imsPADMM is 2 to 3 times faster than the directly extended multi-block ADMM with the aggressive step-length of , which is currently the benchmark among first-order methods for solving multi-block linear and quadratic SDP problems though its convergence is not guaranteed.
Keywords:Convex Conic Programming Convex Quadratic Semidefinite Programming Symmetric Gauss-Seidel Alternating Direction Method of Multipliers Majorization
Msc:90C25 90C22 90C06 65K05
The objective of this paper is to design an efficient first-order method for solving the following high-dimensional convex composite quadratic conic programming problem to moderate accuracy:
where is a closed proper convex function, is a self-adjoint positive semidefinite linear operator, is a linear map, and are given data, is a closed convex cone, and are two real finite dimensional Euclidean spaces each equipped with an inner product and its induced norm . Here, the phrase “high-dimension” means that the linear operators and/or are too large to be stored explicitly or to be factorized by the Cholesky decomposition. By introducing a slack variable , one can equivalently recast (1.1) as
Then, solving the dual of problem (1.2) is equivalent to solving
where is any subspace containing , is the dual cone of defined by , is the Fenchel conjugate of the convex function . Particular examples of (1.1) include convex quadratic semidefinite programming (QSDP), convex quadratic programming (QP), nuclear semi-norm penalized least squares and robust PCA (principal component analysis) problems. One may refer to lithesis (); li14 () and references therein for a brief introduction on these examples.
Let and be given nonnegative integers, , , and , , be finite dimensional real Euclidean spaces each endowed with an inner product and its induced norm . Define and . Problem (1.3) falls within the following general convex composite programming model:
where and are two closed proper convex functions, and are continuously differentiable convex functions whose gradients are Lipschitz continuous. The linear mappings and are defined such that their adjoints are given by for , and for , where and are the adjoints of the linear maps and respectively. For notational convenience, in the subsequent discussions we define the functions and by and . For problem (1.3), one can express it in the generic form (1.4) by setting
There are various numerical methods available for solving problem (1.4). Among them, perhaps the first choice is the augmented Lagrangian method (ALM) pioneered by Hestenes hestenes69 (), Powell powell () and Rockafellar rockafellar (), if one does not attempt to exploit the composite structure in (1.4) to gain computational efficiency. Let be a given penalty parameter. The augmented Lagrangian function of problem (1.4) is defined as follows: for any ,
Given an initial point , the ALM consists of the following iterations:
where is the step-length. A key attractive property of the ALM and its inexact variants, including the inexact proximal point algorithm (PPA) obeying summable error tolerance criteria proposed by Rockafellar rockafellar (); rockafellar76 () and the inexact PPAs proposed by Solodov and Svaiter solodov99 (); solodov992 (); solodov00 () using relative error criteria, is their fast local linear convergence property when the penalty parameter exceeds a certain threshold. However, it is generally difficult and expensive to solve the inner subproblems in these methods exactly or to high accuracy, especially in high-dimensional settings, due to the coupled quadratic term interacting with two nonsmooth functions in the augmented Lagrangian function. By exploiting the composite structure of (1.4), one may use the block coordinate descent (BCD) method to solve the subproblems in (1.5) inexactly. However, it can also be expensive to adopt such a strategy as the number of BCD-type cycles needed to solve each subproblem to the required accuracy can be large. In addition, it is also computationally not economical to use the ALM during the early stage of solving problem (1.4) when the fast local linear convergence of ALM has not kicked in.
A natural alternative to the ALM, for solving linearly constrained -block convex optimization problems such as (1.4), is the alternating direction method of multipliers (ADMM) fortin83 (); Gabay:1976ff (); glo80book (); glo75 (), which solves and alternatively in a Gauss-Seidel fashion (one may refer to eck12 () for a recent tutorial on the ADMM). Computationally, such a strategy can be beneficial because solving or by fixing the other variable in (1.5) is potentially easier than solving and simultaneously. Just as in the case for the ALM and PPAs, one may also have to solve the ADMM subproblems approximately. Indeed, for this purpose, Eckstein and Bertsekas eckstein1992 () proposed the first inexact version of the ADMM based on the PPA theory and Eckstein eckstein1994 () introduced a proximal ADMM (PADMM) to make the subproblems easier to solve. The inexact version of Eckstein’s PADMM and its extensions can be found in he02 (); li-inexact (); ng11 (), to name a few. These ADMM-type methods are very competitive for solving certain 2-block separable problems and they have been used frequently to generate a good initial point to warm-start the ALM. However, for many other cases such as the high-dimensional multi-block convex composite conic programming problem (1.1) and its dual (1.3), it can be very expensive to solve the ADMM subproblems (each is a composite problem with smooth and nonsmooth terms in two or more blocks of variables) to high accuracy. Also, by using BCD-type methods to solve these subproblems, one may still face the same drawback as in solving the ALM subproblems by requiring an unknown number of BCD inner iteration cycles. One strategy which may be adopted to alleviate the computational burden in solving the subproblems is to divide the variables in (1.4) into three or more blocks (depending on its composite structure), and to solve the resulting problems by a multi-block ADMM-type method (by directly extending the 2-block ADMM or PADMM to the multi-block setting). However, such a directly extended method may be non-convergent as was shown in chen14 (), even if the functions and are separable with respect to these blocks of variables, despite ample numerical results showing that it is often practically efficient and effective sun13 (). Thus, different strategies are called for to deal with the numerical difficulty just mentioned.
Our primary objective in this paper is to construct a multi-block ADMM-type method for solving high-dimensional multi-block convex composite optimization problems to medium accuracy with the essential flexibility that the inner subproblems are allowed to be solved only approximately. We should emphasize that the flexibility is essential because this gives us the freedom to solve large scale linear systems of equations (which typically arise in high-dimensional problems) approximately by an iterative solver such as the conjugate gradient method. Without such a flexibility, one would be forced to modify the corresponding subproblem by adding an appropriately chosen “large” semi-proximal term so as to get a closed-form solution for the modified subproblem. But such a modification can sometimes significantly slow down the outer iteration as we shall see later in our numerical experiments.
In this paper, we achieve our goal by proposing an inexact symmetric Gauss-Seidel (sGS) based majorized semi-proximal ADMM (we name it as sGS-imsPADMM for easy reference) for solving (1.4), for which each of its subproblems only needs one cycle of an inexact sGS iteration instead of an unknown number of cycles. Our method is motivated by the works of limajorize () and lisgs () in that it is developed via a novel integration of the majorization technique used in limajorize () with the inexact symmetric Gauss-Seidel iteration technique proposed in lisgs () for solving a convex minimization problem whose objective is the sum of a multi-block quadratic function and a non-smooth function involving only the first block. However, non-trivially, we also design checkable inexact minimization criteria for solving the sPADMM subproblems while still being able to establish the convergence of the inexact method. Our convergence analysis relies on the key observation that the results in sun13 () and lisgs () are obtained via establishing the equivalence of their proposed algorithms to particular cases of the -block sPADMM in fazel13 () with some specially constructed semi-proximal terms. Owing to the inexact minimization criteria, the cost for solving the subproblems in our proposed algorithm can greatly be reduced. For example, one can now solve a very large linear system of equations via an iterative solver to an appropriate accuracy instead of a very high accuracy as required by a method with no inexactness flexibility.
Moreover, by using the majorization technique, the two smooth functions and in (1.4) are allowed to be non-quadratic. Thus, the proposed method is capable of dealing with even more problems beyond the scope of convex quadratic conic programming. The success of our proposed inexact sGS-based ADMM-type method would thus also meet the pressing demand for an efficient algorithm to find a good initial point to warm-start the augmented Lagrangian method so as to quickly enjoy its fast local linear convergence.
To summarize, the main contribution of this paper is that by taking advantage of the inexact sGS technique in lisgs (), we develop a simple, implementable and efficient inexact first-order algorithm, the sGS-imsPADMM, for solving high-dimensional multi-block convex composite conic optimization problems to moderate accuracy. We have also established the global convergence as well as the non-ergodic iteration complexity of our proposed method. Preliminary numerical experiments on the class of high-dimensional linear and convex quadratic SDP problems with a large number of linear equality and inequality constraints are also provided. The results show that on the average, the sGS-imsPADMM is 2 to 3 times faster than the directly extended multi-block ADMM even with the aggressive step-length of 1.618, which is currently the benchmark among first-order methods for solving multi-block linear and quadratic SDPs though its convergence is not guaranteed.
The remaining parts of this paper are organized as follows. In Sec. 2, we present some preliminary results from convex analysis. In Sec. 3, we propose an inexact two-block majorized sPADMM (imsPADMM), which lays the foundation for later algorithmic developments. In Sec. 4, we give a quick review of the inexact sGS technique developed in lisgs () and propose our sGS-imsPADMM algorithm for the multi-block composite optimization problem (1.4), which constitutes as the main result of this paper. Moreover, we establish the relationship between the sGS-imsPADMM and the imsPADMM to substantially simplify the convergence analysis. In Sec. 5, we establish the global convergence of the imsPADMM. Hence, the convergence of the sGS-imsPADMM is also derived. In Sec. 6, we study the non-ergodic iteration complexity of the proposed algorithm. In Sec. 7, we present our numerical results, as well as some efficient computational techniques designed in our implementation. We conclude the paper in Sec. 8.
Let and be two arbitrary finite dimensional real Euclidean spaces each endowed with an inner product and its induced norm . For any linear map , we use to denote its adjoint and to denote its induced norm. For a self-adjoint positive semidefinite linear operator , there exists a unique self-adjoint positive semidefinite linear operator, denoted as , such that . For any , define and . Moreover, for any set , we define and denote the relative interior of by . For any , we have
Let be an arbitrary closed proper convex function. We use to denote its effective domain and to denote its subdifferential mapping. The proximal mapping of associated with is defined by , . It holds lemarechal () that
We say that the Slater constraint qualification (CQ) holds for problem (1.4) if it holds that
When the Slater CQ holds, we know from (rocbook, , Corollaries 28.2.2 & 28.3.1) that is a solution to (1.4) if and only if there is a Lagrangian multiplier such that is a solution to the following Karush-Kuhn-Tucker (KKT) system
If satisfies (2.3), from (rocbook, , Corollary 30.5.1) we know that is an optimal solution to problem (1.4) and is an optimal solution to the dual of this problem. To simplify the notation, we denote and . The solution set of the KKT system (2.3) for problem (1.4) is denoted by .
3 An Inexact Majorized sPADMM
Since the two convex functions and in problem (1.4) are assumed to be continuously differentiable with Lipschitz continuous gradients, there exist two self-adjoint positive semidefinite linear operators and such that for and ,
Let . The majorized augmented Lagrangian function of problem (1.4) is defined by, for any and ,
Let and be two self-adjoint positive semidefinite linear operators such that
Suppose that is a sequence in . For convenience, we define the two functions and by
Now, we are ready to present our inexact majorized sPADMM for solving problem (1.4) and some relevant results.
Algorithm imsPADMM: An inexact majorized semi-proximal ADMM for solving problem (1.4). Let be the step-length and be a summable sequence of nonnegative numbers. Choose the linear operators and such that and in (3.2). Let be the initial point. For , perform the following steps: Step 1. Compute and such that and (3.3) Step 2. Compute and such that and (3.4) Step 3. Compute .
Noting that and , we can write Also, we have . By using (2.2) we can get
Thus, by using the Cauchy-Schwarz inequality, we can readily obtain that . Similarly, we can obtain that
This, together with and , implies that
and this completes the proof. ∎
4 An imsPADMM with Symmetric Gauss-Seidel Iteration
We first present some results on the one cycle inexact symmetric Gauss-Seidel (sGS) iteration technique introduced in lisgs (). Let be a given integer and with each being a finite dimensional real Euclidean space. For any we write . Let be a given self-adjoint positive semidefinite linear operator with the following block decomposition:
where are self-adjoint positive definite linear operators, , are linear maps. We also define . Note that and is positive definite. To simplify later discussions, for any , we denote , , . We also define the self-adjoint positive semidefinite linear operator by
Let be a given closed proper convex function and be a given vector. Consider the quadratic function defined by , . Let , be given error tolerance vectors with . Define
Suppose that is a given vector. We want to compute
We have the following result, established by Li, Sun and Toh in lisgs () to generalize and reformulate their Schur complement based decomposition method in li14 () to the inexact setting, for providing an equivalent implementable procedure for computing . This result is essential for our subsequent algorithmic developments.
Proposition 4.1 (sGS decomposition)
We should note that the above sGS decomposition theorem is valid only when the (possibly nonsmooth) function is dependent solely on the first block variable , and it is not applicable if there is an additional nonsmooth convex function involving another block of variable. In the above proposition, one should interpret in (4.5) and in (4.6) as approximate solutions to the minimization problems without the terms involving and . Once these approximate solutions have been computed, they would generate the error vectors and . With these known error vectors, we know that and are actually the exact solutions to the minimization problems in (4.5) and (4.6). It is important for us to emphasize that when solving the subproblems in the forward GS sweep in (4.6) for , we may try to estimate by using , and in this case the corresponding error vector would be given by . In order to avoid solving the -th problem in (4.6), one may accept such an approximate solution if the corresponding error vector satisfies an admissible condition such as for some constant , say .
We now show how to apply the sGS iteration technique in Proposition 4.1 to the imsPADMM proposed in Sec. 3. We should note that in the imsPADMM, the main issue is how to choose and , and how to compute and . For later discussions, we use the following decompositions
for and , respectively, which are consistent with the decompositions of and .
First, We choose two self-adjoint positive semidefinite linear operators and for the purpose making the minimization subproblems involving and easier to solve. We need and to satisfy the conditions that as well as . With appropriately chosen and , we can assume that the well-defined optimization problems
can be solved to arbitrary accuracy for any given and .
Next, for , we choose a linear operator such that and similarly, for , we choose a linear operator such that .
Now, we define the linear operators
Moreover, define and analogously as in (4.1) for and , respectively, and
Then, and . Moreover, we define the following linear operators:
where and are defined as in (4.2). Define the two constants
Based on the above discussions, we are ready to present the sGS-imsPADMM algorithm for solving problem (1.4).
Algorithm sGS-imsPADMM: An inexact sGS based majorized semi-proximal ADMM for solving problem (1.4). Let be the step-length and be a summable sequence of nonnegative numbers. Let be the initial point. For , perform the following steps: Step 1a. (Backward GS sweep) Compute for , Step 1b. (Forward GS sweep) Compute for ,