Generalized Symmetric ADMM for Separable Convex Optimization ^{†}^{†}thanks: The work was supported by the National Science Foundation of China under grants 11671318 and 11571178, the Science Foundation of Fujian province of China under grant 2016J01028, and the National Science Foundation of U.S.A. under grant 1522654.
Abstract
The Alternating Direction Method of Multipliers (ADMM) has been proved to be effective for solving separable convex optimization subject to linear constraints. In this paper, we propose a Generalized Symmetric ADMM (GSADMM), which updates the Lagrange multiplier twice with suitable stepsizes, to solve the multiblock separable convex programming. This GSADMM partitions the data into two group variables so that one group consists of block variables while the other has block variables, where and are two integers. The two grouped variables are updated in a GaussSeidel scheme, while the variables within each group are updated in a Jacobi scheme, which would make it very attractive for a big data setting. By adding proper proximal terms to the subproblems, we specify the domain of the stepsizes to guarantee that GSADMM is globally convergent with a worstcase ergodic convergence rate. It turns out that our convergence domain of the stepsizes is significantly larger than other convergence domains in the literature. Hence, the GSADMM is more flexible and attractive on choosing and using larger stepsizes of the dual variable. Besides, two special cases of GSADMM, which allows using zero penalty terms, are also discussed and analyzed. Compared with several stateoftheart methods, preliminary numerical experiments on solving a sparse matrix minimization problem in the statistical learning show that our proposed method is effective and promising.
Keywords:
Separable convex programming Multiple blocks Parameter convergence domain Alternating direction method of multipliers Global convergence Complexity Statistical learningMsc:
65C60 65E05 68W40 90C06∎
1 Introduction
We consider the following grouped multiblock separable convex programming problem
(1) 
where are closed and proper convex functions (possibly nonsmooth); and are given matrices and vectors, respectively; and are closed convex sets; and are two integers. Throughout this paper, we assume that the solution set of the problem (1) is nonempty and all the matrices , , and , , have full column rank. And in the following, we denote , , , , and .
In the last few years, the problem (1) has been extensively investigated due to its wide applications in different fields, such as the sparse inverse covariance estimation problem RothmanBickel2008 () in finance and statistics, the model updating problem DongYuTian2015 () in the design of vibration structural dynamic system and bridges, the low rank and sparse representations LiuLiBai2017 () in image processing and so forth. One standard way to solve the problem (1) is the classical Augmented Lagrangian Method (ALM) Hestenes1969 (), which minimizes the following augmented Lagrangian function
where is a penalty parameter for the equality constraint and
(2) 
is the Lagrangian function of the problem (1) with the Lagrange multiplier . Then, the ALM procedure for solving (1) can be described as follows:
However, ALM does not make full use of the separable structure of the objective function of (1) and hence, could not take advantage of the special properties of the component objective functions and in (1). As a result, in many recent real applications involving big data, solving the subproblems of ALM becomes very expensive.
One effective approach to overcome such difficulty is the Alternating Direction Method of Multipliers (ADMM), which was originally proposed in GlowinskiMarrocco1975 () and could be regarded as a splitting version of ALM. At each iteration, ADMM first sequentially optimize over one block variable while fixing all the other block variables, and then follows by updating the Lagrange multiplier. A natural extension of ADMM for solving the multiblock case problem (1) takes the following iterations:
(3) 
Obviously, the scheme (3) is a serial algorithm which uses the newest information of the variables at each iteration. Although the above scheme was proved to be convergent for the twoblock, i.e., , separable convex minimization (see HeYuan2012 ()), as shown in Chen2016 (), the direct extension of ADMM (3) for the multiblock case, i.e., , without proper modifications is not necessarily convergent. Another natural extension of ADMM is to use the Jacobian fashion, where the variables are updated simultaneously after each iteration, that is,
(4) 
As shown in HeHouYuan2015 (), however, the Jacobian scheme (4) is not necessarily convergent either. To ensure the convergence, He et al. HeTaoYuan2015 () proposed a novel ADMMtype splitting method that by adding certain proximal terms, allowed some of the subproblems to be solved in parallel, i.e., in a Jacobian fashion. And in HeTaoYuan2015 (), some sparse lowrank models and image painting problems were tested to verify the efficiency of their method.
More recently, a Symmetric ADMM (SADMM) was proposed by He et al. HeMaYuan2016 () for solving the twoblock separable convex minimization, where the algorithm performs the following updating scheme:
(5) 
and the stepsizes were restricted into the domain
(6) 
in order to ensure its global convergence. The main improvement of HeMaYuan2016 () is that the scheme (5) largely extends the domain of the stepsizes of other ADMMtype methods HeLiuWangYuan2014 (). What’s more, the numerical performance of SADMM on solving the widely used basis pursuit model and the totalvariational image debarring model significantly outperforms the original ADMM in both the CPU time and the number of iterations. Besides, Gu, et al.GuJiang2015 () also studied a semiproximalbased strictly contractive PeacemanRachford splitting method, that is (5) with two additional proximal penalty terms for the and update. But their method has a nonsymmetric convergence domain of the stepsize and still focuses on the twoblock case problem, which limits its applications for solving largescale problems with multiple block variables.
Mainly motivated by the work of HeTaoYuan2015 (); HeMaYuan2016 (); GuJiang2015 (), we would like to generalize SADMM with more wider convergence domain of the stepsizes to tackle the multiblock separable convex programming model (1), which more frequently appears in recent applications involving big data Chandrasekaran2012 (); Ma2017 (). Our algorithm framework can be described as follows:
(7) 
In the above Generalized Symmetric ADMM (GSADMM), and are two stepsize parameters satisfying
(8) 
and are two proximal parameters^{1}^{1}1 Note that these two parameters are strictly positive in (7). In Section 4, however, we analyze two special cases of GSADMM allowing either or to be zero. for the regularization terms and . He and YuanHeyuan2015 () also investigated the above GSADMM (7) but restricted the stepsize , which does not exploit the advantages of using flexible stepsizes given in (8) to improve its convergence.
Major contributions of this paper can be summarized as the following four aspects:

Firstly, the new GSADMM could deal with the multiblock separable convex programming problem (1), while the original SADMM in HeMaYuan2016 () only works for the two block case and may not be convenient for solving largescale problems. In addition, the convergence domain for the stepsizes in (8), shown in Fig. 1, is significantly larger than the domain given in (6) and the convergence domain in GuJiang2015 (); Heyuan2015 (). For example, the stepsize can be arbitrarily close to when the stepsize is close to . Moreover, the above domain in (8) is later enlarged to a symmetric domain defined in (73), shown in Fig. 2. Numerical experiments in Sec. 5.2.1 also validate that using more flexible and relatively larger stepsizes can often improve the convergence speed of GSADMM. On the other hand, we can see that when , the stepsize can be chosen in the interval , which was firstly suggested by Fortin and Glowinski in FortinGlowinski1983 (); Glowinski1984 ().

Secondly, the global convergence of GSADMM as well as its worstcase ergodic convergence rate are established. What’s more, the total block variables are partitioned into two grouped variables. While a GaussSeidel fashion is taken between the two grouped variables, the block variables within each group are updated in a Jacobi scheme. Hence, parallel computing can be implemented for updating the variables within each group, which could be critical in some scenarios for problems involving big data.

Thirdly, we discuss two special cases of GSADMM, which is (7) with and or with and . These two special cases of GSADMM were not discussed in Heyuan2015 () and in fact, to the best of our knowledge, they have not been studied in the literature. We show the convergence domain of the stepsizes for these two cases is still defined in (8) that is larger than .

Finally, numerical experiments are performed on solving a sparse matrix optimization problem arising from the statistical learning. We have investigated the effects of the stepsizes and the penal parameter on the performance of GSADMM. And our numerical experiments demonstrate that by properly choosing the parameters, GSADMM could perform significantly better than other recently quite popular methods developed in BaiLiLi2017 (); HeTaoYuan2015 (); HeXuYuan2016 (); WangSong2017 ().
The paper is organized as follows. In Section 2, some preliminaries are given to reformulate the problem (1) into a variational inequality and to interpret the GSADMM (7) as a predictioncorrection procedure. Section 3 investigates some properties of and provides a lower bound of , where and are some particular symmetric matrices. Then, we establish the global convergence of GSADMM and show its convergence rate in an ergodic sense. In Section 4, we discuss two special cases of GSADMM, in which either the penalty parameters or is allowed to be zero. Some preliminary numerical experiments are done in Section 5. We finally make some conclusions in Section 6.
1.1 Notation
Denoted by be the set of real numbers, the set of dimensional real column vectors and the set of real matrices, respectively. For any , denotes their inner product and denotes the Euclidean norm of , where the superscript is the transpose. Given a symmetric matrix , we define . Note that with this convention, is not necessarily nonnegative unless is a positive definite matrix (). For convenience, we use and to stand respectively for the identity matrix and the zero matrix with proper dimension throughout the context.
2 Preliminaries
In this section, we first use a variational inequality to characterize the solution set of the problem (1). Then, we analyze that GSADMM (7) can be treated as a predictioncorrection procedure involving a prediction step and a correction step.
2.1 Variational reformulation of (1)
We begin with the following standard lemma whose proof can be found in HeMaYuan2016 (); He2016 ().
Lemma 1
Let and be two convex functions defined on a closed convex set and is differentiable. Suppose that the solution set is nonempty. Then, we have
It is wellknown in optimization that a triple point is called the saddlepoint of the Lagrangian function (2) if it satisfies
which can be also characterized as
Then, by Lemma 1, the above saddlepoint equations can be equivalently expressed as
(9) 
Rewriting (9) in a more compact variational inequality (VI) form, we have
(10) 
where
and
Noticing that the affine mapping is skewsymmetric, we immediately get
(11) 
Hence, (10) can be also rewritten as
(12) 
Because of the nonempty assumption on the solution set of (1), the solution set of the variational inequality is also nonempty and convex, see e.g. Theorem 2.3.5 FacchineiPang2003 () for more details. The following theorem given by Theorem 2.1 HeYuan2012 () provides a concrete way to characterize the set .
Theorem 2.1
The solution set of the variational inequality is convex and can be expressed as
2.2 A predictioncorrection interpretation of GSADMM
Following a similar approach in HeMaYuan2016 (), we next interpret GSADMM as a predictioncorrection procedure. First, let
(13) 
(14) 
and
(15) 
Then, by using the above notations, we derive the following basic lemma.
Lemma 2
Proof
Omitting some constants, it is easy to verify that the subproblem of GSADMM can be written as
where . Hence, by Lemma 1, we have and
for any . So, by the definition of (13) and (14), we get
(20) 
for any . By the way of generating in (7) and the definition of (14), the following relation holds
(21) 
Similarly, the subproblem () of GSADMM can be written as
where . Hence, by Lemma 1, we have and
for any . This inequality, by using (21) and the definition of (13) and (14), can be rewritten as
(22) 
for any . Besides, the equality (14) can be rewritten as
which is equivalent to
(23) 
Lemma 3
For the sequences and generated by GSADMM, the following equality holds
(24) 
where
(25) 
Proof
3 Convergence analysis of GSADMM
Compared with (12) and (16), the key to proving the convergence of GSADMM is to verify that the extra term in (16) converges to zero, that is,
In this section, we first investigate some properties of the sequence . Then, we provide a lower bound of . Based on these properties, the global convergence and worstcase convergence rate of GSADMM are established in the end.
3.1 Properties of
It follows from (11) and (16) that and
(26) 
Suppose . Then, the matrix defined in (25) is nonsingular. So, by (24) and a direct calculation, the righthand term of (26) is rewritten as
(27) 
where
(28) 
with defined in (18) and
(29) 
The following lemma shows that is a positive definite matrix for proper choice of the parameters .
Lemma 4
The matrix defined in (28) is symmetric positive definite if
(30) 
Proof
By the block structure of , we only need to show that the blocks and in (28) are positive definite if the parameters satisfy (30). Note that
(31) 
where
(32) 
If , is positive definite. Then, it follows from (31) that is positive definite if and all , , have full column rank.
Now, note that the matrix can be decomposed as
(33) 
where