Projection Algorithms for Finite Sum Constrained Optimization
Abstract.
Parallel and cyclic projection algorithms are proposed for minimizing the sum of a finite family of convex functions over the intersection of a finite family of closed convex subsets of a Hilbert space. These algorithms are of predictorcorrector type, with each main iteration consisting of an inner cycle of subgradient descent process followed by a projection step. We prove the convergence of these methods to an optimal solution of the composite minimization problem under investigation upon assuming boundedness of the gradients at the iterates of the local functions and the stepsizes being chosen appropriately, in the finitedimensional setting. We also discuss generalizations and limitations of the proposed algorithms and our techniques.
Key words and phrases:
convex feasibility, composite minimization, projection algorithm.2010 Mathematics Subject Classification:
90C25, 90C52, 65K10, 47J25.1. Introduction
We are concerned with a composite minimization problem, that is, we consider the case where the objective function is decomposed into the sum of a finite family of convex functions and the set of constraints is the intersection of finitely many closed convex subsets of a real Hilbert space . Precisely, the minimization problem under investigation in this paper is of the form
(1.1) 
where are positive integers, each set is a nonempty closed convex subset of , and each component function is a convex function. We always assume the feasible set .
Largescale optimization problems of form (1.1) naturally arise in modern applications, in particular, network design [16, 12] and machine learning [14, 26, 15]. When the constraint of (1.1) is defined explicitly by the system of inequalities, penalty and augmented Lagrangian techniques, as well as proximal and bundle methods can be applied to this problem. However, when projections onto the constraint sets are readily available, the treatment of constraints via projections techniques may be preferable as computationally robust and memory efficient. One approach that allows to apply projection methods to (1.1) is to replace the optimization problem (1.1) with a sequence of CFPs as is done in [13]. Our development is more direct: we build on the ideas of [11] to prove the convergence of subgradient projection techniques that utilize projections onto individual constraint sets. We note that despite a large body of work dedicated to solving convex feasibility problems via projection methods (see [23, 5, 17, 4, 8] for recent advancements and [3, 7] for textbook exposition) and vast literature on optimization methods that utilize a single projection onto the constraint set (for recent works see, e.g. [18, 25, 19]), little is done in combining optimization and projection steps on several sets, beyond the aforementioned paper by De Pierro and Helou Neto [11]. Our aim is to make a substantial contribution towards bridging this gap. Recent progress on forcing the convergence of Douglas–Rachford type methods to the smallest norm feasible point [1] also indicates that it may be possible to extend our approach to a larger class of projection techniques.
The convex feasibility problem (CFP) [2, 9] is formulated as
(1.2) 
Thus, the composite minimization problem (1.1) can alternatively be rephrased as finding a solution to the convex feasibility problem (1.2) which also minimizes the composite function as defined in (1.1). Consequently, two points should be taken into consideration of algorithmic approaches to (1.1):

the descent property of the values of the objective function , and

the (approximate) feasibility of the iterates generated by the algorithm.
To illustrate these points we consider the special case where and the function is smooth. In this case, (1.1) is reduced to the constrained convex minimization:
(1.3) 
The gradientprojection algorithm (GPA) can solve (1.3): GPA generates a sequence by the recursion process:
(1.4) 
where the initial guess is chosen arbitrarily, and is the stepsize. Assume:

The gradient of , , is Lipschitz (for some ):

The sequence of stepsizes, , satisfies the condition:
It is then easy to find that both points (a) and (b) hold (actually, (b) holds trivially); moreover, the sequence generated by GPA (1.4) converges [22, 28] weakly to a solution of (1.3) (if any).
Observe that the splitting of the objective function into the sum of (simpler) component functions, and the set of constraints into the intersection of (simpler) convex subsets aims at providing more efficient algorithmic approaches to (1.1) by utilizing the simpler structures of the component functions (for instance, the proximal mappings of are computable [10]) and of the sets (for instance, the projections possess closed formulae). This means that when we study algorithms for the composite optimization problem (1.1), we should use individual component functions and individual subsets at each iteration, not the full sum of the component functions , nor the full intersection of the sets .
The purpose of this paper is to analyse the convergence of parallel and cyclic projection algorithms for solving the optimization problem (1.1), significantly expanding the results of De Pierro and Helou Neto in [11] who focussed on the sequential projections version of the method. We provide a unified analysis of all three methods in the finitedimensional setting.
The projection algorithms studied in this paper start with an arbitrary point and produce the iterates (), alternating between subgradient and projection steps.
The generic form of our projection algorithm is as follows.
(Projection algorithm) 
Here by we denote the MoreauRockafellar subdifferential of the convex function at a point for any , and is the (modification of) projection operator, distinguishing the three methods. We have explicitly for
The sequential projection algorithm was introduced by De Pierro and Helou Neto in [11], in this case the projection step is a full cycle of projections onto the sets whose intersection comprises the feasibility region. Explicitly we have
(Sequential projections) 
In the finitedimensional case, De Pierro and Helou Neto discussed the convergence properties of the above algorithm (note that generalized the original method slightly, replacing gradients with subgradients; this does not affect the convergence analysis that relies on the convexity of the component objective functions rather than their differentiability). Moreover, they raised several open questions regarding projection algorithms for solving (1.1), one of which is whether the sequential projections in their algorithm can be replaced with the parallel projections. We answer this question in the affirmative, not only for parallel, but also for the cyclic version of the algorithm.
Our main result is the following direct generalization of [11, Theorem 1].
Theorem 1.1.
Let , suppose that the sets are closed and convex, and let . Assume that the realvalued convex functions , …, are defined on some convex subsets , …, of such that , , (for a choice of cyclic, sequential or parallel projection algorithm) and there exist constants , …, such that
Moreover, assume that the sequence (obtained via the chosen method) is bounded and
Then the sequence converges to the optimal value , and every cluster point of is an optimal solution of (1.1), given that the solution set is nonempty.
Note that our assumptions are standard in the analysis of numerical methods, and can be replaced by more constructive or convenient conditions, with some loss of generality.
The proof of our main result (Theorem 1.1) relies on the key property of asymptotic feasibility (that ensures the cluster points of the iterative sequence converge to the feasible set). We prove asymptotic feasibility for the methods of parallel and cyclic projections in Section 3, and present the complete proof of Theorem 1.1 in Section 4. Note that even though we follow the general framework of De Pierro and Helou Neto, our proofs of asymptotic feasibility for cyclic and parallel projections are based on entirely different ideas.
We begin our discussion with introducing some notation and other preliminary information and results in Section 2, and after presenting the proof of the main results in Sections 3 and 4, provide a discussion of some generalizations including the infinitedimensional setting, and some practical improvements and modifications of the methods.
2. Notation and Preliminaries
The fundamental tool of our argument in this paper is the concept of projections. Let be a real Hilbert space with inner product and norm , respectively, and let be a nonempty closed convex subset of . The (nearest point) projection from onto , dented by , is defined by
(2.1) 
The following wellknown properties are pertinent to our argument in Section 3.
Proposition 2.1.
Let be a real Hilbert space, and for any closed convex set let be the projector operator defined by (2.1). Then the following properties hold.

for all and .

for all ; in particular, is nonexpansive, namely,

for all and .
We also define the distance function from a point to a set as
Observe that for a closed convex set we have .
As mentioned earlier, the CFP (1.2) can be solved by the projection onto convex sets method (POCS), whose convergence is wellunderstood in the general context of real Hilbert spaces. We recall the wellknown convergence results of two major POCS algorithms [2, 9, 21, 27].
Theorem 2.2.
Beginning with an arbitrarily chosen initial guess , we iterate in either one of the following two projection algorithms:

Sequential (cyclic) projections: ;

Parallel projections: , with for all and ;
Then converges weakly to a solution of CFP (1.2), given that this solution set is nonempty.
Another key notion in our discussion is that of a convex function and Moreau–Rockafellar subdifferential [6]. Let be a convex subset of , and let be a convex function. A subgradient of at is a vector such that
The set of all subgradients of at is called the subdifferential and is denoted by .
Let
be the set of optimal solutions and the optimal value of the composite minimization problem (1.1), respectively. We shall always assume from now and onwards that .
Two problems are pertinent:

The sequence would (weakly) converge to an optimal solution ;

The sequence would converge to the optimal value .
If the answer to (a) is affirmative, then the answer to (b) is also positive.
The assumptions of Theorem 1.1 play a key role in establishing the aforementioned properties. We state and discuss these assumptions here explicitly for the clarity of exposition.
First, we make a standard assumption on the divergence of the series of diminishing stepsizes used at the gradient cycle of our projection algorithm: we require that
(2.2) 
The first condition ensures that the steps we make are indeed descent steps, and that the gradient step does not derail our progress with the convergence of projection steps to the feasible set. The second condition ensures that there is no artificial restriction on how far can the sequence of iterates depart from the initial point.
The second key assumption is a uniform Lipschitz bound on the components of the objective function. Explicitly, we use the following assumption on the subgradients of our functions,
(2.3) 
and we also let . Observe that this condition is satisfied naturally when these (realvalued) functions are defined on the whole finitedimensional space and the sequence is bounded. It is also wellknown (see [2, Proposition 7.8]) that the condition of a function having bounded gradients (subdifferentials) on bounded sets is equivalent to the function being bounded on bounded sets in the finitedimensional setting.
3. Asymptotic Feasibility of Parallel and Cyclic Projections
We are ready to prove two major technical results that concern the asymptotic feasibility of parallel and cyclic projections (Lemmas 3.1 and 3.6 respectively). Note that the relevant statement for the sequential projections was shown in [11].
3.1. Asymptotic Feasibility for Parallel Projections
Recall that the parallel projection algorithm (PPA) utilizes a convex combination of the projections on the sets ,…, on its projection step:
(PPA) 
Our goal is to prove the following result. We begin with several technical claims that we use in the proof that is deferred to the end of this subsection.
Lemma 3.1.
Assume , (2.3), and , and that the sequence generated by the method of parallel projections is bounded. Then is asymptotically feasible, that is, .
The following technical result is used in the subsequent proofs.
Lemma 3.2.
Let be a sequence generated by the parallel projections algorithm and assume that the Lipschitz condition (2.3) is satisfied. Then

, where .

for .


.
Proof.
(i) We have
(ii) For , we have
(iii) This is a straightforward consequence of (ii).
(iv) This is easily derived from (iii), (i) and the fact that a distance function of a convex set is Lipschitz continuous with Lipschitz constant one:
∎
Lemma 3.3.
Assume , , the condition (2.3) is satisfied, and is bounded. Then for any , there exists such that
(3.1) 
whenever is such that . Consequently, .
Proof.
Suppose not; then for some , we have a subsequence of such that and
(3.2) 
for all . It turns out from Lemma 3.2(iv) that
(3.3) 
Since is a bounded sequence in a finite dimensional space, we may assume that . We then get
(3.4) 
This implies that for every ; hence, . This contradicts the fact that . ∎
We are now ready to prove Lemma 3.1.
Proof of Lemma 3.1.
By Lemma 3.3 we have , hence, we can take such that and for all . Let . Consider two cases.
Case 1: . In this case, we have by Lemma 3.2(iii)
(3.5) 
Case 2: . Using Lemma 3.3, we obtain .
Note that Lemmas 3.3 and 3.1 can be generalized for the infinitedimensional setting. We discuss this in more detail in Section 5.1.
Remark 3.4.
We include a version of Lemma 3.2 for the sequential projection algorithm (SPA) that generates a sequence via the following iteration process:
(SPA) 
Lemma 3.5.
Let be generated by (SPA) and assume that the Lipschitz condition (2.3) is satisfied. Then

, where .

for .

.

.
Here and we use the convention . Note that as .
The proof of Lemma 3.5 follows the same line of the proof of Lemma 3.2. For instance, part (ii) can be proved by consecutively applying property (iii) of projections in Proposition 2.1 (it is also proved in [11]). Part (iv) can trivially be derived from (iii) by using the Lipschitz1 property of distance functions.
3.2. Asymptotic Feasibility for Cyclic Projections
Recall that the cyclic projection algorithm (CPA) alternates the full sequence of gradient steps with the individual projections on each one of the sets , …, , as follows.
(CPA) 
Our goal is to prove the following asymptotic feasibility result that mirrors Lemma 3.1.
Lemma 3.6.
Assume , (2.3), and , and that the sequence generated by the method of cyclic projections is bounded. Then is asymptotically feasible, that is, .
To prove this lemma, we need several technical claims. First, for any and define the exact cyclic projection
(3.7) 
We next show that such cyclic projections bring the iterations closer to the feasible set in a uniform sense.
Proposition 3.7.
Let be a nonempty compact convex subset of such that and . For each define a function ,
(3.8) 
The function is continuous and for all .
Proof.
We assume throughout that the compact convex set and the index are fixed and use the notation . We first show that for . For any closed convex set we have by Proposition 2.1(iii)
hence, for our setting
It is evident then that if , we have
where does not depend on . Therefore, taking the infimum over , we have for every
(3.9) 
and so
(3.10) 
Now let
Observe that explicitly
(3.11) 
The set is compact because it is the intersection of a compact set with a closed set , and is nonempty for every because . The function is continuous in , and since each of the sets is compact and nonempty, the supremum in (3.11) is attained, and we have
(3.12) 
Hence, for every there exists such that and
If , then . If , we have and from (3.10)
We next focus on showing that is continuous. Since
the function is nondecreasing, and to prove its continuity it is sufficient to show
(3.13) 
If , since is nondecreasing, we have , so for all and the first relation in (3.13) holds trivially. Consider the case . From (3.12) we know that there exists such that and . Let (so that ). Since is convex, we have . Let
Since by our assumption , we have . Now take any . We have
and hence is strictly increasing in for . From this together with the continuity of the distance function we deduce that for every there exists a sufficiently large such that
At the same time, by the continuity of for every there exists such that
This means that for every we can find and such that
and therefore we have the desired
Proposition 3.8.
Proof.
Using the nonexpansivity of the projection operator (Proposition 2.1 (ii)) we have
where . Since , we can always find a sufficiently large number to ensure the last term is smaller than for all . ∎
The next proposition brings us closer to the proof of Lemma 3.6.
Proposition 3.9.
Assume that is bounded, and condition (2.3) is satisfied. Then
Proof.
Assume that the claim is not true. Then for some starting point the sequence is bounded, but
Let be a subsequence of such that
Without loss of generality we may assume that and that , so that each is obtained after projecting onto .