1 Introduction

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Shujun Bi2,  Xiaolan Liu3  and  Shaohua Pan4

November 10, 2011

(First revised July 15, 2012)

(Second revised March 20, 2013)

(Final revision March 20, 2014)

Abstract. We reformulate the zero-norm minimization problem as an equivalent mathematical program with equilibrium constraints and establish that its penalty problem, induced by adding the complementarity constraint to the objective, is exact. Then, by the special structure of the exact penalty problem, we propose a decomposition method that can seek a global optimal solution of the zero-norm minimization problem under the null space condition in [23] by solving a finite number of weighted -norm minimization problems. To handle the weighted -norm subproblems, we develop a partial proximal point algorithm where the subproblems may be solved approximately with the limited memory BFGS (L-BFGS) or the semismooth Newton-CG. Finally, we apply the exact penalty decomposition method with the weighted -norm subproblems solved by combining the L-BFGS with the semismooth Newton-CG to several types of sparse optimization problems, and compare its performance with that of the penalty decomposition method [25], the iterative support detection method [38] and the state-of-the-art code [39]. Numerical comparisons indicate that the proposed method is very efficient in terms of the recoverability and the required computing time.

Key words: zero-norm minimization; MPECs; exact penalty; decomposition method.

## 1 Introduction

Let be the real vector space of dimension endowed with the Euclidean inner product and induced norm . We consider the following zero-norm minimization problem

 minx∈IRn{∥x∥0: ∥Ax−b∥≤δ}, (1)

where and are given data, is a given constant, and

 ∥x∥0:=n∑i=1card(xi)  with  card(xi)={1if xi≠0,0otherwise.

Throughout this paper, we denote by the feasible set of problem (1) and assume that it is nonempty. This implies that (1) has a nonempty set of globally optimal solutions.

The problem (1) has very wide applications in sparse reconstruction of signals and images (see, e.g., [14, 6, 10, 11]), sparse model selection (see, e.g., [36, 15]) and error correction [8]. For example, when considering the recovery of signals from noisy data, one may solve (1) with some given . However, due to the discontinuity and nonconvexity of zero-norm, it is difficult to find a globally optimal solution of (1). In addition, each feasible solution of (1) is locally optimal, but the number of its globally optimal solutions is finite when , which brings in more difficulty to its solving. A common way is to obtain a favorable locally optimal solution by solving a convex surrogate problem such as the -norm minimization or -norm regularized problem. In the past two decades, this convex relaxation technique became very popular due to the important results obtained in [13, 14, 36, 12]. Among others, the results of [13, 14] quantify the ability of -norm minimization problem to recover sparse reflectivity functions. For brief historical accounts on the use of -norm minimization in statistics and signal processing, please see [29, 37]. Motivated by this, many algorithms have been proposed for the -norm minimization or -norm regularized problems (see, e.g., [9, 17, 24]).

Observing the key difference between the -norm and the -norm, Candès et al. [8] recently proposed a reweighted -norm minimization method (the idea of this method is due to Fazel [16] where she first applied it for the matrix rank minimization). This method is solving a sequence of convex relaxations of the following nonconvex surrogate

 minx∈IRn{n∑i=1ln(|xi|+ε): ∥Ax−b∥≤δ}. (2)

This class of surrogate problems are further studied in [32, 41]. In addition, noting that the -norm tends to as , many researchers seek a locally optimal solution of the problem (1) by solving the nonconvex approximation problem

 minx∈IRn{∥x∥pp: ∥Ax−b∥≤δ}

or its regularized formulation (see, e.g., [18, 10]). Extensive computational studies in [8, 18, 10, 32] demonstrate that the reweighted -norm minimization method and the -norm nonconvex approximation method can find sparser solutions than the -norm convex relaxation method. We see that all the methods mentioned above are developed by the surrogates or the approximation of zero-norm minimization problem.

In this paper, we reformulate (1) as an equivalent MPEC (mathematical program with equilibrium constraints) by the variational characterization of zero-norm, and then establish that its penalty problem, induced by adding the complementarity constraint to the objective, is exact, i.e., the set of globally optimal solutions of the penalty problem coincides with that of (1) when the penalty parameter is over some threshold. Though the exact penalty problem itself is also difficult to solve, we exploit its special structure to propose a decomposition method that is actually a reweighted -norm minimization method. This method, consisting of a finite number of weighted -norm minimization, is shown to yield a favorable locally optimal solution, and moreover a globally optimal solution of (1) under the null space condition in [23]. For the weighted -norm minimization problems, there are many softwares suitable for solving them such as the alternating direction method software YALL1 [42], and we here propose a partial proximal point method where the proximal point subproblems may be solved approximately with the L-BFGS or the semismooth Newton-CG method (see Section 4).

We test the performance of the exact penalty decomposition method with the subproblems solved by combining the L-BFGS and the semismooth Newton-CG for the problems with several types of sparsity, and compare its performance with that of the penalty decomposition method (QPDM) [25], the iterative support detection method (ISDM) [38] and the state-of-the-art code [39]. Numerical comparisons show that the proposed method has a very good robustness, can find the sparsest solution with desired feasibility for the Sparco collection, has comparable recoverability with from fewer observations for most of randomly generated problems which is higher than that of and , and requires less computing time than .

Notice that the ISDM proposed in [38] is also a reweighted -norm minimization method in which, the weight vector involved in each weighted minimization problem is chosen as the support of some index set. Our exact penalty decomposition method shares this feature with ISDM, but the index sets to determine the weight vectors are automatically yielded by relaxing the exact penalty problem of the MPEC equivalent to the zero-norm problem (1), instead of using some heuristic strategy. In particular, our theoretical analysis for the exact recovery is based on the null space condition in [23] which is weaker than the truncated null space condition in [38]. Also, the weighted -norm subproblems involved in our method are solved by combining the L-BFGS with the semismooth Newton-CG method, while such problems in [38] are solved by applying YALL1 [42] directly. Numerical comparisons show that the hybrid of the L-BFGS with the semismooth Newton-CG method is effective for handling the weighted -norm subproblems. The penalty decomposition method proposed by Lu and Zhang [25] aims to deal with the zero-norm minimization problem (1), but their method is based on a quadratic penalty for the equivalent augmented formulation of (1), and numerical comparisons show that such penalty decomposition method has a very worse recoverability than which is designed for solving the -minimization problem.

Unless otherwise stated, in the sequel, we denote by a column vector of all s whose dimension is known from the context. For any , denotes the sign vector of , is the vector of components of being arranged in the nonincreasing order, and denotes the subvector of components whose indices belong to . For any matrix , we write as the null space of . Given a point and a constant , we denote by the -open neighborhood centered at .

## 2 Equivalent MPEC formulation

In this section, we provide the variational characterization of zero-norm and reformulate (1) as a MPEC problem. For any given , it is not hard to verify that

 ∥x∥0=minv∈IRn{⟨e,e−v⟩: ⟨v,|x|⟩=0, 0≤v≤e}. (3)

This implies that the zero-norm minimization problem (1) is equivalent to

 minx,v∈IRn{⟨e,e−v⟩: ∥Ax−b∥≤δ, ⟨v,|x|⟩=0, 0≤v≤e}, (4)

which is a mathematical programming problem with the complementarity constraint:

 ⟨v,|x|⟩=0, v≥0, |x|≥0.

Notice that the minimization problem on the right hand side of (3) has a unique optimal solution , although it is only a convex programming problem. Such a variational characterization of zero-norm was given in [1] and [21, Section 2.5], but there it is not used to develop any algorithms for the zero-norm problems. In the next section, we will develop an algorithm for (1) based on an exact penalty formulation of (4).

Though (4) is given by expanding the original problem (1), the following proposition shows that such expansion does not increase the number of locally optimal solutions.

###### Proposition 2.1

For problems (1) and (4), the following statements hold.

(a)

Each locally optimal solution of (4) has the form .

(b)

is locally optimal to (1) if and only if is locally optimal to (4).

Proof.  (a) Let be an arbitrary locally optimal solution of (4). Then there is an open neighborhood such that for all , where is the feasible set of (4). Consider (3) associated to , i.e.,

 minv∈IRn{⟨e,e−v⟩: ⟨v,|x∗|⟩=0, 0≤v≤e}. (5)

Let be an arbitrary feasible solution of problem (5) and satisfy . Then, it is easy to see that , and so This shows that is a locally optimal solution of (5). Since (5) is a convex optimization problem, is also a globally optimal solution. However, is the unique optimal solution of (5). This implies that .

(b) Assume that is a locally optimal solution of (1). Then, there exists an open neighborhood such that for all . Let be an arbitrary feasible point of (4) such that . Then, since is a feasible point of the problem on the right hand side of (3), we have

 ⟨e,e−v⟩≥∥x∥0≥∥x∗∥0=⟨e,e−sign(|x∗|)⟩.

This shows that is a locally optimal solution of (4).

Conversely, assume that is a locally optimal solution of (4). Then, for any sufficiently small , it clearly holds that for all . This means that for any , we have . Hence, is a locally optimal solution of (1). The two sides complete the proof of part (b).

## 3 Exact penalty decomposition method

In the last section we established the equivalence of (1) and (4). In this section we show that solving (4), and then solving (1), is equivalent to solving a single penalty problem

 minx,v∈IRn{⟨e,e−v⟩+ρ⟨v,|x|⟩: ∥Ax−b∥≤δ, 0≤v≤e} (6)

where is the penalty parameter, and then develop a decomposition method for (4) based on this penalty problem. It is worthwhile to point out that there are many papers studying exact penalty for bilevel linear programs or general MPECs (see, e.g., [4, 5, 28, 26]), but these references do not imply the exactness of penalty problem (6).

For convenience, we denote by and the feasible set and the optimal solution set of problem (4), respectively; and for any given , denote by and the feasible set and the optimal solution set of problem (6), respectively.

###### Lemma 3.1

For any given , the problem (6) has a nonempty optimal solution set.

Proof.  Notice that the objective function of problem (6) has a lower bound in , to say . Therefore, there must exist a sequence such that for each ,

 ⟨e,e−vk⟩+ρ⟨|xk|,vk⟩≤α∗+1k. (7)

Since the sequence is bounded, if the sequence is also bounded, then by letting be an arbitrary limit point of , we have and

 ⟨e,e−¯¯¯v⟩+ρ⟨|¯¯¯x|,¯¯¯v⟩≤α∗,

which implies that is a globally optimal solution of (6). We next consider the case where the sequence is unbounded. Define the disjoint index sets and by

 I:={i∈{1,…,n} | {xki} is unbounded}  and  ¯¯¯I:={1,…,n}∖I.

Since is bounded, we without loss of generality assume that it converges to . From equation (7), it then follows that . Note that the sequence satisfies with . Since the sequences and are bounded, we may assume that they converge to and , respectively. Then, from the closedness of the set , there exists an such that with . Letting , we have . Moreover, the following inequalities hold:

 ⟨e,e−¯¯¯v⟩+ρ⟨|¯¯¯x|,¯¯¯v⟩=⟨e,e−¯¯¯v⟩+ρ∑i∈¯¯I|¯¯¯xi|¯¯¯vi=limk→∞[⟨e,e−vk⟩+ρ∑i∈¯¯I⟨|xki|,vki⟩] ≤limk→∞[⟨e,e−vk⟩+ρ∑i∈I⟨|xki|,vki⟩+ρ∑i∈¯¯I⟨|xki|,vki⟩]≤α∗,

where the first equality is using , and the last equality is due to (7). This shows that is a global optimal solution of (6). Thus, we prove that for any given , the problem (6) has a nonempty set of globally optimal solutions.

To show that the solution of problem (4) is equivalent to that of a single penalty problem (6), i.e., to prove that there exists such that the set of global optimal solutions of (4) coincides with that of (6) with , we need the following lemma. Since this lemma can be easily proved by contradiction, we here do not present its proof.

###### Lemma 3.2

Given and . If then there exists such that for all with , we have , where means the th component of which is the vector of components of being arranged in the nonincreasing order .

Now we are in a position to establish that (6) is an exact penalty problem of (4).

###### Theorem 3.1

There exists a constant such that coincides with for all .

Proof.  Let . We only need to consider the case where (if , the conclusion clearly holds for all ). By Lemma 3.2, there exists such that for all satisfying , it holds that . This in turn means that for all and with any .

Let be an arbitrary point in . We prove that for all , and then is the one that we need. Let be an arbitrary constant with . Since , from Proposition 2.1(a) and the equivalence between (4) and (1), it follows that and . Note that for any since . Hence, for any , the following inequalities hold:

 ⟨e,e−v⟩+ρ⟨v,|x|⟩ = n∑i=1(1−vi+ρvi|x|i)≥∑|x|i>1/ρ(1−vi+ρvi|x|i) ≥ r=⟨e,e−¯¯¯v⟩+ρ⟨¯¯¯v,|¯¯¯x|⟩

where the first inequality is using for all , and the second inequality is since and . Since is an arbitrary point in and , the last inequality implies that .

Next we show that if for , then . For convenience, let

 I−:={i∈{1,…,n} | |¯¯¯x|i≤ρ−1}  and  I+:={i∈{1,…,n} | |¯¯¯x|i>ρ−1}.

Note that for all , and for all . Also, the latter together with implies . Then, for any we have

 ⟨e,e−v⟩ = ⟨e,e−v⟩+ρ⟨v,|x|⟩≥⟨e,e−¯¯¯v⟩+ρ⟨¯¯¯v,|¯¯¯x|⟩ = n∑i=1(1−¯¯¯vi+ρ¯¯¯vi|¯¯¯x|i)≥∑i∈I+(1−¯¯¯vi+ρ¯¯¯vi|¯¯¯x|i)≥r,

where the first inequality is using . Let be such that and . Such exists by the definition of . Then , and from the last inequality Thus,

 r=⟨e,e−¯¯¯v⟩+ρ⟨¯¯¯v,|¯¯¯x|⟩=∑i∈I−(1−¯¯¯vi+ρ¯¯¯vi|¯¯¯x|i)+∑i∈I+(1−¯¯¯vi+ρ¯¯¯vi|¯¯¯x|i).

Since for , for and , from the last equation it is not difficult to deduce that

 1−¯¯¯vi+ρ¯¯¯vi|¯¯¯x|i=0  for  i∈I−,  1−¯¯¯vi+ρ¯¯¯vi|¯¯¯x|i=1  for  i∈I+,  and  |I+|=r. (8)

Since each is nonnegative, the first equality in (8) implies that

 ¯¯¯vi=1  and  |¯¯¯x|i=0  for  i∈I−;

while the second equality in (8) implies that for . Combining the last equation with , we readily obtain and . This, together with , shows that . Thus, we complete the proof.

Theorem 3.1 shows that to solve the MPEC problem (4), it suffices to solve (6) with a suitable large . Since the threshold is unknown in advance, we need to solve a finite number of penalty problems with increasing . Although problem (6) for a given has a nonconvex objective function, its separable structure leads to an explicit solution with respect to the variable if the variable is fixed. This motivates us to propose the following decomposition method for (4), and consequently for (1).

###### Algorithm 3.1

(Exact penalty decomposition method for problem (4))

(S.0)

Given a tolerance and a ratio . Choose an initial penalty parameter   and a starting point . Set .

(S.1)

Solve the following weighted -norm minimization problem

 xk+1∈argminx∈IRn{⟨vk,|x|⟩: ∥Ax−b∥≤δ}. (9)
(S.2)

If , then set ; and otherwise set .

(S.3)

If , then stop; and otherwise go to (S.4).

(S.4)

Let and , and then go to Step (S.1).

###### Remark 3.1

Note that the vector in (S.2) is an optimal solution of the problem

 min0≤v≤e{⟨e,e−v⟩+ρk⟨|xk+1|,v⟩}. (10)

So, Algorithm 3.1 is solving the nonconvex penalty problem (6) in an alternating way.

The following lemma shows that the weighted -norm subproblem (9) has a solution.

###### Lemma 3.3

For each fixed , the subproblem (9) has an optimal solution.

Proof.  Note that the feasible set of (9) is , which is nonempty by the given assumption in the introduction, and its objective function is bounded below in the feasible set. Let be the infinum of the objective function of (9) on the feasible set. Then there exists a feasible sequence such that as . Let . Then, noting that , we have that the sequence is bounded. Without loss of generality, we assume that . Let . Noting that , we may assume that converges to . Since the set is closed and for each , where , there exists such that , i.e., . Let . Then, is a feasible solution to (9) with So, is an optimal solution of (9).

For Algorithm 3.1, we can establish the following finite termination result.

###### Theorem 3.2

Algorithm 3.1 will terminate after at most iterations.

Proof.  By Lemma 3.3, for each the subproblem (9) has a solution . From Step (S.2) of Algorithm 3.1, we know that for those with . Then,

 ⟨vk+1,|xk+1|⟩=∑{i: vk+1i=1}|xk+1i|≤nρk.

This means that, when , Algorithm 3.1 must terminate. Note that . Therefore, Algorithm 3.1 will terminate when , i.e., .

We next focus on the theoretical results of Algorithm 3.1 for the case where . To this end, let be an optimal solution of the zero-norm problem (1) and write

 I∗={i | x∗i≠0}  and  ¯¯¯I∗={1,…,n}∖I∗.

In addition, we also need the following null space condition for a given vector :

 ⟨vI∗,|yI∗|⟩<⟨v¯¯I∗,|y¯¯I∗|⟩  for any 0≠y∈Null(A). (11)
###### Theorem 3.3

Assume that and satisfies the condition (11) for some nonnegative integer . Then, . If, in addition, also satisfies the condition (11), then the vector for all satisfy the null space condition (11) and for all . Consequently, if satisfies the condition (11), then for all .

Proof.  We first prove the first part. Suppose that . Let . Clearly, . Since satisfies the condition (11), we have that

 ⟨vk¯¯I∗,|yk+1¯¯I∗|⟩>⟨vkI∗,|yk+1I∗|⟩. (12)

On the other hand, from step (S.1), it follows that

 ⟨vk,|x∗|⟩≥⟨vk,|xk+1|⟩=⟨vk,|x∗+yk+1|⟩≥⟨vk,|x∗|⟩−⟨vkI∗,|yk+1I∗|⟩+⟨vk¯¯I∗,|yk+1¯¯I∗|⟩.

This implies that . Thus, we obtain a contradiction to (12). Consequently, . Since also satisfies the null space condition (11), using the same arguments yields that . We next show by induction that for all satisfy the condition (11) and for all . To this end, we define

 Iν:={i | x∗i>1ρk+ν}  and  ¯¯¯Iν:={i | 0

for any given nonnegative integer . Clearly, . Also, by noting that by (S.4) and , we have that . We first show that the result holds for . Since , we have and by step (S.2). Since , from step (S.2) it follows that and . Now we obtain that

 ⟨vk+2I∗,|yI∗|⟩ =⟨vk+2I1,|yI1|⟩+⟨vk+2¯¯I1,|y¯¯I1|⟩=⟨vk+2¯¯I1,|y¯¯I1|⟩≤⟨vk+1¯¯I0,|y¯¯I0|⟩ =⟨vk+1I0,|yI0|⟩+⟨vk+1¯¯I0,|y¯¯I0|⟩=⟨vk+1I∗,|yI∗|⟩ <⟨vk+1¯¯I∗,|y¯¯I∗|⟩=⟨vk+2¯¯I∗,|y¯¯I∗|⟩ (13)

for any , where the first equality is due to , the second equality is using , the first inequality is due to , and , the second inequality is using the assumption that satisfies the null space condition (11), and the last equality is due to and . The inequality (3) shows that satisfies the null space condition (11), and using the same arguments as for the first part yields that . Now assuming that the result holds for , we show that it holds for . Indeed, using the same arguments as above, we obtain that

 ⟨vk+l+1I∗,|yI∗|⟩ =⟨vk+l+1Il,|yIl|⟩+⟨vk+l+1¯¯Il,|y¯¯Il|⟩=⟨vk+l+1¯¯¯Il,|y¯¯Il|⟩≤⟨vk+l¯¯Il−1,|y¯¯Il−1|⟩ =⟨vk+lIl−1,|yIl−1|⟩+⟨vk+l¯¯Il−1,|y¯¯Il−1|⟩=⟨vk+lI∗,|yI∗|⟩ <⟨vk+l¯¯I∗,|y¯¯I∗|⟩=⟨vk+l+1¯¯I∗,|y¯¯I∗|⟩ (14)

for any . This shows that