Equivalent Lipschitz surrogates for zero-norm and rank optimization problems 1footnote 11footnote 1This work is supported by the National Natural Science Foundation of China under project No. 11571120 and No. 11701186, the Natural Science Foundation of Guangdong Province under project No. 2015A030313214 and No. 2017A030310418.

# Equivalent Lipschitz surrogates for zero-norm and rank optimization problems 111This work is supported by the National Natural Science Foundation of China under project No. 11571120 and No. 11701186, the Natural Science Foundation of Guangdong Province under project No. 2015A030313214 and No. 2017A030310418.

Yulan Liu222Ylliu@gdut.edu.cn. School of Mathematics, GuangDong University of Technology, Guangzhou.   Shujun Bi333Corresponding author(bishj@scut.edu.cn). School of Mathematics, South China University of Technology, Guangzhou.   and  Shaohua Pan444shhpan@scut.edu.cn. School of Mathematics, South China University of Technology, Guangzhou.
###### Abstract

This paper proposes a mechanism to produce equivalent Lipschitz surrogates for zero-norm and rank optimization problems by means of the global exact penalty for their equivalent mathematical programs with an equilibrium constraint (MPECs). Specifically, we reformulate these combinatorial problems as equivalent MPECs by the variational characterization of the zero-norm and rank function, show that their penalized problems, yielded by moving the equilibrium constraint into the objective, are the global exact penalization, and obtain the equivalent Lipschitz surrogates by eliminating the dual variable in the global exact penalty. These surrogates, including the popular SCAD function in statistics, are also difference of two convex functions (D.C.) if the function and constraint set involved in zero-norm and rank optimization problems are convex. We illustrate an application by designing a multi-stage convex relaxation approach to the rank plus zero-norm regularized problem.

Keywords: zero-norm; rank; global exact penalty; equivalent Lipschitz surrogates

Mathematics Subject Classification(2010). 90C27, 90C33, 49M20

## 1 Introduction

This paper concerns with zero-norm and rank optimization problems, which aim at seeking a sparse solution or/and a low-rank solution and have a host of applications in a variety of fields such as statistics [40, 14, 30], signal and image processing [11, 12], machine learning [7, 37], control and system identification [15], finance [35], and so on. Due to the combinatorial property of the zero-norm and rank function, these problems are generally NP-hard. One popular way to deal with them is to use the convex relaxation technique, which typically yields a desirable local optimal even feasible solution via a single or a sequence of numerically tractable convex optimization problems.

The norm minimization, as a convex relaxation for the zero-norm minimization, became popular due to the important results in [11, 12, 40]. Among others, the results of [11, 12] quantify the ability of the norm minimization problem to recover sparse reflectivity functions. For brief historical accounts on the use of the norm minimization in statistics and signal processing; please see [27, 41]. Later, Fazel [15] showed that the nuclear norm is the convex envelope of the rank function in the unit ball on the spectral norm and initiated the research for the nuclear norm convex relaxation. In the past ten years, this method received much attention from many fields such as information, computer science, statistics, optimization, and so on (see, e.g., [7, 37, 33, 30, 42]), and it was shown that a single nuclear norm minimization can recover a low-rank matrix in the noiseless setting if the sampling operator has a certain restricted isometry property [37], or can yield a solution satisfying a certain error bound in the noisy setting [8, 30].

The norm or nuclear norm convex relaxation problem, as a convex surrogate for zero-norm or rank optimization problems, has demonstrated to be successful in encouraging a sparse or low-rank solution, but their efficiency is challenged in some circumstances. For example, Salakhutdinov and Srebro [39] showed that when certain rows and/or columns are sampled with high probability, the nuclear norm minimization may fail in the sense that the number of observations required for recovery is much more than that of the uniform sampling, and Negahban and Wainwright [31] also pointed out the influence of such heavy sampling schemes on the recovery error bound. In particular, when seeking a sparse (or low-rank) solution from a set which has a structure to conflict the role of the norm (or nuclear norm) to promote sparsity (or low-rank), say, the simplex set [24], the correlation matrix set and the density matrix set [28], the norm (or nuclear norm) minimization will fail to yielding a sparse (or low-rank) solution. The key to bring about this dilemma is the significant difference between the convex norm (respectively, nuclear norm) and the nonconvex zero-norm (respectively, rank function).

To enhance the solution quality of the norm and nuclear norm convex surrogate, some researchers pay their attentions to nonconvex surrogates of the zero-norm and rank function. Two popular nonconvex surrogates for the zero-norm (respectively, rank function) are the norm and logarithm function (respectively, the Schattern- function and logarithmic determinant function). Based on these nonconvex surrogates, some sequential convex relaxation algorithms were developed (see, e.g., [6, 16, 29, 25]) and confirmed to have better performance in yielding sparse and low-rank solutions. In addition, the folded concave penalty functions such as the SCAD function [14] and the MCP function [48] are also a class of popular nonconvex surrogates for the zero-norm, which are proposed in statistics to correct the bias of the norm convex surrogate, and some adaptive algorithms were developed by using these surrogates (see [49]).

The existing nonconvex surrogates for the zero-norm and rank function are all heuristically constructed, and now it is unclear whether these nonconvex surrogates have the same global optimal solution set as zero-norm and rank optimization problems do or not. The main contribution of this work is to propose a mechanism to produce equivalent Lipschitz surrogates in the sense that they have the same global optimal solution set as zero-norm and rank optimization problems do, with the help of the global exact penalty for their equivalent MPECs. Due to the excellent properties, with this class of nonconvex surrogates one may expect to develop more effective convex relaxation algorithms.

Specifically, we reformulate zero-norm and rank optimization problems as equivalent MPECs by the variational characterization of the zero-norm and rank function, and show that the penalized problems, yielded by moving the equilibrium constraint into the objective, are uniformly partial calm over the global optimal solution set under a mild condition. The uniform partial calmness over the global optimal solution set, extending the partial calmness at a solution point studied by Ye et al. [44, 45], is proved to coincide with the global exact penalization of the corresponding penalized problem (see Section 2). By eliminating the dual variable in the global exact penalty, we achieve the equivalent Lipschitz surrogates. Interestingly, these surrogates are also D.C. if the function and constraint set involved in zero-norm and rank optimization problems are convex, and the SCAD function can be produced by the mechanism (see Example 5 in Appendix B). Finally, we illustrate an application of these equivalent surrogates in the design of a multi-stage convex relaxation approach to the rank plus zero-norm regularized problem.

It is worthwhile to point out that there are few works on exact penalty for the MPEC even the optimization problem involving non-polyhedral conic constraints, although much research has been done on exact penalty for classical nonlinear programming problems and the MPECs involving polyhedral conic constraints (see, e.g., [26, 44, 45]). In this work, we establish the global exact penalty for equivalent MPECs of group zero-norm and rank optimization problems, thereby providing a mechanism to produce equivalent Lipschitz surrogates for these combinatorial problems. Such global exact penalty was separately obtained in [3, 4] for the zero-norm minimization problem and the rank regularized minimization problem. Here, we present a simple unified proof by the uniform partial calmness for a large class of zero-norm and rank optimization problems including group zero-norm optimization problems, rank plus zero-norm optimization problems, and the simultaneous rank and zero-norm minimization problem. In addition, we emphasize that although the penalized constraint in our MPECs is D.C. (see Section 3-6), the exact penalty results developed for general DC programming in [22] are not applicable to it.

Recently, Chen et al. [9] studied exact penalization for the problems with a class of nonconvex and non-Lipschitz objective functions, which arise from nonconvex surrogates for zero-norm minimization problems. They focused on the existence of exact penalty parameters regarding local minimizers, stationary points and -minimizers. However, here we are interested in the existence of exact penalty parameter regarding the global optimal solution to the equivalent MPECs of zero-norm and rank optimization problems.

Notations. Let be the space of all real matrices, endowed with the trace inner product and its induced Frobenius norm . Let be the set consisting of all matrices whose columns are mutually orthonormal to each other, and write . Let be a finite dimensional real vector space equipped with the inner product and its induced norm . Denote by the closed unit ball of centered at the origin, and by the closed ball of centered at of radius . When the space is known from the context, we delete the subscript from . Let and be the vector and matrix of all ones whose dimension are known from the context. For , is the vector obtained by arranging the entries of in a nonincreasing order; for , means the vector obtained by arranging the entries of in a nonincreasing order; and denotes the th entry of . For , means the singular value vector of with entries arranged in a non-increasing order, and are the nuclear norm and the spectral norm of , respectively, and means the entry -norm of . Define and . For a given with the SVD as , and are the matrix consisting of the first columns of and , respectively, and and are the matrix consisting of the last columns and columns of and , respectively.

Let be the family of proper lower semi-continuous (lsc) functions with which are convex in and satisfy the following conditions

 1>t∗:=argmin0≤t≤1ϕ(t), ϕ(t∗)=0  and  ϕ(1)=1. (1)

For each , let be the associated closed proper convex function

 ψ(t):={ϕ(t)if t∈[0,1],+∞otherwise; (2)

and denote by the conjugate of , i.e., . Since , it is easy to check that and is nondecreasing in . Unless otherwise stated, appearing in the subsequent sections is the constant associated to in Lemma 1 of Appendix A. For the examples of , the reader may refer to Appendix B.

## 2 Uniform partial calmness of optimization problems

Let be a proper lsc function, be a continuous function, and be a nonempty closed set of . This section focuses on the uniform partial calmness of

 (MP)minz∈Z{θ(z): h(z)=0,z∈Δ}.

Let and denote the feasible set and the global optimal solution set of , respectively, and write the optimal value of as . We assume that . To introduce the concept of partial calmness, we consider the perturbed problem of :

 (MPϵ)minz∈Z{θ(z): h(z)=ϵ,z∈Δ}.

For any given , we denote by the feasible set of associated to .

###### Definition 2.1

(see [44, Definition 3.1] or [45, Definition 2.1]) The problem is said to be partially calm at a solution point if there exist and such that for all and all , one has

The calmness of a mathematical programming problem at a solution point was originally introduced by Clarke [10], which was later extended to the partial calmness at a solution point by Ye and Zhu [44, 45]. Next we strengthen the partial calmness of at a solution point as the partial calmness over its global optimal solution set .

###### Definition 2.2

The problem is said to be partially calm over its global optimal solution set if it is partially calm at each ; and it is said to be uniformly partial calm over if there exists such that for any ,

 θ(z)−v∗(MP)+μ|h(z)|≥0.

It is worthwhile to emphasize that the partial calmness over along with the boundedness of does not imply the uniform partial calmness over . In addition, the partial calmness depends on the structure of a problem. Equivalent problems may not share the partial calmness simultaneously; for example, for the following equivalent form of

 minz∈Z{θ(z): dist(z,F)=0}, (3)

it is easy to verify that the local Lipschitz of relative to is enough for the partial calmness of (3) over , but it may not guarantee that of over . Define

 Γ(ϵ):={z∈Δ | h(z)=ϵ}for ϵ∈R. (4)

The following lemma states that under a suitable condition for , the partial calmness of over is implied by the calmness of the multifunction at for each . The proof is similar to that of [46, Lemma 3.1], and we include it for completeness.

###### Lemma 2.1

Suppose that is locally Lipschitzian relative to . If the multifunction is calm at for any , then the problem is partially calm over .

Proof: Let be an arbitrary point from . Since is locally Lipschitzian relative to and , there exist and such that for any ,

 |θ(z′)−θ(z′′)|≤Lθ∥z′−z′′∥. (5)

In addition, since the multifunction is calm at for , by invoking [13, Exercise 3H.4], there exist constants and such that for all ,

 Γ(ω)∩B(z∗,δ′)⊆Γ(0)+ν|ω|BZ.

Set . Let be an arbitrary point from and be an arbitrary point from . Clearly, . Applying the last inclusion with , we obtain that . From the closedness of , there exists a point such that Notice that . Together with (5) and , we have

 θ(z∗)≤θ(ˆz)=θ(z)−θ(z)+θ(ˆz)≤θ(z)+Lθ∥z−ˆz∥≤θ(z)+Lθν|h(z)|,

where the first inequality is by the feasibility of and the optimality of to . The last inequality and the arbitrariness of in implies the desired conclusion.

Next we shall establish the relation between the uniform partial calmness of over and the global exact penalization of the following penalized problem:

 (EPMPμ) minz∈Z{θ(z)+μ|h(z)|: z∈Δ}.

In [45, Proposition 2.2], Ye et al. showed that under the continuity of , the partial calmness of at a local minimum is equivalent to the local exact penalization of . Here, we extend this result and show that the uniform partial calmness of over is equivalent to the global exact penalization of , i.e., there exists such that the global optimal solution set of each associated to coincides with that of , where is called the threshold of the exact penalty.

###### Proposition 2.1

For the problems and , the following statements hold.

• The problem is uniformly partial calm over its global optimal solution set if and only if the problem is a global exact penalty of .

• Suppose that the function is coercive or the set is compact. Then, the partial calmness of over implies the global exact penalization of .

Proof: We denote by the global optimal solution set of associated to .

(a)”. Since the problem is a global exact penalty of , there exists a constant such that for any , . Take an arbitrary point . Then, for any , is also a global optimal solution of . Thus, for all , from the feasibility of and the optimality of to ,

 θ(z)+(¯¯¯μ+γ)|h(z)|≥θ(z∗)+(¯¯¯μ+γ)|h(z∗)|,

which is equivalent to saying that Taking the limit to this inequality, we obtain . This shows that is uniformly partially calm over its optimal solution set .

”. Since the problem is uniformly partial calm over its global optimal solution set , there exists a constant such that for all ,

 θ(z)−v∗(MP)+ˆμ|h(z)|≥0.

We first prove that for any , . Let be an arbitrary point from . Fix an arbitrary . From the last inequality, it follows that for any ,

 θ(z)+μ|h(z)|≥θ(z)+ˆμ|h(z)|≥θ(z∗)=θ(z∗)+μ|h(z∗)|.

This, by the arbitrariness of , implies that . Consequently, for any , it holds that . Next we shall prove that for any , . To this end, fix an arbitrary and take an arbitrary point . Let . Then,

 θ(zμ)+μ|h(zμ)|≤θ(z∗)+μ|h(z∗)|=v∗(MP)+μ+ˆμ2|h(z∗)|≤θ(zμ)+μ+ˆμ2|h(zμ)|

where the first inequality is by the optimality of and the feasibility of to , and the second one is due to for , implied by the above arguments. The last inequality implies , and then . This shows that is feasible to the problem . Together with the first inequality in the last equation, is optimal to . The stated inclusion follows by the arbitrariness of in .

(b) Since is coercive or the set is compact, for each we have . Assume that is partially calm over . To prove that is a global exact penalty for , we first argue that there exists such that for any , . If not, for each sufficiently large , there exist and such that

 θ(zk)+k|h(zk)|<θ(zk,∗)+k|h(zk,∗)|=v∗(MP). (6)

If is compact, clearly, is bounded. If is coercive, inequality (6) implies that is also bounded. Thus, from and the closedness of , we assume (if necessary taking a subsequence) that . Notice that (6) can be equivalently written as

 0≤|h(zk)|<1k[v∗(MP)−θ(zk)].

Take to the both sides. By the continuity of and the lower semi-continuity of ,

 0≤|h(¯¯¯z)|=limk→+∞|h(zk)|≤limk→+∞1k[v∗(MP)−θ(zk)]=0.

In addition, from (6) it follows that . This shows that is a global optimal solution to . But then inequality (6) gives a contradiction to the partial calmness of at , which is implied by the given assumption that is partially calm over . Thus, there exists such that for any , . In addition, using the same arguments as those for the direction “” in part (a), one may prove that for any , . Thus, is a global exact penalty of .

###### Remark 2.1

Proposition 2.1 show that under the coerciveness of or the compactness of , the partial calmness of over , the uniformly partial calmness of over and the global exact penalization of are equivalent each other.

## 3 Equivalent L-surrogates of group zero-norm problems

Let be a partition of . For any given , define

 GJ,p(x):=(∥xJ1∥p,∥xJ2∥p,…,∥xJm∥p)Tfor  x∈Rn.

The number of nonzero components in , denoted by , is called the group zero-norm of induced by the partition and the norm . Clearly, when and for , reduces to the zero-norm of . As a producer of structured sparsity, the group zero-norm has a wide application in statistics, signal and image processing, machine learning, and bioinformatics (see, e.g., [47, 1, 43]). By the definition of the function family , for any , with one has that

 ∥GJ,p(x)∥0=minw∈Rm{∑mi=1ϕ(wi):⟨e−w,GJ,p(x)⟩=0,0≤w≤e}, (7)

that is, the group zero-norm is an optimal value function of a parameterized problem.

### 3.1 Group zero-norm minimization problems

Let be a proper lsc function, and let be a closed set. This subsection is devoted itself to the following group zero-norm minimization problem

 minx∈Rn{∥GJ,p(x)∥0:f(x)≤δ,x∈Ω} (8)

where is a constant to represent the noise level. We assume that (8) has a nonempty global optimal solution set and a nonzero optimal value, denoted by . Let denote the feasible set of (8). From equation (7), it is immediate to obtain the following result.

###### Lemma 3.1

Let . The group zero-norm minimization problem (8) is equivalent to

 minx∈Rn,w∈Rm{∑mi=1ϕ(wi):⟨e−w,GJ,p(x)⟩=0,0≤w≤e,x∈Ω,f(x)≤δ} (9)

in the sense that if is a global optimal solution of (8), then is globally optimal to (9) with the optimal value equal to ; conversely, if is a global optimal solution to (9), then is globally optimal to (8).

Observe that the minimization problem in (9) involves an equilibrium constraint

 ⟨e−w,GJ,p(x)⟩=0, e−w≥0, GJ,p(x)≥0.

Lemma 3.1 shows that the (group) zero-norm minimization problem is essentially an MPEC. Such an equivalent reformulation was employed in [3] to develop a penalty decomposition method for zero-norm minimization problems, and used in [17] to study the stationary point conditions for zero-norm optimization problems. Next we shall establish the uniform partial calmness of this MPEC over its global optimal solution set.

###### Theorem 3.1

Let . Suppose that there exists such that for all . Then (9) is uniformly partial calm over its global optimal solution set and

 minx∈Rn,w∈Rm{∑mi=1ϕ(wi)+ϱ⟨e−w,GJ,p(x)⟩: f(x)≤δ,x∈Ω,0≤w≤e} (10)

is a global exact penalty for the MPEC (9) with threshold .

Proof: Let be an arbitrary feasible point from . Then, it holds that

 ∑mi=1ϕ(wi)+¯¯¯ϱ⟨e−w,GJ,p(x)⟩ ≥∑mi=1ϕ(πi(w))+¯¯¯ϱ∑mi=1πi(GJ,p(x))(1−πi(w)) ≥∑s∗i=1[ϕ(πi(w))+¯¯¯ϱπs∗(GJ,p(x))(1−πi(w))] ≥∑s∗i=1[ϕ(πi(w))+ϕ′−(1)(1−πi(w))] ≥s∗ϕ(1)=s∗,

where the first inequality is using , the third one is by and , and the last one is due to for implied by the convexity of in . Since is the optimal value of (9) by Lemma 3.1, the last inequality along with the arbitrariness of in shows that (9) is uniformly partial calm over its global optimal solution set, which by Proposition 2.1(a) is equivalent to saying that (10) is a global exact penalty.

With the function in (2) associated to and its conjugate , we can represent the dual variable in (10) by the variable , and obtain the following conclusion.

###### Corollary 3.1

Let . Under the assumption of Theorem 3.1, the problem (8) has the same global optimal solution set as the following problem with does:

 minx∈Rn{ϱ∑mi=1∥xJi∥p−∑mi=1ψ∗(ϱ∥xJi∥p): f(x)≤δ,x∈Ω}. (11)

Notice that is nondecreasing and convex in . So, the function is convex in . Thus, the objective function of (11) is locally Lipschitz in by [38, Theorem 10.4] and provides a class of equivalent Lipschitz surrogates for the group zero-norm problem (8). If the feasible set of (8) is convex, it also provides a class of equivalent D.C. surrogates since its objective function is now the difference of two convex functions.

To close this subsection, we show that the assumption of Theorem 3.1 is very mild.

###### Lemma 3.2

Suppose that is bounded, or and for a proper lsc coercive function and a matrix . Then there exists such that for all .

Proof: Suppose the conclusion does not hold. Then there exist with and such that . We proceed the arguments by two cases.

Case 1: is bounded. Now we may assume (if necessary taking a subsequence) that . Together with and the continuity of , it follows that . Notice that . This means that , which contradicts the fact that is the optimal value of the problem (8).

Case 2: and . Now since , we may assume (if necessary taking a subsequence) that there exist such that with . Notice that . By rearranging the columns of if necessary, for each there exists such that where with and . Write for some with . Since is lower semi-continuous and coercive, the sequence is bounded. Without loss of generality, we assume for some with . Since the set is closed by [5, Proposition 2.41], there exists such that , and then or . Since , we obtain a contradiction to the fact that is the optimal value of (8).

###### Remark 3.1

When and , the coerciveness of does not imply the boundness of . The conclusions of Theorem 3.1 and Lemma 3.2 extend the exact penalty result in [3, Theorem 3.3]. In fact, when taking , and for , one can recover the exact penalty result in [3, Theorem 3.3].

### 3.2 Group zero-norm regularized problems

This subsection is devoted itself to the group zero-norm regularized minimization problem

 minx∈Rn{νf(x)+∥GJ,p(x)∥0: x∈Ω}, (12)

where is the regularization parameter, and and are same as those in Subsection 3.1. Assume that (12) has a nonempty global solution set and write its optimal value as . By the characterization of the group zero-norm in (7), the following result holds.

###### Lemma 3.3

Let . The group zero-norm regularized problem (12) is equivalent to

 minx∈Rn,w∈Rm{νf(x)+∑mi=1ϕ(wi): ⟨e−w,GJ,p(x)⟩=0,0≤w≤e,x∈Ω}, (13)

in the sense that if is globally optimal to (12), then is a global optimal solution of (13) with optimal value equal to ; conversely, if is a global optimal solution of (13), then is globally optimal to (12).

Lemma 3.3 states that the group zero-norm regularized problem is also an MPEC. Next, under a suitable restriction on , we show that the MPEC (13) is uniformly partial calm in its global solution set. To this end, for any and , with define

 xϱj:={0if j∈⋃i∉supp(y(x,ϱ))Ji,xjotherwise  for j=1,2,…,n (14)

with

 yi(x,ϱ):={∥xJi∥pif ϱ∥xJi∥p>ϕ′−(1),0otherwise  for i=1,2,…,m. (15)
###### Theorem 3.2

Let . Suppose that is Lipschitzian relative to with constant , and for any given and , the vector lies in . Then, the MPEC (13) is uniformly partial calm in its global optimal solution set, or equivalently the problem

 minx∈Rn,w∈Rm{νf(x)+∑mi=1ϕ(wi)+ϱ⟨e−w,GJ,p(x)⟩: x∈Ω,0≤w≤e} (16)

is a global exact penalty of (13) with threshold , where is the constant in Lemma 1 of Appendix A and .

Proof: Let be an arbitrary point in . Define the following index sets

 I={i: 11−t∗≤¯¯¯ϱ∥xJi∥p≤ϕ′−(1)}  and  ¯¯¯I={i: 0≤¯¯¯ϱ∥xJi∥p<11−t∗}.

By using Lemma 1 of Appendix A with , there exists such that

 ∑mi=1ϕ(wi)+¯¯¯ϱ⟨e−w,GJ,p(x)⟩=∑mi=1[ϕ(wi)+¯¯¯ϱ(1−wi)∥xJi∥p] ≥∥y(x,¯¯¯ϱ)∥0+¯¯¯ϱ(1−t0)ϕ′−(1)(1−t∗)∑i∈I∥xJi∥p+¯¯¯ϱ(1−t0)∑i∈¯¯I∥xJi∥p ≥∥y(x,¯¯¯ϱ)∥0+βνLf∑i∈I∪¯¯I∥xJi∥p≥∥y(x,¯¯¯ϱ)∥0+νLf∥x−x¯¯ϱ∥2 ≥∥GJ,p(x¯¯ϱ)∥0+ν(f(x¯¯ϱ)−f(x))

where the first inequality is using the definition of , the second one is using by the convexity of in , and the last one is due to the Lipschitz of relative to and . From the last inequality and , we have

 νf(x)+∑mi=1ϕ(wi)+¯¯¯ϱ⟨e−w,GJ,p(x)⟩≥∥GJ,p(x¯¯ϱ)∥0+νf(x¯¯ϱ)≥ϖ∗.

This, by the arbitrariness of in , shows that the MPEC (13) is uniformly partial calm over its optimal solution set. The proof is completed.

When takes ,