Sparse Solutions of a Class of Constrained Optimization Problems

# Sparse Solutions of a Class of Constrained Optimization Problems

## Abstract

In this paper, we consider a well-known sparse optimization problem that aims to find a sparse solution of a possibly noisy underdetermined system of linear equations. Mathematically, it can be modeled in a unified manner by minimizing subject to for given , , , and . We then study various properties of the optimal solutions of this problem. Specifically, without any condition on the matrix , we provide upper bounds in cardinality and infinity norm for the optimal solutions, and show that all optimal solutions must be on the boundary of the feasible set when . Moreover, for , we show that the problem with has a finite number of optimal solutions and prove that there exists such that the solution set of the problem with any is contained in the solution set of the problem with and there further exists such that the solution set of the problem with any remains unchanged. An estimation of such is also provided. In addition, to solve the constrained nonconvex non-Lipschitz - problem ( and ), we propose a smoothing penalty method and show that, under some mild conditions, any cluster point of the sequence generated is a KKT point of our problem. Some numerical examples are given to implicitly illustrate the theoretical results and show the efficiency of the proposed algorithm for the constrained - problem under different noises.

Key words. Sparse optimization; nonconvex non-Lipschitz optimization; cardinality minimization; penalty method; smoothing approximation.

AMS subject classifications. 90C46, 49K35, 90C30, 65K05

## 1 Introduction

In this paper, we consider a class of sparse optimization problems, which can be modeled in a unified manner as the following constrained - problem:

 minx∈Rn  ∥x∥pp:=∑ni=1|xi|p  \rm s.t.  ∥Ax−b∥q≤σ, (1.1)

where , , , and are given. Let denote the optimal solution set of (1.1). We assume that so that . Obviously, when , (1.1) is a convex optimization problem and when , (1.1) yields a nonconvex and non-Lipschitz optimization problem.

Problem (1.1) aims to find a sparse vector from the corrupted observation , where denotes an unknown noisy vector bounded by (the noise level) in -norm, i.e., . This problem arises in many contemporary applications and has been widely studied under different choices of , and in the literature; see, for example, [3, 4, 5, 6, 7, 8, 12, 13, 15, 16, 17, 18, 20, 23, 31, 34, 35, 41, 42, 43]. Among these studies, the -norm is commonly used for measuring the noise and leads to a mathematically tractable problem when the noise exists and comes from a Gaussian distribution [3, 5, 12, 13, 17, 20, 34, 35]. In particular, it has been known that a sparse vector can be (approximately) recovered by the solution of the convex optimization problem (1.1) with and under some well-known recovery conditions such as the restricted isometry property (RIP) [5], the mutual coherence condition [3, 17] and the null space property (NSP) [15, 41]. Such convex constrained - problem can also be solved efficiently by a spectral projected gradient minimization algorithm (SPGL1) proposed by Van den Berg and Friedlander [35]. On the other hand, it is natural to find a sparse vector by solving (1.1) with since approaches as . Indeed, under certain RIP conditions, Foucart and Lai [20] showed that a sparse vector can be (approximately) recovered by the solution of the nonconvex non-Lipschitz problem (1.1) with and . Chen, Lu and Pong [12] also proposed a penalty method for solving this constrained - problem () with promising numerical performances. Later, this penalty method and the SPGL1 are further combined to solve (1.1) with and for recovering sparse signals on the sphere in [13]. However, when the noise does not come from the Gaussian distribution but other heavy-tailed distributions (e.g., Student’s t-distribution) or contains outliers, using as the data fitting term is no longer appropriate. In this case, some robust loss functions such as the -norm [19, 36, 37] and the -norm [4, 7] are used to develop robust models. Recently, Zhao, Jiang and Luo [43] also established a fairly comprehensive weak stability theory for problem (1.1) with and under a so-called weak range space property (RSP) condition. The weak RSP condition can be induced by several existing compressed sensing matrix properties and hence can be the mildest one for the sparse solution recovery. However, it is still not easy to verify this condition in practice.

In this paper, we focus on problem (1.1) with different choices of and , and establish the following theoretical results concerning its optimal solutions without any condition on the sensing matrix .

• For any with and , we have

 ∥x∗∥0=rank(AJ)and∥x∗∥∞≤σm1−1q+∥b∥2√λmin(A⊤JAJ),

where and is the smallest eigenvalue of . Moreover, for any , for ; and with some for .

• For , the solution set SOL with has a finite number of elements.

• There exists a such that for any . An explicit estimation of such is also given. Moreover, there exists a such that for any .

Here, we would like to point out that the sparse solution recovery result (iii) is developed without any aforementioned recovery condition on . This not only complements the existing recovery results in the literature, but also shows the potential advantage of using the -norm () for recovering the sparse solution over the -norm.

Note that problem (1.1) is a constrained problem, while, in statistics and computer science, the - problem/minimization often refers to the following unconstrained regularized problem [9, 11, 14]:

 minx∈Rn ∥Ax−b∥qq+λ∥x∥pp, (1.2)

where is a positive regularization parameter. Indeed, when and , problem (1.2) is the well-known -regularized least-squares problem (namely, the LASSO problem) and it is known that, in this case, there exists a such that, for , the constrained problem (1.1) is equivalent to the unconstrained problem (1.2) regarding solutions; see, for example, [3, Section 3.2.3]. However, Example 3.1 in [12] shows that for and , there does not exist a so that problems (1.1) and (1.2) have a common global or local minimizer. Hence, for , one cannot expect to solve (1.1) by solving the regularized problem (1.2) with some fixed . In view of this, we shall consider a penalty method for solving problem (1.1) with , which basically solves the constrained problem (1.1) by solving a sequence of unconstrained penalty problems. Specifically, we consider the following penalty problem of (1.1):

 minx∈Rn  ∥x∥pp+λ(∥Ax−b∥qq−σq)+. (1.3)

Note that the function is continuously differentiable for . Then, based on problem (1.3), one can extend the penalty method proposed in [12] for solving problem (1.1) with and to solve problem (1.1) with and . However, for , the function is nonsmooth, and therefore, the penalty method in [12] cannot be readily applied. To handle these nonsmooth cases (), in this paper, we propose a smoothing penalty method for solving

 minx∈Rn  ∥x∥pp  % \rm s.t.  ∥Ax−b∥1≤σ, (1.4)

where . Notice that we omit the case of to save space in this paper. Nevertheless, our approach can be extended without much difficulty to solve problem (1.1) with and , because the -constrained problem and the -constrained problem have similar properties in the sense that both constraints and can be represented as linear constraints, and the functions and are piecewise linear. We shall show that problem (1.3) with is the exact penalty problem of problem (1.4) regarding local minimizers and global minimizers. We also prove that any cluster point of a sequence generated by our smoothing penalty method is a KKT point of problem (1.4). Moreover, some numerical results are reported to show that all computed KKT points have the properties in our theoretical contribution (i) mentioned above. Here, we would like to emphasize that finding a global optimal solution of (1.4) is NP-hard [11, 21]. Thus, it is interesting to see that our smoothing penalty method can efficiently find a ‘good’ KKT point of problem (1.4), which has important properties of a global optimal solution of problem (1.4).

The rest of this paper is organized as follows. In Section 2, we rigorously prove properties (i)-(iii) listed above and give a concrete example to verify these properties. In Section 3, we present a smoothing penalty method for solving problem (1.4) and show some convergence results. Some numerical results are presented in Section 4, with some concluding remarks given in Section 5.

### Notation and Preliminaries

In this paper, we use the convention that . For an index set , let denote its cardinality and denote its complementarity set. We denote by the restriction of a vector onto and denote by the submatrix formed from a matrix by picking the columns indexed by . Recall from [33, Definition 8.3] that, for a proper closed function , the regular (or Fréchet) subdifferential, the (limiting) subdifferential and the horizon subdifferential of at are defined respectively as

 ˆ∂f(x) :={d∈Rn:liminfy→x,y≠xf(y)−f(x)−⟨d,y−x⟩∥y−x∥≥0}, ∂f(x) :={d∈Rn:∃xkf→x, dk→d  with  dk∈ˆ∂f(xk)}, ∂∞f(x) :={d∈Rn:∃xkf→x, λkdk→d, λk↓0  with  dk∈ˆ∂f(xk)}.

It can be observed from the above definitions (or see [33, Proposition 8.7]) that

 {d∈Rn:∃xkf→x, dk→d with dk∈∂f(xk)} ⊆∂f(x), (1.5) {d∈Rn:∃xkf→x, λkdk→d, λk↓0 with dk∈∂f(xk)} ⊆∂∞f(x).

For a closed convex set , its indicator function is defined by if and otherwise. For any , let be a set defined as:

 SIGN(x)=⎧⎪⎨⎪⎩d∈Rn:di=1,ifxi>0,di∈[−1,1],ifxi=0,di=−1,ifxi<0, for i=1,⋯,n⎫⎪⎬⎪⎭.

In addition, we use to denote the closed ball of radius centered at , i.e., , and to denote the feasible set of (1.1).

## 2 Properties of solutions of problem (1.1)

In this section, we shall characterize the properties of the optimal solutions of problem (1.1) with different choices of and . Our first theorem is given for and .

###### Theorem 2.1.

Let . For any , the following statements hold with .

(i) For , ; and for , there is a scalar such that and .

(ii) For , .

(iii) For , , where is the smallest eigenvalue of .

###### Proof.

Statement (i). We assume that .

Consider . From , we see that . Then, it is easy to verify that there exists a constant such that . Thus, , but for . This leads to a contradiction. Hence, we have .

Consider . From the continuity of the function , and , there exists a scalar such that . Obviously, and thus .

Statement (ii). Let for simplicity. We then consider the following two cases.

Case 1, . We first prove by contradiction. Assume that . Thus, there exists a vector such that and , since . Let be a vector such that and . Thus, we have . Now, let

 τ:=minhi≠0,i∈J{|x∗i||hi|}=|x∗i0||hi0| for some i0.

Then, we see that since . Moreover, from the definition of , one can verify that . This leads to a contradiction. Hence, we have . Then, we have that . Now, we assume that . Thus, there also exists a vector such that and . Using the similar arguments as above, we can get a contradiction. Hence, we only have .

Case 2, . Similar to Case 1, we first prove by contradiction. Assume that . Thus, there exists a vector such that and , since . Let be a vector such that and . Thus, we have that and hence for any . Moreover, we can choose a sufficiently small real positive number such that, for all ,

 x∗J+thJ≠0,andsgn(x∗i)=sgn(x∗i+thi)fori∈J. (2.1)

Let . Then, we have

 f(0) =∑i∈J|x∗i|p=∥x∗∥pp=mint∈[−t0,t0]∥x∗+th∥pp =mint∈[−t0,t0]∑i∈J[sgn(x∗i+thi)(x∗i+thi)]p=mint∈[−t0,t0]f(t),

where the third equality follows because and the last equality follows from (2.1). However, for all ,

 f′′(t)=p(p−1)∑i∈J[sgn(x∗i)(x∗i+thi)]p−2h2i<0.

This leads to a contradiction. Hence, we have and . We further assume that . Then, there also exists a vector such that and . Using the similar arguments as above, we can get a contradiction. Hence, we only have .

Statement (iii). From statement (ii), it is easy to see that has full column rank and hence . Moreover,

 σ ≥∥Ax∗−b∥q=∥AJx∗J−b∥q≥m1q−1∥AJx∗J−b∥1≥m1q−1∥AJx∗J−b∥2 ≥m1q−1(∥AJx∗J∥2−∥b∥2)≥m1q−1(√λmin(A⊤JAJ)∥x∗J∥2−∥b∥2),

where the second inequality follows from the Hölder’s inequality and the last inequality follows from . Then, we see from the above relation that

 ∥x∗∥∞≤∥x∗∥2=∥x∗J∥2≤σm1−1q+∥b∥2√λmin(A⊤JAJ).

This completes the proof. ∎

###### Remark 2.1 (The sparse solution of the Lp-L2 problem).

Theorem 2.1(ii) implies that without any condition on the sensing matrix , for any with and , while Shen and Mousavi show in [34, Proposition 3.1] that for any with if every submatrix of is invertible. Combining these results gives a formal confirmation that if , all solutions of the - problem with are sparse, but the - problem with does not have sparse solutions.

In the following, we shall derive more theoretical results for the optimal solution set of the -constrained problem (1.4) with . But we should point out that all results established later can be extended without too much difficulty to the -constrained case or other more general cases; see Remarks 2.2 and 2.3 for more details. As we shall see later, solving (1.4) with an arbitrarily sufficiently small actually gives an optimal solution of (1.4) with . This nice result is obtained based on a simple observation that the feasible set is indeed a convex polyhedron in (see Lemma B.1). Moreover, observe that can be represented as a union of orthants, denoted by for , such that any two vectors and in each have the same sign for each entry, i.e., for each , we have

 ∀x,y∈Pj⟹xiyi≥0  for  i=1,⋯,n. (2.2)

For example, when , we have , where , , and . Then, for each , one can see that is empty or a polyhedron that has a finite number of extreme points because contains no lines; see [32, Corollary 18.5.3] and [32, Corollary 19.1.1].

###### Lemma 2.1.

Let . Suppose that is an arbitrary index such that , where is defined in (2.2). Then, any optimal solution of the following problem

 minx∈Rn∥x∥pp\rm s.t% .x∈Pj∩FEA(A,b,σ,1) (2.3)

is an extreme point of .

###### Proof.

Let be an optimal solution of (2.3). Suppose that there exist such that for some . Then, we have

 ∥x∗∥pp =∥λy+(1−λ)z∥pp=∑ni=1|λyi+(1−λ)zi|p=∑ni=1(λ|yi|+(1−λ)|zi|)p ≥∑nj=1(λ|yj|p+(1−λ)|zj|p)=λ∥y∥pp+(1−λ)∥z∥pp≥∥x∗∥pp,

where the third equality follows because any have the same sign for each entry, the first inequality follows because is strictly concave for , and the last inequality follows because is an optimal solution of (2.3). Note that the above relation holds if and only if . This implies that is an extreme point of . ∎

Based on Lemma 2.1, we are able to characterize the number of the optimal solutions of problem (1.4) with . For notational simplicity, for , let

 EXT(Pj∩FEA(A,b,σ,1)):={all extreme points of Pj∩FEA(A,b,σ,1)}.
###### Proposition 2.1.

For any , the optimal solution set of (1.4) is a finite set. Moreover, the set is a finite set.

###### Proof.

For a given , let be an optimal solution of (1.4), i.e., . Then, there must exist a such that and is also an optimal solution of (2.3) with in place of . Then, it follows from Lemma 2.1 that is an extreme point of . This implies that

 SOL(A,b,σ,p,1)⊆⋃j∈{1,⋯,2n}EXT(Pj∩FEA(A,b,σ,1)). (2.4)

Note that, for each , is empty or a polyhedron that has a finite number of extreme points since contains no lines; see [32, Corollary 18.5.3] and [32, Corollary 19.1.1]. This together with (2.4) implies that is a finite set.

Moreover, since (2.4) holds for any , then we have

 ⋃0

which implies is a finite set. This completes the proof. ∎

###### Remark 2.2 (Comments on Proposition 2.1).

Proposition 2.1 is obtained based on the observation that the feasible set is a convex polyhedron in . From this observation, we can extend Proposition 2.1 to that for any , the optimal solution set of (1.1) with is a finite set. However, it is not clear whether for any , the optimal solution set of (1.1) with is a finite set. Thanks to Theorem 2.1, we can claim that if satisfies , the optimal solution set is a finite set, where is a positive integer. Indeed, in this case, by Theorem 2.1(ii), any optimal solution satisfies that and hence has at most two nonzero entries supported on . Then, there are only different choices of the support set . Let be the optimal objective value and, without loss of generality, assume that , , . Then, . Also, let and . We then see from Theorem 2.1(i) that and this equation can be further written as a -th order polynomial equation , which has at most real roots. This implies that, for each satisfying , there are only different choices of and . Hence, the optimal solution set is a finite set and the number of solutions is at most .

We next give two supporting lemmas and relegate the proofs to Appendix A.

###### Lemma 2.2.

Suppose that and satisfy

 a1≤a2≤⋯≤an,b1≤b2≤⋯≤bn,∑nj=1akj=∑nj=1bkj,k=1,⋯,n,

then .

###### Lemma 2.3.

Given with . Then, there exists a sufficiently small such that

(i) if holds, then holds for any ;

(ii) if holds, then holds for any .

Now, we are ready to present our results concerning the optimal solution set with different choices of .

###### Theorem 2.2.

There exists a such that for any . Moreover, there exists a such that for any .

###### Proof.

We prove the first result by contradiction. Assume that there does not exist a number such that, for any , . Consider a sequence with and as . Thus, from the hypothesis, for each , there exists a point such that and . Now, we consider the sequence . Note that all elements in come from the set and are not contained in . Since there are only finitely many points in (see Proposition 2.1), then there exists at least one point such that contains infinitely many , i.e., there exists a subsequence so that for all . Moreover, let . Then, for all , we have since . Then, we see that

 ∥x∗∥0=limkj→∞∥x∗∥pkjpkj≥limkj→∞∥xkj∥pkjpkj=limkj→∞∥^x∥pkjpkj=∥^x∥0,

which implies that