Sample average approximation with heavier tails II: localization in stochastic convex optimization and persistence results for the LassoSubmitted to the editors DATE.

# Sample average approximation with heavier tails II: localization in stochastic convex optimization and persistence results for the Lasso††thanks: Submitted to the editors DATE.

Roberto I. Oliveira Instituto de Matemática Pura e Aplicada (IMPA), Rio de Janeiro, RJ, Brazil. (rimfo@impa.br). Roberto I. Oliveira’s work was supported by a Bolsa de Produtividade em Pesquisa from CNPq, Brazil. His work in this article is part of the activities of FAPESP Center for Neuromathematics (grant #2013/07699-0, FAPESP - S. Paulo Research Foundation).    Philip Thompson Center for Mathematical Modeling (CMM), Santiago, Chile & CREST-ENSAE, Paris-Saclay, France. (Philip.THOMPSON@ensae.fr).
###### Abstract

We present exponential finite-sample nonasymptotic deviation inequalities for the SAA estimator’s near-optimal solution set over the class of convex stochastic optimization problems with heavy-tailed random Hölder continuous functions in the objective and constraints. Such setting is better suited for problems where a sub-Gaussian data generating distribution is less expected, e.g., in stochastic portfolio optimization. One of our contributions is to exploit convexity of the perturbed objective and the perturbed constraints as a property which entails localized deviation inequalities for joint feasibility and optimality guarantees. This means that our bounds are significantly tighter in terms of diameter and metric entropy since they depend only on the near-optimal solution set but not on the whole feasible set. As a result, we obtain a much sharper sample complexity estimate when compared to a general nonconvex problem. In our analysis, we derive some localized deterministic perturbation error bounds for convex optimization problems which are of independent interest. To obtain our results, we only assume a metric regular convex feasible set, possibly not satisfying the Slater condition and not having a metric regular solution set. In this general setting, joint near feasibility and near optimality are guaranteed. If in addition the set satisfies the Slater condition, we obtain finite-sample simultaneous exact feasibility and near optimality guarantees. Another contribution of our work is to present, as a proof of concept of our localized techniques, a persistent result for a variant of the LASSO estimator under very weak assumptions on the data generating distribution.

Key words.

AMS subject classifications.

## 1 Introduction

Consider the following set-up.

###### Set-up 1 (The exact problem).

We are given the optimization problem

 f∗:=minx∈Y f0(x) s.t. fi(x)≤0,∀i∈I, (1)

with a nonempty feasible set and a nonempty solution set . In above, the hard constraint is a (possibly unbounded) closed and convex set and . Given , we also define the -near optimal solution set . We set .

The central question of this work is:

###### Problem 1 (The approximate problem).

With the respect to the Set-up LABEL:setup:convex:optimization:problem, suppose that is directly inaccessible, but we do have access to “randomly perturbed” continuous versions of defined over (to be precised in the following). Based on this information, we choose real numbers and consider the problem

 ˆF∗:=minx∈Y ˆF0(x) s.t. ˆFi(x)≤ˆϵi,∀i∈I, (2)

with feasible set and solution set . Given , we define the -near optimal solution set . We also set . We further consider the following assumption:

###### Assumption 1.1 (Convex problem).

The functions and are continuous and convex on for all .

We then ask the following questions:

• How can we ensure that nearly optimal solutions of the accessible problem (LABEL:problem:min:SAA) are nearly optimal solutions of the original inaccessible problem (LABEL:problem:min)?

• Related to the above questions, one of our main concerns in this paper will be of localization: under convexity, “where in space” it is enough for the perturbations to be controlled?

Another possible question is to bound in terms of .

The numbers are “tuning” parameters used to handle the challenge of constraint perturbation. For this reason, we also consider, for given (seen as a “small parameter”), the relaxed feasible set

 Xγ:={x∈Y:fi(x)≤γ,∀i∈I}. (3)

For the same reason, given and , it will be convenient to define the set . We set .

In this work we are concerned with stochastic optimization (SO) where the problem suffers from random perturbations. Problem LABEL:problem:convex:perturbed:opt is made precise by the Sample Average Approximation (SAA) methodology, which is explained as follows. We consider a distribution over a sample space and suppose the data of problem (LABEL:problem:min) satisfies, for any ,

 fi(x):=PFi(x,⋅):=∫ΞFi(x,ξ)dP(ξ),(x∈Y), (4)

where the measurable function is such that the above integrals are well defined. It is then assumed that, although there is no access to , the decision maker can evaluate over an acquired independent identically distributed (i.i.d.) size- sample of . Within this framework, the SAA approach is to solve (LABEL:problem:min:SAA) with

 ˆFi(x):=ˆPFi(⋅,x)=1NN∑j=1Fi(ξj,x),(i∈I0,x∈Y), (5)

where denotes the empirical measure associated to and is the Dirac measure at the point . For notation convenience, we will omit the dependence on and and consider a probability space and a random variable with distribution such that for any in the -algebra of and for any measurable . Problem LABEL:problem:convex:perturbed:opt under (LABEL:equation:expected:data)-(LABEL:equation:empirical:data) is an relevant topic in Stochastic Optimization [16]. It also relates to problems in Mathematical Statistics such as -estimation and Empirical Risk Minimization [9]. In the first setting, knowledge of data is limited, but one can resort to samples using Monte Carlo simulation. In the second setting, a limited number of samples is acquired from measurements and an empirical contrast estimator is built to fit the data to a certain risk criteria given by a loss function over a hypothesis class.

In this work, we are mainly concerned with a nonasymptotic analysis of the SAA problem: given tolerance , state deviation inequalities which give an explicit non-asymptotic rate in which guarantee that with exponential high probability the optimal value and (nearly) optimal solutions of (LABEL:problem:min:SAA) are close enough to the optimal value and (nearly) optimal solutions of (LABEL:problem:min). See, e.g., [16, 12] and references therein concerning the classical asymptotic analysis and also previous work regarding the nonasymptotic analysis. In [12], the authors address the nonasymptotic analysis of the SAA methodology for general (possibly nonconvex) SO problems with respect to two main concerns:

• The presence of a heavy-tailed data generating distribution,

• The presence of random perturbations in the constraints.

The concern (1) is mainly motivated by the fact that most of the previous literature in SO assumes a light-tailed distribution (e.g. generating bounded or sub-Gaussian data), an assumption which is unrealistic in many problem instances (e.g. in stochastic portfolio optimization). With respect to (2), most of the developed work has been done without stochastic constraints. We refer the reader to the recent review [6] advocating this gap in the literature. In [12], the authors were mainly concerned with nonasymptotic results which ensure, with high-probability, simultaneous feasibility and optimality guarantees. Moreover, they were concerned with imposing weaker assumptions on the regularity of the optimization problem than given in previous works. In particular, they give a nonasymptotic analysis assuming only metric regularity of the feasible set without the need of metric regularity of the solution set.111Metric regularity of the solution set often imposes stronger assumptions of the objective functions such as, e.g., strong-convexity.

From now on we will assume the following standard condition in SO [16].

###### Assumption 1.2 (Hölder continuous heavy-tailed data).

Let be a norm on . For all , there exist random variable with and , such that a.e. and for all ,

In [12], the authors show that Assumption LABEL:assump:random:holder:continuity is enough in order to obtain nonasymptotic bounds for the SAA problem in high probability without requiring in addition the classical assumption of a sub-Gaussian distribution [16]. A price to pay in imposing this significantly weaker assumption is that the bounds given depend on data dependent quantities.

Even in the deterministic case, regularity assumptions on the feasible set are needed. We shall study Set-up LABEL:setup:convex:optimization:problem and Problem LABEL:problem:convex:perturbed:opt under the two following different assumptions on the regularity of . These were also considered in [12] regarding general (possibly nonconvex) problems. In the following, for any .

###### Assumption 1.3 (Convex problem with metric regular feasible set).

The feasible set is metric regular: there exists such that

###### Assumption 1.4 (Convex problem with Slater constraint qualification).

There exist such that

We remark that Assumption LABEL:assump:SCQ, under compactness of , implies Assumption LABEL:assump:metric:regularity:set by Robinson’s theorem [15]. Note that Assumption LABEL:assump:SCQ holds in particular if there exists an interior solution. As we shall see, this is satisfied by a variant of the Lasso estimator [17, 2] studied in this work. In that case, tighter bounds can be achieved. Nevertheless, the weaker Assumption LABEL:assump:metric:regularity:set holds also for convex sets which are not strictly feasible. One example is a polyhedron, a result due to Hoffmann [5]. Our analysis could be also carried out for Hölderian metric regular sets: for some (see Section 4.2 in [14]). Our results can then be translated to this setting almost immediately with the additional condition number . In this case, another example of convex problems which may not satisfy Assumption LABEL:assump:SCQ are convex sets with polynomial constraints, such as, e.g., convex quadratic constraints, a result related to Lojasiewicz’s inequality [8, 14].

### 1.1 Related work and contributions

In the following we resume our main contributions.

(i) Localized finite-sample inequalities for stochastic convex problems: In [12], the authors consider items (1)-(2) above for general (possibly nonconvex) SO problems. Although it includes a broader class of problems, the price to be paid with such generality is that one must consider the whole feasible set in order to state nonasymptotic deviation inequalities for the SAA solution set. More precisely, for the concentration of measure property to hold, the variance-type error induced by sample average approximation depends on the diameter and metric entropy of the whole feasible set (see Section LABEL:subsection:results:deviation:inequalities for a precise statement). For example, if the objective and constraints are Lipschitz continuous, is an upper bound on the (random) Lipschitz moduli and Assumptions LABEL:assump:random:holder:continuity-LABEL:assump:metric:regularity:set or Assumptions LABEL:assump:random:holder:continuity-LABEL:assump:SCQ hold, the bounds stated in [12] ensure joint feasibility and optimality guarantees with probability at least as long as the sample size with , where is the diameter of , and a is a quantity related to the metric entropy of (typically of the order in our setting. See [12], Section 3.1 for details).

In this work, one of our main contributions is to improve the analysis of [12] by focusing on convex problems with the aim of obtaining sharper results. Under convexity, we obtain significantly tighter and localized bounds for the statistical error of the SAA estimator with respect to joint feasibility and optimality guarantees: essentially, we show that it is enough to require with the variance depending only on the diameter and metric entropy of the -near-optimal solution set (or explicit Hausdorff-distance approximations of if the constraints are perturbed and the set is not strictly feasible). In fact, it is enough to consider even proper subsets of in which the optimality gap and the constraints are -active (that is, equal to ). We refer to Section LABEL:subsection:results:deviation:inequalities, Theorems LABEL:thm:fixed:set:SAA-LABEL:thm:interior:approx:SAA and Corollary LABEL:cor:interior:solution:SAA for a more detailed description of such type of results. This is a significant improvement since the diameter and metric entropy of (and of its proper active subsets) are typically much smaller than of the diameter and metric entropy of the whole feasible set .222The extreme case of this is when there exists an unique solution and is a small region concentrated around . This is the case, e.g., of well-conditioned strongly convex problems. As a simple example, consider and where is the -norm. Then our bounds show that and a sample size of is enough to obtain -near optimality guarantees, without the typical dependence.

We remark that our analysis only requires metric regular convex feasible sets (possibly not satisfying the Slater condition and without a metric regular solution set). In such case, our bounds imply near feasibility guarantees. Nevertheless, our results in Theorems LABEL:thm:exterior:approx:SAA-LABEL:thm:interior:approx:SAA and Corollary LABEL:cor:interior:solution:SAA also give an interesting finite-sample “transition regime” for strictly feasible convex sets satisfying Assumption LABEL:assump:SCQ. For any sample size , it is guaranteed a -near feasibility and -optimality. For larger sample size satisfying and , exact feasibility and -optimality is guaranteed with high-probability, where is an error associated to set approximation (see Section LABEL:subsection:deterministic:perturbation for details).333Here, and change but are still given in terms of . If the problem admits an interior solution, then can be removed.

Our analysis follows essentially from two consecutive steps. First, we derive localized deterministic error bounds for perturbed convex problems in terms of deviations of the objective and constraint functions (Theorems LABEL:thm:convex:solution:approximation:fixed:set-LABEL:thm:convex:solution:approximation in Section LABEL:subsection:deterministic:perturbation). Secondly, we use concentration inequalities in order to control such deviations over prescribed sets with high-probability. We believe that such deterministic perturbation results stated in Section LABEL:subsection:deterministic:perturbation should be of independent interest.

(ii) A persistent result for the LASSO estimator under weak moment assumptions: A second contribution of this work is to present an application of our previous analysis to least-squares estimators with LASSO-type constraints [17, 2]. These are fundamental problems in Mathematical Statistics which have been analyzed in many works. The main goal of our application to these classes of problems is to highlight two points. On the one hand, our methodology gives optimal persistent results for these type of estimators under weak assumptions on the data generating distribution. On the other hand, our proofs will clarify the different roles of the assumptions and conditions of our deterministic perturbation Theorem LABEL:thm:convex:solution:approximation as a “proof of concept”. We refer to Section LABEL:subsection:lasso and Theorem LABEL:thm:lasso for more details.

The paper is organized as follows. In Section LABEL:section:statement:of:results we present and discuss our main results mentioned above in separated Subsections LABEL:subsection:deterministic:perturbation-LABEL:subsection:lasso. The proofs are presented in Section LABEL:section:proofs. Some needed concentration inequalities are presented in the Appendix.

We present some needed notation. For a set , we will denote its topological interior and frontier respectively by and and set and , where is the norm in Assumption LABEL:assump:random:holder:continuity. The excess or deviation between compact sets is defined as . Note that iff , where denotes the Minkowski’s sum and denotes the closed unit ball with respect to the norm . Given , will denote its -norm and will denote the number of nonzero entries of . For , we write if there exists constant such that . Also, and . For , we denote . Unless otherwise stated, the entries of a matrix and a vector will be denoted by and respectively. For random variables , denotes the -algebra generated by . Given -algebra of sets in , the conditional expectation with respect to . We will write RHS for “right hand side”.

## 2 Statement of main results

In this section we state the main results of this work. In Section LABEL:subsection:deterministic:perturbation we present deterministic perturbation results for convex optimization problems in terms of localized sets which are closed to the original solution set. These deterministic results are later applied with concentration of measure techniques in order to obtain exponential nonasymptotic deviation inequalities presented in Section LABEL:subsection:results:deviation:inequalities. These inequalities near-optimal solutions and optimal value of Problem LABEL:problem:min:SAA

### 2.1 Localized perturbation of convex problems

In this section we state deterministic results with the following content: feasibility and optimality deviations are derived in terms of bounds on the perturbations over specific localized sets: in the presence of convexity of the optimization problem, one does not need to control such perturbations over the whole feasible set but just over its near-optimal solution set (in a precise way stated in the following). In fact, even smaller subsets determined by active regions of the optimality gap and constraints are enough.

We first consider the case the constraints are fixed (). In order to state a localized deviation inequality, we define, for given , the -active level set of the optimality gap:

 X∗0,γ:={x∈X:f(x)=f∗+γ}. (6)

Certainly, (often with an proper inclusion).

###### Theorem 2.1 (Localized optimality deviation with fixed constraints).

Suppose that, in the Set-up LABEL:setup:convex:optimization:problem and Problem LABEL:problem:convex:perturbed:opt, we have fixed constraints () and

• (Convex problem) Assumption LABEL:assump:convex:problem holds.

• (Existence of solutions) There exists .

Let , and consider the quantity

 ^Δ(x∗):=0∨sup{t−[ˆF(x)−ˆF(x∗)]:x∈X∗0,t}.

Suppose further that the following condition holds:

• .

Then .

Theorem LABEL:thm:convex:solution:approximation:fixed:set establishes a localized perturbation result for convex problems with fixed constraints in the sense that it is enough to control over instead of the whole feasible set .

We now consider the case the constraints are also perturbed (). It will be convenient to define the following function deviations: given and , we define:

 ^δ(y,x) := 0∨[ˆF(y)−ˆF(x)−(f(y)−f(x))], ^Δ(y,x) := 0∨[f(y)−f(x)−(ˆF(y)−ˆF(x))], ^δi(x) := 0∨[ˆFi(x)−fi(x)], ^Δi(x) := 0∨[fi(x)−ˆFi(x)].
###### Theorem 2.2 (Localized optimality deviation with perturbed constraints).

Consider the Set-up LABEL:setup:convex:optimization:problem and Problem LABEL:problem:convex:perturbed:opt. Suppose that

• (Convex problem) Assumption LABEL:assump:convex:problem holds.

• (Interior near-solution of the relaxed problem) Let , , and . Suppose there exist and such that and for all .

Given the parameters specified in (ii), we define the set

 ˆXy∗t1:={x∈ˆX:ˆF(x)≤ˆF(y∗)+t1},

and the quantities:

 ^Δ(x∗|y∗,γ) := sup{^Δ(x,x∗):x∈(Xγ)∗t+t2∩ˆXy∗t1,f(x)−minXγf=t+t2}, ^Δi(y∗,γ) := 0∨sup{γ−ˆFi(x):x∈(Xγ)∗t+t2∩ˆXy∗t1,fi(x)=γ},(i∈I).

Suppose further that the following conditions hold:

• .

• , .

• , .

Then and , . In other words: .

Theorem LABEL:thm:convex:solution:approximation establishes, in a somewhat general form, a localized perturbation result for convex problems with perturbed constraints: it is enough to control and at points and over prescribed approximations of the near-optimal solution set given tolerance (or ever “tighter” associated active-level sets).

It is instructive to illustrate the use of Theorem LABEL:thm:convex:solution:approximation in our specific context. Consider first the case where Assumption LABEL:assump:convex:problem holds and we use the following exterior set approximation of :

 ˘Xγ:=(X+γB)∩Y. (7)

Then, for , our localized perturbation will be given in terms of the sets:

 ˘X∗0,γ :={x∈˘Xγ:f(x)=f∗+γ}, (8) ˘X∗i,γ :={x∈˘Xγ:f(x)≤f∗+γ,fi(x)=γ}(i∈I). (9)

Note that “approximates” defined in (LABEL:equation:active:level:set:opt:gap) up to the feasibility error . Analogously, if

 Xi,γ:={x∈X:fi(x)=γ},

denotes the -active level set of the th constraint, then “approximates” up to the feasibility error . Under this set-up and given a tolerance , assumptions in items (i)-(ii) of Theorem LABEL:thm:convex:solution:approximation hold true for specific parameter choices with . Also, conditions (C1)-(C3) take a specific form in terms of a localized perturbation control restricted to the sets . Theorem LABEL:thm:convex:solution:approximation then establishes the following near-feasibility and near-optimality guarantees: and for all . We refer to FACT 1 in the proof of Theorem LABEL:thm:exterior:approx:SAA in Section LABEL:subsection:proofs:exponential:deviations for more details.

Under the stronger Assumption LABEL:assump:SCQ and with the use of the interior set approximation for given , our localized bounds will be given in terms of the sets:

 ˚X∗0,γ :={x∈X:f(x)=f∗+γ+gap(γ)}, (10) ˚X∗i,γ :={x∈X:f(x)≤f∗+γ+gap(γ),fi(x)=0},(i∈I), (11)

where

 gap(γ):=minX−γf−f∗, (12)

is the optimality error associated to set approximation. Note that, up to , the set approximates and, for , approximates the set

 X∗i,γ :={x∈X:f(x)≤f∗+γ,fi(x)=0}, (13)

that is, . Under this set-up and given a tolerance , assumptions in items (i)-(ii) of Theorem LABEL:thm:convex:solution:approximation hold true for specific parameter choices with . Moreover, conditions (C1)-(C3) require the control of perturbations over the localized sets . Theorem LABEL:thm:convex:solution:approximation then establishes the following exact feasibility and near-optimality guarantees: We refer to FACT 1 in the proof of Theorem LABEL:thm:interior:approx:SAA in Section LABEL:subsection:proofs:exponential:deviations for more details.

Finally, if, in addition, there exists an interior solution satisfying the Slater condition , our localized bounds can be given in terms of even “tighter” sets: the sets in (LABEL:equation:active:level:set:opt:gap) and (LABEL:equation:active:level:set:constraints). In that case, for any tolerance , tuning sequence and the perturbation control over the sets , Theorem LABEL:thm:convex:solution:approximation guarantees the following tighter exact feasibility and near optimality: See the proof of Corollary LABEL:cor:interior:solution:SAA in Section LABEL:subsection:proofs:exponential:deviations for details.

### 2.2 Localized nonasymptotic deviation inequalities for stochastic convex problems

The following exponential nonasymptotic deviation inequalities quantify the order of the size of the sample necessary so that and is close to in some sense for a given tolerance . The order of will be a function of , the confidence level and some variance which will depend on intrinsic condition numbers of the problem: the Hölder parameters of the objective and the diameter and metric entropies of the sets , or defined respectively in (LABEL:equation:set:exterior:opt:gap)-(LABEL:equation:set:exterior:constraints), (LABEL:equation:set:interior:opt:gap)-(LABEL:equation:set:interior:constraints) and in (LABEL:equation:active:level:set:opt:gap) and (LABEL:equation:active:level:set:constraints). The assumptions of the problem determine which classes of these sets appear in our deviation inequalities.

To state our deviation inequalities, we shall need the following definition of metric entropy of a set. In terms of uniform sample average approximation, this turns out to be an useful notion of “complexity” of a set.

###### Definition 2.3 (Metric entropy).

Let be a totally bounded metric space. Given , a -net for is a finite set of maximal cardinality such that for all with , one has . The -entropy number is . The function is called the metric entropy of .

With respect to Definition LABEL:definition:metric:entropy, we also define, for given :

 Aα(M) := ∞∑i=1D(M)α2iα ⎷H(D(M)2i,M)+H(D(M)2i−1,M)+ln[i(i+1)], (14)

where denotes the diameter of .

With respect to Assumption LABEL:assump:random:holder:continuity, we will need the following definitions: for and , we define the Hölder moduli

 L2i := PLi(⋅)2,ˆL2i:=ˆPLi(⋅)2, (15)

and the pointwise variances

 σi(x)2:=P[Fi(x,⋅)−PFi(x,⋅)]2,ˆσi(x)2:=ˆP[Fi(x,⋅)−PFi(x,⋅)]2. (16)

For given , we also define the quantity

 ˆσ0(Z) :=Aα0(Z)√ˆL20+L20, (17)

We first state the case there are no stochastic constraints (). Recall definitions (LABEL:equation:active:level:set:opt:gap), (LABEL:equation:A:alpha) and (LABEL:equation:variance:def1).

###### Theorem 2.4 (Fixed feasible set).

Consider the Set-up LABEL:setup:convex:optimization:problem and Problem LABEL:problem:convex:perturbed:opt under Assumptions LABEL:assump:convex:problem-LABEL:assump:random:holder:continuity and (LABEL:equation:expected:data)-(LABEL:equation:empirical:data). Suppose and . Then, for any and , with probability :

 N≥O(1)ˆσ0(X∗0,2ϵ)2ϵ2ln(1/p)⟹ˆX∗ϵ⊂X∗2ϵ.

Hence, for fixed sets, the minimum sample size is proportional to the variance over the localized set . This is the variance associated to the sample average approximation of the objective of a convex problem.

We now consider the case the soft constraints are perturbed (). In addition to the quantities (LABEL:equation:holder:moduli)-(LABEL:equation:variance:def1), we will need to define some variance-type quantities associated to the sample average approximation of constraints. Given a point , we define . Recall the set definitions (LABEL:equation:set:exterior:relaxation)-(LABEL:equation:set:exterior:constraints) with respect to an exterior approximation. Recall also (LABEL:equation:A:alpha). For and , we define:

 ˆσI,\tiny{out}(γ) :=supi∈I{Aαi(˘X∗i,γ)√ˆL2i+L2i}.

Under Assumption LABEL:assump:metric:regularity:set and an exterior set approximation we have the following result.

###### Theorem 2.5 (SAA with exterior approximation of a metric regular feasible set).

Consider the Set-up LABEL:setup:convex:optimization:problem and Problem LABEL:problem:convex:perturbed:opt under Assumptions LABEL:assump:convex:problem-LABEL:assump:random:holder:continuity and (LABEL:equation:expected:data)-(LABEL:equation:empirical:data). Suppose satisfies Assumption LABEL:assump:metric:regularity:set, and for some . Let and . Then, for any and any , with probability :

 ˆϵi≡ϵ and N≥O(1)^σ2ϵ2[lnm+ln(1/p)]⟹{D(ˆX,X)≤2cϵ,∀ˆx∈ˆX∗ϵ,f(ˆx)≤f∗+2ϵ.

with .

Hence, for convex problems with metric regular feasible set and perturbed contraints, the minimum sample size which guarantees a -near feasibility and -optimality is proportional to the maximum variance associated to the localized sets .

We now consider the case the stronger Assumption LABEL:assump:SCQ holds and an interior set approximation is used. Recall definitions in (LABEL:equation:set:interior:opt:gap)-(LABEL:equation:gap) and (LABEL:equation:A:alpha). We define, for given ,

 ˆσI,\tiny{inn}(γ):=supi∈I{Aαi(˚X∗i,γ)√ˆL2i+L2i}.
###### Theorem 2.6 (SAA with interior approximation of a strictly feasible set).

Consider the Set-up LABEL:setup:convex:optimization:problem and Problem LABEL:problem:convex:perturbed:opt under Assumptions LABEL:assump:convex:problem-LABEL:assump:random:holder:continuity and (LABEL:equation:expected:data)-(LABEL:equation:empirical:data). Suppose satisfies Assumption LABEL:assump:SCQ, for all and for some . Let and . Then, for any and for any and , with probability :

 ˆϵi≡−ϵ and N≥O(1)^σ2ϵ2[lnm+ln(1/p)]⟹ˆX∗ϵ⊂X∗2ϵ+gap(2ϵ),

with .

Hence, for strictly feasible convex problems with perturbed contraints, a sample size larger than guarantees exact feasibility and a -near optimality if is proportional to the maximum variance associated to the localized sets .s

If, additionally, the constraints of the convex problem are perturbed but an interior solution exists, we have the following tighter result. We define, for given ,

 ˆσI,\tiny{int}(γ) :=supi∈I{Aαi(X∗i,γ)√ˆL2i+L2i}.
###### Corollary 2.7 (SAA with interior set approximation and interior solution).

If, in addition to assumptions of Theorem LABEL:thm:interior:approx:SAA, Assumption LABEL:assump:SCQ holds for some with , then for any and , with probability :

 ˆϵi≡−ϵ and N≥O(1)^σ2ϵ2[lnm+ln(1/p)]⟹ˆX∗ϵ⊂X∗2ϵ,

with .

Under an interior solution, Corollary LABEL:cor:interior:solution:SAA states a tighter bound in comparison to Theorem LABEL:thm:interior:approx:SAA: the error is removed from the optimality guarantee and from the localized sets appearing in the variance-type quantities.

### 2.3 An application to high-dimensional least squares with LASSO-type constraints

In this section we discuss an application of Theorem LABEL:thm:convex:solution:approximation to least squares-type problems with random samples. These are fundamental problems in Mathematical Statistics and much has been said about them. The main goal of this section is to observe two points. On the one hand, our methodology gives optimal results for these problems under weak assumptions on the data generating distribution. On the other hand, our proofs will clarify the different roles of the assumptions and conditions of the deterministic Theorem LABEL:thm:convex:solution:approximation as a “proof of concept”. Our application will rely on a variant of the LASSO method for least squares in very high dimensions.

The sample space considered is and a point will be decomposed as where and . We consider the Set-up LABEL:setup:convex:optimization:problem and Problem LABEL:problem:convex:perturbed:opt with respect to the loss function

 F(x,ξ):=[y(ξ)−⟨x(ξ),x⟩]2(x∈Rd),

where denotes the standard inner product. As described in the introduction, we define the risk and the empirical risk , where is the empirical distribution with respect to a size- i.i.d. sample of .

Finally, we let denote a closed convex set. We will be interested in the problem of nearly minimizing over , or some subset of thereof, via estimators . The population and empirical design matrices, and , defined respectively via

 ∀v∈Rd,Σv := P⟨v,x(⋅)⟩x(⋅), (18) ˆΣv := ˆP⟨v,x(⋅)⟩x(⋅), (19)

will be important in what follows.

Recall that the usual ordinary least squares method minimizes . When , this method typically produces a good approximation of the minimizer of . This is not true in the setting, where the least squares estimator is not consistent. For this setting, Tibshirani [17] proposed minimizing subject to a constraint on the norm of : for some ,

 ˆxR\tiny{lasso},0:=argmin{ˆF(x):x∈Rd,∥x∥1≤R}.

Since then there has been an explosion of theoretical and practical work on the LASSO. Most of the current literature considers a penalized variant of this estimator. Let’s denote the -th coordinate of as and the diagonal matrix in with entries as . Given , we define the following diagonal matrices in :

 ˆDq:=diag(q√ˆP|x(⋅)[ℓ]|q)dℓ=1,Dq:=diag(q√P|x(⋅)[ℓ]|q)dℓ=1. (20)

It is instructive to remark that the diagonal elements of the matrices in (LABEL:def:population:matrix)-(LABEL:def:design:matrix) and (LABEL:def:pen:diag:matrix:empirical) are related by and . Bickel, Ritov and Tsybakov [2] analyze the following penalized estimator in the fixed design setting: for some ,

 ˆxλ\tiny{lasso},1:=argmin{ˆF(x)+λ∥ˆD2x∥1:x∈Rd}.

Their main result is that this estimator satisfies so-called oracle inequalities under strong distribution assumptions on (e.g. sub-Gaussian “noise” terms) and additional assumptions on the design matrix. These inequalities aim at qualifying the accuracy of LASSO with respect to minimizing the risk and selecting the “best” penalization parameter (corresponding to a unknown sparsity pattern). There have been many other results on this problem, many of which require conditions on the design matrix that ensure that small subsets of rows are “well conditioned”.

In what follows we discuss a different kind of result on the persistency of the LASSO estimator, a result which relates to our general question addressed in Problem LABEL:problem:convex:perturbed:opt: as stated in next Theorem LABEL:thm:lasso, with exponential high-probability and with a sample size which depends only logarithmic on the dimension , a variant of the LASSO estimator minimizing the constrained empirical risk is guaranteed to solve the original constrained risk minimization problem given in terms of the unknown distribution . The main attraction of this result is that it requires only very weak assumptions on the data generating distribution . Related results have been proven e.g. in [1]. Our main purpose here is to show that similar (and in some ways improved) results are consequences of our framework discussed in Sections LABEL:subsection:deterministic:perturbation-LABEL:subsection:results:deviation:inequalities.

###### Theorem 2.8 (A persistent result for LASSO-type constraints with heavier tails).

Assume is a random vector with finite th moments. Considering definition (LABEL:def:population:matrix), we assume that there exist numbers and such that

 ∀v∈Rd, P{ξ∈Ξ:|⟨v,x(ξ)⟩|>u√⟨v,Σv⟩}≥p, (21) ∀1≤ℓ≤d, 7√P|x(⋅)[ℓ]|7≤C√[3]P|x(⋅)[ℓ]|3. (22)

Let be the empirical distribution corresponding to a size- i.i.d. sample of . Choose and define:

 ˆx\tiny{\emph{lasso}}:=argminx∈Rd{