Foundations of gauge and perspective duality1footnote 11footnote 1July 5, 2019

Foundations of gauge and perspective duality111July 5, 2019

A.Y. Aravkin Department of Applied Mathematics, University of Washington, Seattle (sasha.aravkin@gmail.com). Research supported by the Washington Research Foundation Data Science Professorship.    J.V. Burke Department of Mathematics, University of Washington, Seattle (burke@uw.edu). Research supported in part by NSF award DMS-1514559.    D. Drusvyatskiy Department of Mathematics, University of Washington, Seattle (ddrusv@uw.edu). Research partially supported by AFOSR YIP award FA9550-15-1-0237.    M.P. Friedlander Departments of Computer Science and Mathematics, University of British Columbia, Vancouver, BC, Canada (mpf@cs.ubc.ca). Research supported by the ONR award N00014-16-1-2242.    K. MacPhee Department of Mathematics, University of Washington, Seattle (kmacphee@uw.edu).
Abstract

Common numerical methods for constrained convex optimization are predicated on efficiently computing nearest points to the feasible region. The presence of a design matrix in the constraints yields feasible regions with more complex geometries. When the functional components are gauges, there is an equivalent optimization problem—the gauge dual—where the matrix appears only in the objective function and the corresponding feasible region is easy to project onto. We revisit the foundations of gauge duality and show that the paradigm arises from an elementary perturbation perspective. We therefore put gauge duality and Fenchel duality on an equal footing, explain gauge dual variables as sensitivity measures, and show how to recover primal solutions from those of the gauge dual. In particular, we prove that optimal solutions of the Fenchel dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. The gauge duality framework is extended beyond gauges to the setting when the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.

c
\newsiamremark

remarkRemark

onvex optimization, gauge duality, nonsmooth optimization

{AMS}

90C15, 90C25

1 Introduction

This work revolves around optimization problems of the form

(G)

where is a linear map, is an -vector, and and are closed gauges – nonnegative, sublinear functions that vanish at the origin. In statistical and machine learning applications, is often a structure-inducing regularizer, such as the elastic net for group detection [23]. The function may be interpreted as a penalty that measures the misfit between measurements and the prediction . For example, can be the -norm or the Huber [15] function in the case of regression, or the logistic loss, used for classification problems [16, 1]. In high-dimensional applications, the number of measurements is often much smaller than the dimension of the predictor , and the matrix is only available through matrix-vector products and .

The formulation Eq. G gives rise to two different “dual” problems:

(L)
(G)

Here and are the polar gauges; see LABEL:sect:prelim for a precise definition. The first formulation Eq. L is the classical Lagrangian (or Fenchel) dual, routinely used in the design and analysis of algorithms. Under mild interiority conditions, equality

holds and the optimal value of Eq. L is attained. The second formulation Eq. G is called the gauge dual and is less well-known. Gauge duality was introduced by Freund [11] for minimizing nonnegative sublinear functions over convex sets, and subsequently examined by Friedlander, Macêdo, and Pong [13]. Under standard interiority conditions, equality

holds and the optimal value of Eq. G is attained.

The gauge dual Eq. G can be preferable for computation to the the primal Eq. G and the Lagrangian dual Eq. L. Indeed, numerous convex optimization algorithms rely on being able to project onto the feasible region easily. The appearance of the matrix in the constraints of both Eq. G and Eq. L precludes such methods from being directly applicable. In contrast, the design matrix appears in the gauge dual Eq. G only in the objective. Moreover, typical applications occur in the regime . For example, is often logarithmic in [7, 6, 21, 10]. Since the decision variables of Eq. G lie in the small dimensional space , projections onto the feasible region can be computed efficiently, for example by interior-point methods. Friedlander and Macêdo [12] use gauge duality to derive an effective algorithm for an important class of spectral optimization problems that arise in signal-recovery applications, including phase recovery and blind deconvolution.

1.1 A roadmap

Broadly speaking, our goals are two-fold. First, we revisit the foundations of gauge duality in Section 3, reformulating these as in the modern approach to duality through a “perturbation framework”. That is, following Rockafellar and Wets [20, 11.H], consider an arbitrary convex function on and define the value functions

(1.1)

This set-up immediately yields the primal-dual pair

(1.2)

Fenchel duality is a standard example that follows from an appropriate choice of . We show that gauge duality fits equally well into this framework under a judicious choice of the perturbation function , thereby putting Fenchel and gauge dualities on an equal footing. Strong duality, primal-dual optimality conditions, and an interpretation of the gauge dual solutions as sensitivity measures—i.e., subgradients of the value function — quickly follow (Section 3.2). These results, in particular, answer the main open question posed by Freund in his original work [11] on an interpretation of gauge dual variables as sensitivity measures.

We also prove a striking relationship between optimal solutions of the Lagrangian dual of the gauge dual and the primal problem: the two coincide up to scaling by the optimal value function (Section 3.4). Consequently Lagrangian primal-dual methods applied to the gauge dual can always be trivially translated to methods on the original primal problem. We explore this viewpoint in Section 3.4 and illustrate its application to Chambolle and Pock’s primal-dual algorithm [8] in Section 7.

Our second aim is to extend the applicability of the gauge duality paradigm beyond gauges to capture more general convex problems.  Section 4 extends gauge duality to problems that involve convex functions that are merely nonnegative. The approach is based on using the perspective transform

of a convex function to reduce to the gauge setting. We call the resulting dual problem the perspective dual. The perspective-polar transformation, needed to derive the perspective dual problem, is developed in Section 4. We provide concrete illustrations of perspective duality for the logistic loss and the family of piecewise linear-quadratic functions in Section 5, which are used often in data-fitting applications. Numerical illustrations for several case-studies of perspective duals appear in Section 7.

1.2 Notation

The derivation of our results relies mainly on standard notions from convex analysis [18]. We define these briefly below.

Throughout the paper, denotes the extended-real-line, while and denote general closed convex functions. We routinely use the symbols and for an -vector and a matrix, respectively. The domain and epigraph of are the sets

A function is proper if it has nonempty domain and is never , and it is called closed if its epigraph is closed, which corresponds to lower semi-continuity [18, Theorem 7.1]. The closure of , denoted , is the function whose epigraph is the closure of . The indicator function of a set is denoted by

We use the symbol to denote the interior of relative to its affine span. It is a standard fact that if a convex function is finite at a point , then is proper. We will use this observation implicitly in what follows.

The conjugate of a proper convex function is , which is a proper closed convex function [18, Theorem 12.2]. In particular, for any convex set , the conjugate is called the support function of . For any , the subdifferential of at is the set For any convex cone , the polar cone is the set

Observe the equality for any convex cone .

For any convex function , its perspective is the function whose epigraph is the cone generated by the set . Equivalently, we may write

Though may not be closed, the closure of admits the convenient description

where is the recession function of [18, Theorem 8.5]. Importantly, when is a proper convex function, is positively homogeneous. A calculus for the perspective transform is described by Aravkin, Burke, and Friedlander [2, Section 3.3]. We often apply more than one transformation to a function, and in those cases, the multiple transformations are applied in the order that they appear; e.g.,

2 Gauge optimization and duality

In this section, we review the main elements of gauge duality. The original description is due to Freund [11], but here we summarize the more recent treatment given by Friedlander, Macêdo, and Pong [13].

A convex function is called a gauge if it is nonnegative, positively homogeneous, and vanishes at the origin. The symbols and will always denote closed gauges. The polar of a gauge is the function defined by

(2.1)

which is also a gauge. For example, if is a norm than is the corresponding dual norm. Note the equality

It follows directly from the definition and positive homogeneity of that the polar can be characterized as the support function to the unit level set:

(2.2)

Moreover, and its polar satisfy a Hölder-like inequality

(2.3)

which we refer to as the polar-gauge inequality.

Define the following primal and dual feasible sets:

(2.4)

The gauge primal (G) and dual (G) problems are said to be feasible, respectively, if the following intersections are nonempty:

The primal and dual problems are relatively strictly feasible, respectively, if the following intersections are nonempty:

If the intersections above are nonempty, with interior replacing relative interior, then we say that the problems are strictly feasible, respectively.

Assume throughout that . Otherwise, contains the origin, which is a trivial solution of (G). We generally assume that is positive, though in certain cases, it is useful to allow and then assume that ; this allows us to extend many of the following results to problems where the feasible set is affine.

Lemma 1 (Primal-dual constraint activity)

If the primal optimal value is attained at with , then . Similarly, if the dual optimal value is attained at with , then .

Proof

We prove the contrapositive. If , then by lower-semicontinuity of , we have for all close to . Consequently, we deduce , with strict inequality unless . Since is optimal, we conclude . The proof of the dual statement is similar.

The duality relations in the gauge framework follow analogous principles to Lagrange duality, except that instead of an additive relationship between the primal and dual optimal values and , the relationship is multiplicative. The next result summarizes weak and strong duality for gauge optimization.

Theorem 2.1 (Gauge duality [13])

The following relationships hold for the gauge primal-dual pair (G) and (G).

  1. (Weak duality) If and are primal and dual feasible, then

  2. (Strong duality) If the dual (resp. primal) is feasible and the primal (resp. dual) is relatively strictly feasible, then and the gauge dual (resp. primal) attains its optimal value.

3 Perturbation analysis for gauge duality

Modern treatment of duality in convex optimization is based on an interpretation of multipliers as giving sensitivity information relative to perturbations in the problem data. No such analysis, however, has existed for gauge duality. In this section we show that for a particular kind of perturbation, the gauge dual (G) can in fact be derived via such an approach. This resolves a question posed by Friedlander, Macêdo, and Pong [13].

3.1 The perturbation framework

In this section we review the perturbation argument for deriving duality. Our summary follows the discussion in Rockafellar and Wets [20, 11.H]. Fix an arbitrary convex function and consider the value functions defined by (1.1)–(1.2). Observe the equality . Rockafellar-Fenchel duality for the problem

where and are closed and convex, is obtained by setting . In that case, the primal-dual pair takes the familiar form

Under certain conditions, described in the following theorem, strong duality holds, i.e., , and the optimal value is attained.

Theorem 3.1 (Multipliers and sensitivity [20, Theorem 11.39])

Consider the primal-dual pair (1.2), where is proper, closed, and convex.

  1. The inequality always holds.

  2. If , then equality holds and the infimum is attained, if finite. Similarly, if , then equality holds and the infimum is attained, if finite.

  3. The set is nonempty and bounded if and only if and is finite, in which case .

  4. The set is nonempty and bounded if and only if and is finite, in which case .

  5. Optimal solutions are characterized jointly through the conditions

Remark 1

Part (b) of Theorem 3.1 is stated in [20, Theorem 11.39] with interior in place of relative interior. We give a quick argument here for the claimed result with relative interiors. Suppose . If , then equality follows by (a). Hence we can suppose that is finite, and therefore is proper. Thus there exists a subgradient [18, Theorem 23.4]. By the subgradient inequality, the following holds for any :

Taking the infimum over , we recognize the right-hand side as . We deduce . Combining this with Part (a) of Theorem 3.1 yields and we see that attains The symmetric argument for the case is analogous.

3.2 Derivation of gauge duality as a perturbation

We now show that the problems Eq. G and Eq. G constitute a primal-dual pair under the framework set out by Theorem 3.1. The key is to postulate the correct pairing function .

3.2.1 The perturbation function

Our starting point is the primal perturbation scheme:

Note that is equal to the optimal value of the primal Eq. G. Since and multiply each other in the description above, it is convenient to reparametrize the problem by setting and . By positive homogeneity of and , this yields the equivalent description

where is the unit level set for . In particular, this reparameterization shows that is convex because it is the infimal projection of a convex function; it is proper when the primal Eq. G is feasible. Note that minimizing is equivalent to minimizing . With this in mind, define the convex function :

(3.1)

The function is proper and closed because , and and are closed. The associated infimal projection

is essentially the negative reciprocal of . We formalize this in the following lemma. We omit the proof since it is immediate.

Lemma 2

Equality holds provided that is nonzero and finite. Moreover, implies , and implies .

We now compute the conjugate of , which is needed to derive the dual value function . By Rockafellar and Wets [20, Theorem 11.23(b)],

where the closure operation is applied to the function on the right-hand-side with respect to the argument . Because is nonsingular, there is a unique vector that satisfies the constraints in the description of . The closure operation therefore turns out to be superfluous, and we can further simplify the description to

Taking into account the equalities and , this expression transforms to

In particular, we conclude

(3.2)

Thus the dual problem recovers, up to a sign change, the required gauge dual problem.

We are thus justified in defining the dual perturbation function or equivalently

Note that is the optimal value of (G). In summary, and , respectively, play the roles of and as defined in (1.1) and used in Theorem 3.1.

3.2.2 Proof of gauge duality (Theorem 2.1)

We now use the perturbation framework to prove the gauge duality result given by Theorem 2.1. The following auxiliary result ties the feasibility of the gauge pair (G) and (G) to the domain of the value function. The proof of this result, which is largely an application of the calculus of relative interiors, is deferred to Appendix A.

Lemma 3 (Feasibility and domain of the value function)

If the primal (G) is relatively strictly feasible, then . If the dual (G) is relatively strictly feasible, then . The analogous implications, where the operator is replaced by the operator, hold under strict feasibility (not relative).

As in the hypotheses of Theorem 2.1, in this subsection we denote the optimal primal and dual values by and (without arguments), i.e., and . Similarly, we let .

Proof (Proof of Theorem 2.1)

Part (a): We proceed by proving that the two inequalities (i) and (ii) hold always. This in particular will imply that the assumptions of part (a) guarantee and are nonzero and finite. Hence the conlusion of (a) follows. We begin with (i). Theorem 3.1 guarantees the inequality

(3.3)

By Lemma 2, whenever is nonzero and finite, equality holds, which together with (3.3) yields (i). If, on the other hand, , then (i) is trivial. Finally, if , Lemma 2 yields , and hence (3.3) implies , and (i) again holds. Thus, (i) holds always. To establish (ii), it suffices to consider the case . From (3.3) we conclude , that is either or . By Lemma 2, the first case implies and therefore (ii) holds. The second case implies that the primal problem is infeasible, that is , and again (ii) holds. Thus (ii) holds always, as required.

Part (b): Suppose the dual is feasible and the primal is relatively strictly feasible. Part (a) implies that both and are nonzero and finite and hence . On the other hand, by Lemma 3 the assumption that the primal is relatively strictly feasible implies . This last inequality thus implies is finite, and hence is proper. Hence by Theorem 3.1, equality holds and the infimum in the dual is attained. Thus we deduce , as claimed.

Conversely, suppose that the primal is feasible and the dual is relatively strictly feasible. The first assumption implies by Lemma 3. This in turn implies and that the infimum in is attained. Since the primal is feasible, by Lemma 2, is nonzero, and hence and the infimum in the primal is attained.

3.3 Optimality conditions

The perturbation framework can be harnessed to develop optimality conditions for the gauge pair that relate the primal-dual solutions to subgradients of the corresponding value function. This yields a version of parts (c) and (d) in Theorem 3.1 specialized to gauge duality.

Theorem 3.2 (Gauge multipliers and sensitivity)

The following relationships hold for the gauge primal-dual pair (G) and (G).

  1. If the primal is strictly feasible and the dual is feasible, then the set of optimal solutions for the dual is nonempty and bounded, and coincides with

  2. If the dual is strictly feasible and the primal is feasible, then the set of optimal solutions for the primal is nonempty and bounded with solutions given by , where

Proof

Part (a). Because the primal problem is strictly feasible, it follows from Lemma 3 that , and because the dual is feasible, is finite. Theorem 3.1 and Lemma 2 then imply the conclusion of Part (a).

Part (b). Because the dual problem is strictly feasible, it follows from Lemma 3 that , and because the primal is feasible, is finite. Theorem 3.1 then implies that the optimal primal set is nonempty and bounded, and . Because the primal problem is feasible, any pair must satisfy by Lemma 2. Thus, this inclusion is equivalent to being optimal for the primal problem, with optimal value . This proves Part (b).

We next use the sensitivity interpretation given by Theorem 3.2 to develop a set of explicit necessary and sufficient optimality conditions that mirror the more familiar KKT conditions from Lagrange duality.

Theorem 3.3 (Optimality conditions)

Suppose both the gauge primal and gauge dual problems are strictly feasible. Then the pair is primal-dual optimal if and only if it satisfies the conditions \cref@addtoresetequationparentequation

(primal activity) (3.4a)
(dual activity) (3.4b)
(objective alignment) (3.4c)
(constraint alignment) (3.4d)

Proof

Suppose is primal-dual optimal. Then by strong duality, and are both nonzero and finite, and Lemma 1 tells us that the constraints and are active. Hence, (3.4a)-(3.4b) hold.

Now define and Note that . By Theorem 3.1(e) and Theorem 3.2(b), is primal-dual optimal if and only if . By [20, Theorem 6.14] and [18, Theorem 23], whenever and we have

In particular, this subdifferential formula holds for . We deduce existence of and such that the following hold: \cref@addtoresetequationparentequation

(3.5a)
(3.5b)
(3.5c)

Notice that cannot satisfy (3.5a), so (3.5c) together with the polar-gauge inequality implies

Therefore equality holds thoughout, and dividing through by we see that (3.4c) is satisfied. Finally, recall that from the characterization (2.2) of the polar,

(3.6)

which implies

(3.7)

In particular, if then If , then (3.7) implies that must satisfy

which gives condition (3.4d) after dividing through by . On the other hand, if then the set (3.7) is given by . Thus we again have , and dividing through by gives (3.4d). This finishes one direction of the proof.

For the reverse implication, suppose that satisfies (3.4a)-(3.4d). Then clearly satisfies the primal constraint, and satisfies the dual constraint. By weak duality, to show that is primal-dual optimal it is sufficient to show that . Adding (3.4c) and (3.4d), we obtain

Plug in and then use (3.4b) to get , as desired.

The following corollary describes a variation of the optimality conditions outlined by Theorem 3.3. These conditions assume that a solution of the dual problem is available, and gives conditions that can be used to determine a corresponding solution of the primal problem. A application of the following result appears in LABEL:sect:recovery_ex.

Corollary 1 (Gauge primal-dual recovery)

Suppose that the primal (G) and dual (G) are strictly feasible. If is optimal for (G), then for any the following conditions are equivalent:

  1. is optimal for (G);

  2. and ;

  3. and .

Proof

We use the optimality conditions given in Theorem 3.3. Note that by Lemma 1 we have equality (3.4b) in the dual constraint.

We first show that (b) implies (a). Suppose (b) holds. Then (3.4c) holds automatically. From the characterization (2.2) of the polar, we have

(3.8)

and thus is the set of maximizing elements in this supremum. Because , it therefore holds that . If we additionally use the polar-gauge inequality, we deduce that

and therefore the above inequalities are all tight. Thus conditions (3.4a) and (3.4d) hold, and by Theorem 3.3, is a primal-dual optimal pair.

We next show that (a) implies (b). Suppose that is optimal for (G). Then the first condition of (b) holds by (3.4c), and (3.4a) and (3.4d) combine to give us

This implies that is a maximizing element of the supremum in (3.8), and thus

Finally, to show the equivalence of (b) and (c), note that by the polar-gauge inequality, if and only if minimizes the convex function This, in turn, is true if and only if , or equivalently .

3.4 The relationship between Lagrange and gauge multipliers

We now use the perturbation framework for duality to establish a relationship between gauge dual and Lagrange dual variables. We begin with an auxilliary result that characterizes the subdifferential of the perspective function.

Lemma 4 (Subdifferential of perspective function)

Let be a closed proper convex function. Then for , equality holds:

Proof

Recall that for any closed proper convex function , we have

(3.9)

and in particular if is a nonempty closed convex set, then [18, Theorem 23.5 and Corollary 23.5.3]. By [18, Corollary 13.5.1], we have where is a closed convex set. If , then is nonempty and

(3.10)

Suppose now that . Then

Hence, again by (3.9), if and only if and . On the other hand, if then

Hence, again by (3.9), if and only if and .

We now state the main result relating the optimal solutions of (G) to the optimal solutions of the Lagrange dual of (G).

Theorem 3.4

Suppose that the gauge dual (G) is strictly feasible and the primal (G) is feasible. Let denote the Lagrange dual of (G), and let denote its optimal value. Then

Proof

We first note that can be derived via the framework of Theorem 3.1 through the Lagrangian value function

Here plays the role of in Theorem 3.1; cf. [20, Example 11.41]. The feasibility of (G) guarantees that is finite, and by Lemma 3 we have

Thus by Theorem 3.1 the optimal points for are characterized by . Note also

On the other hand, by Theorem 3.2(b) the solutions to (G