Foundations of gauge and perspective duality^{1}^{1}1July 5, 2019
Abstract
Common numerical methods for constrained convex optimization are predicated on efficiently computing nearest points to the feasible region. The presence of a design matrix in the constraints yields feasible regions with more complex geometries. When the functional components are gauges, there is an equivalent optimization problem—the gauge dual—where the matrix appears only in the objective function and the corresponding feasible region is easy to project onto. We revisit the foundations of gauge duality and show that the paradigm arises from an elementary perturbation perspective. We therefore put gauge duality and Fenchel duality on an equal footing, explain gauge dual variables as sensitivity measures, and show how to recover primal solutions from those of the gauge dual. In particular, we prove that optimal solutions of the Fenchel dual of the gauge dual are precisely the primal solutions rescaled by the optimal value. The gauge duality framework is extended beyond gauges to the setting when the functional components are general nonnegative convex functions, including problems with piecewise linear quadratic functions and constraints that arise from generalized linear models used in regression.
remarkRemark
onvex optimization, gauge duality, nonsmooth optimization
90C15, 90C25
1 Introduction
This work revolves around optimization problems of the form
(G) 
where is a linear map, is an vector, and and are closed gauges – nonnegative, sublinear functions that vanish at the origin. In statistical and machine learning applications, is often a structureinducing regularizer, such as the elastic net for group detection [23]. The function may be interpreted as a penalty that measures the misfit between measurements and the prediction . For example, can be the norm or the Huber [15] function in the case of regression, or the logistic loss, used for classification problems [16, 1]. In highdimensional applications, the number of measurements is often much smaller than the dimension of the predictor , and the matrix is only available through matrixvector products and .
The formulation Eq. G gives rise to two different “dual” problems:
(L)  
(G) 
Here and are the polar gauges; see LABEL:sect:prelim for a precise definition. The first formulation Eq. L is the classical Lagrangian (or Fenchel) dual, routinely used in the design and analysis of algorithms. Under mild interiority conditions, equality
holds and the optimal value of Eq. L is attained. The second formulation Eq. G is called the gauge dual and is less wellknown. Gauge duality was introduced by Freund [11] for minimizing nonnegative sublinear functions over convex sets, and subsequently examined by Friedlander, Macêdo, and Pong [13]. Under standard interiority conditions, equality
holds and the optimal value of Eq. G is attained.
The gauge dual Eq. G can be preferable for computation to the the primal Eq. G and the Lagrangian dual Eq. L. Indeed, numerous convex optimization algorithms rely on being able to project onto the feasible region easily. The appearance of the matrix in the constraints of both Eq. G and Eq. L precludes such methods from being directly applicable. In contrast, the design matrix appears in the gauge dual Eq. G only in the objective. Moreover, typical applications occur in the regime . For example, is often logarithmic in [7, 6, 21, 10]. Since the decision variables of Eq. G lie in the small dimensional space , projections onto the feasible region can be computed efficiently, for example by interiorpoint methods. Friedlander and Macêdo [12] use gauge duality to derive an effective algorithm for an important class of spectral optimization problems that arise in signalrecovery applications, including phase recovery and blind deconvolution.
1.1 A roadmap
Broadly speaking, our goals are twofold. First, we revisit the foundations of gauge duality in Section 3, reformulating these as in the modern approach to duality through a “perturbation framework”. That is, following Rockafellar and Wets [20, 11.H], consider an arbitrary convex function on and define the value functions
(1.1) 
This setup immediately yields the primaldual pair
(1.2) 
Fenchel duality is a standard example that follows from an appropriate choice of . We show that gauge duality fits equally well into this framework under a judicious choice of the perturbation function , thereby putting Fenchel and gauge dualities on an equal footing. Strong duality, primaldual optimality conditions, and an interpretation of the gauge dual solutions as sensitivity measures—i.e., subgradients of the value function — quickly follow (Section 3.2). These results, in particular, answer the main open question posed by Freund in his original work [11] on an interpretation of gauge dual variables as sensitivity measures.
We also prove a striking relationship between optimal solutions of the Lagrangian dual of the gauge dual and the primal problem: the two coincide up to scaling by the optimal value function (Section 3.4). Consequently Lagrangian primaldual methods applied to the gauge dual can always be trivially translated to methods on the original primal problem. We explore this viewpoint in Section 3.4 and illustrate its application to Chambolle and Pock’s primaldual algorithm [8] in Section 7.
Our second aim is to extend the applicability of the gauge duality paradigm beyond gauges to capture more general convex problems. Section 4 extends gauge duality to problems that involve convex functions that are merely nonnegative. The approach is based on using the perspective transform
of a convex function to reduce to the gauge setting. We call the resulting dual problem the perspective dual. The perspectivepolar transformation, needed to derive the perspective dual problem, is developed in Section 4. We provide concrete illustrations of perspective duality for the logistic loss and the family of piecewise linearquadratic functions in Section 5, which are used often in datafitting applications. Numerical illustrations for several casestudies of perspective duals appear in Section 7.
1.2 Notation
The derivation of our results relies mainly on standard notions from convex analysis [18]. We define these briefly below.
Throughout the paper, denotes the extendedrealline, while and denote general closed convex functions. We routinely use the symbols and for an vector and a matrix, respectively. The domain and epigraph of are the sets
A function is proper if it has nonempty domain and is never , and it is called closed if its epigraph is closed, which corresponds to lower semicontinuity [18, Theorem 7.1]. The closure of , denoted , is the function whose epigraph is the closure of . The indicator function of a set is denoted by
We use the symbol to denote the interior of relative to its affine span. It is a standard fact that if a convex function is finite at a point , then is proper. We will use this observation implicitly in what follows.
The conjugate of a proper convex function is , which is a proper closed convex function [18, Theorem 12.2]. In particular, for any convex set , the conjugate is called the support function of . For any , the subdifferential of at is the set For any convex cone , the polar cone is the set
Observe the equality for any convex cone .
For any convex function , its perspective is the function whose epigraph is the cone generated by the set . Equivalently, we may write
Though may not be closed, the closure of admits the convenient description
where is the recession function of [18, Theorem 8.5]. Importantly, when is a proper convex function, is positively homogeneous. A calculus for the perspective transform is described by Aravkin, Burke, and Friedlander [2, Section 3.3]. We often apply more than one transformation to a function, and in those cases, the multiple transformations are applied in the order that they appear; e.g.,
2 Gauge optimization and duality
In this section, we review the main elements of gauge duality. The original description is due to Freund [11], but here we summarize the more recent treatment given by Friedlander, Macêdo, and Pong [13].
A convex function is called a gauge if it is nonnegative, positively homogeneous, and vanishes at the origin. The symbols and will always denote closed gauges. The polar of a gauge is the function defined by
(2.1) 
which is also a gauge. For example, if is a norm than is the corresponding dual norm. Note the equality
It follows directly from the definition and positive homogeneity of that the polar can be characterized as the support function to the unit level set:
(2.2) 
Moreover, and its polar satisfy a Hölderlike inequality
(2.3) 
which we refer to as the polargauge inequality.
Define the following primal and dual feasible sets:
(2.4) 
The gauge primal (G) and dual (G) problems are said to be feasible, respectively, if the following intersections are nonempty:
The primal and dual problems are relatively strictly feasible, respectively, if the following intersections are nonempty:
If the intersections above are nonempty, with interior replacing relative interior, then we say that the problems are strictly feasible, respectively.
Assume throughout that . Otherwise, contains the origin, which is a trivial solution of (G). We generally assume that is positive, though in certain cases, it is useful to allow and then assume that ; this allows us to extend many of the following results to problems where the feasible set is affine.
Lemma 1 (Primaldual constraint activity)
If the primal optimal value is attained at with , then . Similarly, if the dual optimal value is attained at with , then .
Proof
We prove the contrapositive. If , then by lowersemicontinuity of , we have for all close to . Consequently, we deduce , with strict inequality unless . Since is optimal, we conclude . The proof of the dual statement is similar.
The duality relations in the gauge framework follow analogous principles to Lagrange duality, except that instead of an additive relationship between the primal and dual optimal values and , the relationship is multiplicative. The next result summarizes weak and strong duality for gauge optimization.
Theorem 2.1 (Gauge duality [13])
The following relationships hold for the gauge primaldual pair (G) and (G).

(Weak duality) If and are primal and dual feasible, then

(Strong duality) If the dual (resp. primal) is feasible and the primal (resp. dual) is relatively strictly feasible, then and the gauge dual (resp. primal) attains its optimal value.
3 Perturbation analysis for gauge duality
Modern treatment of duality in convex optimization is based on an interpretation of multipliers as giving sensitivity information relative to perturbations in the problem data. No such analysis, however, has existed for gauge duality. In this section we show that for a particular kind of perturbation, the gauge dual (G) can in fact be derived via such an approach. This resolves a question posed by Friedlander, Macêdo, and Pong [13].
3.1 The perturbation framework
In this section we review the perturbation argument for deriving duality. Our summary follows the discussion in Rockafellar and Wets [20, 11.H]. Fix an arbitrary convex function and consider the value functions defined by (1.1)–(1.2). Observe the equality . RockafellarFenchel duality for the problem
where and are closed and convex, is obtained by setting . In that case, the primaldual pair takes the familiar form
Under certain conditions, described in the following theorem, strong duality holds, i.e., , and the optimal value is attained.
Theorem 3.1 (Multipliers and sensitivity [20, Theorem 11.39])
Consider the primaldual pair (1.2), where is proper, closed, and convex.

The inequality always holds.

If , then equality holds and the infimum is attained, if finite. Similarly, if , then equality holds and the infimum is attained, if finite.

The set is nonempty and bounded if and only if and is finite, in which case .

The set is nonempty and bounded if and only if and is finite, in which case .

Optimal solutions are characterized jointly through the conditions
Remark 1
Part (b) of Theorem 3.1 is stated in [20, Theorem 11.39] with interior in place of relative interior. We give a quick argument here for the claimed result with relative interiors. Suppose . If , then equality follows by (a). Hence we can suppose that is finite, and therefore is proper. Thus there exists a subgradient [18, Theorem 23.4]. By the subgradient inequality, the following holds for any :
Taking the infimum over , we recognize the righthand side as . We deduce . Combining this with Part (a) of Theorem 3.1 yields and we see that attains The symmetric argument for the case is analogous.
3.2 Derivation of gauge duality as a perturbation
We now show that the problems Eq. G and Eq. G constitute a primaldual pair under the framework set out by Theorem 3.1. The key is to postulate the correct pairing function .
3.2.1 The perturbation function
Our starting point is the primal perturbation scheme:
Note that is equal to the optimal value of the primal Eq. G. Since and multiply each other in the description above, it is convenient to reparametrize the problem by setting and . By positive homogeneity of and , this yields the equivalent description
where is the unit level set for . In particular, this reparameterization shows that is convex because it is the infimal projection of a convex function; it is proper when the primal Eq. G is feasible. Note that minimizing is equivalent to minimizing . With this in mind, define the convex function :
(3.1) 
The function is proper and closed because , and and are closed. The associated infimal projection
is essentially the negative reciprocal of . We formalize this in the following lemma. We omit the proof since it is immediate.
Lemma 2
Equality holds provided that is nonzero and finite. Moreover, implies , and implies .
We now compute the conjugate of , which is needed to derive the dual value function . By Rockafellar and Wets [20, Theorem 11.23(b)],
where the closure operation is applied to the function on the righthandside with respect to the argument . Because is nonsingular, there is a unique vector that satisfies the constraints in the description of . The closure operation therefore turns out to be superfluous, and we can further simplify the description to
Taking into account the equalities and , this expression transforms to
In particular, we conclude
(3.2) 
Thus the dual problem recovers, up to a sign change, the required gauge dual problem.
3.2.2 Proof of gauge duality (Theorem 2.1)
We now use the perturbation framework to prove the gauge duality result given by Theorem 2.1. The following auxiliary result ties the feasibility of the gauge pair (G) and (G) to the domain of the value function. The proof of this result, which is largely an application of the calculus of relative interiors, is deferred to Appendix A.
Lemma 3 (Feasibility and domain of the value function)
As in the hypotheses of Theorem 2.1, in this subsection we denote the optimal primal and dual values by and (without arguments), i.e., and . Similarly, we let .
Proof (Proof of Theorem 2.1)
Part (a): We proceed by proving that the two inequalities (i) and (ii) hold always. This in particular will imply that the assumptions of part (a) guarantee and are nonzero and finite. Hence the conlusion of (a) follows. We begin with (i). Theorem 3.1 guarantees the inequality
(3.3) 
By Lemma 2, whenever is nonzero and finite, equality holds, which together with (3.3) yields (i). If, on the other hand, , then (i) is trivial. Finally, if , Lemma 2 yields , and hence (3.3) implies , and (i) again holds. Thus, (i) holds always. To establish (ii), it suffices to consider the case . From (3.3) we conclude , that is either or . By Lemma 2, the first case implies and therefore (ii) holds. The second case implies that the primal problem is infeasible, that is , and again (ii) holds. Thus (ii) holds always, as required.
Part (b): Suppose the dual is feasible and the primal is relatively strictly feasible. Part (a) implies that both and are nonzero and finite and hence . On the other hand, by Lemma 3 the assumption that the primal is relatively strictly feasible implies . This last inequality thus implies is finite, and hence is proper. Hence by Theorem 3.1, equality holds and the infimum in the dual is attained. Thus we deduce , as claimed.
3.3 Optimality conditions
The perturbation framework can be harnessed to develop optimality conditions for the gauge pair that relate the primaldual solutions to subgradients of the corresponding value function. This yields a version of parts (c) and (d) in Theorem 3.1 specialized to gauge duality.
Theorem 3.2 (Gauge multipliers and sensitivity)
The following relationships hold for the gauge primaldual pair (G) and (G).

If the primal is strictly feasible and the dual is feasible, then the set of optimal solutions for the dual is nonempty and bounded, and coincides with

If the dual is strictly feasible and the primal is feasible, then the set of optimal solutions for the primal is nonempty and bounded with solutions given by , where
Proof
Part (a). Because the primal problem is strictly feasible, it follows from Lemma 3 that , and because the dual is feasible, is finite. Theorem 3.1 and Lemma 2 then imply the conclusion of Part (a).
Part (b). Because the dual problem is strictly feasible, it follows from Lemma 3 that , and because the primal is feasible, is finite. Theorem 3.1 then implies that the optimal primal set is nonempty and bounded, and . Because the primal problem is feasible, any pair must satisfy by Lemma 2. Thus, this inclusion is equivalent to being optimal for the primal problem, with optimal value . This proves Part (b).
We next use the sensitivity interpretation given by Theorem 3.2 to develop a set of explicit necessary and sufficient optimality conditions that mirror the more familiar KKT conditions from Lagrange duality.
Theorem 3.3 (Optimality conditions)
Suppose both the gauge primal and gauge dual problems are strictly feasible. Then the pair is primaldual optimal if and only if it satisfies the conditions \cref@addtoresetequationparentequation
(primal activity)  (3.4a)  
(dual activity)  (3.4b)  
(objective alignment)  (3.4c)  
(constraint alignment)  (3.4d) 
Proof
Suppose is primaldual optimal. Then by strong duality, and are both nonzero and finite, and Lemma 1 tells us that the constraints and are active. Hence, (3.4a)(3.4b) hold.
Now define and Note that . By Theorem 3.1(e) and Theorem 3.2(b), is primaldual optimal if and only if . By [20, Theorem 6.14] and [18, Theorem 23], whenever and we have
In particular, this subdifferential formula holds for . We deduce existence of and such that the following hold: \cref@addtoresetequationparentequation
(3.5a)  
(3.5b)  
(3.5c) 
Notice that cannot satisfy (3.5a), so (3.5c) together with the polargauge inequality implies
Therefore equality holds thoughout, and dividing through by we see that (3.4c) is satisfied. Finally, recall that from the characterization (2.2) of the polar,
(3.6) 
which implies
(3.7) 
In particular, if then If , then (3.7) implies that must satisfy
which gives condition (3.4d) after dividing through by . On the other hand, if then the set (3.7) is given by . Thus we again have , and dividing through by gives (3.4d). This finishes one direction of the proof.
For the reverse implication, suppose that satisfies (3.4a)(3.4d). Then clearly satisfies the primal constraint, and satisfies the dual constraint. By weak duality, to show that is primaldual optimal it is sufficient to show that . Adding (3.4c) and (3.4d), we obtain
Plug in and then use (3.4b) to get , as desired.
The following corollary describes a variation of the optimality conditions outlined by Theorem 3.3. These conditions assume that a solution of the dual problem is available, and gives conditions that can be used to determine a corresponding solution of the primal problem. A application of the following result appears in LABEL:sect:recovery_ex.
Corollary 1 (Gauge primaldual recovery)
Proof
We use the optimality conditions given in Theorem 3.3. Note that by Lemma 1 we have equality (3.4b) in the dual constraint.
We first show that (b) implies (a). Suppose (b) holds. Then (3.4c) holds automatically. From the characterization (2.2) of the polar, we have
(3.8) 
and thus is the set of maximizing elements in this supremum. Because , it therefore holds that . If we additionally use the polargauge inequality, we deduce that
and therefore the above inequalities are all tight. Thus conditions (3.4a) and (3.4d) hold, and by Theorem 3.3, is a primaldual optimal pair.
We next show that (a) implies (b). Suppose that is optimal for (G). Then the first condition of (b) holds by (3.4c), and (3.4a) and (3.4d) combine to give us
This implies that is a maximizing element of the supremum in (3.8), and thus
Finally, to show the equivalence of (b) and (c), note that by the polargauge inequality, if and only if minimizes the convex function This, in turn, is true if and only if , or equivalently .
3.4 The relationship between Lagrange and gauge multipliers
We now use the perturbation framework for duality to establish a relationship between gauge dual and Lagrange dual variables. We begin with an auxilliary result that characterizes the subdifferential of the perspective function.
Lemma 4 (Subdifferential of perspective function)
Let be a closed proper convex function. Then for , equality holds:
Proof
Recall that for any closed proper convex function , we have
(3.9) 
and in particular if is a nonempty closed convex set, then [18, Theorem 23.5 and Corollary 23.5.3]. By [18, Corollary 13.5.1], we have where is a closed convex set. If , then is nonempty and
(3.10) 
Suppose now that . Then
Hence, again by (3.9), if and only if and . On the other hand, if then
Hence, again by (3.9), if and only if and .
We now state the main result relating the optimal solutions of (G) to the optimal solutions of the Lagrange dual of (G).
Theorem 3.4
Proof
We first note that can be derived via the framework of Theorem 3.1 through the Lagrangian value function
Here plays the role of in Theorem 3.1; cf. [20, Example 11.41]. The feasibility of (G) guarantees that is finite, and by Lemma 3 we have
Thus by Theorem 3.1 the optimal points for are characterized by . Note also
On the other hand, by Theorem 3.2(b) the solutions to (G