Computation of the Epsilon-Subdifferential of Convex Piecewise-Defined Functions in Optimal Worst-Case Time††thanks: This is a pre-print of an article published in Set-Valued and Variational Analysis (SVAA). The final authenticated version is available online at: http://dx.doi.org/10.1007/s11228-018-0476-5
The -subdifferential of convex univariate piecewise linear-quadratic functions can be computed in linear worst-case time complexity as the level-set of a convex function. Using dichotomic search, we show how the computation can be performed in logarithmic worst-case time. Furthermore, a new algorithm to compute the entire graph of the -subdifferential in linear time is presented. Both algorithms are not limited to convex PLQ functions but are also applicable to any convex piecewise-defined function with little restrictions.
Keywords. Subdifferential; -Subdifferentials; Piecewise linear-quadratic functions; Convex Function; Computational Convex Analysis (CCA); Computer-Aided Convex Analysis; Visualization.
The -subdifferential quantifies the approximation error intrinsic to numerical computation when one is interested in the solution of a convex nonsmooth optimization problem. It extends Fermat’s rule and approximates the convex subdifferential through the Brøndsted-Rockafellar Theorem [BR65]. It plays a critical role in the convergence of Bundle methods [HUL13, HUL93].
While the numerical evaluation of an -subgradient occurs in several numerical optimization algorithms, the full computation of the graph of the subdifferential is too time consuming except for specific classes of functions. Such endeavors are the goal of computational convex analysis, a field that started with the numerical computation of the Legendre-Fenchel transform [GL11, Luc96, Luc97, Luc06, HUL07, LBT09, Luc13] and has since tackled the computation of the main transforms encountered in convex analysis, e.g. the Moreau envelope, the Lasry-Lions double envelope, the proximal average [Tri07, BGLW08, Har09, Goe10, BMW11, JKL11, GHW12], etc. See [Luc10] for historical notes and a survey of numerous applications. While pure symbolic computation was considered [BM06, BH06, BH08], most work in computational convex analysis use a hybrid symbolic-numerical framework that considers a specific class of functions e.g. piecewise linear or piecewise linear-quadratic functions [RW09]. Most algorithms have been made publicly available in the CCA open source numerical library [Luc16].
More recent work has considered computing operators i.e. multifunctions such as the -subdifferential. Bajaj et al. [BHL16] proposed an algorithm to compute the -subdifferential of a univariate convex PLQ function as the level set of its conjugate minus an affine function, and used the CCA numerical library to carry out that computation in linear time.
We propose a new algorithm to perform the same computation in logarithmic time by using binary and dichotomic search to avoid computing the entire graph of the conjugate. The approach relies on Goebel’s graph-matrix calculus as made explicit in [GL11] especially the parametrization of the graph of the conjugate, which has been previously exploited in computational convex analysis in [HUL07].
We then propose another new algorithm to compute the entire graph of the -subdifferential in linear time. We use symmetry to simplify the implementation and introduce a new data structure to store , which is a piecewise non-PLQ function for which we give explicit formulas.
Both algorithms are implemented in Scilab within the CCA numerical library [Luc16]. Finally, we point out that both algorithms can be easily adapted to convex piecewise univariate functions that are not necessarily PLQ.
The remaining of the paper is organized as follow. Section 2 recalls needed facts and sets the notations, Section 3 details the pointwise computation of the -subdifferential while Section 4 explains how to compute its entire graph. We briefly explain in Section 5 how to extend the algorithms to non-PLQ functions. Finally, Section 6 concludes the paper and proposes future research directions.
2 Preliminaries and Notations
Throughout this paper, we restrict ourselves to univariate functions and adapt more general definitions and results to that context. Unless otherwise stated, functions are lower semi-continuous (lsc).
The -subdifferential of a function is defined for and as
and is defined as empty when . Any element is called an -subgradient. The function is said to be proper when it has nonempty domain.
To make the distinction with the approximate subdifferential introduced in [Iof84], following [BHL16] we use the accepted terminology -subdifferential, although historically [BR65] used the term “approximate subgradients”.
When , reduces to the convex subdifferential that we will denote . In our computation, we will use the fact that for any .
The conjugate of
will play a critical role in the computation due to the following fact.
Fact 2.1 ([Bhl16, Proposition 3.1]).
For any function ,
For a lsc convex function , the following are equivalent
A function is piecewise linear-quadratic (PLQ) if can be represented as the union of finitely many closed intervals on each of which is linear or quadratic. Note that a PLQ function is continuous on its domain and can be represented in the form
where for , , and . (The above formula represents more general functions than PLQ functions since some coefficient choices may result in a function that is discontinuous on its domain.)
Any PLQ function will be stored as a matrix [LBT09]
with the convention that if or , then the structure demands that or respectively. A quadratic function on is stored as and while an indicator function of a single point
where , is stored as a single row vector . We will call the later function, a needle function.
In this section, we present Algorithm 1 to compute the -subdifferential of an univariate proper convex PLQ function at a particular . The key idea is to use dichotomic search to reduce the time complexity.
For a PLQ function defined as (1) we note and where and . We also note i.e. .
Since is proper convex PLQ the equation has at most two solutions. We focus on the computation of $̱s$ (the computation of is similar).
If , for all . Hence, .
So let us assume there is a single solution to on the interval . The first step of Algorithm 1 is to locate the interval containing $̱s$. When , we can compute $̱s$ directly so we now assume . Using binary search we locate the index such that . We know that since i.e. we have , see Figure 1.
We then perform a dichotomic search between and by computing the middle index and updating or depending on whether . Using Fact 2.2, we have so i.e. we can perform the dichotomic search without computing explicitly, see Algorithm 1.
Invoking the function Intersection(l,u) computes by considering several cases. The dichotomic search ensures that , and that the equation has at most a solution in that interval. If has a solution in , we compute explicitly on that interval by interpolation ( is at most quadratic and goes through with derivative for ) and solve the resulting linear or quadratic equation. Otherwise we have , and in that case, .
The algorithm computes similarly.
The algorithm performs the same computation as [BHL16, Algorithm 2], which proves its correctness. Its complexity is logarithmic since it performs constant time operations (solving a quadratic equation), a binary search, and up to 2 dichotomic searches in sequential order. ∎
To validate the complexity numerically, we build a convex plq function with a large number of pieces by sampling the function , then building a (zeroth-order) piecewise linear approximation and finally building its Moreau envelope. The resulting function is convex PLQ and is alternating between being linear and quadratic. We then time the original algorithm and our new algorithm. When the function contains around pieces, the original linear-time algorithm takes seconds while our algorithm takes seconds on a desktop quad-core hyperthreading Intel Xeon with 64GB of memory running Windows 7 64 bits and Scilab 5.5.2 (64 bits). Table 1 shows the resulting timings.
The graph of is defined as
Our objective in this section is to fully describe for bivariate convex PLQ functions.
First, we note that we only need to describe the lower and upper bound of . Define by , and . Then .
Next, only $̱g$ needs to be computed as the following lemma, which follows directly from the definition of , indicates.
Assume , is a proper function, and denote . Then
and in particular
So we only need to compute $̱g$; then we apply the same algorithm to to deduce the upper bound.
Before explaining the algorithm, we define our notations. Given and a convex lsc PLQ function , we note the point at which we wish to evaluate . We name the line going through with and tangent to the graph of at a point with and . The line has equation and is illustrated on Figure 2 when is smooth at and on Figure 3 when is nonsmooth at .
The function $̱g$ is a piecewise function but not a PLQ function. In order to make our algorithm more generic, we adopt a data structure that only stores enough information to evaluate $̱g$ with the knowledge of the PLQ matrix of . We will store the output using an matrix where each row stores a piece of $̱g$ in the format . Similarly to a PLQ matrix, the first column stores a sorted array of points. The second column stores a type index where if the function is smooth at the point associated with (see Figure 2) while if is nonsmooth at (see Figure 3). The value indicates that is constant on that interval with value (see Figure 4). The index (resp. ) refers to the index in the PLQ matrix of the piece storing (resp. ).
Once computed, the matrix allows us to obtain as follow. Giving , we find the row in such that . We then obtain the type . If , we immediately return . If , we return , the slope of the line going through and where and . Otherwise, and we return the slope of the line going through and tangent to the quadratic function , which corresponds to the piece of . The later requires solving a quadratic equation. Our argument proved the following result.
Given a PLQ function with pieces stored as PLQ matrix , , the matrix storing the function , and a point ; the value can be computed in logarithmic time.
In addition, given a PLQ matrix corresponding to , and a matrix storing ; the value can be computed in logarithmic time. Consequently, the set can be computed in logarithmic time.
Given a sorted set of values for and all of the above input, computing for can be performed in .
First, given a single value , we search for the row in such that using binary search. Then a constant time calculation gives (obvious for and ; by Lemma 4.3 below for ). We perform the same computation for and to obtain in logarithmic time.
Then given a sorted set of values, we can repeat the same computation in worst-case time. Alternatively, when is large, we can do a linear search of and consider each point in or only once, resulting in a evaluation algorithm. ∎
The previous result relies on solving the following equations in constant time. The algorithm performs that computation in the subroutine esub_compute_xb whose code is not included since it only includes special cases checks and solving a quadratic (the full algorithm is available for download in the CCA toolbox or by contacting the corresponding author).
Defined as the point on the graph of at which the line going through is tangent to with . Assume is differentiable at . Note (resp. ) the row index in the PLQ matrix of for the piece containing (resp. ), i.e. (resp. ). Given , , and , the point is a solution of
where is the quadratic function corresponding to the th piece of , and is the derivative of . Conversely, given , the point can be computed as the solution to the same equations.
The equations translate the facts that is on the line going through with slope with the derivative of at , and (resp. ) the image of (resp. ) by .
Substituting the variables and the coefficient of the quadratic functions , we obtain the quadratic equation
which can be solved explicitly in constant time. While that quadratic equation may have 0, 1, or 2 solutions, the assumptions always ensure there is at least one solution, and the geometric positions of with respect to always allow us to pick the correct root. ∎
It is always possible to adopt an explicit data structure similar to the PLQ matrix by expliciting the formula for as a function of . The resulting data structure only works with PLQ functions while the one suggested will be extended beyond PLQ functions in the following section.
More precisely, $̱g$ is a piecewise function. Around a point , the type indicates the explicit formula to use. If , , i.e. $̱g$ is constant around . If , or more explicitly,
where is constant on that interval. Finally, if , is the solution of (3) where is now the moving variable, i.e. on that interval
In conclusion, the function $̱g$ is piecewise and on each piece it is either a constant, a rational function, or a linear plus the square root of a quadratic function.
For example, the absolute value function has PLQ matrix
and its associated function is stored as
The value NaN refers to the standard IEEE 754 Not a Number value. In our context, it indicates that the value should not be used i.e. it is irrelevant. So the first row of indicates that for any value (column 2 value of means it is type i.e. a constant value equal to ). The second row of means that for any the value can be computed as the slope of the line going through and with (resp. ) belonging to the piece of with index (resp (). In this case, the line goes through and so . In the general case, using the information in with the matrix , we can compute .
Note that since , is symmetric with respect to as predicted by Lemma 4.1.
The algorithm outline is as follow. The first step of the algorithm is to handle special cases: indicator functions of a point, linear functions, and quadratic functions, i.e. when the function is not piecewise defined. Next, the algorithm initializes the output data structure with the leftmost piece of the PLQ function. Then the main loop sweeps through tangent points associated with a given point at which we compute . Finally, the rightmost piece is handled by considering all 3 possibilities: the function equals , the function is linear, or the function is quadratic.
Algorithm 2 shows the first part of our algorithm that focuses on the initialization. The cases of a needle function (indicator function of a single point), a linear function, and a quadratic function are handled directly. Then the first piece of is considered; it is either (domain is left-bounded), linear, or quadratic. For each case, the first row of the lb matrix is computed (our code use the variable lb to store the matrix ). In addition, when the first piece is linear or quadratic, the line may be above several pieces of the function . So we need to store all the resulting rows in the lb matrix. This is achieved by looping on the index ib (variable name corresponding to ) till the point is no longer below the line , where has slope . Similarly, we need to store the same information in the matrix lb for the line with slope , which is performed in the function _update_nonsmooth_xt displayed in Algorithm 3.
The main loop of the algorithm relies on the fact that breakpoints can only appear from the original breakpoints in the PLQ matrix or from the intersection of the line with the graph of ; see Figure 7. We also need to carefully distinguish between linear and quadratic pieces to record the correct information. Algorithm 4 describes the main loop.
Finally, the contribution from the last piece of needs to be accounted for, which is the purpose of Algorithm 5. Similar to the first piece, we consider the 3 possible cases: a right-bounded domain, a linear piece, or a quadratic last piece. We then update the matrix lb accordingly.
All the loops in the algorithm increment either (variable it) or (variable ib), hence the algorithm runs in linear time. The correctness of the algorithm follows from the definition of and the definitions of and . ∎
5 Extension to non-PLQ functions
All the computation in Section 3 and 4 apply to convex piecewise lsc PLQ functions. However, the algorithms are easily extended to convex piecewise lsc functions for which the intersection of a line with a given piece can be computed in constant time.
More precisely, the algorithm only requires detecting whether the piece of a convex piecewise lsc function is linear, computing at any point, and running the subroutine esub_compute_xb. The later can be performed explicitly for convex piecewise cubic or quartic polynomials. Other functions require using a numerical method like Newton’s method since we guarantee that the intersection exists on a given interval (note that there may be 2 intersection points so a little care must be exercised to select the right point).
6 Conclusion and future work
We proposed two algorithms. The first one is an improvement from the linear-time algorithm proposed in [BHL16]. It is a dual algorithm that relies on the conjugate, the fact that a points on the graph of are in one-to-one correspondence with points on the graph of , and a dichotomic search. The result is a logarithmic time algorithm to evaluate the -subdifferential at a given point.
The second algorithm computes the full graph of . It is a line sweep algorithm that moves a point on the graph of while computing the associated point on the graph of . The data structure we adopt allows the evaluation of at a point in logarithmic time for a given point, or in linear-time for a given grid of points. Hence, after a linear pre-processing time, the algorithm is as efficient as the previous one for a small number of points, and more efficient (linear time vs. log-linear time) for a grid of points.
Finally, we indicated how the algorithms readily extend to convex lsc non-PLQ functions. Our data structure is particularly efficient in that regard since the algorithms are exactly the same; only the subroutine to compute the intersection of a line with a piece of has to be changed.
The algorithms for convex PLQ functions have been implemented in Scilab within the CCA numerical library [Luc16].
Future work involves implementing the algorithms for non-PLQ functions, and considering functions of 2 variables, which involves completely different data structures.
This work was supported in part by Discovery Grants #298145-2013 (Lucet) from NSERC, and The University of British Columbia, Okanagan campus. Part of the research was performed in the Computer-Aided Convex Analysis (CA2) laboratory funded by a Leaders Opportunity Fund (LOF, John R. Evans Leaders Fund – Funding for research infrastructure) from the Canada Foundation for Innovation (CFI) and by a British Columbia Knowledge Development Fund (BCKDF).
This work was started at the end of Anuj Bajaj MSc research under the guidance of Dr. Warren Hare. Their preliminary efforts, ideas, and interest motivated the authors to pursue more efficient algorithms. The authors thank them for their initial contribution without which this work would not have been possible.
- [BGLW08] Heinz H. Bauschke, Rafal Goebel, Yves Lucet, and X. Wang. The proximal average: Basic theory. SIAM J. Optim., 19(2):768–785, 2008.
- [BH06] Jonathan M. Borwein and Chris H. Hamilton. Symbolic computation of multidimensional Fenchel conjugates. In ISSAC 2006, pages 23–30. ACM, New York, 2006.
- [BH08] Jonathan M. Borwein and Chris H. Hamilton. Symbolic Fenchel conjugation. Math. Program., 116(1):17–35, 2008.
- [BHL16] Anuj Bajaj, Warren Hare, and Yves Lucet. Visualization of the -subdifferential of piecewise linear-quadratic functions. Comput. Optim. Appl., 2016. Accepted for publication.
- [BM06] Heinz H. Bauschke and Martin v. Mohrenschildt. Symbolic computation of Fenchel conjugates. ACM Commun. Comput. Algebra, 40(1):18–28, 2006.
- [BMW11] Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang. Self-dual smooth approximations of convex functions via the proximal average. In Heinz H. Bauschke, Regina S. Burachik, Patrick L. Combettes, Veit Elser, D. Russell Luke, and Henry Wolkowicz, editors, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, volume 49 of Springer Optimization and Its Applications, pages 23–32. Springer New York, 2011.
- [BR65] A. Brøndsted and R. T. Rockafellar. On the subdifferentiability of convex functions. Proc. Amer. Math. Soc., 16:605–611, 1965.
- [GHW12] Rafal Goebel, Warren Hare, and Xianfu Wang. The optimal value and optimal solutions of the proximal average of convex functions. Nonlinear Analysis: Theory, Methods & Applications, 75(3):1290 – 1304, 2012.
- [GL11] Bryan Gardiner and Yves Lucet. Graph-matrix calculus for computational convex analysis. In Heinz H. Bauschke, Regina S. Burachik, Patrick L. Combettes, Veit Elser, D. Russell Luke, and Henry Wolkowicz, editors, Fixed-Point Algorithms for Inverse Problems in Science and Engineering, volume 49 of Springer Optimization and Its Applications, pages 243–259. Springer New York, 2011.
- [Goe10] Rafal Goebel. The proximal average for saddle functions and its symmetry properties with respect to partial and saddle conjugacy. J. Nonlinear Convex Anal., 11(1):1–11, 2010.
- [Har09] Warren Hare. A proximal average for nonconvex functions: A proximal stability perspective. SIAM J. Optim., 20(2):650–666, 2009.
- [HUL93] J.B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods, vol. 306 of Grundlehren der mathematischen Wissenschaften. Springer-Verlag, New York, 1993.
- [HUL07] Jean-Baptiste Hiriart-Urruty and Yves Lucet. Parametric computation of the Legendre–Fenchel conjugate. J. Convex Anal., 14(3):657–666, August 2007.
- [HUL13] J.B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms I: Fundamentals, volume 305. Springer Science & Business Media, 2013.
- [Iof84] A. D. Ioffe. Approximate subdifferentials and applications. I. The finite-dimensional theory. Trans. Amer. Math. Soc., 281(1):389–416, 1984.
- [JKL11] Jennifer Johnstone, Valentin Koch, and Yves Lucet. Convexity of the proximal average. Journal of Optimization Theory and Applications, 148(1):107–124, 2011.
- [LBT09] Yves Lucet, Heinz H. Bauschke, and Michael Trienis. The piecewise linear-quadratic model for computational convex analysis. Comput. Optim. Appl., 43:95–118, May 2009.
- [Luc96] Yves Lucet. A fast computational algorithm for the Legendre–Fenchel transform. Comput. Optim. Appl., 6(1):27–57, July 1996.
- [Luc97] Yves Lucet. Faster than the Fast Legendre Transform, the Linear-time Legendre Transform. Numer. Algorithms, 16(2):171–185, January 1997.
- [Luc06] Yves Lucet. Fast Moreau envelope computation I: Numerical algorithms. Numer. Algorithms, 43(3):235–249, November 2006.
- [Luc10] Yves Lucet. What shape is your conjugate? A survey of computational convex analysis and its applications. SIAM Rev., 52(3):505–542, 2010.
- [Luc13] Yves Lucet. Techniques and open questions in computational convex analysis. In Computational and analytical mathematics, volume 50 of Springer Proc. Math. Stat., pages 485–500. Springer, New York, 2013.
- [Luc16] Y. Lucet. Computational convex analysis library, 1996–2016. http://atoms.scilab.org/toolboxes/CCA/, 1996-2016.
- [RW09] R.T. Rockafellar and R.J.B. Wets. Variational Analysis, volume 317. Springer Science & Business Media, 2009.
- [Tri07] M.J. Trienis. Computational convex analysis: From continuous deformation to finite convex integration. Master thesis, 2007.