Faster Width-dependent Algorithm for Mixed Packing and Covering LPs

# Faster Width-dependent Algorithm for Mixed Packing and Covering LPs

Digvijay Boob
Georgia Tech
Atlanta, GA
digvijaybb40@gatech.edu &Saurabh Sawlani
Georgia Tech
Atlanta, GA
sawlani@gatech.edu &Di Wang
Atlanta, GA
Work done when author was at Georgia Tech.
###### Abstract

In this paper, we give a faster width-dependent algorithm for mixed packing-covering LPs. Mixed packing-covering LPs are fundamental to combinatorial optimization in computer science and operations research. Our algorithm finds a approximate solution in time , where is number of nonzero entries in the constraint matrix, and is the maximum number of nonzeros in any constraint. This run-time is better than Nesterov’s smoothing algorithm which requires where is the dimension of the problem. Our work utilizes the framework of area convexity introduced in [Sherman-FOCS’17] to obtain the best dependence on while breaking the infamous barrier to eliminate the factor of . The current best width-independent algorithm for this problem runs in time [Young-arXiv-14] and hence has worse running time dependence on . Many real life instances of the mixed packing-covering problems exhibit small width and for such cases, our algorithm can report higher precision results when compared to width-independent algorithms. As a special case of our result, we report a approximation algorithm for the densest subgraph problem which runs in time , where is the number of edges in the graph and is the maximum graph degree.

## 1 Introduction

Mixed packing and covering linear programs (LPs) are a natural class of LPs where coefficients, variables, and constraints are non-negative. They model a wide range of important problems in combinatorial optimization and operations research. In general, they model any problem which contains a limited set of available resources (packing constraints) and a set of demands to fulfill (covering constraints).

Two special cases of the problem have been widely studied in literature: pure packing, formulated as ; and pure covering, formulated as where are all non-negative. These are known to model fundamental problems such as maximum bipartite graph matching, minimum set cover, etc LubyN93 . Algorithms to solve packing and covering LPs have also been applied to great effect in designing flow control systems BartalBR04 , scheduling problems PlotkinST95 , zero-sum matrix games Nesterov05 and in mechanism design ZurelN01 . In this paper, we study the mixed packing and covering (MPC) problem, formulated as checking the feasibility of the set: , where are non-negative. We say that is an -approximate solution to MPC if it belongs to the relaxed set . MPC is a generalization of pure packing and pure covering, hence it is applicable to a wider range of problems such as multi-commodity flow on graphs Young01 ; Sherman17 , non-negative linear systems and X-ray tomography Young01 .

General LP solving techniques such as the interior point method can approximate solutions to MPC in as few as iterations - however, they incur a large per-iteration cost. In contrast, iterative approximation algorithms based on first-order optimization methods require iterations, but the iterations are fast and in most cases are conducive to efficient parallelization. This property is of utmost importance in the context of ever-growing datasets and the availability of powerful parallel computers, resulting in much faster algorithms in relatively low-precision regimes.

### 1.1 Previous work

In literature, algorithms for the MPC problem can be grouped into two broad categories: width-dependent and width-independent. Here, width is an intrinsic property of a linear program which typically depends on the dimensions and the largest entry of the constraint matrix, and is an indication of the range of values any constraint can take. In the context of this paper and the MPC problem, we define and as the maximum number of non-zeros in any constraint in and respectively. We define the width of the LP as .

One of the first approaches used to solve LPs was Langrangian-relaxation: replacing hard constraints with loss functions which enforce the same constraints indirectly. Using this approach, Plotkin, Schmoys and Tardos PlotkinST95 and Grigoriadis and Khachiyan GrigoriadisK96 obtained width-dependent polynomial-time approximation algorithms for MPC. Luby and Nisan LubyN93 gave the first width-dependent parallelizable algorithm for pure packing and pure covering, which ran in parallel time, and total work. Here, parallel time (sometimes termed as depth) refers to the longest chain of dependent operations, and work refers to the total number of operations in the algorithm.

Young Young01 extended this technique to give the first width-independent parallel algorithm for MPC in parallel time, and total work111 here is the maximum number of constraints that any variable appears in.. Young Young14 later improved his algorithm to run using total work . Mahoney et al. MahoneyRWZ16 later gave an algorithm with a faster parallel run-time of .

The other most prominent approach in literature towards solving an LP is by converting it into a smooth function Nesterov05 , and then applying general first-order optimization techniques Nesterov05 ; Nesterov12 . Although the dependence on from using first-order techniques is much improved, it usually comes at the cost of sub-optimal dependence on the input size and width. For the MPC problem, Nesterov’s accelerated method Nesterov12 , as well as Bienstock and Iyengar’s adaptation BienstockI06 of Nesterov’s smoothing Nesterov05 , give rise to algorithms with runtime linearly depending on , but with far from optimal dependence on input size and width. For pure packing and pure covering problems, however, Allen-Zhu and Orrechia AllenZhuO19 were the first to incorporate Nesterov-like acceleration while still being able to obtain near-linear width-independent runtimes, giving a time algorithm for the packing problem. For the covering problem, they gave a time algorithm, which was then improved to by WangRM16 . Importantly, however, the above algorithms do not generalize to MPC.

### 1.2 Our contributions

We give the best parallel width-dependent algorithm for MPC, while only incurring a linear dependence on in the parallel runtime and total work. Additionally, the total work has near-linear dependence on the input-size. Formally, we state our main theorem as follows.

###### Theorem 1.1.

There exists a parallel -approximation algorithm for the mixed packing covering problem, which runs in parallel time, while performing total work, where is the total number of non-zeros in the constraint matrices, and is the width of the given LP.

Table 1 compares the running time of our algorithm to previous works solving this problem (or its special cases).

Sacrificing width independence for faster convergence with respect to precision proves to be a valuable trade-off for several combinatorial optimization problems which naturally have a low width. Prominent examples of such problems which are not pure packing or covering problems include multicommodity flow and densest subgraph, where the width is bounded by the degree of a vertex. In a large number of real-world graphs, the maximum vertex degree is usually small, hence our algorithm proves to be much faster when we want high-precision solutions. We explicitly show that this result directly gives the fastest algorithm for the densest subgraph problem on low-degree graphs in Appendix C.

## 2 Notation and Definitions

For any integer , we represent using the -norm of any vector. We represent the infinity-norm as . We denote the infinity-norm ball (sometimes called the ball) as the set . The nonnegative part of this ball is denoted as . For radius , we drop the radius specification and use a short notation and . We denote the extended simplex of dimension as . For any , if . Further, for any set , we represent its interior, relative interior and closure as and , respectively. Function is applied to a vector element wise. Division of two vectors of same dimension is also performed element wise.
For any matrix , we use to denote the number of nonzero entries in it. We use and to refer to the th row and th column of respectively. We use notation or alternatively to denote element in -th row and -th column of matrix . denotes the operator norm . For a symmetric matrix and an antisymmetric matrix , we define an operator as is positive semi-definite. We formally define an -approximate solution to the mixed packing-covering (MPC) problem as follows.

###### Definition 2.1.

We say that is an -approximate solution of the mixed packing-covering problem if satisfies , and .

Here, denotes a vectors of ’s of dimension for any integer .
The saddle point problem on two sets and can be defined as follows:

 minx∈Xmaxy∈YL(x,y) (1)

where is some bilinear form between and . For this problem, we define the primal-dual gap function as . This gap function can be used as measure of accuracy of the above saddle point solution.

###### Definition 2.2.

We say that is an -optimal solution for (1) if .

## 3 Technical overview

The mixed packing-covering (MPC) problem is formally defined as follows.

Given two nonnegative matrices , find an such that and if it exists, otherwise report infeasibility.

Note that the vector of ’s on the right hand side of packing and covering constraints can be obtained by simply scaling each constraint appropriately. We also assume that each entry in the matrices and is at most one. This assumption, and subsequently the constraints on also cause no loss of generality.222This transformation can be achieved by adapting techniques from WangRM16 while increasing dimension of the problem up to a logarithmic factor. Details of this fact are in the Appendix B in the full paper (supplementary file). For the purpose of the main text, we work with this assumption.

We reformulate MPC as a saddle point problem, as defined in Section 2;

 λ∗def=minx∈Bn+,∞  maxy∈Δ+c, z∈Δ+p L(x,y,z), (2)

where . The relation between the two formulation is shown in Section 4. For the rest of the paper, we focus on the saddle point formulation (2).
is a piecewise linear convex function. Assuming oracle access to this “inner" maximization problem, the “outer" problem of minimizing can be performed using first order methods like mirror descent, which are suitable when the underlying problem space is the unit ball. One drawback of this class of methods is that their rate of convergence, which is standard for non-accelerated first order methods on non-differentiable objectives, is to obtain an -approximate minimizer of which satisfies , where is the optimal value. This means that the algorithm needs to access the inner maximization oracle times, which can become prohibitively large in the high precision regime.

Note that even though is a piecewise linear non-differentiable function, it is not a black box function, but a maximization linear functions in . This structure can be exploited using Nesterov’s smoothing technique Nesterov05 . In particular, can be approximated by choosing a strongly convex33footnotemark: 3 function and considering

 ˜η(x)=maxy∈Δ+c,z∈Δ+pL(x,y,z)−ϕ(y,z).

This strongly convex regularization yields that is a Lipschitz-smooth333Definitions of Lipschitz-smoothness and strong convexity can be found in many texts in nonlinear programming and machine learning. e.g. bubeck2014theory . Intuitively, is Lipschitz-smooth if the rate of change of can be bounded by a quantity known as the “constant of Lipschitz smoothness”. convex function. If is the constant of Lipschitz smoothness of then application of any of the accelerated gradient methods in literature will converge in iterations. Moreover, it can also be shown that in order to construct a smooth -approximation of , the Lipschitz smoothness constant can be chosen to be of the order , which in turn implies an overall convergence rate of . In particular, Nesterov’s smoothing achieves an oracle complexity of , where where , and denote the sizes of the ranges of their respective regularizers which are strongly convex functions. and can be made of the order of and , respectively. However, can be problematic since belongs to an ball. More on this will soon follow.
Nesterov’s dual extrapolation algorithmNesterov07 gives a very similar complexity but is a different algorithm in that it directly addresses the saddle point formulation (2) rather than viewing the problem as optimizing a non-smooth function . The final convergence for the dual extrapolation algorithm is given in terms of the primal-dual gap function of the saddle point problem (2). This algorithms views the saddle point problem as solving variational inequality for an appropriate monotone operator in joint domain . Moreover, as opposed to smoothing techniques which only regularize the dual, this algorithm regularizes both primal and dual parts, hence is a different scheme altogether.

Note that for both schemes mentioned above, the maximization oracle itself has an analytical expression which involves matrix-vector multiplication. Hence each call to the oracle incurs a sequential run-time of . Then, overall complexity for both schemes is of order .

#### The ℓ∞ barrier

Note that the both methods, i.e., Nesterov’s smoothing and dual extrapolation, involves a term, which denotes the range of a convex function over the domain of . The following lemma states a lower bound for this range in case of balls.

###### Lemma 3.1.

Any strongly convex function has a range of at least on any ball.

Since for each member function of this wide class, there is no hope of eliminating this factor using techniques involving explicit use of strong convexity.
So, the goal now is to find a function with a small range over balls, but still act as good enough regularizers to enable accelerated convergence of the descent algorithm. In pursuit of breaking this barrier, we draw inspiration from the notion of area convexity introduced by Sherman Sherman17 . Area convexity is a weaker notion than strong convexity, however, it is still strong enough to ensure that accelerated first order methods still go through when using area convex regularizers. Since this is a weaker notion than strong convexity, we can construct area convex functions which have range of on ball.

First, we define area convexity, and then go on to mention its relevance to the saddle point problem (2). Area convexity is a notion defined in context of a matrix and a convex set . Let .

###### Definition 3.2.

A function is area convex with respect to a matrix on a convex set iff for any , satisfies

To understand the definition above, first lets look at the notion of strong convexity. is strongly convex if for any two points exceeds by an amount proportional to . Definition 3.2 generalizes this notion in context of matrix for any three points . is area-convex on set if for any three points , we have exceeds by an amount proportional to the area of the triangle defined by the convex hull of .

Consider the case that points are collinear. For this case, the area term (i.e., the term involving ) in Definition 3.2 is 0 since matrix is antisymmetric. In this sense, area convexity is even weaker than strict convexity. Moreover, the notion of area is parameterized by matrix . To see a specific example of this notion of area, consider and . Then, for all possible permutations of , the area term takes a value equal to . Since the condition holds irrespective of the permutation so we must have that But note that area of triangle formed by points is equal to . Hence the area term is just a high dimensional matrix based generalization of the area of a triangle.

Coming back to the saddle point problem (2), we need to pick a suitable area convex function on the set . Since is defined on the joint space, it has the property of joint regularization vis a vis (2). However, we need an additional parameter: a suitable matrix . The choice of this matrix is related to the bilinear form of the primal-dual gap function of (2). We delve into the technical details of this in Section 4, however, we state that the matrix is composed of and some additional constants. The algorithm we state exactly follows Nesterov’s dual extrapolation method described earlier. One notable difference is that in Nesterov07 , they consider joint regularization by a strongly convex function which does not depend on the problem matrices but only on the constraint set . Our area convex regularizer, on the other hand, is tailor made for the particular problem matrices as well as the constraint set.

## 4 Area Convexity for Mixed Packing Covering LPs

In this section, we present our technical results and algorithm for the MPC problem, with the end goal of proving Theorem 1.1. First, we relate an -approximate solution to the saddle point problem to an -approximate solution to MPC. Next, we present some theoretical background towards the goal of choosing and analyzing an appropriate area-convex regularizer in the context of the saddle point formulation, where the key requirement of the area convex function is to obtain a provable and efficient convergence result. Finally, we explicitly show an area convex function which is generated using a simple “gadget" function. We show that this area convex function satisfies all key requirements and hence achieves the desired accelerated rate of convergence. This section closely follows Sherman17 , in which the author chooses an area convex function specific to the undirected multicommodity flow problem. Due to space constraints, we relegate almost all proofs to Appendix A and simply include pointers to proofs in Sherman17 when it is directly applicable.

### 4.1 Saddle Point Formulation for MPC

Consider the saddle point formulation in (2) for MPC problem. Given a feasible primal-dual feasible solution pair and for (2), we denote and where . Then, we define a function as

 Q(w,\widebarw)def=[\widebaryT \widebarzT][P−1p−C1c][xu]−[yT zT][P−1p−C1c][\widebarx\widebaru].

Note that if , then

 sup\widebarw∈WQ(w,\widebarw)=sup\widebarx∈Bn+,∞,\widebary∈Δ+p,\widebarz∈Δ+cL(x,\widebary,\widebarz)−L(\widebarx,y,z)

is precisely the primal-dual gap function defined in Section 2. Notice that if is a saddle point of (2), then we have

 L(x∗,y,z)≤L(x∗,y∗,z∗)≤L(x,y∗,z∗)

for all . From above equation, it is clear that for all where and . Moreover, . This motivates the following accuracy measure of the candidate approximate solution .

###### Definition 4.1.

We say that is an -optimal solution of (2) iff

 sup\widebarw∈WQ(w,\widebarw)≤ε.
###### Remark 4.2.

Recall the definition of for a matrix in Section 3. We can rewrite where and

 H=[P−1p−C1c]⇒J:=⎡⎢ ⎢ ⎢ ⎢⎣0n×n0n×1−PTCT01×n01Tp−1TcP−1p0p×p0p×c−C1c0c×p0c×c⎤⎥ ⎥ ⎥ ⎥⎦.

Thus, the gap function in Definition 4.1 can be written in the bilinear form .

Lemma 4.3 relates the -optimal solution of (2) to the -approximate solution to MPC.

###### Lemma 4.3.

Let satisfy . Then either
1. is an -approximate solution of MPC, or
2. satisfy for all .

This lemma states that in order to find an -approximate solution of MPC, it suffices to find -optimal solution of (2). Henceforth, we will focus on -optimality of the saddle point formulation (2).

### 4.2 Area Convexity with Saddle Point Framework

Here we state some useful lemmas which help in determining whether a differentiable function is area convex. We start with the following remark which follows from the definition of area convexity (Definition 3.2).

###### Remark 4.4.

If is area convex with respect to on a convex set , and is a convex set, then is area convex with respect to on .

The following two lemmas from Sherman17 provide the key characterization of area convexity.

###### Lemma 4.5.

Let symmetric matrix. and .

###### Lemma 4.6.

Let be twice differentiable on the interior of convex set , i.e., .

1. If is area convex with respect to on , then for all

2. If for all , then is area convex with respect to on . Moreover, if is continuous on , then is area convex with respect to on .

In order to handle the operator (recall from Section 2), we state some basic but important properties of this operator, which will come in handy in later proofs.

###### Remark 4.7.

For symmetric matrices and and antisymmetric matrices and ,

1. If then .

2. If and then .

3. If and then .

Having laid a basic foundation for area convexity, we now focus on its relevance to solving the saddle point problem (2). Considering Remark 4.2, we can write the gap function criterion of optimality in terms of bilinear form of the matrix . Suppose we have a function which is area convex with respect to on set . Then, consider the following jointly-regularized version of the bilinear form:

 ˜η(w):=sup\widebarw∈W \widebarwTJw−ϕ(\widebarw). (3)

Similar to Nesterov’s dual extrapolation, one can attain convergence of accelerated gradient descent for function in (3) over variable . In order to obtain gradients of , we need access to . However, it may not be possible to find an exact maximizer in all cases. Again, one can get around this difficulty by instead using an approximate optimization oracle of the problem in (3).

###### Definition 4.8.

A -optimal solution oracle (OSO) for takes input and outputs such that

 aTw−ϕ(w)≥sup\widebarw∈WaT\widebarw−ϕ(\widebarw)−δ.

Given as a -OSO for a function , consider the following algorithm (Algorithm 1):

For Algorithm 1, Sherman17 shows the following:

###### Lemma 4.9.

Let . Suppose is area convex with respect to on . Then for and for all we have and,

 sup\widebarw∈W\widebarwJwtt≤δ+ρt.

In particular, in iterations, Algorithm 1 obtain -solution of the saddle point problem (2).

The analysis of this lemma closely follows the analysis of Nesterov’s dual extrapolation.

Note that, each iteration consists of matrix-vector multiplications, vector additions, and calls to the approximate oracle. Since the former two are parallelizable to depth, the same remains to be shown for the oracle computation to complete the proof of the run-time in Theorem 1.1.

Recall from the discussion in Section 3 that the critical bottleneck of Nesterov’s method is that diameter of the ball is , which is achieved even in the Euclidean norm. This makes in Lemma 4.9 to also be , which can be a major bottleneck for high dimensional LPs, which are commonplace among real-world applications.

Although, on the face of it, area convexity applied to the saddle point formulation (2) has a similar framework to Nesterov’s dual extrapolation, the challenge is to construct a for which we can overcome the above bottleneck. Particularly, there are three key challenges to tackle:
1. We need to show that existence of a function that is area convex with respect to on .
2. should be such that is not too large.
3. There should exist an efficient -OSO for .
In the next subsection, we focus on these three aspects in order to complete our analysis.

### 4.3 Choosing an area convex function

First, we consider a simple 2-D gadget function and prove a “nice" property of this gadget. Using this gadget, we construct a function which can be shown to be area convex using the aforementioned property of the gadget.

Let be a function parameterized by defined as

 γβ(a,b)=baloga+βblogb.
###### Lemma 4.10.

Suppose . Then for all and .

Now, using the function , we construct a function and use the sufficiency criterion provided in Lemma 4.6 to show that is area convex with respect to on . Note that our set of interest is not full-dimensional, whereas Lemma (4.6) is only stated for and not for . To get around this difficulty, we consider a larger set such that is full dimensional and is area convex on . Then we use Remark 4.4 to obtain the final result, i.e., area convexity of .

###### Theorem 4.11.

Let and define

where and , then is area convex with respect to on set . In particular, it also implies is area convex with respect to on set .

Theorem 4.11 addresses the first part of the key three challenges. Next, Lemma 4.12 shows an upper bound on the range of .

###### Lemma 4.12.

Function then

Finally, we need an efficient -OSO. Consider the following alternating minimization algorithm.

Beck15 shows the following convergence result.

###### Lemma 4.13.

For , Algorithm 2 is a -OSO for which converges in iterations.

We show that for our chosen , we can compute the two argmax in each iteration of Algorithm 2 analytically with computation time and hence we obtain a -OSO running in total work. Parallelizing matrix-vector multiplications, eliminaters the dependence on and , at the cost of another term.

###### Lemma 4.14.

Each in Algorithm 2 can be computed as follows:
for all .

In particular, we can compute in work and parallel time.

## Acknowledgements

We thank Richard Peng for many important pointers and discussions.

## References

• (1) Allen-Zhu, Z., and Orecchia, L. Nearly linear-time packing and covering LP solvers - achieving width-independence and -convergence. Math. Program. 175, 1-2 (2019), 307–353.
• (2) Bahmani, B., Goel, A., and Munagala, K. Efficient primal-dual graph algorithms for mapreduce. In Algorithms and Models for the Web Graph - 11th International Workshop, WAW 2014, Beijing, China, December 17-18, 2014, Proceedings (2014), pp. 59–78.
• (3) Bartal, Y., Byers, J. W., and Raz, D. Fast, distributed approximation algorithms for positive linear programming with applications to flow control. SIAM J. Comput. 33, 6 (2004), 1261–1279.
• (4) Beck, A. On the convergence of alternating minimization for convex programming with applications to iteratively reweighted least squares and decomposition schemes. SIAM Journal on Optimization 25, 1 (2015), 185–209.
• (5) Bienstock, D., and Iyengar, G. Approximating fractional packings and coverings in o(1/epsilon) iterations. SIAM J. Comput. 35, 4 (2006), 825–854.
• (6) Bubeck, S. Theory of convex optimization for machine learning. arXiv preprint arXiv:1405.4980 15 (2014).
• (7) Charikar, M. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization (Berlin, Heidelberg, 2000), APPROX ’00, pp. 84–95.
• (8) Grigoriadis, M. D., and Khachiyan, L. G. Approximate minimum-cost multicommodity flows in õ(epsilonknm) time. Math. Program. 75 (1996), 477–482.
• (9) Luby, M., and Nisan, N. A parallel approximation algorithm for positive linear programming. In Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing, May 16-18, 1993, San Diego, CA, USA (1993), pp. 448–457.
• (10) Mahoney, M. W., Rao, S., Wang, D., and Zhang, P. Approximating the solution to mixed packing and covering lps in parallel o(epsilon^{-3}) time. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy (2016), pp. 52:1–52:14.
• (11) Nesterov, Y. Smooth minimization of non-smooth functions. Math. Program. 103, 1 (2005), 127–152.
• (12) Nesterov, Y. Dual extrapolation and its applications to solving variational inequalities and related problems. Math. Program. 109, 2-3 (2007), 319–344.
• (13) Nesterov, Y. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization 22, 2 (2012), 341–362.
• (14) Plotkin, S. A., Shmoys, D. B., and Tardos, É. Fast approximation algorithms for fractional packing and covering problems. Math. Oper. Res. 20, 2 (1995), 257–301.
• (15) Sherman, J. Area-convexity, l regularization, and undirected multicommodity flow. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017 (2017), pp. 452–460.
• (16) Wang, D., Rao, S., and Mahoney, M. W. Unified acceleration method for packing and covering problems via diameter reduction. In 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, 2016, Rome, Italy (2016), pp. 50:1–50:13.
• (17) Young, N. E. Sequential and parallel algorithms for mixed packing and covering. In 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14-17 October 2001, Las Vegas, Nevada, USA (2001), pp. 538–546.
• (18) Young, N. E. Nearly linear-time approximation schemes for mixed packing/covering and facility-location linear programs. CoRR abs/1407.3015 (2014).
• (19) Zurel, E., and Nisan, N. An efficient approximate allocation algorithm for combinatorial auctions. In Proceedings 3rd ACM Conference on Electronic Commerce (EC-2001), Tampa, Florida, USA, October 14-17, 2001 (2001), pp. 125–136.

## Appendix A Proof of auxiliary results

In this section, we include proofs of lemmas from the main paper. In some cases, the lemmas are direct restatements of results from other papers, for which we provide appropriate pointers.

###### Proof of Lemma 3.1.

Consider an arbitrary strongly convex function . Assume WLOG that . (otherwise, we can shift it accordingly). We will show that by induction on for set . This suffices because is isomorphic to . The claim holds for by the definition of strong convexity. Now, suppose it is true for . Then there exists such that . Moving units in the last coordinate from in the direction of nonnegative slope, suppose we reach . Then, due to strong convexity of , we have

###### Proof of Lemma 4.3.

Suppose we are given such that . If there exists which is feasible for MPC then choosing then . Hence we have

 sup(\widebary,\widebarz)∈Δ+p×Δ+cL(x,\widebary,\widebarz) ≤ε ≤ε,

where implication follows by optimality over extended simplices . So we obtain, if there exist a feasible solution for MPC then is -approximate solution of MPC.
On the other hand, suppose is not an -approximate solution. Then

 max{ ∥[Px−1p]+∥,∥[−Cx+1c]+∥}>ε ⇒sup(\widebary,\widebarz)∈Δ+p×Δ+cL(x,\widebary,\widebarz)= ∥[Px−1p]+∥+∥[−Cx+1c]+∥>ε

Let such that then we have

 sup\widebarx∈Bn+,∞L(x,ˆy,ˆz)−L(\widebarx,y,z) ≤ε ⇒L(x,ˆy,ˆz)−inf\widebarx∈Bn+,∞L(\widebarx,y,z) ≤ε ⇒inf\widebarx∈Bn+,∞L(\widebarx,y,z)>0

Hence, if is not -approximate solution of MPC then satisfy for all implying that MPC is infeasible. ∎

###### Proof of Lemma 4.5.

Let and .
Then iff iff all principle minors of are nonnegative. Now, implies . It is easy to verify that third principle minor is nonnegative iff . So implies must be invertible. Then, applying Schur complement lemma, we obtain that . Now let then . It is easy to verify that . This implies and . Hence we conclude the proof. ∎

###### Proof of Lemma 4.6.

This lemma appears exactly as Theorem 1.6 in Sherman17 . The proof follows from the same. ∎

###### Proof of Proposition 4.7.
1.  A⪰iB ⇔[A−BBA]⪰0 ⇔xTAx+yTAy+yTBx−xTBy≥0,∀ x,y ⇔xTAx+yTAy−yTBx+xTBy≥0,∀ x,y ⇔[AB−BA]⪰0⇔A⪰i(−B)

Here, the third equivalence follows after replacing by . Hence we conclude the proof of part 1.

2.  A⪰iB⇔[A−BBA]⪰0⇒[λA−λBλBλA]⪰0⇔λA⪰λB
3. implies . Similarly implies . Hence

 [A+C−(B+D)(B+D)(A+C)]⪰0.

So we obtain .

###### Proof of Lemma 4.9.

This lemma appears as Theorem 1.3 in Sherman17 , and the proof follows from the same. ∎

###### Proof of Lemma 4.10.

We use equivalent characterization proved in Lemma 4.5. We need to show that and for all and . First of all, note that is well-defined on this domain. In particular, we can write

 d2γβ(a,b)=⎡⎣βb1+loga1+logaba⎤⎦.

Note that a matrix is PSD if and only if its diagonal entries and determinant are nonnegative. Clearly diagonal entries of are nonnegative for the given values of and . Hence, in order to prove the lemma, it suffices to show that .
is only a function of for any fixed value of . Moreover, it can be shown that is a decreasing function of on set . Clearly, the minimum occurs at . However, for all . Hence we have that for all and .
Finally to see the claim that is a decreasing function of for any , consider

 dda(det(d2γβ(a,b))) =−βa2−2(1+loga)a ≤−2(1+a(1+loga))a2<0

where the last inequality follows from the observation that for all