Approximating Operator Norms via Generalized Krivine Rounding

Approximating Operator Norms via Generalized Krivine Rounding

Vijay Bhattiprolu Supported by NSF CCF-1422045 and CCF-1526092. vpb@cs.cmu.edu. Part of the work was done while visiting UC Berkeley and CMSA, Harvard.    Mrinalkanti Ghosh Supported by NSF CCF-1254044 mkghosh@ttic.edu    Venkatesan Guruswami Supported in part by NSF grant CCF-1526092. guruswami@cmu.edu. Part of the work was done while visiting CMSA, Harvard.    Euiwoong Lee Supported by the Simons Institute for the Theory of Computing. euiwoong@cims.nyu.edu    Madhur Tulsiani Supported by NSF CCF-1254044 madhurt@ttic.edu
Abstract

We consider the -Grothendieck problem, which seeks to maximize the bilinear form for an input matrix over vectors with . The problem is equivalent to computing the operator norm of , where is the dual norm to . The case corresponds to the classical Grothendieck problem. Our main result is an algorithm for arbitrary with approximation ratio for some fixed . Here denotes the ’th norm of the standard Gaussian. Comparing this with Krivine’s approximation ratio for the original Grothendieck problem, our guarantee is off from the best known hardness factor of for the problem by a factor similar to Krivine’s defect (up to the constant ).

Our approximation follows by bounding the value of the natural vector relaxation for the problem which is convex when . We give a generalization of random hyperplane rounding using Hölder-duals of Gaussian projections rather than taking the sign. We relate the performance of this rounding to certain hypergeometric functions, which prescribe necessary transformations to the vector solution before the rounding is applied. Unlike Krivine’s Rounding where the relevant hypergeometric function was , we have to study a family of hypergeometric functions. The bulk of our technical work then involves methods from complex analysis to gain detailed information about the Taylor series coefficients of the inverses of these hypergeometric functions, which then dictate our approximation factor.

Our result also implies improved bounds for “factorization through ” of operators from to (when )— such bounds are of significant interest in functional analysis and our work provides modest supplementary evidence for an intriguing parallel between factorizability, and constant-factor approximability.

1 Introduction

We consider the problem of finding the norm of a given matrix , which is defined as

 ∥A∥p→q := maxx∈Rn∖{0}∥Ax∥q∥x∥p.

The quantity is a natural generalization of the well-studied spectral norm () and computes the maximum distortion (stretch) of the operator from the normed space to . The case when and is the well known Grothendieck problem [KN12, Pis12], where the goal is to maximize subject to . In fact, via simple duality arguments, the general problem computing can be seen to be equivalent to the following variant of the Grothendieck problem

where denote the dual norms of and , satisfying . The above quantity is also known as the injective tensor norm of where is interpreted as an element of the space .

In this work, we consider the case of , where the problem is known to admit good approximations when , and is hard otherwise. Determining the right constants in these approximations when has been of considerable interest in the analysis and optimization community.

For the case of norm, Grothendieck’s theorem [Gro56] shows that the integrality gap of a semidefinite programming (SDP) relaxation is bounded by a constant, and the (unknown) optimal value is now called the Grothendieck constant . Krivine [Kri77] proved an upper bound of on , and it was later shown by Braverman et al. [BMMN13] that is strictly smaller than this bound. The best known lower bound on is about , due to (an unpublished manuscript of) Reeds [Ree91] (see also [KO09] for a proof).

A very relevant work of Nestereov [Nes98] proves an upper bound of on the approximation factor for norm for any (although the bound stated there is slightly weaker - see Section 5.3 for a short proof). A later work of Steinberg [Ste05] also gave an upper bound of , where denotes norm of a standard normal random variable (i.e., the -th root of the -th Gaussian moment).

On the hardness side, Briët, Regev and Saket [BRS15] showed NP-hardness of for the norm, strengthening a hardness result of Khot and Naor based on the Unique Games Conjecture (UGC) [KN08] (for a special case of the Grothendieck problem when the matrix is positive semidefinite). Assuming UGC, a hardness result matching Reeds’ lower bound was proved by Khot and O’Donnell [KO09], and hardness of approximating within was proved by Raghavendra and Steurer [RS09]. In a companion paper [BGG18], the authors proved NP-hardness of approximating norm within any factor better than , for any . Stronger hardness results are known and in particular the problem admits no constant approximation, for the cases not considered in this paper i.e., when or . We refer the interested reader to a detailed discussion in [BGG18].

1.1 The Search For Optimal Constants and Optimal Algorithms

The goal of determining the right approximation ratio for these problems is closely related to the question of finding the optimal (rounding) algorithms. For the Grothendieck problem, the goal is to find and with , and one considers the following semidefinite relaxation:

 maximize ∑i,jAi,j⋅⟨ui,vj⟩s.t. subject to ∥ui∥2∥ui∥2≤1,∥vj∥2≤1 ∀i∈[m],j∈[n] ui,vj∈Rm+n ∀i∈[m],j∈[n]

By the bilinear nature of the problem above, it is clear that the optimal can be taken to have entries in . A bound on the approximation ratio111Since we will be dealing with problems where the optimal solution may not be integral, we will use the term “approximation ratio” instead of “integrality gap”. of the above program is then obtained by designing a good “rounding” algorithm which maps the vectors to values in . Krivine’s analysis [Kri77] corresponds to a rounding algorithm which considers a random vector and rounds to defined as

 yi := sgn(⟨φ(ui),g⟩)andxj := sgn(⟨ψ(vj),g⟩),

for some appropriately chosen transformations and . This gives the following upper bound on the approximation ratio of the above relaxation, and hence on the value of the Grothendieck constant :

 KG ≤ 1sinh−1(1)⋅π2 = 1ln(1+√2)⋅π2.

Braverman et al. [BMMN13] show that the above bound can be strictly improved (by a very small amount) using a two dimensional analogue of the above algorithm, where the value is taken to be a function of the two dimensional projection for independent Gaussian vectors (and similarly for ). Naor and Regev [NR14] show that such schemes are optimal in the sense that it is possible to achieve an approximation ratio arbitrarily close to the true (but unknown) value of by using -dimensional projections for a large (constant) . A similar existential result was also proved by Raghavendra and Steurer [RS09] who proved that the there exists a (slightly different) rounding algorithm which can achieve the (unknown) approximation ratio .

For the case of arbitrary , Nesterov [Nes98] considered the convex program in Fig. 1, denoted as , generalizing the one above.

Note that since and , the above program is convex in the entries of the Gram matrix of the vectors . Although the stated bound in [Nes98] is slightly weaker (as it is proved for a larger class of problems), the approximation ratio of the above relaxation can be shown to be bounded by . By using the Krivine rounding scheme of considering the sign of a random Gaussian projection (aka random hyperplane rounding) one can show that Krivine’s upper bound on still applies to the above problem.

Motivated by applications to robust optimization, Steinberg [Ste05] considered the dual of (a variant of) the above relaxation, and obtained an upper bound of on the approximation factor. Note that while Steinberg’s bound is better (approaches 1) as and approach 2, it is unbounded when (as in the Grothendieck problem).

Based on the inapproximability result of factor obtained in a companion paper by the authors [BGG18], it is natural to ask if this is “right form” of the approximation ratio. Indeed, this ratio is when , which is the ratio obtained by Krivine’s rounding scheme, up to a factor of . We extend Krivine’s result to all as below.

Theorem 1.1.

There exists a fixed constant such that for all , the approximation ratio of the convex relaxation is upper bounded by

 1+ε0sinh−1(1)⋅1γp∗⋅γq = 1+ε0ln(1+√2)⋅1γp∗⋅γq.

Perhaps more interestingly, the above theorem is proved via a generalization of hyperplane rounding, which we believe may be of independent interest. Indeed, for a given collection of vectors considered as rows of a matrix , Gaussian hyperplane rounding corresponds to taking the “rounded” solution to be the

We consider the natural generalization to (say) norms, given by

We refer to as the “Hölder dual” of , since the above rounding can be obtained by viewing as lying in the dual () ball, and finding the for which Hölder’s inequality is tight. Indeed, in the above language, Nesterov’s rounding corresponds to considering the ball (hyperplane rounding). While Steinberg used a somewhat different relaxation, the rounding there can be obtained by viewing as lying in the primal ball instead of the dual one. In case of hyperplane rounding, the analysis is motivated by the identity that for two unit vectors and , we have

We prove the appropriate extension of this identity to balls (and analyze the functions arising there) which may also be of interest for other optimization problems over balls.

1.2 Proof overview

As discussed above, we consider Nesterov’s convex relaxation and generalize the hyperplane rounding scheme using “Hölder duals” of the Gaussian projections, instead of taking the sign. As in the Krivine rounding scheme, this rounding is applied to transformations of the SDP solutions. The nature of these transformations depends on how the rounding procedure changes the correlation between two vectors. Let be two unit vectors with . Then, for , and are -correlated Gaussian random variables. Hyperplane rounding then gives valued random variables whose correlation is given by

 Eg1∼ρg2[sgn(g1)⋅sgn(g2)] = 2π⋅sin−1(ρ).

The transformations and (to be applied to the vectors and ) in Krivine’s scheme are then chosen depending on the Taylor series for the function, which is the inverse of function computed on the correlation. For the case of Hölder-dual rounding, we prove the following generalization of the above identity

where denotes a hypergeometric function with the specified parameters. The proof of the above identity combines simple tools from Hermite analysis with known integral representations from the theory of special functions, and may be useful in other applications of the rounding procedure.

Note that in the Grothendieck case, we have , and the remaining part is simply the function. In the Krivine rounding scheme, the transformations and are chosen to satisfy , where the constant then governs the approximation ratio. The transformations and taken to be of the form such that

 ⟨φ(u),ψ(v)⟩ = c′⋅sin(⟨u,v⟩)and∥φ(u)∥2=∥ψ(v)∥=1.

If represents (a normalized version of) the function of occurring in the identity above (which is for hyperplane rounding), then the approximation ratio is governed by the function obtained by replacing every Taylor coefficient of by its absolute value. While is simply the function (and thus is the function) in the Grothendieck problem, no closed-form expressions are available for general and .

The task of understanding the approximation ratio thus reduces to the analytic task of understanding the family of the functions obtained for different values of and . Concretely, the approximation ratio is given by the value . At a high level, we prove bounds on by establishing properties of the Taylor coefficients of the family of functions , i.e., the family given by

 {f−1  |  f(ρ)=ρ⋅2F1(a1,b1;3/2;ρ2) , a1,b1∈[0,1/2]}.

While in the cases considered earlier, the functions are easy to determine in terms of via succinct formulae [Kri77, Haa81, AN04] or can be truncated after the cubic term [NR14], neither of these are true for the family of functions we consider. Hypergeometric functions are a rich and expressive class of functions, capturing many of the special functions appearing in Mathematical Physics and various ensembles of orthogonal polynomials. Due to this expressive power, the set of inverses is not well understood. In particular, while the coefficients of are monotone in and , this is not true for . Moreover, the rates of decay of the coefficients may range from inverse polynomial to super-exponential. We analyze the coefficients of using complex-analytic methods inspired by (but quite different from) the work of Haagerup [Haa81] on bounding the complex Grothendieck constant. The key technical challenge in our work is in arguing systematically about a family of inverse hypergeometric functions which we address by developing methods to estimate the values of a family of contour integrals.

While our methods only gives a bound of the form , we believe this is an artifact of the analysis and the true bound should indeed be .

1.3 Relation to Factorization Theory

Let be Banach spaces, and let be a continuous linear operator. As before, the norm is defined as

 ∥A∥X→Y := supx∈X∖{0}∥Ax∥Y∥x∥X.

The operator is said to be factorize through Hilbert space if the factorization constant of defined as

 Φ(A) := infHinfBC=A∥C∥X→H⋅∥B∥H→Y∥A∥X→Y

is bounded, where the infimum is taken over all Hilbert spaces and all operators and . The factorization gap for spaces and is then defined as where the supremum runs over all continuous operators .

The theory of factorization of linear operators is a cornerstone of modern functional analysis and has also found many applications outside the field (see [Pis86, AK06] for more information). An application to theoretical computer science was found by Tropp [Tro09] who used the Grothendieck factorization [Gro56] to give an algorithmic version of a celebrated column subset selection result of Bourgain and Tzafriri [BT87].

As an almost immediate consequence of convex programming duality, our new algorithmic results also imply some improved factorization results for (a similar observation was already made by Tropp [Tro09] in the special case of and for a slightly different relaxation). We first state some classical factorization results, for which we will use and to respectively denote the Type-2 and Cotype-2 constants of . We refer the interested reader to Section 5 for a more detailed description of factorization theory as well as the relevant functional analysis preliminaries.

The Kwapień-Maurey [Kwa72a, Mau74] theorem states that for any pair of Banach spaces and

 Φ(X,Y) ≤ T2(X)⋅C2(Y).

However, Grothendieck’s result [Gro56] shows that a much better bound is possible in a case where is unbounded. In particular,

 Φ(ℓn∞,ℓm1) ≤ KG,

for all . Pisier [Pis80] showed that if or satisfies the approximation property (which is always satisfied by finite-dimensional spaces), then

 Φ(X,Y) ≤ (2⋅C2(X∗)⋅C2(Y))3/2.

We show that the approximation ratio of Nesterov’s relaxation is in fact an upper bound on the factorization gap for the spaces and . Combined with our upper bound on the integrality gap, we show an improved bound on the factorization constant, i.e., for any and , we have that for ,

 Φ(X,Y) ≤ 1+ε0sinh−1(1)⋅(C2(X∗)⋅C2(Y)),

where as before. This improves on Pisier’s bound for all , and for certain ranges of it also improves upon and the bound of Kwapień-Maurey.

1.4 Approximability and Factorizability

Let and be sequences of Banach spaces such that is over the vector space and is over the vector space . We shall say a pair of sequences factorize if is bounded by a constant independent of and . Similarly, we shall say a pair of families are computationally approximable if there exists a polynomial , such that for every , there is an algorithm with runtime approximating within a constant independent of and (given an oracle for computing the norms of vectors and a separation oracle for the unit balls of the norms). We consider the natural question of characterizing the families of norms that are approximable and their connection to factorizability and Cotype.

The pairs for which is known (resp. not known) to factorize, are precisely those pairs which are known to be computationally approximable (resp. inapproximable assuming hardness conjectures like and ETH). Moreover the Hilbertian case which trivially satisfies factorizability, is also known to be computationally approximable (with approximation factor 1).

It is tempting to ask whether the set of computationally approximable pairs coincides with the set of factorizable pairs or the pairs for which have bounded (independent of ) Cotype-2 constant. Further yet, is there a connection between the approximation factor and the factorization constant, or approximation factor and Cotype-2 constants (of and )? Our work gives some modest additional evidence towards such conjectures. Such a result would give credibility to the appealing intuitive idea of the approximation factor being dependent on the “distance” to a Hilbert space.

1.5 Notation

For a non-negative real number , we define the -th Gaussian norm of a standard gaussian as

Given a vector , we define the -norm as for all . For any , we denote the dual norm by , which satisfies the equality: .

For , we will use the following notation: and . We note that .

For a matrix (or vector, when ). For an unitary function , we define to be the matrix with entries defined as for . For vectors , we denote by the entry-wise/Hadamard product of and . We denote the concatenation of two vectors and by . For a vector , we use to denote the diagonal matrix with the entries of forming the diagonal, and for a matrix we use to denote the vector of diagonal entries.

For a function defined as a power series, we denote the function .

2 Analyzing the Approximation Ratio via Rounding

We will show that is a good approximation to by using an appropriate generalization of Krivine’s rounding procedure. Before stating the generalized procedure, we shall give a more detailed summary of Krivine’s procedure.

2.1 Krivine’s Rounding Procedure

Krivine’s procedure centers around the classical random hyperplane rounding. In this context, we define the random hyperplane rounding procedure on an input pair of matrices as outputting the vectors and where is a vector with i.i.d. standard Gaussian coordinates ( denotes entry-wise application of a scalar function to a vector . We use the same convention for matrices.). The so-called Grothendieck identity states that for vectors ,

 E[sgn⟨g,u⟩⋅sgn⟨g,v⟩]=sin−1⟨ˆu,ˆv⟩π/2

where denotes . This implies the following equality which we will call the hyperplane rounding identity:

 E[sgn[Ug](sgn[Vg])T]=sin−1[ˆUˆVT]π/2. (1)

where for a matrix , we use to denote the matrix obtained by replacing the rows of by the corresponding unit (in norm) vectors. Krivine’s main observation is that for any matrices , there exist matrices with unit vectors as rows, such that

 φ(ˆU)ψ(ˆV)T=sin[(π/2)⋅c⋅ˆUˆVT]

where . Taking to be the optimal solution to , it follows that

The proof of Krivine’s observation follows from simulating the Taylor series of a scalar function using inner products. We will now describe this more concretely.

Observation 2.1 (Krivine).

Let be a scalar function satisfying for an absolutely convergent series . Let and further for vectors of -length at most , let

 SL(f,u):=(sgn(f1)√f1⋅u)⊕(sgn(f2)√f2⋅u⊗2)⊕(sgn(f3)√f3⋅u⊗3)⊕⋯ SR(f,v):=(√f1⋅v)⊕(√f2⋅v⊗2)⊕(√f3⋅v⊗3)⊕⋯

Then for any ,   and have -unit vectors as rows, and

 SL(f,√cf⋅ˆU) SR(f,√cf⋅ˆV)T=f[cf⋅ˆUˆVT]

where for a matrix , is applied to row-wise and .

Proof.

Using the facts and
, we have

The claim follows.

Before stating our full rounding procedure, we first discuss a natural generalization of random hyperplane rounding, and much like in Krivine’s case this will guide the final procedure.

2.2 Generalizing Random Hyperplane Rounding – Hölder Dual Rounding

Fix any convex bodies and . Suppose that we would like a strategy that for given vectors , outputs so that is close to for all . A natural strategy is to take

 (¯¯¯y,¯¯¯x):=argmax(˜y,˜x)∈B1×B2⟨˜y˜xT,yxT⟩=(argmax˜y∈B1⟨˜y,y⟩ , argmax˜x∈B2⟨˜x,x⟩)

In the special case where is the unit ball, there is a closed form for an optimal solution to , given by , where . Note that for , this strategy recovers the random hyperplane rounding procedure. We shall call this procedure, Gaussian Hölder Dual Rounding or Hölder Dual Rounding for short.

Just like earlier, we will first understand the effect of Hölder Dual Rounding on a solution pair . For , let denote -correlated standard Gaussians, i.e.,  where , and let

We will work towards a better understanding of in later sections. For now note that we have for vectors ,

 E[sgn⟨g,u⟩|⟨g,u⟩|b⋅sgn⟨g,v⟩|⟨g,v⟩|a]=∥u∥b2⋅∥v∥a2⋅˜fa,b(⟨ˆu,ˆv⟩).

Thus given matrices , we obtain the following generalization of the hyperplane rounding identity for Hölder Dual Rounding :

 E[Ψq([Ug])Ψp∗([Vg])T]=D(∥ui∥b2)i∈[m]⋅˜fa,b([ˆUˆVT])⋅D(∥vj∥a2)j∈[n]. (2)

2.3 Generalized Krivine Transformation and the Full Rounding Procedure

We are finally ready to state the generalized version of Krivine’s algorithm. At a high level the algorithm simply applies Hölder Dual Rounding to a transformed version of the optimal convex program solution pair . Analogous to Krivine’s algorithm, the transformation is a type of “inverse” of Eq. 2.

1. Let be the optimal solution to , and let and respectively denote the rows of and .

2. Let and let

 φ(U) :=D(∥ui∥1/b2)i∈[m]SL(˜f−1a,b,√ca,b⋅ˆU), ψ(V) :=D(∥vj∥1/a2)j∈[n]SR(˜f−1a,b,√ca,b⋅ˆV).
1. Let be an infinite dimensional i.i.d. Gaussian vector.

2. Return and .

Remark 2.2.

Note that and so the returned solution pair always lie on the unit and spheres respectively.

Remark 2.3.

Like in [AN04] the procedure above can be made algorithmic by observing that there always exist and , whose rows have the exact same lengths and pairwise inner products as those of and above. Moreover they can be computed without explicitly computing and by obtaining the Gram decomposition of

 M := ⎡⎢ ⎢⎣abs(˜f−1a,b)[ca,b⋅ˆVˆVT]˜f−1a,b([ca,b⋅ˆUˆVT])˜f−1a,b([ca,b⋅ˆVˆUT])abs(˜f−1a,b)[ca,b⋅ˆVˆVT]⎤⎥ ⎥⎦,

and normalizing the rows of the decomposition according to the definition of and above. The entries of can be computed in polynomial time with exponentially (in and ) good accuracy by implementing the Taylor series of upto terms (Taylor series inversion can be done upto terms in time ).

Remark 2.4.

Note that the -norm of the -th row (resp. -th row) of (resp. ) is (resp. ).

We commence the analysis by defining some convenient normalized functions and we will also show that above is well-defined.

2.4 Auxiliary Functions

Let  ,   ,  and  . Also note that .

Well Definedness.

By Lemma 4.7, and are well defined for . By (M1) in Corollary 3.19,   and hence and . Combining this with the fact that is continuous and strictly increasing on , implies that is well defined on .

We can now proceed with the analysis.

2.5 1/(h−1p,q(1)⋅γp∗γq) Bound on Approximation Factor

For any vector random variable in a universe , and scalar valued functions and . Let . Now we have

 maxx∈Ωf1(x)−λ⋅f2(x)≥E[f1(X)−λ⋅f2(X)]=0 ⇒ maxx∈Ωf1(x)/f2(x)≥λ=E[f1(X)]/E[f2(X)].

Thus we have

 ∥A∥p→q ≥ E[⟨A,Ψq(φ(U)g) Ψp∗(ψ(V)g)T⟩]E[∥Ψq(φ(U)g)∥q∗⋅∥Ψp∗(ψ(V)g)∥p] = ⟨A,E[Ψq(φ(U)g) Ψp∗(ψ(V)g)T]⟩E[∥Ψq(φ(U)g)∥q∗⋅∥Ψp∗(ψ(V)g)∥p],

which allows us to consider the numerator and denominator separately. We begin by proving the equality that the above algorithm was designed to satisfy:

Proof.
 E[Ψq(φ(U)g) Ψp∗(ψ(V)g)T] =D(∥ui∥2)i∈[m]⋅˜fa,b([SL(˜f−1a,b,√ca,b⋅ˆU)⋅SR(˜f−1a,b,√ca,b⋅ˆV)T])⋅D(∥vj∥2)j∈[n] (by \lx@cref{creftype~refnum}{cvgp:identity} and % \lx@cref{creftype~refnum}{algo:rmk}) =D(∥ui∥2)i∈[m]⋅˜fa,b([˜f−1a,b([ca,b⋅ˆUˆVT])])⋅D(∥vj∥2)j∈[n] (by \lx@cref{creftype~refnum}{simulating:taylor}) =D(∥ui∥2)i∈[m]⋅ca,b⋅ˆUˆVT⋅D(∥vj∥2)j∈[n] =ca,b⋅UVT ■

It remains to upper bound the denominator which we do using a straightforward convexity argument.

Proof.
 E[∥φ(U)g∥bq⋅∥ψ(V)g∥ap∗] ≤ E[∥φ(U)g∥q∗bq]1/q∗⋅E[∥ψ(V)g∥pap∗]1/p (1p+1q∗≤1) =