LDPs for projections of \ell_{p}^{n}-balls

Large deviations for high-dimensional random projections of -balls

Abstract.

The paper provides a description of the large deviation behavior for the Euclidean norm of projections of -balls to high-dimensional random subspaces. More precisely, for each integer , let , be a uniform random -dimensional subspace of and be a random point that is uniformly distributed in the -ball of for some . Then the Euclidean norms of the orthogonal projections are shown to satisfy a large deviation principle as the space dimension tends to infinity. Its speed and rate function are identified, making thereby visible how they depend on and the growth of the sequence of subspace dimensions . As a key tool we prove a probabilistic representation of which allows us to separate the influence of the parameter and the subspace dimension .

Key words and phrases:
Convex bodies, large deviation principles, -balls, random projections, stochastic geometry
2010 Mathematics Subject Classification:
Primary: 60F10, 52A23 Secondary: 60D05, 46B09
\makenomenclature

1. Introduction

The geometry of convex bodies in high dimensions is a fascinating and vivid field at the core of what is known today as Asymptotic Geometric Analysis, a branch of mathematics at the crossroads between analysis, geometry and probability. In particular, it has been realized in the last decades that the presence of high dimensions forces certain regularity on the geometry of convex bodies that in many instances has a probabilistic flavor, compare with the surveys of Guédon [16, 17] and the monograph [8], for example. The arguably most prominent example is the central limit theorem, which is widely known in probability theory to capture the fluctuations of a sum of (independent) random variables (see, e.g., Chapter 5 in [18]). In the geometric context it roughly says that most -dimensional marginals of a high-dimensional isotropic convex body are approximately Gaussian, provided that is of smaller order than for some universal constant , i.e., . The central limit theorem for convex bodies was conjectured in [2] by Anttila, Ball and Perissinaki (for ), who proved the conjecture for the case of uniform distributions on convex sets whose modulus of convexity and diameter satisfy some additional quantitative assumptions. Other contributions to different facets of the central limit problem for (special classes of) convex bodies are due to Bobkov and Koldobsky [7], Brehm, Hinow, Vogt and Voigt [9], E. Meckes [23, 24], E. and M. Meckes [25], E. Milman [27] or Paouris [29], just to mention a few. For general bodies, based on a principle going back to the work of Sudakov [34], and Diaconis and Freedman [12], a central limit theorem was proved by Klartag in [20, 21], who obtained that . If in addition the convex body is -unconditional, that is, symmetric with respect to all coordinate hyperplanes, this has been extended by M. Meckes [26] to -dimensional marginals with . In particular, this class of convex bodies includes the -balls considered in the present text.

On the one hand the central limit theorem underlines the universal behavior of Gaussian fluctuations. On the other hand, it is widely known in probability theory that the so-called large deviation behavior, which considers fluctuations beyond the Gaussian scale, is much more sensitive to the distributions of the involved random variables. For example, Cramér’s theorem (see, e.g., [10, Theorem 2.2.3] or [18, Theorem 27.5]) guarantees that if are independent, identically distributed and centered random variables with cumulant generating function for all , one has that

for all , where is the Legendre-Fenchel transform of . Equivalently, this means that for any and any there exists some natural number so that for each ,

We emphasize that this function usually displays an entirely different behavior for random variables sharing the same properties on the scale of the central limit theorem. While large deviations have been investigated intensively in probability theory (see, for instance, [10, 11] and the references cited therein), they have – in sharp contrast to the central limit theorem – left almost no traces in Asymptotic Geometric Analysis so far. However and as already anticipated above, the study of large deviations of marginals of high-dimensional convex bodies might open new perspectives and give access to non-universal features that allow to make transparent properties that distinguish between different convex bodies. In addition to the potential mentioned before, random projections of random vectors in high dimensions naturally appear in machine learning and information science, for instance, in linear regression [22] when searching for the best regression function and for the purpose of dimension reduction in information retrieval in text documents [6] to reduce the computational complexity.

It was only recently that Gantert, Kim and Ramanan [14, 13], and Kim and Ramanan [19] opened this field by deriving, in particular, a Large Deviation Principle (LDP) in the spirit of Donsker and Varadhan for -dimensional random projections of -balls in , as the space dimension tends to infinity. More precisely, their results show that if for each , is a uniform random direction and is an independent random point uniformly distributed in the -ball of for some fixed , then the sequence of rescaled random variables

satisfies an LDP with speed if and speed if and with a certain rate function that also depends on (all notions and notation are explained in Section 2 below). In view of Klartag’s multi-dimensional version of the central limit theorem for convex bodies (see [20, Theorem 1.3] and [21, Theorem 1.1]) it is also natural to consider projections onto higher-dimensional random subspaces as well. The purpose of the present paper is to put the results from [13] into a wider context and to provide a description of the large deviation behavior for the Euclidean norm of projections of -balls onto random subspaces in high dimensions. At the same time it helps to clarify the rôle of the involved parameters. The essential step to an extension to higher dimensions is our novel probabilistic representation of the Euclidean norm of a random projection (see Theorem 3.1) that allows us to separate the influence of the parameter and the subspace dimension. This representation might be of independent interest.

Let us explain our main results in more detail (again, we refer to Section 2 below for any unexplained notion or notation). We fix and let for each , be an independent random point that is uniformly distributed in the -ball of . Furthermore, we let be an integer and assume that is a random subspace distributed according to the Haar probability measure on the Grassmann manifold of -dimensional subspaces in which is independent of . The sequence of random variables of interest to us are the Euclidean norms of the orthogonal projections of onto , that is,

We set and note that if for all , this reduces to the sequence of random variables studied in [13]. We first consider the case and define for the function

where is the Legendre-Fenchel transform of

with , , being the density of a -generalized Gaussian random variable. To handle the exceptional case simultaneously, we write with being the Legendre-Fenchel transform of . Our first main result reads as follows.

Theorem 1.1.

Let and assume that the limit exists in . Then the sequence satisfies an LDP with speed and rate function

where we understand the cases as the corresponding limits.

We emphasize at this point that while the LDP in Theorem 1.1 shows a universal speed, its rate function depends in a subtle way on the underlying convex body via the parameter .

Next, we shall discuss the special case , which corresponds to the Euclidean unit ball, in some more detail. First of all, in this situation the rate function can be made fully explicit and is given by

(1)

where we understand the cases as the corresponding limits and with or included in the effective domain of . In particular, if takes the value zero the rate function reduces to

and this is exactly the rate function that already appeared in the -dimensional LDP in [4, Theorem 3.4] or [13, Theorem 2.12]. In other words, this means that in the Euclidean case the LDP does not ‘feel’ the random subspaces we project onto as long as their dimension is growing slowly with , that is, if . The difference to the -dimensional projections becomes visible only in the ‘truly’ high-dimensional regime in which is eventually proportional to .

We now turn to the case , which already for the -dimensional projections shows a large deviation behavior at different scales, but this time with a fully explicit rate function (see [13, Theorem 2.3]). Our next results shows that this continues to hold for high-dimensional random projections as well.

Theorem 1.2.

Let and assume that the limit exists in . Then the sequence satisfies an LDP with speed and rate function

where .

We emphasize that for the LDP for random projections of -balls holds at a non-universal and -dependent speed. Moreover, a comparison with [13, Theorem 2.3] shows that both, the speed and the rate function differ from those for the -dimensional random projections. In fact, in this situation (where for all ) the sequence satisfies an LDP with speed and rate function

Note that the rate function stated here slightly differs from the rate function in [13], since we are not dealing with signed distances in our set-up but rather with their absolute values. Note that our Theorem 1.2 leaves open the case where the subspace dimensions are such that , as . We conjecture that in this case the LDP for is the same as for with discussed above.

After having presented our main theorems, let us comment on the tools we are going to use in their proofs. They basically reflect a lively interplay between geometric arguments with techniques and methods from large deviation theory. As already anticipated above, the key to Theorem 1.1 and Theorem 1.2 is a new probabilistic representation of the random variables . Notably, in the special case that for all this is different from the one that has been used in [13]. More precisely, for each we will identify with the product of three independent random variables:

Here,

  • is uniformly distributed on ,

  • is the quotient of the - and the -norm of an -dimensional random vector consisting of independent -generalized Gaussian random entries,

  • is given by with standard Gaussian random variables that are independent.

The essential feature of this representation is that the parameter influences only the random variables , while on the other hand the dimension parameter shows up exclusively in the definition of . This in turn allows us to study the different effects separately and paves the way to the higher-dimensional generalizations of the results in [13]. We emphasize that the representation of as a product is well reflected by the rate function appearing in Theorem 1.1, which possesses the following probabilistic interpretation: while the radial part has no influence as already seen in the -dimensional case, the rate function is the infimum of the sum of two rate functions corresponding to LDPs for and . Moreover, the latter corresponds to the rate function (1) appearing in the particular Euclidean case .

The rest of this paper is structured as follows. In Section 2 we introduce our notation, recall the necessary background material from large deviation theory and provide some preliminaries on the geometry of -balls. The aforementioned probabilistic representation of is the content of Section 3. We prove some auxiliary LDPs in Section 4 and in the final Section 5 we eventually prove Theorem 1.1 and Theorem 1.2. Since we have in mind a broad readership we decided to include background material, tools and arguments from both Asymptotic Geometric Analysis and probability theory.

2. Preliminaries

2.1. Notation

In this paper we denote by the -dimensional Lebesgue measure of a Lebesgue measurable set and we write for the -field of all Lebesgue measurable subsets of . The collection of Borel sets in is denoted by . We supply the -dimensional Euclidean space with its standard inner product and the Euclidean norm . The interior and the closure of a set are denoted by and , respectively.

We write for the Euclidean unit ball and for the corresponding unit sphere in , and for the uniform probability measure on , that is, the normalized spherical Lebesgue measure. As subsets of they carry natural Borel -fields that we denote by and , respectively. Moreover, we recall that

(2)

where is the Gamma-function.

The group of -orthogonal matrices is denoted by and we let be the subgroup of orthogonal matrices with determinant . As subsets of , and can be equipped with the trace -field of . Moreover, both compact groups and carry a unique Haar probability measure which we denote by and , respectively. Since consists of two copies of , the measure can easily be derived from and vice versa.

Given , we use the symbol to denote the Grassmannian of -dimensional linear subspaces of . Denoting by the Hausdorff distance we supply with the metric , , where and stand for the Euclidean unit balls in and , respectively. The Borel -field on induced by this metric is denoted by and we supply the arising measurable space with the unique Haar probability measure . It can be identified with the image measure of the Haar probability measure on under the mapping with . Here, we write for the standard orthonormal basis in and , , for the -dimensional linear subspace spanned by the first vectors of this basis.

2.2. Large Deviation Principles

The purpose of this section is to provide the necessary background material from large deviation theory, which may be found in [10, 11, 18], for example. We directly start with the definition of what we understand by a full and a weak large deviation principle. We refrain from presenting these definitions in the most general possible framework and rather restrict to the set-up needed in this paper. For this reason, let be a fixed integer and assume that the -dimensional Euclidean space is supplied with its standard topology. In this subsection we denote for clarity the space dimension by instead of in order to distinguish it from our index parameter . Finally, we make the assumption that all random objects we are dealing with are defined on a common (and sufficiently rich) probability space .

Definition 2.1.

Let be a sequence of random vectors taking values in . Further, let and be a lower semi-continuous function with compact level sets , . We say that satisfies a (full) large deviation principle with speed and (good) rate function if

(3)

for all . Moreover, we say that satisfies a weak large deviation principle with speed and rate function if the lower bound in (3) holds as stated, while the upper bound is valid only for compact sets .

We notice that on the class of all -continuity sets, that is, on the class of sets for which with , one has the exact limit relation

In our paper we use the convention that the rate function in an LDP for a sequence of random vectors is denoted by .

What separates a weak from a full LDP is the so-called exponential tightness of the sequence of random variables (see, for instance, [10, Lemma 1.2.18] and [18, Lemma 27.9]).

Proposition 2.2.

Let be a sequence of random vectors taking values in . Suppose that satisfies a weak LDP with speed and rate function . Then satisfies a full LDP if and only if is exponentially tight, that is, if and only if

where the infimum is running over all compact sets .

The following proposition (see, for instance, [10, Theorem 4.1.11]) shows that it is sufficient to prove a weak LDP for a sequence of random variables solely for sets in a basis of the underlying topological space.

Proposition 2.3.

Let and be basis of the standard topology in . Let be a sequence of -valued random vectors. For every , define

and for set . Suppose that for all ,

Then satisfies a weak LDP with speed and rate function .

Let be a fixed integer and let be an -valued random vector. We write

for the cumulant generating function of . Moreover, we define the (effective) domain of to be the set .

Definition 2.4.

The Legendre-Fenchel transform of a convex function is defined as

The Legendre-Fenchel transform of the cumulant generating function plays a crucial rôle in the following result, usually referred to as Cramér’s theorem, (see, e.g., [10, Theorem 2.2.30, Theorem 6.1.3, Corollary 6.1.6] or [18, Theorem 27.5]).

Proposition 2.5 (Cramér’s theorem).

Let be independent and identically distributed random vectors taking values in . Assume that the origin is an interior point of , where stands for the cumulant generating function of . Then the partial sums , satisfy an LDP with speed and good rate function .

It will be rather important for us to deduce from an already existing large deviation principle a new one by applying various transformations. We first consider the large deviation behavior under the formation of vectors. For this, assume that and are integers and that is a sequence of -valued random vectors and that is a sequence of -random vectors. Assuming that and satisfy large deviation principles, does then also the sequence of -valued random vectors satisfy a large deviation principle and, if so, what is its rate function? The following result is only implicit in [10]. For the sake of completeness we present a self-contained proof in the appendix, since we were not able to precisely locate it in the existing literature.

Proposition 2.6.

Assume that satisfies an LDP with speed and good rate function and that satisfies an LDP with speed and good rate function . Then, if and are independent for every , satisfies an LDP with speed and rate function , where for all .

Next, assume that a sequence of random variables satisfies an LDP with speed and rate function . Suppose now that is a sequence of random variables that are ‘close’ to the ones from . Our aim is to transfer in such a situation the LDP from to . The conditions under which such an approach is working are the content of the next result, which we took from [10, Theorem 4.2.13] or [18, Lemma 27.13].

Proposition 2.7.

Let and be two sequence of -valued random vectors and assume that satisfies an LDP with speed and rate function . Further, suppose that and are exponentially equivalent, i.e.,

for any . Then satisfies an LDP with the same speed and the same rate function.

Remark 2.8.

If the dimension is fixed, then, since all norms are equivalent, we may consider the -norm instead of the -norm in the definition of exponential equivalence.

Finally, we consider the possibility to ‘transport’ a large deviation principle to another one by means of a continuous function. This device is known as the contraction principle and we refer to [10, Theorem 4.2.1] or [18, Theorem 27.11(i)].

Proposition 2.9 (Contraction principle).

Let and let be a continuous function. Further, let be a sequence of -valued random vectors that satisfies an LDP with speed and rate function . Then the sequence of -valued random vectors satisfies an LDP with the same speed and with rate function , i.e., , , with the convention that if .

While this form of the contraction principle was sufficient to analyse the large deviation behavior for -dimensional random projections of -balls, we will need a refinement to treat the higher-dimensional cases. More precisely, to handle this situation we need to allow the continuous function to depend on . The following result can be found in [10, Corollary 4.2.21].

Proposition 2.10.

Let and let be a continuous function. Suppose that is a sequence of -valued random variables that satisfies an LDP with speed and rate function . Further, suppose that for each , is a measurable function such that for all , and

Then the sequence of -valued random variables satisfies an LDP with the same speed and with rate function .

2.3. Geometry of -balls

Let be an integer and consider the -dimensional Euclidean space . For any the -norm, , of is given by

Although depends on the space dimension , we decided to suppress this dependency in our notation for simplicity, since will always be clear from the context.

For any and let us denote by the -ball in and denote by the corresponding unit sphere. The restriction of the Lebesgue measure to provides a natural volume measure on . Although one could supply with the -dimensional Hausdorff measure, the so-called cone measure turns out to be more useful as explained later (see [28] for the relation between these two measures).

Definition 2.11.

For a set we define

The measure is called the cone (probability) measure of .

We remark that the cone measure coincides with the -dimensional Hausdorff probability measure on if and only if , or . In particular, is the same as , the normalized spherical Lebesgue measure.

The proofs of our results heavily rely on the following probabilistic representations for the volume and the cone probability measure of for , which are taken from [30] and [32] (we also refer to [5] for a different representation).

Proposition 2.12.

Let and . Suppose that are independent -generalized Gaussian random variables whose distribution has density

with respect to the Lebesgue measure on . Consider the random vector and define . Furthermore, let be a uniformly distributed random variable on , which is independent of the ’s, and let us write . Then,

  • the random vector is independent of and is distributed according to ,

  • the random vector is uniformly distributed in .

In the rest of this paper will always denote a sequence of independent real-valued standard Gaussians, will denote an independent random variable uniformly distributed on and, for , will denote a sequence of independent -generalized Gaussian random variables with density . All these random variables are assumed to be independent.

For further probabilistic aspects pertaining the geometry of -balls we refer to [5, 31, 32, 33] as well as the references cited therein.

3. A probabilistic representation for

In this section the dimension of the space will be fixed. Thus, for simplicity in the notation, we will omit the indices that will refer to the dimension . Fix , let be a point chosen according to the uniform distribution on and let be an independent random subspace with distribution for some . In this section we will develop the already announced probabilistic representation for , which will turn out to be crucial in the proofs of Theorems 1.1 and 1.2. The key feature of this representation is that it will allow us to identify with a continuous function of two random variables and . These random variables in turn can be written as functions of sums of independent identically distributed random variables. Besides, only one of them will depend on , while the other one will depend only on the dimension of the random subspace . These properties, together with Cramér’s theorem and the contraction principle will give us the LDPs in the main theorems.

Theorem 3.1.

For any and let be a random vector uniformly distributed in for some and let be a random subspace distributed according to . Then the random variable has the same distribution as the random variable

Proof.

Let be a fixed vector. By construction of the Haar measure on and uniqueness of the Haar measure on , we have that, for any ,

where . Again, by the uniqueness of the Haar measure on , is a random vector uniformly distributed on according to , provided that has distribution . Thus,

Since is a standard Gaussian random vector in , by Proposition 2.12, the random vector is distributed on according to . Thus,

Consequently, if is a random vector uniformly distributed on , is a random subspace independent of having distribution , and is a standard Gaussian random vector in that is independent of and , we have that

Here, denotes the joint distribution of the random vector , while stands for that of . Now, let be a random vector having independent -generalized Gaussian random entries. Then, by Proposition 2.12, the random vector is uniformly distributed in . Therefore,

with being the joint distribution of the random vector . Consequently, we conclude that the two random variables

have the same distribution. ∎

4. Proof of auxiliary LDPs

The purpose of this section is to derive a number of auxiliary LDPs for the factors appearing in the probabilistic representation for in Theorem 3.1. These results can be seen as intermediate steps in the proof of Theorem 1.1. Recall the set-up and the notation introduced above, define for each the random variables

  • ,

and the sequences ,