Singularity of random symmetric matrices – a combinatorial approach to improved bounds

# Singularity of random symmetric matrices – a combinatorial approach to improved bounds

Asaf Ferber Massachusetts Institute of Technology. Department of Mathematics. Email: ferbera@mit.edu. Research is partially supported by NSF 6935855.    Vishesh Jain Massachusetts Institute of Technology. Department of Mathematics. Email: visheshj@mit.edu. Research is partially supported by NSF CCF 1665252, NSF DMS-1737944 and ONR N00014-17-1-2598.
###### Abstract

Let denote a random symmetric matrix whose upper diagonal entries are independent and identically distributed Bernoulli random variables (which take values and with probability each). It is widely conjectured that is singular with probability at most . On the other hand, the best known upper bound on the singularity probability of , due to Vershynin (2011), is , for some unspecified small constant . This improves on a polynomial singularity bound due to Costello, Tao, and Vu (2005), and a bound of Nguyen (2011) showing that the singularity probability decays faster than any polynomial. In this paper, improving on all previous results, we show that the probability of singularity of is at most for all sufficiently large . The proof utilizes and extends a novel combinatorial approach to discrete random matrix theory, which has been recently introduced by the authors together with Luh and Samotij.

## 1 Introduction

The invertibility problem for Bernoulli matrices is one of the most outstanding problems in discrete random matrix theory. Letting denote a random matrix, whose entries are independent and identically distributed (i.i.d.) Bernoulli random variables which take values with probability each, this problem asks for the value of , which is the probability that is singular. By considering the event that two rows or two columns of are equal (up to a sign), it is clear that

 cn≥(1+o(1))n221−n.

It has been widely conjectured that this bound is, in fact, tight. On the other hand, perhaps surprisingly, it is non-trivial even to show that tends to as goes to infinity; this was accomplished in a classical work of Komlós in 1967 [6] which showed that

 cn=O(n−1/2)

using the classical Erdős-Littlewood-Offord anti-concentration inequality. Subsequently, a breakthrough result due to Kahn, Komlós, and Szemerédi in 1995 [5] showed that

 cn=O(0.999n).

Improving upon an intermediate result by Tao and Vu [13], the current ‘world record’ is

 cn≤(2+o(1))−n/2,

due to Bourgain, Vu, and Wood [1].

Another widely studied model of random matrices is that of random symmetric matrices; apart from being important for applications, it is also very interesting from a technical perspective as it is one of the simplest models with nontrivial correlations between its entries. Formally, let denote a random symmetric matrix, whose upper-diagonal entries are i.i.d. Bernoulli random variables which take values with probability each, and let denote the probability that is singular. Despite its similarity to , much less is known about .

The problem of whether tends to as goes to infinity was first posed by Weiss in the early 1990s and only settled in 2005 by Costello, Tao, and Vu [2], who showed that

 qn=O(n−1/8+o(1)).

In order to do this, they introduced and studied a quadratic variant of the Erdős-Littlewood-Offord inequality. Subsequently, Nguyen [7] developed a quadratic variant of inverse Littlewood-Offord theory to show that

 qn=OC(n−C)

for any , where the implicit constant in depends only on . This so-called quadratic inverse Littlewood-Offord theorem in [7] builds on previous work of Nguyen and Vu [8], which is itself based on deep Freiman-type theorems in additive combinatorics (see [14] and the references therein). The current best known upper bound on is due to Vershynin [15], who used a sophisticated and technical geometric framework pioneered by Rudelson and Vershynin [11, 12] to show that

 qn=O(2−nc)

for some unspecified small constant .

As far as lower bounds on are concerned, once again, by considering the event that the first and last rows of are equal (up to a sign), we see that . It is commonly believed that this lower bound is tight.

###### Conjecture 1.1 ([2, 16]).

We have

 qn=(2+o(1))−n.

In this paper, we obtain a much stronger upper bound on , thereby making progress towards creftype 1.1.

###### Theorem 1.2.

There exists a natural number such that for all ,

 qn≤2−n1/4√logn/1000.

Apart from providing a stronger conclusion, our proof of the above theorem is considerably shorter than previous works, and introduces and extends several novel combinatorial tools and ideas in discrete random matrix theory (some of which are based on joint work of the authors with Luh and Samotij [3]). We believe that these ideas allow for a unified approach to the singularity problem for many different discrete random matrix models, which have previously been handled in an ad-hoc manner. For completeness and for the convenience of the reader, we have included full proofs of all the simple background lemmas that we use from other papers, making this paper completely self contained.

### 1.1 Outline of the proof and comparison with previous work

In this subsection, we provide a very brief, and rather imprecise, outline of our proof, and compare it to previous works of Nguyen [7] and Vershynin [15]; for further comparison with the work of Costello, Tao, and Vu, see [7].

Let be the first row of , let denote the bottom-right submatrix of , and for , let denote the cofactor of obtained by removing its row and column. Then, Laplace’s formula for the determinant gives

 det(Mn)=x1det(Mn−1)−n∑i,j=2cijxixj,

so that our goal is to bound the probability (over the randomness of and ) that this polynomial is zero. By a standard reduction due to [2] (see Corollaries 2.4, 2.3 and 2.1), we may further assume that has rank either or . In this outline, we will only discuss the case when has rank ; the other case is easier, and is handled exactly as in [7] (see Sections 2.2 and 2.5).

A decoupling argument due to [2] (see Lemma 2.10) further reduces the problem (albeit in a manner incurring a loss) to bounding from above the probability that

 ∑i∈U1∑j∈U2cij(xi−x′i)(xj−x′j)=0,

where is an arbitrary non-trivial partition of , and are independent copies of (see Corollary 2.11). For the remainder of this discussion, the reader should think of as ‘small’(more precisely, ). We remark that a similar decoupling based reduction is used in [15] as well, whereas [7] also uses a similar decoupling inequality in proving the so-called quadratic inverse Littlewood-Offord theorem. The advantage of decoupling is that for any given realization of the variables and , the problem reduces to bounding from above the probability that the linear sum

 ∑i∈U1Ri(xi−x′i)=0,

where . Problems of this form are precisely the subject of standard (linear) Littlewood-Offord theory.

Broadly speaking, Littlewood-Offord theory applied to our problem says that the less ‘additive structure’ the vector possesses, the smaller the probability of the above sum being zero. Quantifying this in the form of ‘Littlewood-Offord type theorems’ has been the subject of considerable research over the years; we refer the reader to [9, 12] for general surveys on the Littlewood-Offord problem with a view towards random matrix theory. Hence, our goal is to show that with very high probability, the vector is additively ‘very unstructured’. This is the content of our structural theorem (Theorem 3.2), which is at the heart of our proof.

The statement (and usefulness) of our structural theorem is based on the following simple, yet powerful, observations.

• The -dimensional vector is zero if and only if for all , which happens with probability exponentially small in ; the if and only if statement holds since the matrix is proportional to the matrix , which is assumed to be invertible.

• The vector is orthogonal to at least rows of (Lemma 2.12). This follows since for every , the dimensional vector is orthogonal to all but the row of , again since the matrix is proportional to the matrix .

• The probability of the linear sum being zero is ‘not much more’ than the probability of the linear sum being zero (Lemma 2.9).

Taken together, these observations show that it suffices to prove a structural theorem of the following form: every non-zero integer vector which is orthogonal to ‘most’ rows of is ‘very unstructured’. In [7], a structural theorem along similar lines is also proven. However, it suffers from two drawbacks. First, the notion of ‘very unstructured’ in the conclusion there is much weaker, leading to the bound for any constant , as opposed to our bound from Theorem 1.2. Second, such a conclusion is not obtained for every non-zero integer vector, but only for those non-zero integer vectors for which ‘most’ coefficients satisfy the additional additive constraint of being contained in a ‘small’ generalized arithmetic progression (GAP) of ‘low complexity’. Consequently, the simple observations mentioned above no longer suffice, and the rest of the proof in [7] is necessarily more complicated.

The structural theorem in [15] is perhaps closer in spirit to ours, although there are many key differences, of which we mention here the most important one. Roughly speaking, both [15] and the present work prove the respective structural theorems by taking the union bound, over the choice of a non-zero (integer) vector which is not ‘very unstructured’, that the matrix-vector product of with this vector is contained in a small prescribed set. A priori, this union bound is over an infinite collection of vectors. In order to overcome this obstacle, [11, 15] adopts a geometric approach of grouping vectors on the unit sphere into a finite number of clusters based on Euclidean distances; using the union bound and a non-trivial estimate of the number of clusters to show that with very high probability, the matrix-vector product of with a representative of each cluster is ‘far’ from the small prescribed set; and then, using estimates on the operator norm of to deduce a similar result for all other vectors in each cluster. Naturally, this geometric approach is very involved, and leads to additional losses at various steps (which is why [15] obtains a worse bound on than Theorem 1.2); however, it is worth mentioning that [15] also provides bounds not just for the probability of singularity of , but also for the probability that the ‘least singular value’ of is ‘very small’.

In contrast, we overcome this obstacle with a completely novel and purely combinatorial approach of clustering vectors based on the residues of their coordinates modulo a large prime, and using a combinatorial notion due to Halász [4] to quantify the amount of additive structure in a vector (Proposition 3.3). In particular, with our approach, the analogue of the problem of ‘bounding the covering number of sub-level sets of regularized LCD’ – which constitutes a significant portion of [15] (see Section 7.1 there), is one of the key contributions of that work, and is also a major contributor to the sub-optimality of the final result – can be solved more efficiently and with a short double-counting argument (see Theorem 3.10, which is based on joint work of the authors with Luh and Samotij in [3], and Corollary 3.11).

The rest of this paper is organized as follows. In Section 2, we discuss in detail the overall proof strategy leading to the reduction to the structural theorem; in Section 3, we state and prove our structural theorem; and in Section 4, we put everything together to quickly complete our proof. Appendix A reproduces the proof of the ‘counting lemma’ from [3], and Appendix B contains a proof of Halász’s inequality over , which follows the outline of the original proof of Halász [4].

Notation: Throughout the paper, we will omit floors and ceilings when they make no essential difference. For convenience, we will also say ‘let be a prime’, to mean that is a prime between and ; again, this makes no difference to our arguments. As is standard, we will use to denote the discrete interval . All logarithms are natural unless noted otherwise.

## 2 Proof strategy: reduction to the structural theorem

In this section, we discuss the strategy underlying our proof of Theorem 1.2. The key conclusions are Eq. 2 Section 2.2, and Eq. 14, which show that it suffices to prove the structural theorem in Section 3 in order to prove Theorem 1.2.

### 2.1 Preliminary reductions

For any and , let denote the event that has rank exactly , and let denote the event that has rank at most . Thus, our goal is to bound the probability of . The next lemma, which is due to Nguyen [7], shows that it suffices to bound the probability of .

###### Lemma 2.1 (Lemma 2.1 in [7]).

For any ,

 Pr[Rkℓ(n)]≤0.1×Pr[Rk2n−ℓ−2(2n−ℓ−1)].

The proof of this lemma uses the following simple observation due to Odlyzko [10]:

###### Observation 2.2.

Let be any subspace of of dimension at most . Then, .

###### Proof of Lemma 2.1.

It suffices to show that for any ,

 Pr[Rkℓ+2(n+1)∣Rkℓ(n)]≥1−2−n+ℓ. (1)

Indeed, iterating this equation shows that

 Pr[Rk2n−ℓ−2(2n−ℓ−1)∣Rkℓ(n)] ≥n−ℓ−1∏j=1Pr[Rkℓ+2j(n+j)∣Rkℓ+2j−2(n+j−1)] ≥n−ℓ−1∏j=1(1−2−n+ℓ+j)≥0.1,

which gives the desired conclusion.

In order to prove Eq. 1, consider the coupling of and where is the top left sub-matrix of . Suppose has rank , and let be the (-dimensional) subspace spanned by its rows. By creftype 2.2, . Therefore, the probability that the vector formed by the first coordinates of the last row of lies in is at most . If this vector does not lie in , then the symmetry of the matrix also shows that the last column of does not lie in the span of the first columns of , so that the rank of exceeds the rank of by .

The following lemma, also due to Nguyen, allows us to reduce to the case where the rank of the symmetric matrix obtained by removing the first row and the first column of is at least .

###### Lemma 2.3 (Lemma 2.3 in [7]).

Assume that has rank . Then, there exists such that the removal of the row and the column of results in a symmetric matrix of rank at least .

###### Proof.

Without loss of generality, we can assume that the last rows of are independent. Therefore, the matrix , which is obtained by removing the first row and first column of has rank at least . ∎

As a simple corollary of the above lemma, we obtain the following:

###### Corollary 2.4.

For , let denote the event that has rank , and the symmetric matrix obtained by removing the row and the column of has rank at least . Then,

 Pr[Rkn−1(n)]≤nPr[Rk1n−1(n)].
###### Proof.

Suppose that has rank . By Lemma 2.3, there exists an for which the matrix obtained by deleting the row and column has rank at least . Moreover, by symmetry,

 Pr[Rkin−1(n)]=Pr[Rk1n−1(n)] for all i∈[n].

Therefore, by the union bound,

 Pr[Rkn−1(n)]=Pr[∪ni=1Rkin−1(n)]≤n∑i=1Pr[Rkin−1(n)]=nPr[Rk1n−1(n)].

Let denote the symmetric matrix obtained by deleting the first row and first column of . Let denote the ‘degenerate’ event that has rank , and let denote the ‘non-degenerate’ event that has full rank . By definition,

 Rk1n−1(n)=(Rk1n−1(n)∩D(n−1))⊔(Rk1n−1(n)∩ND(n−1)),

and hence,

 Pr[Rk1n−1(n)]=Pr[Rk1n−1(n)∩D(n−1)]+Pr[Rk1n−1(n)∩ND(n−1)]. (2)

It is thus enough to bound each of the above two summands.

### 2.2 Bounding Pr[Rk1n−1(n)∩D(n−1)]

Let denote the first row of . It follows from Laplace’s formula for the determinant that

 det(Mn)=x1det(M1n−1)−∑2≤i,j≤ncijxixj, (3)

where denotes the cofactor of obtained by removing its row and column. In order to deal with , we use the following observation due to Nguyen (see Section 9 in [7]).

###### Lemma 2.5.

For every , there exists some and some such that

 M1n−1a=0, (4)

and

 det(Mn)=λ(∑2≤i≤naixi)2. (5)
###### Proof.

Let denote the adjugate matrix of ; note that this is an integer-valued symmetric matrix since is an integer-valued symmetric matrix. Since is of rank , its kernel is of rank . Moreover, the equation

shows that every column of is in the kernel of as by assumption. It follows that the matrix is an integer-valued symmetric matrix of rank , which cannot be zero since is of rank . Hence, there exists some and a vector such that

In particular, every column of is equal to a multiple of the vector . By considering any column which is a non-zero multiple of , Eq. 6 along with gives Eq. 4. Moreover, by writing the entries of the adjugate matrix in terms of the cofactors, we see that Eq. 7 is equivalent to the following: for all :

 cij=λaiaj.

Substituting this in Eq. 3 and using gives Eq. 5. ∎

Before explaining how to use Lemma 2.5, we need the following definition.

###### Definition 2.6 (Atom probability).

Let be an arbitrary ring (with a unit element). For a vector , we define its -atom probability by

 ρRμ(a):=supc∈RPrxμ1,…,xnμ[a1xμ1+⋯+anxμn=c],

where the ’s are i.i.d. random variables taking on the value with probability and the values , each with probability .

###### Remark 2.7.

We will often refer to the -atom probability simply as the atom probability, and denote it by instead of . Similarly, we will denote simply as .

Although we will not need them in this subsection, we will later make use of the following two simple lemmas about the atom probability. The first lemma shows that the -atom probability of a vector is bounded above by the -atom probability of any of its restrictions.

###### Lemma 2.8.

Let , and let denote the restriction of to . Then,

 ρRμ(a)≤ρRμ(a|U1).
###### Proof.

Let . Then,

 ρRμ(a) =Prxμ⎡⎣∑i∈[n]aixμi=c∗⎤⎦=Prxμ⎡⎢⎣∑i∈[U1]aixμi=c∗−∑i∈[¯¯¯¯¯U1]aixμi⎤⎥⎦ =E(xμi)i∈¯¯¯¯¯U1⎡⎢⎣Pr(xμi)i∈[U1]⎡⎢⎣∑i∈[U1]aixμi=c∗−∑i∈[¯¯¯¯¯U1]aixμi⎤⎥⎦⎤⎥⎦ ≤E(xμi)i∈¯¯¯¯¯U1[ρRμ(a|U1)]=ρRμ(a|U1),

where the third equality follows from the law of total probability, and the fourth inequality follows from the definition of . ∎

The second lemma complements Lemma 2.8, and shows that the -atom probability cannot increase too much if, instead of the original vector, we work with its restriction to a sufficiently large subset of coordinates.

###### Lemma 2.9.

Let , and let denote the restriction of to . Then,

 ρRμ(a|U1)≤max{μ,1−μ2}−|U2|ρRμ(a).
###### Proof.

Let where the ’s are as in Definition 2.6, and let . Then,

 Prxμ⎡⎣∑i∈[n]aixμi=c0⎤⎦ ≥ Pr(xμi)i∈U1⎡⎣∑i∈U1aixμi=c0⎤⎦∏j∈U2Prxμj[xμj=0] ≥ ρRμ(a|U1)μ|U2|,

and

 Prxμ⎡⎣∑i∈[n]aixμi=c1⎤⎦ ≥ Pr(xμi)i∈U1⎡⎣∑i∈U1aixμi=c0⎤⎦∏j∈U2Prxμj[xμj=1] ≥ ρRμ(a|U1)(1−μ2)|U2|.

Taking the maximum of the two expressions gives

 ρRμ(a)≥max{μ,1−μ2}|U2|ρRμ(a|U1),

and by rearranging we obtain the desired conclusion. ∎

Returning to the goal of this subsection, for , let denote the event – depending only on – that every non-zero integer null vector of has atom probability (in ) at most . Then, we have

 PrMn[Rk1n−1(n)∩D(n−1)] ≤PrMn[Rk1n−1(n)∩D(n−1)∩Nullρ(n−1)]+PrM1n−1[¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Nullρ(n−1)] ≤∑An−1∈Nullρ(n−1) Prx[(∑2≤i≤nai(An−1)xi=0)]PrM1n−1[M1n−1=An−1]+PrM1n−1[¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Nullρ(n−1)] ≤ρ+PrM1n−1[¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Nullρ(n−1)], (8)

where the second line follows from Eq. 5; the third line is trivial; and the last line follows from the definition of . Theorem 3.2 shows that ’typically’, every non-zero integer null vector of has ‘small’ atom probability, and will be used to bound the right hand side of Section 2.2.

### 2.3 Bounding Pr[Rk1n−1(n)∩ND(n−1)]

Once again, we start with Eq. 3. However, for , is invertible, and we no longer have the factorization of the determinant in Lemma 2.5 available to us. In this case, in order to reduce to a problem involving the anti-concentration of a linear form, we will follow an idea by Costello, Tao and Vu [2]. The basic tool is the following decoupling inequality from [2].

###### Lemma 2.10 (Lemma 4.7 in [2]).

Let and be independent random variables, and be an event depending on and . Then,

 Pr[E(Y,Z)]4≤Pr[E(Y,Z)∩E(Y′,Z)∩E(Y,Z′)∩E(Y′,Z′)],

where and denote independent copies of and , respectively.

###### Proof.

For simplicity, and since this is the case of interest to us, we may assume that take only finitely many values; for the general case, see [2]. Suppose that takes the values and takes the values . Note that one can write

 Pr[E(Y,Z)]=n∑i=1Pr[E(yi,Z)]Pr[Y=yi],

and

 Pr[E(Y,Z)∩E(Y,Z′)]=n∑i=1Pr[E(yi,Z)]2Pr[Y=yi],

since and are i.i.d. Therefore, by Jensen’s inequality, we obtain

 Pr[E(Y,Z)]2≤n∑i=1Pr[E(yi,Z)]2Pr[Y=yi]=Pr[E(Y,Z)∩E(Y,Z′)]. (9)

We also have

 Pr[E(Y,Z)∩E(Y,Z′)]=m∑i=1m∑j=1Pr[E(Y,zi)∩E(Y,zj)]Pr[Z=zi]Pr[Z=zj],

and

 Pr[E(Y,Z)∩E(Y,Z′)∩E(Y′,Z)∩E(Y′,Z′)]=n∑i=1n∑j=1Pr[E(Y,zi)∩E(Y,zj)]2Pr[Z=zi]Pr[Z=zj].

Once again, by Jensen’s inequality, we obtain

 Pr[E(Y,Z)∩E(Y,Z′)]2≤Pr[E(Y,Z)∩E(Y,Z′)∩E(Y′,Z)∩E(Y′,Z′)]. (10)

By combining Eq. 9 and Eq. 10, we obtain the desired conclusion. ∎

Next, we explain how to use the above decoupling lemma for our purpose. For this discussion, recall Eq. 3. Fix a non-trivial partition . Let and . Let denote the event that

 Qα,c(Y,Z):=α−∑2≤i,j≤ncijxixj=0,

where and are fixed. Then, the previous lemma shows that

 Pr[Eα,c(Y,Z)]4≤Pr[Eα,c% (Y,Z)∩Eα,c(Y′,Z)∩Eα,c(Y,Z′)∩Eα,c(Y′,Z′)].

On the other hand, whenever the event on the right holds, we also have

 Qα,c(Y,Z)−Qα,c(Y′,Z)−Qα,c(Y,Z′)+Qα,c(Y,Z)=0.

Direct computation shows that the left hand side equals

 Rc :=∑i∈U1∑j∈U2cij(xi−x′i)(x′j−xj)=∑i∈U1Ri(xi−x′i),

where denotes an independent copy of , and denotes the random sum . To summarize, we have deduced the following.

###### Corollary 2.11.

Let be an arbitrary non-trivial partition of . Let be the random vector with coordinates . Then, with notation as above, and for any symmetric matrix , we have

 PrMn[Rk1n−1(n)∣∣M1n−1=An−1]≤Prx,x′⎡⎣∑i∈U1Riwi=0∣∣M1n−1=An−1⎤⎦1/4.

Using this corollary, we thus see that

 PrMn[Rk1n−1(n)∩ND(n−1)]4 ≤∑An−1∈ND(n−1)PrMn[Rk1n−1(n)|M1n−1=An−1]4Pr[M1n−1=An−1] ≤∑An−1∈ND(n−1)Prx,x′⎡⎣∑i∈U1Riwi=0|M1n−1=An−1⎤⎦Pr[M1n−1=An−1] =Prx,x′,M1n−1⎡⎣⎛⎝∑i∈U1Riwi=0⎞⎠∩ND(n−1)⎤⎦, (11)

where the second line follows from Jensen’s inequality. Hence, we have reduced the problem of bounding to a linear anti-concentration problem.

In order to use Section 2.3 profitably, we will rely on the following simple, but crucial, observation about the vector , where is defined as above.

###### Lemma 2.12.

R is orthogonal to at least rows of .

###### Proof.

Observe that is a linear combination of the columns of corresponding to the indices in . By Eq. 6, each of these columns is orthogonal to each of the rows with indices in ; therefore, the same is true for . Since , we are done. ∎

For , let denote the event – depending only on – that every integer non-zero vector which is orthogonal to at least rows of has -atom probability (in ) at most , uniformly for all . Let be a partition of where . Then, with the vector defined as above, we have

 Prx,x′,M1n−1⎡⎣⎛⎝∑i∈U1Riwi=0⎞⎠∩ND(n−1)⎤⎦ ≤Prx,x′,M1n−1⎡⎣⎛⎝∑i∈U1Riwi=0⎞⎠∩Orthδ,γn(n−1)∩ND(n−1)⎤⎦ +PrM1n−1[¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Orthδ,γn(n−1)] ≤∑An−1∈Orthδ,γn(n−1)∩ND(n−1) Prw⎡⎣∑i∈U1Ri(An−1)wi=0⎤⎦PrM1n−1[M1n−1=An−1] +PrM1n−1[¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Orthδ,γn(n−1)]. (12)

As in Section 2.2, we will provide an upper bound on which is uniform in the choice of . We start by observing that

 Prw⎡⎣∑i∈U1Ri(An−1)wi=0⎤⎦ ≤Prw⎡⎣⎛⎝∑i∈U1Ri(An−1)wi=0⎞⎠∩(R(An−1)≠0)⎤⎦+Prw[R(An−1)=0] =Prw⎡⎣⎛⎝∑i∈U1Ri(An−1)wi=0⎞⎠∩(R(An−1)≠0)⎤⎦+2−|U2| ≤Prw⎡⎣⎛⎝∑i∈U1Ri(An−1)wi=0⎞⎠∩(R(An−1)≠0)⎤⎦+2−γn+1. (13)

To see why the second equality holds, observe as before that