Bivariate Erdős-Kac

# On the Bivariate Erdős-Kac Theorem and Correlations of the Möbius Function

Alexander P. Mangerel Department of Mathematics
University of Toronto
###### Abstract.

Let such that . Let denote the number of distinct prime factors of such that , and let , where is the Möbius function. We prove that if is not too large (in terms of ) then for each fixed ,

 ∑n≤xμy(n)μy(n+a)≪x(1log2y+e−121βlogβ).

This can be seen as a partial result towards the binary Chowla conjecture. Our main input is a quantitative bivariate analogue of the Erdős-Kac theorem regarding the distribution of the pairs , where and both belong to any subset of the positive integers with suitable sieving properties; moreover, we show that the set of squarefree integers is an example of such a set. We end with a further application of this probabilistic result related to a problem of Erdős and Mirsky on the number of integers such that .

## 1. Introduction

### 1.1. On the Binary Chowla Conjecture

Let denote the Möbius function. It is well-known that the Prime Number Theorem is equivalent to the statement that . Thus, exhibits a lot of cancellation, and we in fact expect (according to the density hypothesis) that for each and sufficiently large, has a sign change in any interval of the form . We also expect that the sign changes are random, and do not exhibit conspiratorial tendencies, such as and frequently changing sign simultaneously for a given, fixed . One of the first enunciations of this latter principle is due to Chowla [2].

###### Conjecture 1.1 (Chowla).

Let and let be integers. Then

 (1) ∑n≤xμ(n+a1)⋯μ(n+ak)=o(x).

In the binary case, i.e., , Chowla’s conjecture is the statement that for any fixed ,

 (2) ∑n≤xμ(n)μ(n+a)=o(x).

This is currently not known for any . In fact, even to show that the left side of (2) has absolute value at most , where , was an intractable problem until very recently, when Matomäki and Radziwiłł were able to prove this as a consequence of their work on short averages of multiplicative functions (see Corollary 2 of [16] for the corresponding result with the Liouville function in place of ).
There has been some remarkable recent progress on Conjecture 1.1 itself. We mention the two following notable examples. Tao [21] proved a logarithmically averaged version of (2), that is

 (3) ∑n≤xμ(n)μ(n+a)n=o(logx).

Unfortunately, (3) is a strictly weaker estimate than (2). In a different direction, Matomäki, Radziwiłłand Tao [17] proved that if one averages over all shift vectors , where as then (2) holds for almost every such .
We shall prove the following two partial results in the direction of (2), which provide further motivation for the binary case of Conjecture 1.1 for any fixed shift.
Set , where is the th iterated logarithm and is sufficiently large.

###### Theorem 1.2.

a) Let and let . Suppose that . For let denote the number of primes dividing with , and let . Then for each fixed ,

 (4) ∑n≤xμy(n)μy(n+a)≪x(1log2y+e−121βlogβ).

b) Let . If and , we have

 (5) ∑n≤xμ(n;u)μ(n+a;v)≪x(wlog(1/w)+(logx)−13(u2+v2)).

Note that (4) states that if we only account for those prime factors of integers such that then the corresponding correlation sums of are small. In other words, those sign changes of that are caused by ”small primes” do not appear to correlate with those of . (5), while not directly related to (2), roughly shows that if we replace by for any that is not too small as then the corresponding correlation sums are also small.

###### Remark 1.3.

Results like (5) appear to be completely new, and provide a collection of examples in the direction of a general conjecture on the size of correlations of non-pretentious multiplicative functions, originally due to Elliott (see, for example, Conjecture 1.5 in [17]).
Results like (4) do exist in the literature, but in weaker forms. To the author’s knowledge, the first result of this form is Theorem 5 in [1], where a correlation estimate of the type of (4) is established for a truncated version of the Liouville function, i.e., , where is the number of prime factors of with , counted with multiplicity. However, the parameter is restricted in such a way that .
Subsequently, Daboussi and Sarkőzy [3] established a more general theorem, applicable to the truncation of any 1-bounded multiplicative function, in which one may take any . As a particular case, they show that

 ∑n≤xμy(n)μy(n+1)≪x(1(logy)9+e−β8).

Their result is not explicitly stated for shifts other than 1, though this restriction appears to be merely technical. We emphasize, though, that our probabilistic view of the problem motivates the methods that we employ in this paper, and these are substantially different from those used in [3]. Among other things, in [3] Brun’s sieve is used, while in the present paper we appeal to the Rosser-Iwaniec sieve instead (see Lemma 2.4 below). This difference accounts for the appearance of the additional factor of in the second term in (4) above. In particular, when our estimate is superior to theirs. We believe that this improvement is worthwhile, considering that the main interest in a result like (4) is in its effectiveness when is as close to as possible in support of Conjecture 1.1.
We also note that a ternary extension of the result in [3] was worked out by Ganguli in his thesis [9]. This extension essentially follows the same method of proof as that in [3].

As our results suggest, the main focus of our arguments is on the behaviour of the pairs , where and are squarefree. Indeed, observe that we can express (2) equivalently in the form

 ∑n≤xμ2(n)μ2(n+a)eiuω(n)eivω(n+a)=o(x),

where . A moment’s reflection (assuming one has probabilistic inclinations) suggests that the function of and resembles some variant of a characteristic function for a random vector. That is, suppose is a random vector on a fixed probability space , and let denote its law. We recall that the characteristic function of is the Fourier transform of , i.e.,

 ϕX(t):=E[eiX⋅t]=∫R2eiu⋅tdσX(u).

Let denote the set of integers such that and are both squarefree. Choosing the finite probability space given by with its power set and normalized counting measure, the characteristic function of the random vector is

 ~ϕx(t):=E∗(x;a)−1∑n≤xμ2(n)μ2(n+a)ei(ω(n),ω(n+a))⋅t,

where . In particular, (2) is stating that , as . In the next subsection, we shall discuss an analogue of this estimate that is achievable, by way of a quantitative, bivariate generalization of the Erdős-Kac theorem. This is the crucial input into the proof of Theorem 1.2.

###### Remark 1.4.

In principle, our arguments extend to -ary correlations of and , as well as to certain classes of unimodular multiplicative functions. We postpone such extensions to a separate paper.

###### Remark 1.5.

In a sense, the approach we use as support for Chowla’s conjecture is misguided. In light of an argument of Tao [22], it is known that the estimate

 (6) ∑n≤xμ(n)μ(an+2)≪ϵx(logx)2+ϵ,

for each , would be sufficient to prove the Twin Prime conjecture. The notorious Parity Problem in Sieve Theory prevents the Twin Prime conjecture from being tractable via Sieve methods alone; hence, it would seem that an attempt at proving Chowla’s conjecture must also invoke additional parity barrier-breaking arguments. Our approach, as outlined in Section 2, involves generalizing a probabilistic framework of Kubilius in order to analyze joint distributions, and makes use of a composition of Rosser-Iwaniec sieves (see Lemma 2.4). Thus, our method is heuristically insufficient to provide a full proof of Chowla’s conjecture. In spite of this, we are content to provide here a basic framework for correlation problems upon which further investigations may build.

### 1.2. A Quantitative Bivariate Erdős-Kac Theorem

Given , let

 ~ωx(n):=(ω(n)−log2x)/√log2x,

where is the number of distinct prime factors of . Furthermore, for let

 Hx(z):=x−1|{n≤x:~ωx(n)≤z}|.

is the distribution function for , a centred and normalized version of . It is well-known that if we define a collection of indicator functions indexed by primes such that if and otherwise then for every , and the values of and are asymptotically independent as . In spite of this, still behaves like a sum of genuinely independent random variables in the sense that it obeys a Central Limit Theorem. Indeed, this conclusion is furnished by the Erdős-Kac theorem, which states that as , almost everywhere, where

 Φ(z):=1√2π∫z−∞e−12t2dt

is the distribution function of a Gaussian random variable. The rate of convergence of to the normal distribution has also been studied in this connection, and a best-possible estimate for the distance between and was conjectured by LeVeque [15] and proven by Rényi and Turán [18] to be . This result echoes the best-possible rate in the Lindeberg Central Limit theorem, given by the Berry-Esséen Theorem (see Chapter XVI of [7]).
One expects that since and share only finitely many common prime factors, the values of and are also asymptotically independent as . It is therefore reasonable to guess that a bivariate analogue of the Erdős-Kac theorem should hold, in which the distribution function is replaced by the two-dimensional analogue

 H′x(z,z′):=x−1∣∣{n≤x:~ωx(n)≤z,~ωx(n+a)≤z′}∣∣,

and the Gaussian limit distribution is replaced by the uncorrelated bivariate Gaussian distribution

 Φ(2)(z,z′):=Φ(z)Φ(z′).

This was in fact proven by LeVeque [15], though his result does not yield an effective rate of decay for .

###### Remark 1.6.

Actually, more general theorems than that of LeVeque regarding the existence and characterization of joint distributions of additive functions and their shifts appear in the literature; for a synthesis of these results, see Chapter VII of [14]. However, the methods employed to prove these results, e.g., the Crámer-Wald trick, are typically not useful for producing quantitative results. In a related though distinct vein, see Chapter V in [14] for quantitative results related to the distribution of sums , where each is a real-valued, strongly multiplicative function.

In the sequel, we will concern ourselves with questions surrounding a restricted analogue of LeVeque’s theorem. Specifically, let be fixed, let and be as above and for let

 Fx,a(z,z′):=E∗(x;a)−1∣∣{n≤x:μ2(n)=μ2(n+a)=1,~ωx(n)≤z,~ωx(n+a)≤z′}∣∣.

We shall prove the following.

###### Theorem 1.7.

Let . Then as ,

 ∥Fx,a−Φ(2)∥L∞(R2)≪(log3x)(log2x)−14.

We will actually prove a rather more general result, Theorem 2.3, which gives an distance estimate as in Theorem 1.7 for a distribution function associated to pairs , where and belong to any subset of with suitable sieve properties. Theorem 1.7 is a corollary of this (in this connection, see Proposition 2.2).

###### Remark 1.8.

As in the univariate case, we expect that the best error term here is (to see that this is best possible, see Lemma 5.2). As mentioned, Rényi and Turán [18] obtained this error term in the classical Erdős-Kac theorem using analytic methods from multiplicative number theory. Indeed, their proof depends crucially on the fact that the characteristic function of the distribution they consider is

 u↦x−1∑n≤xeiuω(n),

the mean value of the multiplicative function for . This can be estimated quite precisely for each by the Selberg-Delange method (see Chapter II.5 of [23]). Such techniques are not at our disposal, however, and our methods are not sufficiently powerful to yield the best-possible estimate.

###### Remark 1.9.

As an application of the more general Theorem 2.3 below, one can deduce an asymptotic formula for the number of integers (without the squarefree restriction) such that and , where each is sufficiently close to , and is fixed. We note in this connection that Goudout (see Théorème 3 in [10]) has recently proved an upper bound of the correct order of magnitude for the number of such integers that is uniform over all , given any fixed .

It will be convenient for us to use a slight modification of the function defined in the previous subsection. For let

 ϕx,a(u,v):=E∗(x;a)−1∑n≤xμ2(n)μ2(n+a)ei(u~ωx(n)+v~ωx(n+a)),

In terms of , (2) is equivalent to the statement that , where . Note that for this choice of and , , so it would suffice to know that to prove (2). By a well-known theorem of Lévy, the convergence in distribution is equivalent to the pointwise convergence of the corresponding characteristic functions of these distributions, i.e., as for each . The assertion that this continues to be true when are chosen to grow as a function of is a more subtle and difficult one. While we do not demonstrate that the convergence is uniform in general, we develop a method to prove an effective bound, in terms of and , for the distance between and in a range of and depending on (that unfortunately does not include ). We shall use this bound in conjunction with a bivariate analogue of a smoothing lemma of Esséen (see Lemma 3.1) to prove Theorem 1.7.

### 1.3. On a Problem of Erdős and Mirsky

For a positive-valued multiplicative function let

 Sf(x):=|{n≤x:f(n)=f(n+1)}|.

It is generally a difficult problem to even determine whether tends to infinity with . Much of the literature on problems of this type relate to the case where , the divisor function, and this shall be our focus as well. We thus henceforth write to mean .
Erdős and Mirsky [5] famously conjectured that , i.e., that there are infinitely many integers such that . Based on ideas of C. Spiro, Heath-Brown proved this conjecture in the affirmative, giving the lower bound . Erdős, Pomerance and Sarkőzy have made the following conjecture regarding the order of magnitude of .

###### Conjecture 1.10 ([6]).

We have .

The latter three authors proved that the upper bound in Conjecture 1.10 holds, and cite an heuristic argument due to Bateman and Spiro as further motivation for this conjecture. In the opposite direction, Hildebrand [13] has shown that . See the beginning of Section 6 for our version of the Bateman-Spiro heuristic.
Implementing techniques related to those used to prove Theorem 1.2 and Theorem 2.3, we can prove a partial result in this direction. Let .

###### Theorem 1.11.

Let such that if then . Let be the number of divisors such that if then .

 |{n≤x:τy(n)=τy(n+1)}|≫x√log2x.

In Section 6 we actually prove a more general result on the number of such that in a range of depending on that includes . See Theorem 6.1 for a precise statement.
Our methods also extend to the more general question of estimating from below the number of integers with for , and with more effort, to questions such as determining how often , for and .

### 1.4. Notation and Conventions

Throughout this paper, will always stand for positive integers, and will always denote primes. We write to mean , with , and is always assumed to be sufficiently large so that these quantities are positive. We will frequently write to mean and . We will also denote by the shift map defined on .
We shall employ the usual conventions of Probability Theory. We denote by a fixed probability measure on a measurable space on which a given random vector is defined. If is Borel measurable and then we will write in place of . We write to denote the measure , which we call the law of . Obviously, the law of any random vector is a probability measure on equipped with its Borel -algebra. We let denote the expectation of , i.e.,

 EX:=∫ΩXdP=∫RntdL(X)(t).

Given two probability measures and on the measurable space we define the total variation distance between them by

 dTV(ν1,ν2):=supA∈B|ν1(A)−ν2(A)|.

## 2. A Bivariate Kubilius Model and Multivariate Poisson Approximation

In this section we give a general framework to study the distribution of the vector , where and are confined to subsets of with suitable properties. To be precise, we introduce the following definition.

###### Definition 2.1.

Let . Let . We say that is siftable with respect to if: a) we have

 E(x;a):=∑n≤x(n,a)=11S(n)1S(n+a)≫ax,

where is the indicator function for , and: b) there exist a non-negative multiplicative function and a real number such that:
i) for each prime ,
ii) for each pair of coprime squarefree integers with we have

 (7) ∑n≤xn≡0(k1),n+a≡0(k2)1S(n)1S(n+a)1(n,a)=1=f(k1k2)(E(x;a)+R(x;k1,k2)),

and the remainders satisfy

 ∑k1k2≤xθ|R(x;k1,k2)|≪x(logx)3.

Finally, we will say that a siftable set is regular if for all sufficiently large.

Without loss of generality, we can, and will, assume that whenever .
Trivially, is siftable and regular. Less trivially:

###### Proposition 2.2.

The set of squarefree integers is siftable and regular with respect to each fixed .

We shall prove this fact in Section 4, as it is crucial in our application to both Theorems 1.2 and 2.3. By a similar argument, the set of -free integers for each is siftable. We leave the verification of this statement to the reader.
Given a siftable set and a fixed , let denote the number of coprime to such that both of , and let denote the number of such without the coprimality condition. In this context, put and

 ~ωx,S(n):=ω(n)−λ(x)√λ(x),

and set

 Fx,S,a(z,z′) :=E(x;a)−1∣∣{n≤x:n,n+a∈S,(n,a)=1,~ωx,S(n)≤z,~ωx,S(n+a)≤z′}∣∣, F∗x,S,a(z,z′) :=E∗(x;a)−1∣∣{n≤x:n,n+a∈S,~ωx,S(n)≤z,~ωx,S(n+a)≤z′}∣∣.

For convenience, we will write and in place of and , respectively, whenever and are clearly defined. We shall establish an estimate for the rate of convergence of the limiting process as . In the statement below we denote by the shift map , for , and we write to denote the composition of with itself times.

###### Theorem 2.3.

Let be fixed and let be a siftable, regular set with respect to . Then

 ∥Fx,S,a−Φ(2)∥L∞(R2)≪(log3x)(log2x)−14.

Moreover, if is multiplicative and is also siftable for each divisor of then

 ∥F∗x,S,a−Φ(2)∥L∞(R2)≪(log3x)(log2x)−14.

In particular, the limiting distribution of the vectors in the set is the uncorrelated bivariate Gaussian distribution.

Let be a siftable set and let be fixed. Let where , and put . Our goal is to approximate the vector on by a discrete random vector. In this direction, it is sufficient to construct a measure against which there is a collection of independent random vectors that are good approximations for the deterministic events , for distinct primes dividing .
Given coprime , define the set

 Ek1,k2:={n≤x:(n,a)=1,n,n+a∈S,k1|n,k2|(n+a),(n,P/k1)=(n+a,P/k2)=1}.

We define a set function

 νy(Ek1,k2)=E(x;a)−1|Ek1,k2|.

Note that the sets are mutually disjoint, and their union is precisely the set of all coprime to such that . We use a sieve to estimate each individual .

###### Lemma 2.4.

Let and . Put , for . Then if are coprime,

 νy(Ek1,k2) +O⎛⎝∑m1m2≤k1k2D1D2f(k1)f(k2)|R(x;m1,m2)|⎞⎠.
###### Proof.

For , let and be upper and lower bound sieves, i.e., sequences of real numbers that satisfy

 1∗λ−j(d)≤1∗μ(d)≤1∗λ+j(d)

for each with . We take and to be the upper and lower bound sieve weights of Rosser-Iwaniec, respectively (see Chapter 11 of [8]).
We can compose these sieve weights to produce upper and lower bound sieves by defining the two-dimensional weights

 φ+(m,n) :=(1∗λ+1)(m)(1∗λ+2)(n) φ−(m,n) :=(1∗λ−1)(m)(1∗λ+2)(n)+(1∗λ+1)(m)(1∗λ−2)(n)−(1∗λ+1)(m)(1∗λ+2)(n).

That is an upper bound sieve weight is immediate; that is a lower bound sieve weight follows from the identity

 φ−(m,n)=(1∗λ−1)(m)(1∗λ−2)(n)−((1∗λ+1)(m)−(1∗λ−1)(m))((1∗λ+2)(n)−(1∗λ−2)(n)).

Therefore, by Definition 2.1 with ,

 (8) |Ek1,k2| ≤X∑d1|P/k1∑d2|P/k2(k1d1,k2d2)=1λ+1(d1)λ+2(d2)f(k1k2d1d2)+∑d1|P/k1∑d2|P/k2(k1d1,k2d2)=1f(k1k2)|R(x;k1d1,k2d2)| (9) |Ek1,k2| ≥X∑d1|P/k1∑d2|P/k2(k1d1,k2d2)=1(λ+1(d1)λ−2(d2)+λ−1(d1)λ+2(d2)−λ+1(d1)λ+2(d2))f(k1k2d1d2) (10) −∑d1|P/k1∑d2|P/k2(k1d1,k2d2)=1f(k1k2)|R(x;k1d1,k2d2)|.

Since is multiplicative, for any of the sign pairs ,

 ∑d1|P/k1∑d2|P/k2(k1d1,k2d2)=1λη11(d1)λη22(d2)f(k1k2d1d2)=f(k1)f(k2)∑d1|P/k1(d1,k2)=1λη11(d1)f(d1)∑d2|P/k2(d2,k1d1)=1λη22(d2)f(d2).

Put . Then by Theorem 11.12 of [8], we have

 ∑d1|P/k1(d1,k2)=1λη11(d1)f(d1)∑d2|P/k2(d2,k1d1)=1λη22(d2)f(d2)=∑d1|P/k1λη11(d1)hk2(d1)∑d2|Pλη22(d2)hk1k2d1(d2) =(1+O(e−s2logs2+log−16D2))∑d1|P/k1λη11(d1)hk2(d1)∏p|P/(k1k2d1)(1−f(p)) =(1+O(e−(1+o(1))s2logs2+log−16D2))∏p|P/k1k2(1−f(p))∑d1|P/k1λη11(d1)hk2,k1(d1)∏p|d1(1−f(p))−1 =(1+O(maxj=1,2{e−(1+o(1))sjlogsj+log−16Dj}))∏p|P/k1k2(1−f(p))(1−f(p)(1−f(p))−1) =(1+O(maxj=1,2{e−(1+o(1))sjlogsj+log−16Dj}))∏p|P/k1k2(1−2f(p)).

This last estimate together with (8) and (10) gives

 |Ek1,k2| =(1+O(maxj=1,2{e−(1+o(1))sjlogsj+log−16Dj}))Xf(k1)f(k2)∏p|P/k1k2(1−2f(p)) +O⎛⎝∑d1≤D1,d2≤D2f(k1k2)|R(x;k1d1,k2d2)|⎞⎠.

Moreover, we can trivially bound the remainder by

 ∑d1≤D1,d2≤D2|R(x;k1d1,k2d2)|≤∑m1m2≤D1D2k1k2|R(x;m1,m2)|.

This completes the proof. ∎

###### Corollary 2.5.

Fix with , where is associated to . Let and . Then

 νy(Ek1,k2)=(1+O(e−θ8βlogβ+log−16x))f(k1)f(k2)∏p|P/k1k2(1−2f(p)).
###### Proof.

Choose in Lemma 2.4 so that . Note that . Thus, since is siftable, the sum of the remainder terms in Lemma 2.4 can be estimated as

 ∑m1m2≤k1k2D1D2|R(x;m1,m2)|≤∑m1m2≤xθ|R(x;m1,m2)|≪xlog3x.

Note moreover that