Primes and polynomials with restricted digits

# Primes and polynomials with restricted digits

James Maynard Magdalen College, Oxford, England, OX1 4AU
###### Abstract.

Let be a sufficiently large integer, and . We show there are infinitely many prime numbers which do not have the digit in their base expansion. Similar results are obtained for values of a polynomial (satisfying the necessary local conditions) and if multiple digits are excluded.

## 1. Introduction

Let and let

 A={∑i≥0niqi:ni∈{0,…,q−1}∖{a0}}

be the set of numbers which have no digit equal to when written in base . For fixed , the number of elements of which are less than is , where . In particular, is a sparse subset of the natural numbers. A set being sparse in this way presents several analytic difficulties if one tries to answer arithmetic questions such as whether the set contains infinitely many primes. Typically we can only show that sparse sets contain infinitely many primes when the set in question possesses some additional multiplicative structure.

The set has unusually nice structure in that its Fourier transform has a convenient explicit analytic description, and is often unusually small in size. There has been much previous work [1, 2, 4, 5, 6, 10, 13] studying and related sets by exploiting this Fourier structure. In particular the work of Dartyge and Mauduit [7, 8] shows the existence of infinitely many integers in with at most prime factors, this result relying on the fact that is well distributed in arithmetic progressions [7, 12, 14]. We also mention the related work of Mauduit-Rivat [15] who showed the sum of digits of primes was well-distributed, and the work of Bourgain [3] which showed the existence of primes in the sparse set created by prescribing a positive proportion of the digits.

We show that there are infinitely many primes in , and any polynomial satisfying suitable local conditions takes infinitely many values in provided the base is sufficiently large (i.e. provided is not too sparse). Our proof is based on the circle method, and in particular makes key use of the Fourier structure of , in the same spirit as the aforementioned works. Somewhat surprisingly, the Fourier structure is sufficient to deduce the existence of primes in using only existing exponential sum estimates for the primes, and without having to investigate further bilinear sums.

###### Theorem 1.1.

Let , and be the set of numbers with no digit in base equal to . Then for any constant we have

 ∑n

where

 κq(a0)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩qq−1if (a0,q)≠1,q(ϕ(q)−1)(q−1)ϕ(q),if (a0,q)=1.

Thus there are infinitely many primes with no digit when written in base . There is nothing special about the fact we sum up to a power of ; one could sum up to instead of and have instead of in the statement.

We have made no particular effort to optimize the lower bound on ; it is likely that it could be improved significantly. In particular, a more involved calculation shows that is sufficient by the same method, whilst it appears that the method of bilinear sums, Harman’s sieve and zero density estimates all have the potential to show the existence of primes missing digits when the base is noticeably smaller. One might conjecture that the result would remain true for all .

As presented here the bound is ineffective due to the reliance on estimates for primes in arithmetic progressions. However, since these estimates are only used when the modulus is highly composite, in fact Siegel zeros do not play a role, and so the error terms could be replaced by effective ones of size if desired.

An analysis of our method reveals that in fact one can choose digits , and we obtain the same statement for primes with uniformly over all such choices of .

Our results hold for sufficiently large not only because we require to be not too sparse, but also because we separately get superior control on the Fourier transform of as . A similar feature was present in the earlier work [11].

###### Theorem 1.2.

Let , and be a polynomial of degree with lead coefficient . Then for any we have

 ∑P(n)

where

 S(P)=limJ→∞#{(n,m):0≤n,m

Given a polynomial it is a straightforward computation to determine whether , in which case it takes infinitely many values in , or whether in which case it takes finitely many values in . (This is because is a disjoint union of open balls and a finite set of points in the -adic topology.) In particular, by Hensel lifting we see that Theorem 1.2 shows that there are infinitely many powers in , provided that .

Again we have made no particular effort to optimize the lower bound on . It is clear that the statement must require to grow with , since the main term is only larger than if is large enough in terms of . Presumably this bound would be improved if one used stronger bounds of Vinogradov type for the Weyl sums which appear rather than bounds based on Weyl-differencing for large , and by less crude numerical bounds. We note that although the implied constant in the error term in the statement of the Theorem depends on the coefficients of , the lower bound on depends only on the degree.

###### Theorem 1.3.

Let , and let be sufficiently large in terms of . Let be distinct and let be the set of -digit numbers in base with no digit in the set . Then we have

 ∑n

where .

Moreover, if are consecutive integers then the same result holds provided only that and is sufficiently large in terms of .

In the case of consecutive with we see that Theorem 1.3 shows the existence of primes in a set containing elements less than . The exponent is ultimately related to the exponent of Lemma 4.2 for an exponential sum over primes, and represents a limit of our basic method. As with Theorem 1.1, one would hope that utilizing Type I-II sums and Harman’s sieve would extend this to sets of smaller density.

The conclusion of Theorem 1.3 holds in the case and , so one can choose . Thus there are infinitely many prime numbers with no string of 15 consecutive base 10 digits being the same. (Again, we expect 15 to be able to be reduced with slightly more effort.)

An analogous statement for the set for polynomial values also holds, but in the more restrictive region for arbitrary or for consecutive .

## 2. Notation

We use as the complex exponential and to denote the distance to the largest integer. We will use various expressions of the form , which are interpreted to take the value if . We use to abbreviate . Any implied constants in asymptotic notation or are allowed to depend on the base and when dealing with polynomials as in Theorem 1.2, the polynomial , but on no other quantity unless explicitly indicated by a subscript. Outside of Section 4 all quantities should be thought of as . In particular, will implicitly be assumed to be larger than any fixed constant.

## 3. Outline

We give an informal sketch the overall outline of the proof, which is essentially an application of the Hardy-Littlewood circle method. We let be the Fourier transform (over ) of the set restricted to . Thus for we have

 ^Fqk(θ)=∑n≤qk1A(n)e(nθ)=k−1∏i=0(∑0≤ni≤q−11A(ni)e(niqiθ)).

Here we have written . It is this factorization of and the fact that the sum over is almost a geometric series which allows us very good Fourier control over . By Fourier inversion on

 1A(n)=1qk∑0≤a

Thus

 ∑n≤qkΛ(n)1A(n)=1qk∑0≤a

where

 SΛ,qk(θ)=∑n≤qkΛ(n)e(nθ).

We split the contribution up depending on whether is close to a rational with small denominator or not. This distinguishes between those when is large or not. It turns out that is large if is ‘close’ to a number with few non-zero base -digits, but these are somewhat rare and ‘spread out’ except when close to a rational with denominator being a small power of , and so it turns out decomposition is adequate for describing as well as .

If is the set of such that for some integers of and some with of size , we use a - bound to show their contribution is at most

 supa∈D∣∣SΛ,qk(aqk)∣∣∑a∈D1qk∣∣^Fqk(aqk)∣∣.

One can save a small power of over the trivial bound on for . By using a large-sieve type argument (and the analytic description of ) we show equidistribution for a truncated version of of

 ∑a/qk=ℓ/d+β∣∣^FJ(aqk)∣∣≈J∫10|^FJ(θ)|dθ,

where . We then use the explicit analytic description of to obtain a final bound which is unusually strong. In particular, we important make use of the averaging over different . This bound loses only a small power of over the size of the largest individual terms in the sum. Crucially this power decreases to 0 as , whilst the power saving in was independent of , and so we have an overall saving of a small power of if is sufficiently large. This saving shows that these ‘minor arc’ contributions when is large are negligible.

Thus only those which are very close to a rational (i.e. is small) make a noticeable contribution. In this case the problem simply reduces to estimating primes and elements of separately in short intervals and arithmetic progressions. For primes this is well known, whilst for the set this follows from a suitable bound on .

After writing this paper, the author discovered that very similar ideas appeared earlier in the literature, notably in [15, 12, 14, 3]. For simplicity we give an essentially self-contained proof, but emphasize to the reader that many Lemmas appearing are not new. It appears possible that (at least in the case when the base is large) that an argument similar to the one here might simplify or extend other arguments in the study of digit related functions.

Much of the previous work relied on estimating correlations of primes with digit-related functions relied on exploiting a certain property of the Fourier transform described in [16] as the ‘carry property’, which often allowed one to simplify bilinear expressions so the Fourier transform only relied on the lower-order digits. This feature is not present in our work.

## 4. Exponential sums for primes and polynomials

We first collect some results for exponential sums for primes and polynomials. The bounds here are well-known, but we give a essentially complete proofs since they differ slightly from some standard references.

###### Lemma 4.1.

Let with coprime integers and satisfying . Then we have

 N∑n=1min(M,∥αn∥−1)≪(N+NMd|β|+1d|β|+d)logN.
###### Proof.

If then we let for non-negative integers with and . If then

 ∥αn∥=∥n0a/d+βn∥≥∥n0a/d∥−∥βn∥≥∥n0a/d∥/2

since . We let be such that . Thus the terms with contribute a total

 ≪∑n1

The terms with contribute

 ≪∑1≤n1

Here we have used the fact that since and we sum over we must have .

We now consider the case . We let , with , and . Thus we obtain

 N∑n=1min(M,∥αn∥−1)≪∑cn1≤1/d2βn2≪Nβ/d∑0≤n0

where we have put for convenience. The inner sum is of the form for points which are separated. Therefore the sum over is

 ≪dlogd+sup0≤m0

since for all . The term contributes to the total sum, which is acceptable. Thus we are left to bound

 ∑m2≪Mdβsupθ∈R∑m1≤1/dβmin(N,d∥θ+(m1+O(1))d2β∥).

The inner sum is of the form ( copies of) for points which are -separated . Therefore the inner sum is , and this gives a bound

 ≪Nd|β|(M+1d|β|)logN≪(MNd|β|+N)logN.

Putting these bounds together gives

 N∑n=1min(M,∥αn∥−1)≪(N+NMd|β|+1d|β|+d)logN.\qed
###### Lemma 4.2.

Let with and . Then

 SΛ,x(α)=∑n
###### Proof.

From [9, (6), Page 142], taking we have that for any choice of with

 ∑n

The sum over is clearly and the sum over is similarly . Putting and into dyadic intervals and applying Lemma 4.1 to the resulting sums (or the trival bound when ) gives a bound

 ≪(UV+xd|β|+1d|β|+d+xU1/2+xV1/2+x|dβ|1/2+x1/2|dβ|1/2+x1/2d1/2)(logx)4.

Choosing and simplifying the terms then gives the result. ∎

###### Lemma 4.3.

Let be an integer polynomial of degree with lead coefficient . Let be such that with and . Then for any constant we have

 SP,x(α)=∑P(n)
###### Proof.

If is an interval contained in then

 ∣∣∑n∈Ie(αP(n))∣∣2=∑|h|

where is an interval contained in , and is a polynomial of degree with lead coefficient . Applying this and Cauchy’s inequality times gives

 ∣∣∑n∈Ie(αP(n))∣∣2r−1 ≤(2x)2r−1−r∑|h1|,…,|hr−1|

where we have put . We split the sum depending on whether or not, for some quantity which we choose later. This shows that the inner sum is of size

 ≪∑HBxτr−1(H)2B ≪B(xr−1+xrd|β|+1d|β|+d)logx+xr(logx)(r−1)2B

by applying Lemma 4.1. Writing this bound as and choosing then gives the result, noting that . ∎

## 5. Fourier analysis

We now establish in turn several properties of the function , which are the key ingredient in our result.

###### Lemma 5.1 (L1 bound).

There exists a constant such that

 supθ∈R∑0≤a
###### Proof.

We expand out the definition of , and let be the base- expansion of .

 ^Fqk(t)

The sum over is a sum over all values in , and so is bounded by

 (5.1)

For , we write with and . We see that for some . In particular, if . Thus we see that

 supθ∈R∑0≤a

Here we used a small computation to verify for all integers , whilst for (where is Euler’s constant), we have . (This bound is only relevant to the final lower bound on ; for a qualitative statement a bound suffices.) ∎

###### Lemma 5.2 (Large sieve estimate).

We have

 supθ∈R∑d∼D∑0<ℓ

Here is the constant described in Lemma 5.1.

###### Proof.

We have that

 ^Fqk(t)=^Fqk(u)+∫ut^F′qk(v)dv.

Thus integrating over we have

 |^Fqk(t)|≪1δ∫t+δt−δ|^Fqk(u)|du+∫t+δt−δ|^F′qk(u)|du.

We note that the fractions with , and are separated from one another by . Thus

 ∑d∼D∑0<ℓ

We note that, writing we have

 ^F′qk(t) =2πi∑n≤qkn1A(n)e(nt) =2πik−1∑j=0qj(∑0≤nj

Thus, as in Lemma 5.1, we have

and we have the same bound for but without the factor. We let for some and . We see that, as in Lemma 5.1 we have

 ∫10k−1∏i=0min(q,1+12∥qit∥)dt ≪1qk∑t1,…,tk

Putting this all together then gives the result. ∎

###### Lemma 5.3 (Hybrid estimate).

Let . Then we have

 ∑d∼D∑ℓ

where is the constant described in Lemma 5.1 and

 αq=log(Cqqq−1logq)logq.
###### Proof.

The result follows immediately from Lemma 5.1 if , so we may assume . For any integer we have

 ^Fqk(α) =k−k1−1∏i=0(∑ni

Using this and the trivial bound , for we have that

Substituting this bound gives

 ∑d∼D ∑ℓ

We choose minimally such that , and extend the inner sum to . Applying Lemma 5.1 to the inner sum, and then Lemma 5.2 to the sum over gives

 ∑d∼D∑ℓ

We choose . We see that

 (Cqqlogqq−1)k1+k2 ≪(D2B)αq, D2qk1(Cqlogqq−1)k1+k2 ≪D2B(q−1)k(Cqlogq)k+(D2B)αq.

Combining these bounds gives the result. ∎

###### Lemma 5.4 (L∞ bound).

Let be of the form with and , and let . Then for any integer coprime with we have

 ∣∣^Fqk(ℓd+ϵ)∣∣≪(q−1)kexp(−cqklogd)

for some constant depending only on .

###### Proof.

We have that

 |e(nθ)+e((n+1)θ)|2=2+2cos(2πθ)<4exp(−2∥θ∥2).

This implies that

 ∣∣∑ni