Norm convergence of continuous-time polynomial multiple ergodic averages
For a jointly measurable probability-preserving action and a tuple of polynomial maps , , the multiple ergodic averages
converge in as for any . This confirms the continuous-time analog of the conjectured norm convergence of discrete polynomial multiple ergodic averages, which in is its original formulation remains open in most cases. A proof of convergence can be given based on the idea of passing up to a sated extension of in order to find a simple partially characteristic factor, similarly to the recent development of this idea for the study of related discrete-time averages, together with a new inductive scheme on tuples of polynomials. The new induction scheme becomes available upon changing the time variable in the above integral by some fractional power, and provides an alternative to Bergelson’s PET induction, which has been the mainstay of positive results in this area in the past.
Given commuting probability-preserving transformations , the study of the associated ‘multiple’ (or ‘diagonal’) ergodic averages
by now has an extensive history. Interest in them originated in Furstenberg’s proof in  of the Multiple Recurrence Theorem, which effectively shows that the integrals of the above averages stay uniformly positive as when for some fixed and for some non-negligible set . That work was followed by a multidimensional generalization in Furstenberg and Katznelson’s paper , and since then the above averages, various generalizations of them and a host of related questions have been investigated by several researchers: see, in particular, the papers [12, 13, 14], , , , , , [20, 22, 21], ,  and .
In a sequence of breakthroughs culminating in Tao’s work, it has now been shown that the above averages always converge in as , augmenting our understanding of their asymptotic nonvanishing due to Furstenberg and Katznelson. However, their generalization to ‘polynomial’ multiple averages, such as
for , remain far less well understood. In case their convergence and also a reasonably good description of their limit have been obtained in [26, 21], and similarly in the case of more general commuting transformations under some assumptions of weak mixing  or of some algebraic independence among the polynomials . A handful of other higher-dimensional cases are now at least partly understood [10, 2, 3]. However, the following broader conjecture remains open in general:
For any probability-preserving action and any polynomial mappings the averages
converge in for any .
A version of this appears as Question 9 in Bergelson , although it had certainly been proposed informally before that, by Furstenberg and others. In addition to the special cases mentioned above, it has been proved for general polynomial mappings under the assumption that the action is totally ergodic by Johnson in . In , Bergelson and Leibman extend this conjecture even further to the setting of actions of a discrete nilpotent group and ‘polynomial mappings’ ; here we will omit the technical preliminaries needed to set up this latter notion.
This conjecture has been verified in all special cases that have been successfully analyzed to date; but the partial results obtained in  indicate that the general conjecture may still lie some way beyond the scope of current approaches to these results. However, in this paper we will see that the ‘continuous-time’ analogs of the above averages are rather simpler to understand, and do indeed enjoy the analogous convergence.
For any jointly measurable action and any tuple of polynomials , , …, the associated multiple ergodic averages
converge in as for any , , …, , where denotes the average for any bounded real interval .
By analogy with Bergelson and Leibman , it should be possible to formulate an extension of the above conclusion that applies to a tuple of polynomial maps taking values in some nilpotent Lie group and an action ; however, this seems to lead to new algebraic complications that we will not try to surmount in this paper.
Theorem 1.2 will be proved by induction on the set of polynomial maps . An important invention from  is Bergelson’s ‘Polynomial Ergodic Theorem’ (‘PET’) induction scheme: a wellordering on such sets of polynomials that he and many others have now used as the basis for analyzing multiple polynomial ergodic averages. However, in this paper we depart from this scheme, using instead a simple but crucial change in the time-variable that allows us to work instead with families of ‘fractional polynomial maps’, among which a different and rather more efficient induction scheme becomes more natural. In a sense, the availability of these fractional polynomial maps in the continuous-time setting indicates a much greater ‘smoothness’ that is enjoyed by the averages of Theorem 1.2 than by their discrete-time analogs, hence leading to simplified behaviour. This flexibility in choosing time-changes was also shown to me by Vitaly Bergelson.
In Section 2 we will introduce and justify these changes of variables, and reformulate Theorem 1.2 accordingly. Then, in Section 3, we introduce an extension of that reformulated result which will be needed in order that our induction close on itself, and then we introduce the partial ordering on families of (fractional) polynomials that we will use and complete the proof. In addition to the time-change and this partial order, the main tool we need is the technology of sated extensions developed in , which in our setting enables us to extend an initially-given -system to one in which the averages of interest exhibit simplified behaviour from which their convergence can be deduced.
Before launching into technical details, we should mention two other recent works concerning multiple averages in continuous-time. The first, that of Potts , studies the averages
for a single flow . She shows that for these averages the powerful machinery developed by Host and Kra (see , and also ) on multiple averages for a single transformation has a direct counterpart (and in fact that the Host-Kra-like factors of the system that underly the resulting analysis coincide with those for the single transformation for a suitable choice of time-step ). As a result Potts is able to obtain convergence and multiple recurrence for the above continuous averages, as well as various finer results describing their limit.
A different approach to the continuous-time setting has also recently been investigated by Bergelson, Leibman and Moreira . Their work is based on a direct reduction to analogous questions for discrete transformations, for which the desired convergence is then assumed as a black-box result, rather than showing how discrete-action machinery can be reconstructed in the continuous setting. This reduction, in turn, is based on a very general result asserting that convergence to a common limit for a family of discrete Césaro averages implies the similar convergence for an enveloping continuous Césaro average (Theorem 0.1 in their preprint). Their method gives results for the above one-parameter linear multiple averages, and also for polynomial multiple averages such as
However, neither of the above approaches is currently sufficient to prove convergence for general polynomial averages in an action of for , because in both cases the authors make heavy appeal to results for the discrete-time setting that are already known: either by showing how structural results from that setting have analogs for continuous-time flows, or by simply reducing the desired convergence assertion to known facts about discrete transformations. By contrast, one of the surprising features of Theorem 1.2 is that the analysis of continuous-time polynomial multiple averages in higher-dimensional actions is actually much simpler than its discrete-time relative, so that we are able to prove the theorem above while the behaviour of the analogous discrete-time averages remains largely mysterious. Our proof uses the machinery of sated extensions, previously employed in [2, 3] and related to earlier arguments from  and , but which has not yet provided enough insight to handle the general case of Conjecture 1.1. (We should also remark that another consequence of using extensions in proofs of convergence is that relatively little can then be deduced about the structure of the limit, whereas Potts and Bergelson, Leibman and Moreira do obtain some such descriptive results as well.) The greater simplicity in the continuous-time setting results from the greater ‘smoothness’ of the continuous averages alluded to above, which will be exploited concretely through the use of the fractional-power time-changes that were mentioned there.
I am grateful to Vitaly Bergelson for several helpful suggestions, and to Bryna Kra for motivating me to return to this project after a period of neglect. This research was supported by a Fellowship from Microsoft Corporation and a Fellowship from the Clay Mathematics Institute.
2 Formulating the right question
It turns out that one of the chief obstacles to proving Theorem 1.2 is finding the right conjecture: the assertion of Theorem 1.2 is actually too weak. Instead we will consider functions (rather than polynomials ) that are vector-valued sums of maps with powers between and . This will be made simpler by also allowing some greater flexibility in the location of the intervals over which we take our ergodic averages.
Definition 2.1 (Tempered sequence; tempered-uniform convergence)
A sequence of bounded open intervals is tempered if and there is some fixed such that for all .
Given now a locally integrable map into some Hilbert space, the averages of over bounded intervals in converge tempered-uniformly if the sequence of averages converges as for any tempered sequence of intervals .
In fact, tempered-uniform convergence is not really a new property of such a map . Indeed, one has
for any real , and if the sequence of intervals is tempered then the coefficients and that appear here remain bounded with difference equal to , so the tempered-uniform convergence of the averages is actually equivalent to the Cesàro convergence of . (This also clearly implies that the limit is the same along any tempered-uniform sequence of intervals.) The usefulness of the above definition is in providing a convenient handle for this convergence along sequences of intervals that may not be pinned to the origin.
Suppose that is a norm-continuous and bounded function taking values in some Hilbert space. Then the tempered-uniform convergence of the averages
is equivalent to that of the averages
for any , and the limits are the same.
Proof Let . By symmetry it suffices to prove the forward implication and show that the limits are the same. This is achieved by making the substitution in the second integral to obtain
We will now show how this can always be written as a weighted integral of averages of the form in such a way that we can apply the tempered-uniform convergence of these latter. Let us write for their tempered uniform limit. Since the case is trivial, this argument falls into two remaining cases.
Case 1: In fact let us assume further that , since this irritating case can be treated by exactly the same method except that the antiderivative of the function cannot be written using the usual formula that is valid for other .
We first compute that
Now suppose that is a tempered sequence of intervals, say with for all . Define by , so that a re-arrangement of the temperedness inequality gives , and now a simple computation gives
which is uniformly positive over the possible values . Therefore for any we can select some (typically much larger than ) such that
for any . Defining , we now have , but on the other hand , so any sequence of intervals of the form or for some selection of is still tempered.
Using the above inequality, another simple calculation gives
Combining this with the change-of-variables made above and using that is a bounded function we can write
where . Now as , any sequence of intervals such that for some for each also satisfies for some . Hence the tempered-uniform convergence of the averages to some limiting vector, say , implies that we must actually have
for all sufficiently large . Inserting this approximation into the above average, we deduce that for all sufficiently large we have
with . Since was arbitrary we conclude that
Case 2: This is similar, except now we start with the computation
Just as before, given a tempered sequence of intervals we can approximate the above expressions arbitrarily well by truncating the second integrals so that they involve only for for some suitable sequence , and then argue that all of the expressions for such converge uniformly fast to , so that the left-hand integrals must do the same. Note that in this case, if the values actually tend to , then this argument also requires the fact that and so the function is locally integrable at : this is needed so that the small range that we initially omit from the second integral on the right-hand side can be chosen so as to give an arbitrarily small contribution overall.
Now let us examine a little the taxonomy of the functions that result from applying such fractional-power time-changes to polynomials.
Definition 2.3 (Fractional polynomial; height; degree; goodness)
If then a map is a fractional polynomial (‘f-polynomial’) of height if it takes the form
for some tuple of vectors , , …, . Note that there may be more than one possible choice of the height for a given map , and in the following we will need to keep track of the height as well. We allow the possibility that , and let the degree of , denoted by , be the largest fraction for which ; we also say that is of top-degree if .
A height- f-polynomial is good if the list of vectors , , …, is linearly independent (including the assertion that they are all non-zero). Note that this property depends on the choice of height: for example, given as above, we may always re-write it as a sum in which for all odd .
Also, we set
which agrees with the linear span of the image , and we set
(so when is not of top-degree).
Definition 2.4 (Good families of f-polynomials)
A family of f-polynomials, say expressed as
is good if each of the individual f-polynomials is good, and moreover all of the non-zero vectors appearing in the above expressions are linearly independent.
We can now give the simple reformulation of Theorem 1.2 in terms of fractional polynomials:
For any action , any good family of f-polynomials , , …, and any tempered sequence of intervals the associated averages
converge in as for any , , …, .
and let , so this is a norm-continuous and bounded map into .
We first reduce the case in which all the are non-zero and linearly independent. Introduce a formal collection of vectors for and that are all non-zero and linearly independent. Let be their linear span, and define a linear map by setting
(so there may be linear dependences among these images, and some of them may be zero). Now define by and also
and observe that . Hence we can write our multiple averages in terms of and the , and so it suffices to prove Theorem 1.2 in case the coefficient vectors are all non-zero and linearly independent.
Assuming this, define
and observe that this is a good family. On the other hand, by Lemma 2.2, the norm convergence of the averages follows from the tempered-uniform convergence of the averages
This completes the proof.
3 The full induction
3.1 Furstenberg self-joinings and partially characteristic factors
Here we reformulate in our present setting some older machinery from the study of multiple ergodic averages.
First, suppose that we have already established convergence for some family of f-polynomials , that is a system and that , , , …, are Borel subsets of . Then our assumption implies that the scalar averages
converge as . Denoting the limiting value by , it is now easy to check that this may be extended by multilinearity and continuity, and that this actually defines a -fold self-joining of the system (which depends also on , although we suppress this in our notation). This construct has its origins in Furstenberg’s original work on multiple recurrence , and is referred to as the Furstenberg self-joining associated to the family . This will prove an important tool for analyzing our averages, much as in the previous works in [12, 31, 4, 2].
Next, a factor is partially characteristic for a given tuple of f-polynomials , , …, if
for any , , …, and any tempered sequence of intervals , where we write to denote that as . This notion is based on a definition that first appears in Furstenberg and Weiss’ paper , and which has gone through a number of incarnations since. Note that it involves operating only on the last function in the list.
As in most previous proofs of convergence for some family of multiple ergodic averages, the heart of our induction will be finding a partially characteristic factor that has some additional structure allowing averages of interest to be re-written into a simpler form. Here we will also make crucial use of a more recent twist on this strategy, in which an initially-given system must first be extended (that is, expressed as a factor of some ‘larger’ system) before the desired factor can be shown to be partially characteristic. This approach has been developed in [4, 1, 2, 3]. Here we will appeal to the very general machinery of sated extensions from  in order to make this initial construction of an enlarged system.
Following , a class of standard Borel probability-preserving -systems is sated to be idempotent if it is closed under isomorphisms, inverse limits and (not necessarily ergodic) joinings. These conditions are enough to guarantee that any such system has a maximal factor whose target system is a member of the class . In these terms, a system is -sated if whenever is an extension, the factors and are relatively independent over the further common factor . More concretely, the -satedness of means that if and we prove that
for some extension , then we can deduce that in fact
Here we will be concerned with idempotent classes of the following kinds. If is a vector subspace then we may associate to it the class of -systems whose -subaction is trivial; and now given several subspaces we let be the class of all joinings of systems drawn from each of the classes . Both of these examples are readily verified to be idempotent (or see Section 3 in ), and if we abbreviate then a simple check also shows that
as factor maps of , in the sense that these maps define the same factor of up to negligible sets, and where the right-hand side denotes the factor map generated by the individual factor maps .
In these terms, we now make the following definition, which closely follows the idea of a ‘fully isotropy-sated’ system from .
Definition 3.1 (Fully rationally sated systems)
We will write that is fully rationally sated (‘FRS’) if it is sated for the idempotent classes
whenever is a finite collection of rational subspaces of , where a subspace is rational if it has a basis consisting of members of .
The usefulness of this definition derives from the following general fact, which is an immediate special case of Theorem 3.11 in :
Theorem 3.2 (Sated extensions exist)
Every -system has an extension that is FRS.
3.3 An ordering on fractional polynomial families
The proof of Theorem 2.5 will require an induction on good families of f-polynomials , and so we make a separate step of introducing the relevant ordering on these families, and introducing two particular kinds of ‘downward movement’ through the collection of such families that will appear during the induction.
Definition 3.3 (The precedence ordering)
Given two non-empty good families of height- fractional polynomials, say and , we say that precedes , written , if
when the and are ordered so that their degrees are non-increasing in , one has
(so the for are not needed here), and
either , or if then strict inequality holds in the second condition above for some .
In addition, we always have if and are families of fractional polynomials of distinct heights and .
It is clear that this defines a partial order on the collection of all families of height- fractional polynomials, and that it satisfies the descending chain condition. Note that the inequality is not by itself enough to guarantee that : it is also necessary that the former family not have too many high-degree members, in the sense of the second point above.
Suppose now that is a good family with . Two special kinds of precedent for will be important in the sequel:
On the one hand, suppose that is minimal such that there is some for which the leading term has degree . In this case the family
precedes . Indeed, we have removed one instance of degree , and for every we have that still has degree at least (the goodness assumption implies that all coefficients of and are distinct, so there can be no cancellation of coefficients here, and by the minimality of ). Also, is still a good family since the collection of its coefficients is
and it is clear that these are all still nonzero and linearly independent.
We refer to a family constructed this way as a precedent of of type I.
On the other hand, if is a top-degree member then the family
has swapped out an entry of degree and replaced it with an entry of degree (where in case we instead take the above to mean that has been omitted altogether). So again it clearly precedes and is still good. We call this a precedent of of type II.
3.4 The main induction
In order to formulate an inductive hypothesis that includes Theorem 2.5 and can be closed on itself, we will actually prove a composite of three different properties of the multiple averages associated to each family . To this end we insert the statement of Theorem 2.5 as the second of the three related conclusions.
Suppose that is an -system and that is a good family of height- f-polynomials with expressions
where each . Then the following hold:
if is FRS and is of top-degree then the factor
is partially characteristic for the averages
associated to ;
the averages converge tempered-uniformly in for any ;
the Furstenberg self-joining exists and is invariant under the off-diagonal flow
for each .
Clearly the assumption that involves no loss of generality in that we may simply change basis to make it true, but it does fix what we mean by ‘FRS’ in the statement of conclusion , and for this it is important.
In the above notation, for we will write for the assertion that holds for all .
Our next task is to establish the base case of Theorem 3.4.