Nonlinear Markov semigroups and refinement schemes

# Nonlinear Markov semigroups and refinement schemes on metric spaces.

## Abstract.

This article settles the convergence question for multivariate barycentric subdivision schemes with nonnegative masks on complete metric spaces of nonpositive Alexandrov curvature, also known as Hadamard spaces. We establish a link between these types of refinement algorithms and the theory of Markov chains by characterizing barycentric subdivision schemes as nonlinear Markov semigroups. Exploiting this connection, we subsequently prove that any such scheme converges on arbitrary Hadamard spaces if and only if it converges for real valued input data. Moreover, we generalize a characterization of convergence from the linear theory, and consider approximation qualities of barycentric subdivision schemes. A concluding section addresses the relationship between the convergence properties of a scheme and its so-called characteristic Markov chain.

###### Key words and phrases:
Hadamard space; conditional expectation; Markov chain; barycentric subdivision scheme
###### 2010 Mathematics Subject Classification:
53C23, 60J20, 65D17
The author is supported by the Austrian science fund, grants W1230 and P19870.

## Introduction and main results

The convergence and smoothness analysis of refinement schemes processing data from manifolds and more generally metric spaces has been a subject of intense research over the last few years. As to convergence, complete spaces of nonpositive curvature have proven most accessible in terms of generalizing well-known facts from the linear theory to the nonlinear setting. An example of such a structure prominent in applications is the space of positive definite symmetric matrices, which represent measurements in diffusion tensor imaging.

While the question whether the smoothness properties of the linear model scheme prevail when passing to the nonlinear setting was successfully addressed in [6], the corresponding convergence problem remained unsolved. The aim of this article is to fill this gap in the theory.

Relying on a martingale theory for discrete-time stochastic processes with values in negatively curved spaces developed in [10], we observe that the refinement processes in question actually act on bounded input data as nonlinear Markov semigroups. This fact will substantially facilitate their convergence analysis.

Let us specify the general setup. Given a metric space , a refinement scheme is a map . We call convergent if for all there exists a continuous function such that

 d∞(S∞x(⋅/2n),Snx)=supjd(S∞x(j/2n),Snxj)→0 as n→∞. (1)

Visualizing as a function on the refined grid , convergence to is tantamount to .

Throughout the present paper we are mostly concerned with so-called barycentric refinement schemes associated to nonnegative real-valued -variate sequences of finite support, henceforth referred to as masks, which we require to fulfill the basic sum rule

 ∑j∈Zsai−2j=1for i∈Zs. (2)

Barycentric refinement schemes act on data from a complete metric space of nonpositive curvature in the sense of A. D. Alexandrov according to the following rule:

 Sxi=argmin⎛⎝∑j∈Zsai−2jd2(xj,⋅)⎞⎠. (3)

Much is known about the convergence of these type of refinement algorithms in the case . On complete manifolds of nonpositive sectional curvature convergence analysis was initiated in the article [12]. The author in [5] recently proved general convergence statements for arbitrary Hadamard spaces using the principle of contractivity: A scheme is called contractive with respect to some nonnegative function if and only if there is such that

 D(Sx)<γD(x),for allx∈ℓ∞(Zs,X).

The function is referred to as a contractivity function for . An important class of contractivity functions is associated to balanced, convex and bounded subsets of :

 DΩ(x)=supρ(i−j)<2d(xi,xj), (4)

where denotes the Minkowski functional of . Contractivity functions of this type are called admissible, cf. [5]. The following result is taken from loc. cit.:

###### Proposition 1.

A barycentric refinement scheme with nonnegative mask which is contractive with respect to some admissible contractivity function also converges. This implies convergence in case the support of the mask coincides with the set of lattice points within a centered unimodular zonotope or a lattice quad with nonempty interior.

The main result of the present article is a substantial extension of this statement and describes a phenomenon which could be referred to as linear equivalence:

###### Theorem 1.

A barycentric refinement scheme converges on arbitrary Hadamard spaces if and only if it converges on the real line.

The proof of this fact, given in Section 2, relies on a stochastic interpretation of the subdivision rule (3). More precisely, for each nonnegative mask satisfying the basic sum rule (2) one finds a so-called characteristic Markov chain with state space and transition matrix in terms of which the iterates of the refinement algorithm acting on may be written as

 Snxi=E(x∘Xan|||Xa0=i),

see Theorem 4. Here denotes the filtered conditional expectation introduced by K.-T. Sturm in [10]. Thus, as in the linear case,

 {N0→Lip1(ℓ∞(Zs,X));n↦Sn

may be considered a (nonlinear) Markov semigroup. Here refers to the set of maps satisfying the Lipschitz condition

 d∞(Tx,Ty)≤d∞(x,y)for x,y∈ℓ∞(Zs,X). (5)

Combining Theorem 1 with other recent developments in the theory of linear subdivision schemes with nonnegative masks and their barycentric counterparts on nonlinear objects, one comes up with a variety of remarkable results:

In the articles [13] and [14], X. Zhou establishes general theorems on the relation of the mask’s support with its convergence properties, which, utilizing Theorem 1 now generalize to the following:

###### Theorem 2.

A barycentric subdivision scheme with nonnegative mask converges under each of the following circumstances:

(i) The support of coincides with the set of grid points inside a balanced zonotope.

(ii) The grid dimension and, if, after a possible index translation, , the integers within the support are relatively prime and . This also constitutes a necessary condition for convergence.

Moreover, as far as finite-dimensional Hadamard manifolds are concerned, the smoothness question is settled by a combination of Theorem 1 and recent work from [6]:

###### Theorem 3.

On a smooth Hadamard manifold, a barycentric subdivision scheme with nonnegative mask converges and produces -times differentiable limit functions if and only if the same is true for the corresponding linear scheme.

## 1. Stochastic preliminaries

This section is devoted to a stochastic interpretation of the subdivision rule (3). More precisely, we view barycentric subdivision as the semigroup acting on associated to the so-called characteristic Markov chain of . This result requires some prerequisites about the notion of conditional expectation of random variables with values in Hadamard spaces. This type of metric space was first investigated by A.D. Alexandrov in his seminal articles [1] and [2]. In loc. cit., Alexandrov uses the Gauss-Bonnet theorem, i.e. the fact that the deficiency of the angle sum in a geodesic triangle can be expressed by means of the ambient space’s curvature, to generalize the notion of curvature bounds to a purely metric setting. A comprehensive introduction to the nowadays well-established theory of such spaces is [3]. Furthermore we refer to the survey article [11] on probability measures and their centers of mass on Hadamard spaces. A treatise of the smooth case can be found in [7], which investigates infinite-dimensional Riemannian manifolds of nonpositive curvature.

A complete metric space is called Hadamard- or global NPC-space if and only if for one finds such that the so-called Hadamard inequality holds true for all :

 d(z,y)2≤12d(z,x0)2+12d(z,x1)2−14d(x0,x1)2. (6)

In other words, a Hadamard space is a complete metric space with nonpositive curvature in the sense of A. D. Alexandrov. The Hadamard inequality (6) expresses the fact that geodesic triangles are ’slim’ compared to Euclidean ones with the same edge lengths.

Recall that a continuous curve , , is called geodesic if and only if for all . A well-known property of Hadamard spaces is that they are strongly geodesic: given any two points , one finds a unique geodesic joining them. In particular, there is meaningful notion of convexity. More precisely, a subset of a global NPC space is called convex if and only if for each two , the joining geodesic remains in . Nonpositive curvature turns out to have a major impact on the convexity properties of the metric . As announced above, we are particularly interested in the concept of conditional expectation for random variables with values in Hadamard spaces. Although several approaches appear in the literature, a definition due to K.-T. Sturm, see [10], serves our purposes best. Sturm’s definition relies on a convex projection property enjoyed by NPC spaces:

###### Proposition 2.

Suppose is a global NPC space, and a closed and convex subset. Then there exists a well-defined projection map defined by the relation

 d(πC(x),x)=miny∈Cd(y,x).

This projection is Lipschitz-continuous in a sense that for .

Given a probability space and a metric space , a strongly measurable function is called square-integrable if and only if

 ∫Ωd2(f(ω),x)P(dω)<∞

for one (and then all) . The space of square-integrable functions factorized by the relation of being equal almost surely is endowed with a metric

 d2(f,g)=(∫Ωd2(f(ω),g(ω))P(dω))12.

In case is a global NPC space, it is well-known that inherits the Hadamard property. Moreover, given a subalgebra , it is easy to show that , the subspace of corresponding to -measurable square-integrable maps, is closed and convex. In view of Proposition 2 one thus obtains the following Definition:

###### Definition 2 ([10]).

Suppose is a probability space, and let be a square-integrable random variable with separable image in the Hadamard space . Given a subalgebra , there is a -measurable minimizing the functional

 Z′↦E(d2(Y,Z′))

among all -measurable square-integrable random variables. Any other minimizer coincides with almost surely. One refers to as the conditional expectation of given and uses the notation .

Otherwise put, following the notation of Proposition 2, . This definition follows the principle that in the real-valued case, the conditional expectation as introduced by Kolmogorov restricts to as the orthogonal projection to the space of -measurable -functions.

###### Remark 3.

The space is, as in the case , defined to be the space of strongly measurable functions such that modulo equality almost surely. Again comes with a metric of the form

 dp(f,g)=(∫Ωdp(f(ω),g(ω))P(dω))1p.

The conditional expectation as defined above is continuous in a sense that for each two and ,

 dp(E(Y|G),E(Z|G))≤dp(Y,Z).

In particular, extends continuously to .

###### Remark 4.

It is inherent in the existence statement provided by Definition 2 that each integrable random variable with values in possesses an expected value defined by

 E(Y):=E(Y∣{∅,Ω}),

which is the unique minimizer of the functional

 z↦∫Ωd2(Y(ω),z)P(dω).

Recall the elementary smoothing property of conditional expectations for random variables with values in a linear space: Given an ordered pair of subalgebras , one has

 E(E(⋅|˜G)|G)=E(⋅|G)

almost surely. Not surprisingly, this associativity property is violated in the nonlinear case, see Example 3.2 in [10]. Actually a generalization of this feature would render the theory of nonlinear subdivision schemes of type (3) obsolete, as the discussion following Theorem 5 below illustrates.

It is for the reason of lacking associativity that K.-T. Sturm in his treatise [10] defines:

###### Definition 3.

Suppose is a sequence of subalgebras. Furthermore assume . Then one defines the filtered conditional expectation of given as

 E(Y|||F0)=E(⋯E(E(Y|FN−1)|FN−2)⋯|F0). (7)

Let us briefly recall the basic construction of a Markov chain, beginning with a foundational definition. Recall that a family of maps

 pn,m:Zs×Zs→R

parametrized by nonnegative integers , is called Markov transition kernel if the following are fulfilled:

1. is a probability measure for all

2. .

3. The Chapman-Kolmogorov equations are fulfilled: for each one has

 pn,m(i,j)=∑k∈Zspn,ℓ(i,k)pℓ,m(k,j). (8)

We use the notation .

Let denote the space of sequences on the grid , endowed with the infinite power of the discrete sigma algebra on . Consider the filtration

 Fn=σ({i0}×⋯×{in}×Zs×⋯∣ij∈Zs for j=1,…,n).

Choose an initial distribution on , and introduce a probability measure on via

 Pα({i0}×⋯×{in}×Zs×⋯)=αi0p0,1(i0,i1)p1,2(i1,i2)…pn−1,n(in−1,in),

and the standard extension theorems. Whenever equals , the point measure on , we will write . Expected values of integrable random variables with respect to are, as usual, written as

 Eα(Y)=argmin(∫Ωd2(Y(ω),⋅)Pα(dω)).

Finally, the discrete stochastic process

 Xn:(Ω,F,P)→Zs;ω↦ωn

constitutes a Markov chain associated to the transition kernel , meaning that the linear Markov property holds true: For any nonnegative one has

 Eα(f(Xm)|Fn)=∑j∈Zspn,m(Xn,j)f(j). (9)

In particular, this linear Markov property allows for the interpretation of as the transition probability .

In view of the convergence analysis of barycentric subdivision schemes, it is of particular interest to gain a deeper understanding of principle of conditioning in case the filtration stems from a Markov chain. A nonlinear Markov property analogous to (9), see [10, Theorem 5.2], leads to a representation of the conditional expectation explicit enough for our purposes. We provide a short proof adapted to our setting, beginning with an auxiliary result which can be found e.g. in [10]:

###### Lemma 5.

Suppose is a Markov chain in associated to the transition kernel . Choose an initial distribution . Furthermore assume is -measurable, and let . Then for nonnegative and measurable and we have

 ∫Ωf(x∘Xm(ω),Y(ω))Pα(dω)=∫Ω∑jpn,m(Xn(ω),j)f(xj,Y(ω))Pα(dω).
###### Proposition 6 (Nonlinear Markov property).

Let be a Markov chain as in Lemma 5, and suppose , with Hadamard. Choose with . Then

 Eα(x(Xm)|Fn)(ω)=argmin∑j∈Zspn,m(Xn(ω),j)d2(xj,⋅)=EXn(ω)(x(Xm)). (10)
###### Proof.

By the linear Markov property (9),

 Y(ω) :=argmin(Eα(d2(x∘Xm,⋅)∣Fn)(ω)) =argmin∑j∈Zspn,m(Xn(ω),j)d2(xj,⋅).

Clearly , as a measurable function of , is -measurable. Thus, in order to verify that is indeed the conditional expectation of given , it remains to show that for each -measurable function the inequality holds true, cf. Definition 2. For this sake, define

 {ψ:Zs×X→R≥0∪{∞};(i,z)↦∑jpn,m(i,j)d(xj,z).

By construction of we have . Thus, Lemma 5 implies

 Eα(d2(x∘Xm,Y)) =Eα(ψ(Xm,Y)) ≤Eα(ψ(Xm,Z)) =Eα(d2(x∘Xm,Z)).

###### Remark 7.

Proposition 6 implies that the expression actually is independent of the initial distribution . Therefore we will omit in the following and simply write .

We are now in a position to establish a link between nonlinear Markov semigroups and barycentric refinement processes. Suppose is a nonnegative compactly supported -variate sequence such that for all . Define recursively and

 a(n+1)i=∑j∈Zsai−2ja(n)j.

Then clearly

 pan,m(i,j)=a(m−n)i−2m−nj

defines a Markov transition kernel. This kernel is homogeneous in the sense that for any . We write , denote the associated Markov chain by , and refer to as the characteristic Markov chain for . The central observation of this article is the following consequence of the nonlinear Markov property (10):

###### Theorem 4.

Suppose is bounded, where is a Hadamard space. Let be a barycentric refinement scheme acting on data from according to the subdivision rule (3). Let denote the characteristic Markov chain of . Then

 Snx∘Xa0=E(x∘Xan|||F0).
###### Proof.

This statement is proven by induction over using the following computation and Proposition 6:

 E(Sn−kx∘Xak∣Fk−1) =argmin(∑j∈Zspk−1,k(Xak−1,j)d2(Sn−kxj,⋅)) =argmin(∑j∈ZsaXak−1−2jd2(Sn−kxj,⋅)) =Sn−k+1x∘Xak−1.

## 2. The convergence problem

We begin this section by summarizing some well-known facts about the convergence of barycentric schemes acting on real-valued input data. Classical resources on this topic are [8, 4, 9].

###### Theorem 5.

Suppose is an -variate compactly supported sequence of nonnegative reals. Define a refinement scheme via

 ˜Sxi=∑j∈Zsai−2jxj,where x∈ℓ∞(Zs).

Then a necessary condition for the convergence of on is the basic sum rule (2). In case the mask obeys this rule, we conclude

 ˜Sxi =argmin⎛⎝∑j∈Zsai−2j|xj−⋅|2⎞⎠=argmin⎛⎝∑j∈Zsai−2jd|⋅|(xj,⋅)2⎞⎠.

Moreover, converges if and only if there exists a continuous subject to the functional equations

 φ(t) = ∑jajφ(2t−j) (11) ∑jφ(t−j) = 1. (12)

Due to Equation (11), is referred to as an -refinable function. Given bounded, real-valued input data , the limit function may be written as

 ˜S∞x(t)=∑j∈Zsφ(t−j)xj.

In particular, , where denotes the Dirac distribution on the origin.

Assuming that conditional expectations of bounded random variables mapping to the metric space are well-defined in the sense of Definition 2, and in addition satisfy the smoothing property (7), we could deduce from Theorem 4

 Snx∘X0 =E(x∘Xan|||F0) =E(x∘Xan|F0) =argmin(∑jp(n)(X0,j)d2(xj,⋅)).

Note, however, that , the -step transition probabilities of , can be viewed as , where denotes the linear counterpart to , and the Dirac delta on the origin, cf. Theorem 5. Thus, the assumption of associativity would guarantee that every scheme converging for linear input data would converge on as well. Indeed, the limit functions for given input data would satisfy

 S∞x(t)=argmin(∑jφ(t−j)d2(xj,⋅)),

where , leading to a complete analogy to the linear case. However, nonlinear conditioning is a non-associative operation. The above observations demonstrate that this lack of property (7) constitutes the need for a further discussion of convergence.

The first result of this section is a small, but useful generalization of Theorem 1 in [5]. Although the proof transcribes more or less directly, we give a detailed exposition for the reader’s convenience.

###### Theorem 6.

Let be refinement schemes acting on data from a Hadamard space . Then converges under the following assumptions:

(i) There is a function , a nonnegative real number and a positive integer such that

 D(Snx)≤γ[n/n0]D(x) (13)

for and .

(ii) is convergent and satisfies

 d∞(Tx,Ty)≤d∞(x,y)

for .

(iii) There is such that

 d∞(Sx,Tx)≤C⋅D(x)

for .

###### Proof.

We set and claim that this defines a Cauchy sequence in . Note first that given and , by continuity of respectively , we find and such that

 d(fr(y),fr(2−mj))

Moreover, due to convergence of , by multiplying both the numerator and the denominator of the number with a power of two if necessary we may assume to be sufficiently large for

 d(fr(2−mj),Tm−r(Srx)j) =d(T∞Srx(2r−mj),Tm−r(Srx)j)

to hold for , in addition to (14). This together with (i) and (ii) implies

 d(fn(y),fn+1(y)) ≤d(fn(y),fn(2−mj)) +d(fn(2−mj),Tm−nSnxj) +d(Tm−nSnxj,Tm−n−1Sn+1xj) +d(Tm−n−1Sn+1xj,fn+1(2−mj)) +d(fn+1(2−mj),fn+1(y)) <5C⋅D(x)γ[n/n0],

showing that is a Cauchy sequence. Since is complete, we find a continuous with uniformly. We claim that converges to in the sense of (1). For and , we obtain the inequality

 d(Tm−nSnxj,Smxj) ≤m−1∑k=nd(Tm−kSkxj,Tm−k−1Sk+1xj) ≤m−1∑k=nγ[k/n0]⋅D(x)C≤γ[n/n0](n0D(x)C1−γ),

which together with

 d(fn(2−mj),Smxj)≤d(fn(2−mj),Tm−nSnxj)+d(Tm−nSnxj,Smxj)

establishes the claim. ∎

###### Definition 4.

In accordance with [5], we call a scheme satisfying (13) weakly contractive. Thus, a weakly contractive scheme is contractive if and only if .

In the following we rely on a nonlinear version of Jensen’s inequality due to K. T. Sturm, cf. [10]:

###### Lemma 8 (Conditional Jensen’s inequality).

Suppose is a convex, lower semicontinuous function on a Hadamard space , and is a probability space. Moreover suppose is a filtration in . Then for each bounded, -measurable random variable the following holds true:

 ψ(E(Y|||(Fk)k≥n))≤E(ψ(Y)|Fn). (15)
###### Lemma 9.

Suppose the linear scheme associated to converges, and , with bounded, convex and balanced. Denote by the Minkowski functional of . Furthermore define via

 D(x)=supρ(i−j)<2d(xi,xj).

Then the barycentric scheme associated to is weakly contractive with respect to .

###### Proof.

The Hadamard property implies that for each the function

 X→R≥0;z↦d(z,z0),

which clearly is continuous, is convex as well. Thus, by Jensen’s inequality (15) and Theorem 4,

 d(Snx∘X0,z0)=d(E(x∘Xan|||F0),z0)≤E(d(x∘Xan,z0)|F0). (16)

Recall that the transition kernel of takes the form

 Pa=(a(m−n)i−2m−nj)n≤m.

Thus Proposition 6 implies . Together with (16) this gives

 d(Snxi,z0)≤∑k∈Zsa(n)i−2nkd(xk,z0)for all i∈Zs.

Substituting for , we deduce

 d(Snxi,Snxj) ≤∑k∈Zsa(n)i−2nkd(xk,Snxj) ≤∑k∈Zs,ℓ∈Zsa(n)i−2nka(n)j−2nℓd(xk,xℓ).

The fact that the support of is contained in the balanced, convex and bounded set together with the recursion (which amounts to the Chapman-Kolmogorov equations (8)) implies , see also [4].

Since the linear subdivision scheme with mask converges, we find a refinable function satisfying (11) and (12), cf. Theorem 5. Recall that one obtains this refinable function as the limit of the linear scheme acting on the input data , cf. Theorem 5 :

 supi|a(n)i−φ(i/2n)|=ϵn⟶n→∞0. (17)

Accordingly, setting , we obtain

Now, if are such that , and , one concludes

 ρ(k−ℓ)≤12n(ρ(i−2nk)+ρ(i−j)+ρ(j−2nℓ)<12n(2(2n−1)+2)=2. (19)

Define

 ψ(s,t)=∑i∈Zsφ(t−i)φ(s−i).

Then, since the refinable function is uniformly continuous, the property (12) implies that for large enough,

 αn=infρ(t−s)<2−n+1ψ(s,t)>ϵ>0. (20)

By boundedness of we also obtain

 M=supt∈Rs|Zs∩(t+Ω)|<∞. (21)

Combining (12) with (18) through (21) further gives

 D(Snx)=supρ(i−j)<2d(Snxi,Snxj)≤γnD(x), (22)

where . Clearly, for large enough, . Moreover, the estimate (22) is uniform in (and even ). The same argument leading to the first inequality in (18) together with (19) provides

 D(Smx)≤D(Skx)for m≥k.

Thus, for one concludes:

 D(Snx) ≤γD(Sn−n0x) ≤γ[n/n0]D(Sn−n0[n/n0]) ≤γ[n/n0]D(x),

which completes the proof. ∎

Recall that the tensor product of two masks and is defined by

 (a⊗b)(i,j)=ai⋅bj.
###### Lemma 10 ([5]).

Suppose and the corresponding are as in Lemma 9. Define via ,