Optimal point sets for quasi–Monte Carlo integration of bivariate periodic functions with bounded mixed derivatives

# Optimal point sets for quasi–Monte Carlo integration of bivariate periodic functions with bounded mixed derivatives

Aicke Hinrichs Aicke Hinrichs Institut für Analysis, Johannes-Kepler-Universität Linz, Altenberger Straße 69, 4040 Linz, Austria
22email: aicke.hinrichs@uni-rostock.deJens Oettershagen Institute for Numerical Simulation, Wegelerstraße 6, 53115 Bonn, Germany
44email: oettershagen@ins.uni-bonn.de
Jens Oettershagen Aicke Hinrichs Institut für Analysis, Johannes-Kepler-Universität Linz, Altenberger Straße 69, 4040 Linz, Austria
22email: aicke.hinrichs@uni-rostock.deJens Oettershagen Institute for Numerical Simulation, Wegelerstraße 6, 53115 Bonn, Germany
44email: oettershagen@ins.uni-bonn.de
###### Abstract

We investigate quasi-Monte Carlo (QMC) integration of bivariate periodic functions with dominating mixed smoothness of order one. While there exist several QMC constructions which asymptotically yield the optimal rate of convergence of , it is yet unknown which point set is optimal in the sense that it is a global minimizer of the worst case integration error. We will present a computer-assisted proof by exhaustion that the Fibonacci lattice is the unique minimizer of the QMC worst case error in periodic for small Fibonacci numbers . Moreover, we investigate the situation for point sets whose cardinality is not a Fibonacci number. It turns out that for the optimal point sets are integration lattices.

\pdfstringdefDisableCommands

## 1 Introduction

Quasi-Monte Carlo (QMC) rules are equal-weight quadrature rules which can be used to approximate integrals defined on the -dimensional unit cube

 ∫[0,1)df(x)dx≈1NN∑i=1f(xi),

where are deterministically chosen quadrature points in . The integration error for a specific function is given as

 ∣∣ ∣∣∫[0,1)df(x)dx−1NN∑i=1f(xi)∣∣ ∣∣.

To study the behavior of this error as increases for from a Banach space one considers the worst case error

 wce(H,PN)=sup\lx@stackrelf∈H∥f∥≤1∣∣ ∣∣∫[0,1)df(x)dx−1NN∑i=1f(xi)∣∣ ∣∣.

Particularly nice examples of such function spaces are reproducing kernel Hilbert spaces Aronszajn_1950 (). Here, we will consider the reproducing kernel Hilbert space of 1-periodic functions with mixed smoothness. Details on these spaces are given in Section 2. The reproducing kernel is a tensor product kernel of the form

 Kd,γ(x,y)=d∏j=1K1,γ(xj,yj) % for x=(x1,…,xd),y=(y1,…,yd)∈[0,1)d

with and and a parameter . It turns out that minimizing the worst case error among all -point sets with respect to the Hilbert space norm corresponding to the kernel is equivalent to minimizing the double sum

 Gγ(x1,…,xN)=N∑i,j=1Kd,γ(xi,xj).

There is a general connection between the discrepancy of a point set and the worst case error of integration. Details can be found in (NW10, , Chapter 9). In our case, the relevant notion is the -norm of the periodic discrepancy. We describe the connection in detail in Section 2.3.

There are many results on the rate of convergence of worst case errors and of the optimal discrepancies for , see e.g. Niederreiter (); NW10 (), but results on the optimal point configurations for fixed and are scarce. For discrepancies, we are only aware of W77 (), where the point configurations minimizing the standard -star-discrepancy for and are determined, PVC06 (), where for the point minimizing the standard - and -star discrepancy for is found, and LP07 (), where this is extended to .

It is the aim of this paper to provide a method which for and yields the optimal points for the periodic -discrepancy and worst case error in . Our approach is based on a decomposition of the global optimization problem into exponentially many local ones which each possess unique solutions that can be approximated efficiently by a nonlinear block Gauß-Seidel method. Moreover, we use the symmetries of the two-dimensional torus to significantly reduce the number of local problems that have to be considered.

It turns out that in the case that is a (small) Fibonacci number, the Fibonacci lattice yields the optimal point configuration. It is common wisdom, see e.g. BTY12 (); NS84 (); SJ94 (); SZ82 (), that the Fibonacci lattice provides a very good point set for integrating periodic functions. Now our results support the conjecture that they are actually the best points.

These results may suggest that the optimal point configurations are integration lattices or at least lattice point sets. This seems to be true for some numbers of points, for example for Fibonacci numbers, but not always. However, it can be shown that integration lattices are always local minima of . Moreover, our numerical results also suggest that for small the optimal points are always close to a lattice point set, i.e. -point sets of the form

 {(iN,σ(i)N):i=0,…,N−1},

where is a permutation of .

The remainder of this article is organized as follows: In Section 2 we recall Sobolev spaces with bounded mixed derivatives, the notion of the worst case integration error in reproducing kernel Hilbert spaces and the connection to periodic discrepancy. In Section 3 we discuss necessary and sufficient conditions for optimal point sets and derive lower bounds of the worst case error on certain local patches of the whole . In Section 4 we compute candidates for optimal point sets up to machine precision. Using arbitrary precision rational arithmetic we prove that they are indeed near the global minimum which also turns out to be unique up to torus-symmetries. For certain point numbers the global minima are integration lattices as is the case if is a Fibonacci number. We close with some remarks in Section 5.

## 2 Quasi–Monte Carlo Integration in H1mix(T2)

### 2.1 Sobolev Spaces of Periodic Functions

We consider univariate 1-periodic functions which are given by their values on the torus . For , the -th Fourier coefficient of a function is given by . The definition

 ∥f∥2H1,γ=^f20+γ∑k∈Z|2πk|2^f2k=(∫Tf(x)dx)2+γ∫Tf′(x)2dx (1)

for a function in the univariate Sobolev space of functions with first weak derivatives bounded in gives a Hilbert space norm on depending on the parameter . The corresponding inner product is given by

 (f,g)H1,γ(T)=(∫10f(x)dx)(∫10g(x)dx)+γ∫10f′(x)g′(x)dx.

We denote the Hilbert space equipped with this inner product by .

Since is continuously embedded in it is a reproducing kernel Hilbert space (RKHS), see Aronszajn_1950 (), with a symmetric and positive definite kernel , given by Wahba75 ()

 K1,γ(x,y):= 1+γ∑k∈Z∖{0}|2πk|−2exp(2πik(x−y)) (2) = 1+γk(|x−y|),

where is the Bernoulli polynomial of degree two divided by two.

This kernel has the property that it reproduces point evaluations in , i.e. for all . The reproducing kernel of the tensor product space is the product of the univariate kernels, i.e.

 K2,γ(x,y)= K1,γ(x1,y1)⋅K1,γ(x2,y2) (3) = 1+γk(|x1−y1|)+γk(|x2−y2|)+γ2k(|x1−y1|)k(|x2−y2|).

### 2.2 Quasi–Monte Carlo Cubature

A linear cubature algorithm with uniform weights on a point set is called a QMC cubature rule. Well-known examples for point sets used in such quadrature methods are digital nets, see e.g. DickPillich (); Niederreiter (), and lattice rules SJ94 (). A two-dimensional integration lattice is a set of points given as

 {(iN,igNmod1):i=0,…,N−1}

for some coprime to . A special case of such a rank-1 lattice rule is the so called Fibonacci lattice that only exists for being a Fibonacci number and is given by the generating vector , where denotes the -th Fibonacci number. It is well known that the Fibonacci lattices yield the optimal rate of convergence in certain spaces of periodic functions.

In the setting of a reproducing kernel Hilbert space with kernel on a general domain , the worst case error of the QMC-rule can be computed as

 wce(H,PN)2=∫D∫DK(x,y)dxdy−2NN∑i=1∫DK(xi,y)dy+1N2N∑i,j=1K(xi,xj),

which is the norm of the error functional, see e.g. DickPillich (); NW10 (). For the kernel we obtain

 wce(H1,γmix(T2),PN)2=−1+1N2N∑i=1N∑j=1K2,γ(xi,xj).

There is a close connection between the worst case error of integration in for the case and periodic -discrepancy, which we will describe in the following.

### 2.3 Periodic Discrepancy

The periodic -discrepancy is measured with respect to periodic boxes. In dimension , periodic intervals for are given by

 I(x,y)=[x,y) if x≤yandI(x,y)=[x,1)∪[0,y) if x>y.

In dimension , the periodic boxes for and are products of the one-dimensional intervals, i.e.

 B(x,y)=I(x1,y1)×⋯×I(xd,yd).

The discrepancy of a set with respect to such a periodic box is the deviation of the relative number of points of in from the volume of

 D(PN,B)=#PN∩BN−vol(B).

Finally, the periodic -discrepancy of is the -norm of the discrepancy function taken over all periodic boxes , i.e.

 D2(PN)=(∫[0,1)d∫[0,1)dD(PN,B(x,y))2dydx)1/2.

It turns out, see (NW10, , page 43) that the periodic -discrepancy can be computed as

 D2(PN)2= −3−d+1N2∑x,y∈PN~Kd(x,y) = 3−dwce(H1,6mix(Td),PN)2,

where is the tensor product of kernels . So minimizing the periodic -discrepancy is equivalent to minimizing the worst case error in for . Let us also remark that the periodic -discrepancy is (up to a factor) sometimes also called diaphony. This terminology was introduced in Zin1997 ().

## 3 Optimal Cubature Points

In this section we deal with (local) optimality conditions for a set of two-dimensional points , where denote the vectors of the first and second components of the points, respectively.

### 3.1 Optimization Problem

We want to minimize the squared worst case error

 wce (H1,γmix(T2),PN)2=−1+1N2N−1∑i,j=0K1,γ(xi,xj)K1,γ(yi,yj) = −1+1N2N−1∑i,j=0(1+γk(|xi−xj|)+γk(|yi−yj|)+γ2k(|xi−xj|)k(|yi−yj|)) = γN2N−1∑i,j=0(k(|xi−xj|)+k(|yi−yj|)+γk(|xi−xj|)k(|yi−yj|)) = γ(2k(0)+γk(0)2)N +2γN2N−2∑i=0N−1∑j=i+1(k(|xi−xj|)+k(|yi−yj|)+γk(|xi−xj|)k(|yi−yj|))

Thus, minimizing is equivalent to minimizing either

 Fγ(x,y):=N−2∑i=0N−1∑j=i+1(k(|xi−xj|)+k(|yi−yj|)+γk(|xi−xj|)k(|yi−yj|)) (4)

or

 Gγ(x,y):=N−1∑i,j=0(1+γk(|xi−xj|))(1+γk(|yi−yj|)). (5)

For theoretical considerations we will sometimes use , while for the numerical implementation we will use as objective function, since it has less summands.

Let be two permutations of . Define the sets

 Dτ,σ={x∈[0,1)N,y∈[0,1)N:xτ(0)≤xτ(1)≤⋯≤xτ(N−1)yσ(0)≤yσ(1)≤⋯≤yσ(N−1)} (6)

on which all points maintain the same order in both components and hence it holds for . It follows that the restriction of to , i.e. , is a polynomial of degree in . Moreover, is convex for sufficiently small .

###### Proposition 1

and are convex if .

###### Proof

It is enough to prove the claim for

 Gγ(x,y)=N−1∑i,j=0(1+γk(|xi−xj|))(1+γk(|yi−yj|)).

Since the sum of convex functions is convex and since is convex if is, it is enough to show that is convex for . To this end, we show that the Hesse matrix is positive definite if . First, is positive if . Hence is is enough to check that the determinant of is positive, which is equivalent to the inequality

 (1+γk(s))(1+γk(t))>γ2(s−12)2(t−12)2.

So it remains to see that

 1+γk(s)=1+γ2(s2−s+16)>γ(s−12)2.

But this is elementary to check for and . In the case the determinant of and some additional argument is necessary which we omit here. ∎

Since

 [0,1)N×[0,1)N=⋃(τ,σ)∈SN×SNDτ,σ,

one can obtain the global minimum of on by computing for all and choose the global minimum as the smallest of all the local ones.

### 3.2 Using the Torus Symmetries

We now want to analyze how symmetries of the two dimensional torus allow to reduce the number of regions for which the optimization problem has to be solved.

The symmetries of the torus which do not change the worst case error for the considered classes of periodic functions are generated by

1. Shifts in the first coordinate and shifts in the second coordinate .

2. Reflection of the first coordinate and reflection of the second coordinate .

3. Interchanging the first coordinate and the second coordinate .

4. The points are indistinguishable, hence relabeling the points does not change the worst case error.

Applying finite compositions of these symmetries to all the points in the point set leads to an equivalent point set with the same worst case integration error. This shows that the group of symmetries acting on the pairs indexing generated by the following operations

1. replacing or by a shifted permutation: or

2. replacing or by its flipped permutation: or

3. interchanging and :

4. applying a permutation to both and :

lead to equivalent optimization problems. So let us call the pairs and in equivalent if they are in the same orbit with respect to the action of . In this case we write .

Using the torus symmetries 1. and 4. it can always be arranged that and , which together with fixing the point leads to the sets

 Dσ={x∈[0,1)N,y∈[0,1)N:0=x0≤x1≤…≤xN−10=y0≤yσ(1)≤⋯≤yσ(N−1)}, (7)

where denotes a permutation of .

But there are many more symmetries and it would be algorithmically desirable to cycle through exactly one representative of each equivalence class without ever touching the other equivalent . This seems to be difficult to implement, hence we settled for a little less which still reduces the amount of permutations to be handled considerably.

To this end, let us define the symmetrized metric

 d(i,j)=min{|i−j|,N−|i−j|}for0≤i,j≤N−1 (8)

and the following subset of .

###### Definition 1

The set of semi-canonical permutations consists of permutations which fulfill

1. is lexicographically smaller than .

Here we identify with .

This means that is semi-canonical if the distance between and is minimal among all distances between and , which can be arranged by a shift. Moreover, the distance between and is at most as large as the distance between and , which can be arranged by a reflection and a shift if it is not the case. Hence we have obtained the following lemma.

###### Lemma 1

For any permutation with there exists a semi-canonical such that the sets and are equivalent up to torus symmetry.

Thus we need to consider only semi-canonical which is easy to do algorithmically.

###### Remark 1

If is semi-canonical, it holds .

Another main advantage in considering our objective function only in domains is that it is not only convex but strictly convex here. This is due to the fact that we fix .

###### Proposition 2

and are strictly convex if .

###### Proof

Again it is enough to prove the claim for

 Gγ(x,y)=N−1∑i,j=0(1+γk(|xi−xj|))(1+γk(|yi−yj|)).

Now we use that the sum of a convex and a strictly convex function is again strictly convex. Hence it is enough to show that the function

 f(x1,…,xN−1,y1,…,yN−1) =N−1∑i=1(1+γk(|xi−x0|))(1+γk(|yi−y0|)) =N−1∑i=1(1+γk(xi))(1+γk(yi))

is strictly convex on . In the proof of Proposition 1 it was actually shown that is strictly convex for for each fixed . Hence the strict convexity of follows from the following easily verified lemma. ∎

###### Lemma 2

Let be strictly convex functions on the convex domains . Then the function

 f:D=D1×⋯×Dm→R,(z1,…,zm)↦m∑i=1fi(zi)

is strictly convex.

Hence we have indeed a unique point in each where the minimum of is attained.

### 3.3 Minimizing Fγ on Dσ

Our strategy will be to compute the local minimum of on each region for all semi-canonical permutations and determine the global minimum by choosing the smallest of all the local ones.

This gives for each the constrained optimization problem

 min(x,y)∈DσFγ(x,y) % subject to vi(x)≥0 and wi(y)≥0 for all i=1,…,N−1, (9)

where the inequality constraints are linear and given by

 vi(x)=xi−xi−1 and wi(y)=yσ(i)−yσ(i−1) for i=1,…,N−1. (10)

In order to use the necessary (and due to local strict convexity also sufficient) conditions for local minima

 ∂∂xkFγ(x,y)=0 and ∂∂ykFγ(x,y)=0 for k=1,…,N−1

for we need to evaluate the partial derivatives of .

###### Proposition 3

For a given permutation the partial derivative of with respect to the second component is given by

 ∂∂ykFγ(x,y)|Dσ=yk⎛⎜ ⎜⎝N−1∑i=0i≠kci,k⎞⎟ ⎟⎠−N−1∑i=0i≠kci,kyi+12⎛⎝k−1∑i=0ci,ksi,k−N−1∑j=k+1ck,jsk,j⎞⎠, (11)

where and .

Interchanging and the same result holds for the partial derivatives with respect to with the obvious modification to and the simplification that .

The second order derivatives with respect to are given by

 ∂2∂yk∂yjF(x,y)|Dσ={∑k−1i=0ci,k+∑N−1i=k+1ci,k for j=k−ck,j for j≠k,k,j∈{1,…,N−1} (12)

Again, the analogue for is obtained with the obvious modification .

###### Proof

We prove the claim for the partial derivative with respect to :

 ∂∂ykFγ(x,y)= N−2∑i=0N−1∑j=i+1∂∂ykk(|yi−yj|)(1+γk(|xi−xj|))=:ci,j+∂∂ykk(|xi−xj|) = N−2∑i=0N−1∑j=i+1ci,j∂∂ykk(|yi−yj|) = N−2∑i=0N−1∑j=i+1ci,jk′(si,j(yi−yj))⋅⎧⎪⎨⎪⎩si,j for i=k−si,j for j=k0 else = N−1∑j=k+1ck,jsk,j(sk,j(yk−yj)−12)−k−1∑i=0ci,ksi,k(si,k(yi−yk)−12) = yk⎛⎜ ⎜⎝N−1∑i=0i≠kci,k⎞⎟ ⎟⎠−N−1∑i=0i≠kci,kyi+12⎛⎝k−1∑i=0ci,ksi,k−N−1∑j=k+1ck,jsk,j⎞⎠.

From this we immediately get the second derivative (12). ∎

### 3.4 Lower Bounds of Fγ on Dσ

Until now we are capable of approximating local minima of on a given . If this is done for all we can obtain a candidate for a global minimum, but due to the finite precision of floating point arithmetic one can never be sure to be close to the actual global minimum. However, it is also possible to compute a lower bound for the optimal point set for each using Wolfe-duality for constrained optimization. It is known NW06 () that for a convex problem with linear inequality constraints like (9) the Lagrangian

 LF(x,y,λ,μ):= F(x,y)−λTv(x)−μTw(y) (13) = F(x,y)−N−1∑i=1(λivi(x)+μiwi(y)) (14)

gives a lower bound on , i.e.

 min(x,y)∈DσF(x,y)≥LF(~x,~y,λ,μ)

for all that fulfill the constraint

 ∇(x,y)LF(~x,~y,λ,μ)=0 and λ,μ≥0 (% component-wise). (15)

Here, , where denotes the gradient of a function with respect to the variables in . Hence it is our goal to find for each such an admissible point which yields a lower bound that is larger than some given candidate for the global minimum. If the relevant computations are carried out in infinite precision rational number arithmetic these bounds are mathematically reliable.

In order to accomplish this we first have to compute the Lagrangian of (9). To this end, let denote the permutation matrix corresponding to and

 B:=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1−10…0001−1…00⋮⋱⋮0…01−10…01⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠∈R(N−1)×(N−1). (16)

Then the partial derivatives of with respect to and are given by

 ∇xLF(x,y,λ,μ)= ∇xF(x,y)−⎛⎜ ⎜ ⎜ ⎜⎝λ1−λ2⋮λN−2−λN−1λN−1⎞⎟ ⎟ ⎟ ⎟⎠=∇xF(x,y)−Bλ (17)

and

 ∇yLF(x,y,λ,μ)= ∇yF(x,y)−⎛⎜ ⎜ ⎜ ⎜ ⎜⎝μσ(1)−μσ(2)⋮μσ(N−2)−μσ(N−1)μσ(N−1)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠=∇yF(x,y)−BPσμ. (18)

This leads to the following theorem.

###### Theorem 3.1

For and let the point fulfill

 ∂∂xkF(~xσ,~yσ)=δ and ∂∂ykF(~xσ,~yσ)=δ for k=1,…,N−1. (19)

Then

 F(x,y)≥ F(~xσ,~y