# A low-rank approach to the computation of path integrals

###### Abstract

We present a method for solving the reaction-diffusion equation with general potential in free space. It is based on the approximation of the Feynman-Kac formula by a sequence of convolutions on sequentially diminishing grids. For computation of the convolutions we propose a fast algorithm based on the low-rank approximation of the Hankel matrices. The algorithm has complexity of flops and requires floating-point numbers in memory, where is the dimension of the integral, , and is the mesh size in one dimension. The presented technique can be generalized to the higher-order diffusion processes.

###### keywords:

Low-rank approximation, Feynman-Kac formula, Path integral, Multidimensional integration, Skeleton approximation, Convolution^{†}

^{†}journal: Journal of Computational Physics

July 3, 2019

## 1 Introduction

Path integrals Feynman (1948); Feynman and Hibbs (1965); Garrod (1966) play a dominant role in description of a wide range of problems in physics and mathematics. They are a universal and powerful tool for condensed matter and high-energy physics, theory of stochastic processes and parabolic differential equations, financial mathematics, quantum chemistry and many others. Different theoretical and numerical approaches have been developed for their computation, such as the perturbation theory Abrikosov et al. (1975), the stationary phase approximation Mahan (2000); Norman Bleistein (2010), the functional renormalization group Zinn-Justin (1996); Polonyi (2003), various Monte Carlo Dick et al. (2012) and sparse grids methods Holtz (2011); Garcke and Griebel (2013). The interested reader can find particular details in the original reviews and books Wong (2014); Masujima (2008); Kleinert (2009).

In this paper we focus on the one-dimensional reaction-diffusion equation with initial distribution and a constant diffusion coefficient

(1) |

This equation may be treated in terms of a Brownian particle motion Crank (1975); Bass (1998); Chaichian and Demichev (2001), where the solution is the density distribution of the particles. The potential (or the dissipation rate) is bounded from below. We do not consider the drift term because it can be easily excluded by a substitution Borodin and Salminen (2002).

The solution of (1) can be expressed by the Feynman-Kac formula Borodin and Salminen (2002); Kac (1951); Karatzas and Shreve (1991)

(2) |

where the integration is done over the set of all continuous paths from the Banach space starting at and stopping at arbitrary endpoints at time . is the Wiener measure, and is the Wiener process Mazo (2002); Nelson (2001). One of the advantages of the formulation (2) is that it can be directly applied for the unbounded domain without any additional (artificial) boundary conditions.

Path integral (2) corresponding to the Wiener process is typically approximated by a finite multidimensional integral with the Gaussian measure (details are given in Section 2.1). The main drawback is that this integral is a high-dimensional one and its computation requires a special treatment. Several approaches have been developed to compute the multidimensional integrals efficiently. The sparse grid method Smolyak (1964); Gerstner and Griebel (2003) has been applied to the computation of path integrals in Gerstner and Griebel (1998), but only for dimensions up to , which is not enough in some applications. The main disadvantage of the Monte Carlo simulation is that it does not allow to achieve a high accuracy Makri and Miller (1987); Gorshkov et al. (2013) for some cases (highly oscillatory functions, functions of sum of all arguments).

The multidimensional integrand can be represented numerically as a
multidimensional array (a tensor), which contains values of a multivariate
function on a fine uniform grid.
For the last decades several approaches have been developed
to efficiently work with tensors. They are based on the idea of separation of
variables de Lathauwer (2009); Smilde et al. (2004); Khoromskij (2010a, 2012)
firstly introduced in Hitchcock (1927a, b).
It allows to present a tensor in the low-rank or low-parametric
format Grasedyck et al. (2013); Kolda and Bader (2009); Grasedyck and Wolfgang (2011),
where the number of parameters used for the approximation is
almost linear (with respect to dimensionality).
To construct such decompositions numerically
the very efficient algorithms have been developed recently:
two-dimensional incomplete cross
approximation^{1}^{1}1Because the low-rank
representation of large matrices based on the adaptive cross approximation
is directly related to the manuscript we summarize the basics of the method
in A. for the skeleton decomposition,
three-dimensional cross approximation Oseledets et al. (2008) for
the Tucker format Tucker (1966, 1963, 1964); de Lathauwer et al. (2000) in 3D,
tt-cross Oseledets and Tyrtyshnikov (2010) approximation for the tensor train
decomposition Oseledets and Tyrtyshnikov (2009); Oseledets (2011),
which can be also considered as a particular case of the
hierarchical Tucker format Hackbusch and Kühn (2009); Grasedyck (2010); Hackbusch (2012)
for higher dimensional case.
For certain classes of functions commonly used in the computational physics
(multiparticle Schrödinger operator Khoromskij et al. (2009); Khoromskij and Khoromskaia (2009); Khoromskaia and Khoromskij (2015); Schneider et al. (2009); Flad, Heinz-JÃ¼rgen et al. (2007); Khoromskij et al. (2011),
functions of a discrete elliptic operator Kazeev and Khoromskij (2012); Hackbusch and
Khoromskij (2006a, b); Beylkin and Mohlenkamp (2002); Gavrilyuk et al. (2005); Khoromskij (2009),
Yukawa, Helmholtz and Newton potentials Khoromskij and Khoromskaia (2007); Hackbusch and Khoromskij (2007); Hackbusch and Braess (2005); Oseledets and Muravleva (2010), etc.) there exist
low-parametric representations in separated formats
and explicit algorithms Oseledets (2013); Beylkin and Mohlenkamp (2005) to obtain
and effectively work with them
(especially quantized tensor train (QTT)
format Dolgov et al. (2012a); Khoromskij and Oseledets (2010a, b); Oseledets (2010); Lebedeva (2011); Khoromskij and Oseledets (2011); Savostyanov (2011); Khoromskij (2011)).
In many cases it is very effective to compute the multidimensional
integrals Khoromskaia et al. (2013)
using separated representations Ballani (2012),
particularly for multidimensional convolutions Rakhuba and Oseledets (2015); Khoromskij (2010b); Kazeev et al. (2011); Dolgov et al. (2012b) and
highly oscillatory functions Beylkin and Monzón (2010).

Our approach presented here is based on the low-rank approximation of matrices used in an essentially different manner. We formulate the Feyman-Kac formula as an iterative sequence of convolutions defined on grids of diminishing sizes. This is done in Section 3.2. To reduce the complexity of this computation, in Section 3.3 we find a low-rank basis set by applying the cross approximation (see A) to a matrix constructed from the values of a one-dimensional function on a very large grid. That gives reduce of computational time and memory requirements, resulting in fast and efficient algorithm presented in Section 3.4. The numerical examples are considered in Section 4. The most interesting part is that we are able to treat non-periodic potentials without any artificial boundary conditions (Section 4.3).

## 2 Problem statement

### 2.1 Time discretization

Equation (2) corresponds to the Wiener process. A standard way to discretize the path integral is to break the time range into intervals by points

The average path of a Brownian particle after steps is defined as

where every random step , , is independently taken from a normal distribution with zero mean and variance equal to . By definition, .

Application of a suitable quadrature rule on the uniform grid (i.e., trapezoidal or Simpson rules) with the weights to the time integration in (2) gives

(3) |

and transforms the exponential factor to the approximate expression

The Wiener measure, in turn, transforms to the ordinary -dimensional measure

and the problem reduces to an -dimensional integral over the Cartesian coordinate space. Thus, we can approximate the exact solution (2) by

written in the following compact form

(4) |

The integration sign here denotes an -dimensional integration over the particle steps , and is defined in (3). The convergence criterion in terms of for the sequence (4) is discussed and proven in Chaichian and Demichev (2001), p. 33. The limit of (4) exists if it is a Cauchy sequence.

Our goal is to compute the integral (4) numerically in an efficient way.

## 3 Computational technique

### 3.1 Notations

In this paper vectors (written in columns) are denoted by boldface lowercase letters, e.g., , matrices are denoted by boldface capital letters, e.g., . The -th element of a vector is denoted by , the element of a matrix is denoted by . A set of vectors , is denoted by , and the th element of a vector is denoted by .

DefDefinition

Let and be vectors and . We say that a vector is a convolution of two ordered vectors and and write

if has the following components

The computation of the convolution can be naturally carried out as a multiplication by the Hankel matrix. {Def} We say that the Hankel matrix is generated by row and column , and denote this by

if

(5) |

This compact notation will be used to compute convolutions (when they are written as a Hankel matrix-vector products). As it can be directly verified,

(6) |

For two vectors and from (5) for the case , , we will also write

(7) |

This notation will be used when vector is a subvector of .

### 3.2 Multidimensional integration via the sequence of one-dimensional convolutions

Multidimensional integral (4) can be represented in terms of an iterative sequence of one-dimensional convolutions. Indeed, for a one-dimensional function , such that

(8) |

with

(9) |

and the initial condition

(10) |

the solution (4) reads

(11) |

The iteration starts from and goes down to . Since the function is bounded and the convolution (8) contains the exponentially decaying Gaussian, the integral has finite lower and upper bounds. Consider

(12) |

We suppose that the product rapidly decays, so that for large enough, we can approximate the integral in (8) by and assume that this approximation has an error in some norm

This approximation has an important drawback: as soon as has to be computed on the semi-open interval , the domain of should be taken larger, i.e. for steps, because of the convolution structure of the integral (12). Indeed, if we suppose, that the function is computed on the uniform mesh

(13) |

and the integration mesh is chosen to be nested in (13) with the same step

(14) |

then the function is defined on the mesh

(15) |

and

(16) |

The last equality follows from definitions (13) and (14). This is illustrated in Figure 1.

The integral (12) can be calculated for every fixed of the mesh (13) as the quadrature sum with the weights

(17) |

(18) |

The complexity of the computation of for all is flops. It can be reduced to by applying the Fast Fourier transform (FFT) for convolution (17). Full computation of costs operations and floating-point numbers. This complexity becomes prohibitive for large (i.e., for small time steps), but can be reduced. Below we present a fast approximate method for the calculation of in flops and memory cost with , by applying low-rank decompositions.

### 3.3 Low-rank basis set for the convolution array.

In this section we provide a theoretical justification for our approach. Consider a sequence of matrices corresponding to the iterative process (17) and constructed in the following way

(19) |

where is the iteration number.

Let us now consider iteration (17) at the step and for simplicity omit the index in the matrix and mesh notations (19). Let us also denote the sum (17) for taken from the grid (13) by and set . Then

(20) |

The equality (20) establishes the recurrence relation between iterations at the step (the right-hand side) and the step (the left-hand side) according to (17) and (18), see Figure 2.

The matrix is a Hankel matrix, as follows from definition (19), and consists of square blocks , , such that

(21) |

Here, every block is a Hankel matrix as well generated by the upper row and the right column correspondingly:

(the notation is introduced in Section 3.1, Definition 3.1), where

(22) |

by the definition, see Figure 2.
It can also be represented as a sum of two
anti-triangular^{2}^{2}2By anti-triangular matrix we call a matrix which is
triangular with respect to the anti-diagonal of the matrix. Hankel matrices

(23) |

where the upper-left has nonzero anti-diagonal and the bottom-right has zero anti-diagonal according to (22).

Equation (20) may be rewritten in the block form (see again Figure 2)

(24) |

Here every block is multiplied by the same vector . The number of matrix-vector multiplications can be reduced, if the dimension of the linear span is less than . Before estimation of the dimension we formulate some auxiliary lemmas (proven in the C).

RemRemark

###### Lemma 1.

Let be a basis set of span , , and let matrix . Then is a basis set of span from (23).

###### Lemma 2.

Let be a basis set of span , , and let matrix . Then is a basis set of span from (23).

###### Lemma 3.

Let be a basis set of span , such that according to (7). Then is a basis set of span .

Let us define a basis set as follows

(25) |

An obvious corollary of the previous Lemma is the following Theorem.

###### Theorem 1.

The dimension of the linear span of matrices is equal to . Moreover, it is contained in the linear span of the matrices defined in (25).

###### Proof.

###### Lemma 4.

Let be a basis set of span . Then for basis matrices defined in (25) the computation of the matrix-by-vector products

(26) |

costs flops for a fixed .

###### Proof.

Once the basis for the span of is found, the complexity of the multiplication in (20) can be estimated as follows.

###### Theorem 2.

###### Proof.

Indeed, by the assumption for each , . The complexity of the product , for a fixed is flops by Lemma 4. The computation of such products for all takes flops.

The vector , which is a subvector of , is represented via few matrix-by-vector products (26) as follows

(27) |

The computation of its th component takes flops for any . Computation of components of the vector , which are also the components of the particular vector (for ), costs, in turn, flops. Finally, . ∎

### 3.4 Final algorithm

To compute , which defines the final solution (11)
on the mesh (13),
one needs to carry out iterations (17) starting from down to .
At each iteration step we construct a function ,
which approximates
the entries in equation (28) as follows.
Suppose, that the function has been already constructed
at the previous step .
Then, to compute at the current iteration ,
we consider^{3}^{3}3but do not compute all its elements
the matrix with the entries

(29) |

and apply the cross approximation (39) to this matrix. The columns of this matrix are vectors element-wise multiplied by the corresponding exponential factor with the potential (29), see Figure 3. The algorithm of the cross approximation requires only entries, which are being chosen adaptively. They are calculated by the function on-the-fly for the particular points . Thus,

(30) |

where and are matrices of the rank saved in memory. By construction, the -th column of matrix is the vector from (22) and the -th column of matrix is the basis vector from Lemma 1. Hence, is the matrix of coefficients of the decomposition (45). Once the cross approximation (30) is obtained, the memory allocated for all data structures related to can be overwritten at the next iteration.

Computation of the circulant matrix-vector products (26) is done according to Lemma 4 by the convolution , where is a column of the matrix . The vectors are also saved in the memory. Then is calculated by equation (28), and the algorithm proceeds to the next iteration.

At some iteration step the rank of the decomposition (30) will reach the number of columns and from this iteration it will be more efficient to carry out the convolution (20) without low-rank approximation. Complexity of one iteration of the presented algorithm is estimated in Theorem 2. Finally, for all steps it is flops, . The standard FFT based algorithm applied to the whole array without any low-rank compression at each step gives complexity for all steps equal to flops. We illustrate this theoretical estimations by the example from Section 4.2 in Figure 4.

Basically, the asymptotic complexity, proven in Theorem 2 is practically useful for . This is the main assumption for the matrix from (29). Existence of such an approximation (and the properties of the initial problem) is in general still an open question. Some particular cases were studied in Chiu and Demanet (2013). It was shown, that the cross approximation converges for matrices having singular vectors satisfying the coherence property. Some estimations can be found in Bebendorf and Grzhibovskis (2006); Bebendorf and Kunis (2009); Bebendorf (2011) also. There is a theoretical idea how to identify the existence of the low-rank structure of a given matrix generated by a one-dimensional target function a priory (see Hairer et al. (2006) and B for details).

## 4 Numerical experiments and discussions

### 4.1 Harmonic Oscillator

As a first example, let us consider a model system, which can be solved analytically, with the initial condition and the dissipation rate defined as

(31) |

According to equation (11) the exact solution for the particular case (31) has the following form (see D for derivation)

(32) |

Comparison of the numerical low-rank solution with the exact one (32) gives the relative error

(33) |

which in the order of magnitude is equal to the machine precision, where is an approximate solution on the final mesh and is the exact one on the same mesh. For our example

Here , , , and the mesh is a uniform one on with points. It is interesting that the scheme is exact for this case.

### 4.2 Cauchy Distribution

The second example is taken from Gerstner and Griebel (1998) and is interesting because it can be solved analytically as well. For and initial condition such that

(34) |

the exact solution is

In Table 1 we present numerical results demonstrating the numerical order of scheme by the Runge formula

with respect to and the timings for the whole computation. Here is the computed solution at the final step in time.

Using our approach, it becomes possible to calculate for large values of final time due to the low-rank approximation of matrices composed from the columns of the integrand values (see Section 3.4). That significantly reduces the computational cost. For an example, for the last row of Table 1 iterations start from the calculation of the convolution on the range with mesh points. This is reduced to the calculation of (the rank) convolutions of two arrays with elements.

As it can be seen from our results, the scheme has the second order in time. It can be improved to higher orders by Richardson extrapolation on fine meshes Brezinski and Zaglia (1991); Stoer and Bulirsch (2002). Another way is to use other path integral formulations with high-order propagators Makri (1995, 2014).

CPU Time (min.) | ||||||

### 4.3 Nonperiodic potential with impurity

The dissipation rate causes the creation and annihilation of diffusing particles, as it follows from the main equation (1). Without the Laplacian, which is responsible for the free diffusion, we have

It can be seen, that the density of particles increases over time for and decreases for correspondingly. The case may lead to an instability in the solution, because the integral

(35) |

may diverge (see Eq. (4)). Therefore, when choosing , one should make sure that the integral in (35) converges.

Consider the following problem (see Figure 5)

(36) |

It can be interpreted as a nonperiodic system with an impurity. The term does not decay in the spatial domain and it is not periodic. Therefore the reduction of this problem to a bounded domain is not a trivial task and would require sophisticated artificial boundary conditions.

In Table 2 we present results of numerical calculations, which show the order of the numerical scheme. In Figure 6 we also present the computed solutions for different values of . Even in this case, the solution converges with the order . We also used the Richardson extrapolation of for different to get higher order schemes in time.

CPU Time (min.) | |||||||
---|---|---|---|---|---|---|---|

### 4.4 Monte Carlo experiments

In this section we present results of Monte Carlo simulation. To estimate the solution in a fixed point the following formula is used

(37) |