A low-rank approach to the computation of path integrals

A low-rank approach to the computation of path integrals

Mikhail S. Litsarev m.litsarev@skoltech.ru Ivan V. Oseledets Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Building 3, 143026 Moscow, Russia Institute of Numerical Mathematics of Russian Academy of Sciences, Gubkina St. 8, 119333 Moscow, Russia

We present a method for solving the reaction-diffusion equation with general potential in free space. It is based on the approximation of the Feynman-Kac formula by a sequence of convolutions on sequentially diminishing grids. For computation of the convolutions we propose a fast algorithm based on the low-rank approximation of the Hankel matrices. The algorithm has complexity of flops and requires floating-point numbers in memory, where  is the dimension of the integral, , and is the mesh size in one dimension. The presented technique can be generalized to the higher-order diffusion processes.

Low-rank approximation, Feynman-Kac formula, Path integral, Multidimensional integration, Skeleton approximation, Convolution
journal: Journal of Computational Physics

July 3, 2019

1 Introduction

Path integrals Feynman (1948); Feynman and Hibbs (1965); Garrod (1966) play a dominant role in description of a wide range of problems in physics and mathematics. They are a universal and powerful tool for condensed matter and high-energy physics, theory of stochastic processes and parabolic differential equations, financial mathematics, quantum chemistry and many others. Different theoretical and numerical approaches have been developed for their computation, such as the perturbation theory Abrikosov et al. (1975), the stationary phase approximation Mahan (2000); Norman Bleistein (2010), the functional renormalization group Zinn-Justin (1996); Polonyi (2003), various Monte Carlo Dick et al. (2012) and sparse grids methods Holtz (2011); Garcke and Griebel (2013). The interested reader can find particular details in the original reviews and books Wong (2014); Masujima (2008); Kleinert (2009).

In this paper we focus on the one-dimensional reaction-diffusion equation with initial distribution and a constant diffusion coefficient 


This equation may be treated in terms of a Brownian particle motion Crank (1975); Bass (1998); Chaichian and Demichev (2001), where the solution is the density distribution of the particles. The potential (or the dissipation rate) is bounded from below. We do not consider the drift term because it can be easily excluded by a substitution  Borodin and Salminen (2002).

The solution of (1) can be expressed by the Feynman-Kac formula Borodin and Salminen (2002); Kac (1951); Karatzas and Shreve (1991)


where the integration is done over the set  of all continuous paths from the Banach space starting at and stopping at arbitrary endpoints at time . is the Wiener measure, and is the Wiener process Mazo (2002); Nelson (2001). One of the advantages of the formulation (2) is that it can be directly applied for the unbounded domain without any additional (artificial) boundary conditions.

Path integral (2) corresponding to the Wiener process is typically approximated by a finite multidimensional integral with the Gaussian measure (details are given in Section 2.1). The main drawback is that this integral is a high-dimensional one and its computation requires a special treatment. Several approaches have been developed to compute the multidimensional integrals efficiently. The sparse grid method Smolyak (1964); Gerstner and Griebel (2003) has been applied to the computation of path integrals in Gerstner and Griebel (1998), but only for dimensions up to , which is not enough in some applications. The main disadvantage of the Monte Carlo simulation is that it does not allow to achieve a high accuracy Makri and Miller (1987); Gorshkov et al. (2013) for some cases (highly oscillatory functions, functions of sum of all arguments).

The multidimensional integrand can be represented numerically as a multidimensional array (a tensor), which contains values of a multivariate function on a fine uniform grid. For the last decades several approaches have been developed to efficiently work with tensors. They are based on the idea of separation of variables de Lathauwer (2009); Smilde et al. (2004); Khoromskij (2010a, 2012) firstly introduced in Hitchcock (1927a, b). It allows to present a tensor in the low-rank or low-parametric format Grasedyck et al. (2013); Kolda and Bader (2009); Grasedyck and Wolfgang (2011), where the number of parameters used for the approximation is almost linear (with respect to dimensionality). To construct such decompositions numerically the very efficient algorithms have been developed recently: two-dimensional incomplete cross approximation111Because the low-rank representation of large matrices based on the adaptive cross approximation is directly related to the manuscript we summarize the basics of the method in A. for the skeleton decomposition, three-dimensional cross approximation Oseledets et al. (2008) for the Tucker format Tucker (1966, 1963, 1964); de Lathauwer et al. (2000) in 3D, tt-cross Oseledets and Tyrtyshnikov (2010) approximation for the tensor train decomposition Oseledets and Tyrtyshnikov (2009); Oseledets (2011), which can be also considered as a particular case of the hierarchical Tucker format Hackbusch and Kühn (2009); Grasedyck (2010); Hackbusch (2012) for higher dimensional case. For certain classes of functions commonly used in the computational physics (multiparticle Schrödinger operator Khoromskij et al. (2009); Khoromskij and Khoromskaia (2009); Khoromskaia and Khoromskij (2015); Schneider et al. (2009); Flad, Heinz-Jürgen et al. (2007); Khoromskij et al. (2011), functions of a discrete elliptic operator Kazeev and Khoromskij (2012); Hackbusch and Khoromskij (2006a, b); Beylkin and Mohlenkamp (2002); Gavrilyuk et al. (2005); Khoromskij (2009), Yukawa, Helmholtz and Newton potentials Khoromskij and Khoromskaia (2007); Hackbusch and Khoromskij (2007); Hackbusch and Braess (2005); Oseledets and Muravleva (2010), etc.) there exist low-parametric representations in separated formats and explicit algorithms Oseledets (2013); Beylkin and Mohlenkamp (2005) to obtain and effectively work with them (especially quantized tensor train (QTT) format Dolgov et al. (2012a); Khoromskij and Oseledets (2010a, b); Oseledets (2010); Lebedeva (2011); Khoromskij and Oseledets (2011); Savostyanov (2011); Khoromskij (2011)). In many cases it is very effective to compute the multidimensional integrals Khoromskaia et al. (2013) using separated representations Ballani (2012), particularly for multidimensional convolutions Rakhuba and Oseledets (2015); Khoromskij (2010b); Kazeev et al. (2011); Dolgov et al. (2012b) and highly oscillatory functions Beylkin and Monzón (2010).

Our approach presented here is based on the low-rank approximation of matrices used in an essentially different manner. We formulate the Feyman-Kac formula as an iterative sequence of convolutions defined on grids of diminishing sizes. This is done in Section 3.2. To reduce the complexity of this computation, in Section 3.3 we find a low-rank basis set by applying the cross approximation (see A) to a matrix constructed from the values of a one-dimensional function on a very large grid. That gives reduce of computational time and memory requirements, resulting in fast and efficient algorithm presented in Section 3.4. The numerical examples are considered in Section 4. The most interesting part is that we are able to treat non-periodic potentials without any artificial boundary conditions (Section 4.3).

2 Problem statement

2.1 Time discretization

Equation (2) corresponds to the Wiener process. A standard way to discretize the path integral is to break the time range into intervals by points

The average path of a Brownian particle after steps is defined as

where every random step , , is independently taken from a normal distribution  with zero mean and variance equal to . By definition, .

Application of a suitable quadrature rule on the uniform grid (i.e., trapezoidal or Simpson rules) with the weights  to the time integration in (2) gives


and transforms the exponential factor to the approximate expression

The Wiener measure, in turn, transforms to the ordinary -dimensional measure

and the problem reduces to an -dimensional integral over the Cartesian coordinate space. Thus, we can approximate the exact solution (2) by

written in the following compact form


The integration sign here denotes an -dimensional integration over the particle steps , and is defined in (3). The convergence criterion in terms of  for the sequence (4) is discussed and proven in Chaichian and Demichev (2001), p. 33. The limit of (4) exists if it is a Cauchy sequence.

Our goal is to compute the integral (4) numerically in an efficient way.

3 Computational technique

3.1 Notations

In this paper vectors (written in columns) are denoted by boldface lowercase letters, e.g., , matrices are denoted by boldface capital letters, e.g., . The -th element of a vector is denoted by , the element of a matrix is denoted by . A set of vectors , is denoted by , and the th element of a vector is denoted by .




Let and be vectors and . We say that a vector is a convolution of two ordered vectors and and write

if  has the following components

The computation of the convolution can be naturally carried out as a multiplication by the Hankel matrix. {Def} We say that the Hankel matrix is generated by row  and column , and denote this by



This compact notation will be used to compute convolutions (when they are written as a Hankel matrix-vector products). As it can be directly verified,


For two vectors and from (5) for the case , , we will also write


This notation will be used when vector is a subvector of .

3.2 Multidimensional integration via the sequence of one-dimensional convolutions

Multidimensional integral (4) can be represented in terms of an iterative sequence of one-dimensional convolutions. Indeed, for a one-dimensional function , such that




and the initial condition


the solution (4) reads


The iteration starts from and goes down to . Since the function is bounded and the convolution (8) contains the exponentially decaying Gaussian, the integral has finite lower and upper bounds. Consider


We suppose that the product rapidly decays, so that for large enough, we can approximate the integral in (8) by and assume that this approximation has an error  in some norm

Figure 1: A correspondence of meshes for two nearest iterations from equation (16) for . Blue filled circles separate the ranges corresponding to different steps , in time . Ticks on the axes label the mesh points. Violet curved lines show correspondence (16) between the two meshes for nearest iterations.

This approximation has an important drawback: as soon as has to be computed on the semi-open interval , the domain of should be taken larger, i.e.  for  steps, because of the convolution structure of the integral (12). Indeed, if we suppose, that the function  is computed on the uniform mesh


and the integration mesh is chosen to be nested in (13) with the same step


then the function is defined on the mesh




The last equality follows from definitions (13) and (14). This is illustrated in Figure 1.

The integral (12) can be calculated for every fixed of the mesh (13) as the quadrature sum with the weights


The complexity of the computation of for all is flops. It can be reduced to by applying the Fast Fourier transform (FFT) for convolution (17). Full computation of costs operations and floating-point numbers. This complexity becomes prohibitive for large (i.e., for small time steps), but can be reduced. Below we present a fast approximate method for the calculation of  in flops and memory cost with , by applying low-rank decompositions.

3.3 Low-rank basis set for the convolution array.

In this section we provide a theoretical justification for our approach. Consider a sequence of matrices corresponding to the iterative process (17) and constructed in the following way


where is the iteration number.

Let us now consider iteration (17) at the step and for simplicity omit the index in the matrix and mesh notations (19). Let us also denote the sum (17) for taken from the grid (13) by and set . Then


The equality (20) establishes the recurrence relation between iterations at the step  (the right-hand side) and the step  (the left-hand side) according to (17) and (18), see Figure 2.

Figure 2: Transition between two neighbour iterations is illustrated. The left-hand-side matrix  is multiplied by vector  in a resulting vector  according to (20). Explicit structure of matrix blocks  in (21) and vector blocks  in (24) is shown. Then entries of vector  are multiplied by corresponding factor  to produce the next iteration step (9). From a new vector  there formed a new matrix  according to (19). The last point from the previous iteration is not needed and is thrown out. Then the steps repeat for the next iteration.

The matrix is a Hankel matrix, as follows from definition (19), and consists of square blocks , , such that


Here, every block is a Hankel matrix as well generated by the upper row and the right column correspondingly:

(the notation is introduced in Section 3.1, Definition 3.1), where


by the definition, see Figure 2. It can also be represented as a sum of two anti-triangular222By anti-triangular matrix we call a matrix which is triangular with respect to the anti-diagonal of the matrix. Hankel matrices


where the upper-left has nonzero anti-diagonal and the bottom-right has zero anti-diagonal according to (22).

Equation (20) may be rewritten in the block form (see again Figure 2)


Here every block is multiplied by the same vector . The number of matrix-vector multiplications can be reduced, if the dimension of the linear span is less than . Before estimation of the dimension we formulate some auxiliary lemmas (proven in the C).



Lemma 1.

Let be a basis set of span , , and let matrix . Then is a basis set of span from (23).

Lemma 2.

Let be a basis set of span , , and let matrix . Then is a basis set of span from (23).

Lemma 3.

Let be a basis set of span , such that according to (7). Then is a basis set of span .

Let us define a basis set as follows


An obvious corollary of the previous Lemma is the following Theorem.

Theorem 1.

The dimension of the linear span of matrices is equal to . Moreover, it is contained in the linear span of the matrices defined in (25).


The matrix can be written as a sum (23), . According to Lemma 1, the set is a basis set of the span . By Lemma 3, is a basis set of the span , and by Lemma 2, is a basis set of . The subspaces and contain only zero matrix in common, so the dimension of the basis is . ∎

Lemma 4.

Let be a basis set of span . Then for basis matrices defined in (25) the computation of the matrix-by-vector products


costs flops for a fixed .


Consider a Hankel matrix

A product is a result of the convolution , which can be done by the FFT  Brigham (1988); Nussbaumer (1981) procedure in flops for a fixed . The vector is taken in the reverse order. ∎

Once the basis for the span of is found, the complexity of the multiplication in (20) can be estimated as follows.

Theorem 2.

Let the set defined in (25) be a basis set of the linear span generated by the set of Hankel matrices  defined in (21). Then the computation of any elements of the vector  (20) costs flops for .


Indeed, by the assumption for each , . The complexity of the product , for a fixed  is flops by Lemma 4. The computation of such products for all takes flops.

The vector , which is a subvector of , is represented via few matrix-by-vector products (26) as follows


The computation of its th component takes flops for any . Computation of components of the vector , which are also the components of the particular vector (for ), costs, in turn, flops. Finally, . ∎


Each component of the resulting vector can be computed by the formula


Here is the -th component of the vector  and is the -th component of the vector . {Rem} It follows from Lemma 3 that in (27).

3.4 Final algorithm

To compute , which defines the final solution (11) on the mesh (13), one needs to carry out iterations (17) starting from down to . At each iteration step we construct a function , which approximates the entries  in equation (28) as follows. Suppose, that the function has been already constructed at the previous step . Then, to compute at the current iteration , we consider333but do not compute all its elements the matrix  with the entries


and apply the cross approximation (39) to this matrix. The columns of this matrix are vectors element-wise multiplied by the corresponding exponential factor with the potential (29), see Figure 3. The algorithm of the cross approximation requires only entries, which are being chosen adaptively. They are calculated by the function on-the-fly for the particular points . Thus,


where and are matrices of the rank  saved in memory. By construction, the -th column of matrix is the vector  from (22) and the -th column of matrix is the basis vector from Lemma 1. Hence, is the matrix of coefficients of the decomposition (45). Once the cross approximation (30) is obtained, the memory allocated for all data structures related to can be overwritten at the next iteration.

Figure 3: Construction of matrix from a one-dimensional convolution (20) according to algorithm in Section 3.4. On a spatial homogeneous mesh (13) the corresponding entries of vector  (20) are calculated. By definition, vector is composed from vectors  (24). Each column of the matrix  is composed of multiplied by a corresponding factor . Then this matrix is decomposed by a cross approximation  (30). For the approximation there needed only some elements of matrix , which are chosen adaptively and computed on-the-fly. Then convolutions are calculated via fast Fourier transform and saved in the memory. Particular values of for the next iteration step can be computed by formula (28) then.

Computation of the circulant matrix-vector products (26) is done according to Lemma 4 by the convolution , where is a column of the matrix . The vectors are also saved in the memory. Then is calculated by equation (28), and the algorithm proceeds to the next iteration.

At some iteration step  the rank of the decomposition (30) will reach the number of columns and from this iteration it will be more efficient to carry out the convolution (20) without low-rank approximation. Complexity of one iteration of the presented algorithm is estimated in Theorem 2. Finally, for all steps it is flops, . The standard FFT based algorithm applied to the whole array without any low-rank compression at each step gives complexity for all steps equal to flops. We illustrate this theoretical estimations by the example from Section 4.2 in Figure 4.

Figure 4: A numerical illustration of theoretical estimations for the example from Section 4.2. For a standard FFT based algorithm applied to the whole array the flops complexity is labeled by square points. The low-rank complexity flops is labeled by circles. The time is scaled in minutes, is the number of dimensions (iteration steps), , .

Basically, the asymptotic complexity, proven in Theorem 2 is practically useful for . This is the main assumption for the matrix from (29). Existence of such an approximation (and the properties of the initial problem) is in general still an open question. Some particular cases were studied in Chiu and Demanet (2013). It was shown, that the cross approximation converges for matrices having singular vectors satisfying the coherence property. Some estimations can be found in Bebendorf and Grzhibovskis (2006); Bebendorf and Kunis (2009); Bebendorf (2011) also. There is a theoretical idea how to identify the existence of the low-rank structure of a given matrix generated by a one-dimensional target function a priory (see Hairer et al. (2006) and B for details).

4 Numerical experiments and discussions

4.1 Harmonic Oscillator

As a first example, let us consider a model system, which can be solved analytically, with the initial condition and the dissipation rate defined as


According to equation (11) the exact solution for the particular case (31) has the following form (see D for derivation)


Comparison of the numerical low-rank solution with the exact one (32) gives the relative error


which in the order of magnitude is equal to the machine precision, where is an approximate solution on the final mesh and is the exact one on the same mesh. For our example

Here , , , and the mesh is a uniform one on with points. It is interesting that the scheme is exact for this case.

4.2 Cauchy Distribution

The second example is taken from Gerstner and Griebel (1998) and is interesting because it can be solved analytically as well. For and initial condition such that


the exact solution is

In Table 1 we present numerical results demonstrating the numerical order of scheme by the Runge formula

with respect to  and the timings for the whole computation. Here is the computed solution at the final step in time.

Using our approach, it becomes possible to calculate for large values of final time  due to the low-rank approximation of matrices  composed from the columns of the integrand values (see Section 3.4). That significantly reduces the computational cost. For an example, for the last row of Table 1 iterations start from the calculation of the convolution on the range with mesh points. This is reduced to the calculation of (the rank) convolutions of two arrays with elements.

As it can be seen from our results, the scheme has the second order in time. It can be improved to higher orders by Richardson extrapolation on fine meshes Brezinski and Zaglia (1991); Stoer and Bulirsch (2002). Another way is to use other path integral formulations with high-order propagators Makri (1995, 2014).

CPU Time (min.)
Table 1: Convergence rate for system (34). Accuracy of the cross approximation . Direct convolutions start from , , range of final spatial domain is , . Dimension of the integral (4) is labeled by , is a time step, is a final time for solution ,  is an error estimated by the Richardson extrapolation, and is the order of the scheme for . Ranks of the matrix from (30) are presented in column labeled by . The CPU time for computation of the integral (4) in all points of the mesh is reported in the last column.
Figure 5: Potential and initial distribution  for periodic system with impurity (36). Potential oscillates on a free space. Functions and are relatively shifted to break the symmetry.
Figure 6: Convergence of solution for nonperiodic potential with impurity (36) for different . This results correspond to the data presented in Table 2. The number of spacial mesh points in the final range . The dissipation rate (36) leads to a decrease in the norm of the distribution density. As seen in the picture, the solution is far from the correct one for the dimensions .
Figure 7: The first few singular values (s.v.) of the matrix (30) for system (34) at each iteration step. The first s.v.  is presented in the absolute value. The other ones are given in the relative values as . The values below the cross accuracy  are thrown out. As it can be seen, approximate SVD-rank is similar to the cross rank (in the sense of criterion (40)).

4.3 Nonperiodic potential with impurity

The dissipation rate causes the creation and annihilation of diffusing particles, as it follows from the main equation (1). Without the Laplacian, which is responsible for the free diffusion, we have

It can be seen, that the density of particles increases over time for and decreases for correspondingly. The case may lead to an instability in the solution, because the integral


may diverge (see Eq. (4)). Therefore, when choosing , one should make sure that the integral in (35) converges.

Consider the following problem (see Figure 5)


It can be interpreted as a nonperiodic system with an impurity. The term does not decay in the spatial domain and it is not periodic. Therefore the reduction of this problem to a bounded domain is not a trivial task and would require sophisticated artificial boundary conditions.

In Table 2 we present results of numerical calculations, which show the order of the numerical scheme. In Figure 6 we also present the computed solutions for different values of . Even in this case, the solution converges with the order . We also used the Richardson extrapolation of for different to get higher order schemes in time.

CPU Time (min.)
Table 2: Convergence rate for system (36). Accuracy of the cross approximation . Direct convolutions start from , , final domain is , . Dimension of the integral (4) is labeled by , is the time step, is the final time. The order of the scheme for and the relative error  (33) are estimated from the original data computed by the algorithm from Section 3.4. The next values and are estimated by the Richardson extrapolation. As it can be seen, the scheme has the fourth order in time after the extrapolation. The ranks of the matrix from(30) are given in the column labeled by . The CPU time for computation of the integral (4) in all points of the mesh is reported in the last column.

4.4 Monte Carlo experiments

In this section we present results of Monte Carlo simulation. To estimate the solution in a fixed point the following formula is used