Nonparametric Uncertainty Quantification for Stochastic Gradient Flows4footnote 44footnote 4This work is partially supported by the Office of Naval Research Grants MURI N00014-12-1-0912

# Nonparametric Uncertainty Quantification for Stochastic Gradient Flows444This work is partially supported by the Office of Naval Research Grants MURI N00014-12-1-0912

Tyrus Berry222Department of Mathematics, the Pennsylvania State University, University Park, PA     John Harlim222Department of Mathematics, the Pennsylvania State University, University Park, PA 333Department of Meteorology, the Pennsylvania State University, University Park, PA 555The research of JH is also partially supported by the Office of Naval Research Grants N00014-13-1-0797 and the National Science Foundation DMS-1317919.
July 6, 2019
###### Abstract

This paper presents a nonparametric statistical modeling method for quantifying uncertainty in stochastic gradient systems with isotropic diffusion. The central idea is to apply the diffusion maps algorithm to a training data set to produce a stochastic matrix whose generator is a discrete approximation to the backward Kolmogorov operator of the underlying dynamics. The eigenvectors of this stochastic matrix, which we will refer to as the diffusion coordinates, are discrete approximations to the eigenfunctions of the Kolmogorov operator and form an orthonormal basis for functions defined on the data set. Using this basis, we consider the projection of three uncertainty quantification (UQ) problems (prediction, filtering, and response) into the diffusion coordinates. In these coordinates, the nonlinear prediction and response problems reduce to solving systems of infinite-dimensional linear ordinary differential equations. Similarly, the continuous-time nonlinear filtering problem reduces to solving a system of infinite-dimensional linear stochastic differential equations. Solving the UQ problems then reduces to solving the corresponding truncated linear systems in finitely many diffusion coordinates. By solving these systems we give a model-free algorithm for UQ on gradient flow systems with isotropic diffusion. We numerically verify these algorithms on a 1-dimensional linear gradient flow system where the analytic solutions of the UQ problems are known. We also apply the algorithm to a chaotically forced nonlinear gradient flow system which is known to be well approximated as a stochastically forced gradient flow.

Key words.   nonparametric UQ, nonlinear filtering, nonlinear response, diffusion maps, statistical prediction, gradient flows

AMS subject classifications.   58J65, 82C31, 93E11, 60G25

## 1 Introduction

An important emerging scientific discipline is to quantify the evolution of low-order statistics of dynamical systems in the presence of uncertainties due to errors in modeling, numerical approximations, parameters, and initial and boundary conditions [30]. While many uncertainty quantification (UQ) methods have been proposed, they rely on some knowledge about the underlying dynamics, at least at the coarse grained levels. One class of popular UQ methods is low-order truncation modeling based on projection of the dynamics onto some basis [30, 17], such as the principal component analysis [14, 16], the truncated polynomial chaos basis [23, 17], the dynamically orthogonal basis [28], the Nonlinear Laplacian Spectral Analysis (NLSA) basis [12], and recently, the “ROMQG” which stands for Reduced Order Modified Quasilinear Gaussian” method [29] which was carefully designed for predicting statistics of turbulent systems.

This paper considers UQ problems in which one has no explicit parametric form for the underlying dynamics. Given only a time series of the underlying dynamics, our goal is to devise UQ methods based on nonparametric modeling, applying ideas from diffusion maps [3, 9, 6]. While the classical Cauchy problem for solving the backward Kolmogorov equation is to find the semigroup solutions of this PDE, the diffusion maps technique can be interpreted as the inverse of this Cauchy problem. Namely, given a realization of the associated stochastic process, we construct a stochastic matrix (also known as Markov matrix), whose generator is a discrete approximation to the backward Kolmogorov operator. Numerically, the stochastic matrix is constructed by evaluating an appropriate kernel function on all pairs of data points and carefully re-normalizing the resulting matrix to account for sampling bias in the data set and to insure the Markov property. In this paper, we apply the recently developed variable bandwidth diffusion kernel [6], which generalizes the original result of diffusion maps introduced in [9] to non-compact manifolds. Our choice is motivated by the fact that this variable bandwidth kernel is more robust and it significantly improves the operator estimation in areas of sparse sampling, e.g. on the tail of a distribution [6]. The UQ methods discussed in this paper are only applicable for a class of nonlinear dynamical systems which can be described by stochastically forced gradient flows, since this is the class of problems for which one can learn the (backward) Kolmogorov operator from the diffusion maps algorithm [9, 6]. In the conclusion we discuss the possibility of applying the UQ framework developed here to more general systems.

The representation of the probabilistic forecasting problem in diffusion coordinates was first noted in [22, 8], however this forecasting approach was never demonstrated. In fact, such forecasting would have been difficult for the non-compact examples which we will consider, because they used fixed bandwidth kernels in [22, 8] which would not recover the true Kolmogorov operator, as shown in [6]. In fact, the goal of [22, 8] was not to solve the forecasting problem, but instead to a low-dimensional representations of complex dynamical systems. We should note that a ‘locally-scaled’ version of the approach in [22, 8] was taken in [25], which used an ad-hoc variable bandwidth function. While the ad-hoc bandwidth of [25] is valid for finding low-dimensional coordinates, it would not be useful for the forecasting problem since, as shown in [6], the bandwidth function would affect the estimated operator. The previous work in [22, 8, 25] all focused on using diffusion coordinates as a nonlinear map to obtain a low-dimensional representation of the observed dynamics. In this paper we take a significantly different perspective, namely, we treat the diffusion coordinates as a basis for smooth functions and we represent probability densities in this basis for the purposes of UQ.

We will consider three UQ problems, namely the problems of forecasting, filtering [20], and response [18]. While most stochastic projection methods chose the projection basis coordinate [30, 17] depending on the nature of the problems (such as the geometry of the state space or the distribution of the parameters), here we consider solving these UQ problems in the diffusion coordinates. For diffusion processes on nonlinear manifolds, this data-driven basis is the most natural choice since these eigenfunctions implicitly describe both the dynamics and geometry of the underlying dynamical system. For example, in this coordinate basis, the forecasting problem reduces to solving a diagonal infinite-dimensional systems of linear ODEs, as originally noted in [22]. For continuous-time nonlinear filtering problems, the solutions are characterized by the unnormalized conditional distribution which solves the Zakai equation [2]. In diffusion coordinates, we shall see that this nonlinear filtering problem reduces to solving a system of infinite-dimensional multiplicatively forced linear stochastic differential equations. Finally, we consider the response problem studied in [18], which applied the fluctuation-dissipation theorem to estimate the statistical linear response of perturbed systems given the data set of the unperturbed system and the functional form of the external perturbation. Finding the response in our framework requires the diffusion coordinates of the perturbed system. By assuming the perturbation is given by a known change in the potential function, we show that this perturbed basis can be constructed using an appropriate kernel, evaluated on unperturbed data. In this basis, the corresponding nonlinear response problem reduces to solving an infinite dimensional linear system of ODEs. In each case described above, the nonparametric UQ algorithms consist of solving truncated systems of ODEs and SDEs with finitely many diffusion coordinates.

This paper will be organized as follows: In Section LABEL:DM, we briefly review the diffusion maps and several computational manipulations for obtaining the appropriate nonparametric probabilistic models by projecting the operators to a basis of the data set itself. In Section LABEL:UQ, we formulate two UQ problems (prediction and filtering) in the diffusion coordinates. In Section LABEL:sec4, we will formulate the third UQ problem (nonlinear response) in the diffusion coordinates. In Section LABEL:sec5, we verify our numerical methods on linear and nonlinear examples. We conclude the paper with a short summary and discussion in Section LABEL:sec6. We accompany this paper with several movies in the electronic supplementary materials, depicting the evolution of the estimated time-dependent distributions.

## 2 Diffusion Kernels as Nonparametric Models

Consider a state variable evolving on a -dimensional manifold according to the stochastic gradient flow,

 dx=−∇U(x)dt+√2DdWt, \hb@xt@.01(2.1)

where denotes the potential of the vector field at and is a positive scalar that characterizes the amplitude of a dimensional white noise on the manifold . Given data sampled independently from the invariant measure of (LABEL:SDE), , our goal is to approximate the generator,

 L=DΔ−∇U⋅∇, \hb@xt@.01(2.2)

with a stochastic matrix constructed from the data set. In order to estimate from the data set, we will assume that (LABEL:SDE) is ergodic, and for estimating the coefficient we will require the system to be wide-sense stationary as well. The Markov matrix that we will construct is a nonparametric model in the sense that the structure of the manifold and the form of the potential function are not assumed to be known. In (LABEL:BKop), is the Laplacian (with negative eigenvalues), and is the gradient operator on with respect to the Riemannian metric inherited from the ambient space .

The diffusion maps algorithm [9] generates a stochastic matrix by evaluating a fixed bandwidth isotropic kernel function on all pairs of data points and then renormalizing the matrix and extracting the generator. When the data set lies on a compact manifold , the resulting matrix converges to in the limit of large data for any smooth potential . Recently, this result was extended by both authors to non-compact manifolds by using a variable bandwidth kernel [6]. They showed that the variable bandwidth kernels produce significantly improved estimation in areas of sparse sampling (e.g. such as rare events which occur on the tail of a distribution) and are less dependent on the choice of bandwidth. As mentioned in the introduction, we will consider the variable bandwidth kernels [6] as the key ingredient in constructing the nonparametric models in this paper.

In Section LABEL:findingBK below, we briefly review the methodology in [6] for approximating the generator . Since our goal is to approximate , then we need to determine from the data. In Section LABEL:findingD will provide one method to determine using the correlation time of the data and we also mention several other methods to determine .

### 2.1 Approximating the Generator of Stochastic Gradient Flows

The key to our approach is the intuition that continuous notions such as functions and operators have discrete representation in the basis of the data set itself. Given a data set , sampled independently from the invariant measure , a function is represented by a vector . Similarly, an integral operator , where denotes the volume form on , is represented by a matrix-vector multiplication between the matrix, and the -dimensional vector, . With these definitions, the matrix product yields a vector of length with -th component,

 1N(K→f)i=1NN∑j=1Kij→fj=1NN∑j=1K(xi,xj)f(xj)\lx@stackrelN→∞⟶Gϵf(xi),

where the limit follows from interpreting the summation as a Monte-Carlo integral. Thus, the matrix takes functions defined on to functions defined on , so in this sense we think of as an operator written in the basis of delta functions on the data set. Notice the crucial fact that the data is not uniformly distributed on , so the operator is biased by the sampling measure . This same bias applies to inner products, if are functions on and are their representations as vectors evaluated at , then the dot product has the following interpretation,

 1N→f⋅→g=1NN∑i=1f(xi)g(xi)\lx@stackrelN→∞⟶∫Mf(y)g(y)peq(y)dV(y)≡⟨f,g⟩L2(M,peq).

The previous formula shows that inner products weighted by the sampling measure will be easy to compute in this framework.

With the above intuition in mind, we construct a matrix that will converge to the generator , where is defined in (LABEL:BKop), in the limit as in the sense that,

 limN→∞N∑j=1(Lϵ)ij→fj=D−1Lf(xi)+O(ϵ).

The theory developed in [6] shows that such an can be constructed with a variable bandwidth kernel,

 Kϵ(x,y) =exp{−∥x−y∥24ϵρ(x)ρ(y)}, \hb@xt@.01(2.3)

where the bandwidth function, , is chosen to be inversely proportional to the sampling density, . This choice enables us to control the error bounds in the area of sparse sample (see [6] for the detailed error estimates). Using this kernel requires us to first estimate the sampling density in order to define the bandwidth function . There are many methods of estimating , such as kernel density estimation methods [26, 24, 31, 27]. In this paper, we estimate using a variable bandwidth kernel method based on the distance to the nearest neighbors (see Section 4 of [6] for details).

Given the kernel in (LABEL:VBkernel), we apply the following normalization (first introduced in [9]) to remove the sampling bias,

Note that is a kernel density estimate of the sampling distribution based on the kernel and this is the estimate which will be used in all our numerical examples below. Throughout this paper we will use the normalization with as suggested by the theory of [6], where is the intrinsic dimension of the manifold . We note that and can be estimated from data using the method described in [6, 4]. The next step is to construct a Markov matrix from by defining,

Finally, the discrete approximation to the continuous generator is given by,

 Lϵ(xi,xj) =^Kϵ,α(xi,xj)−δijϵρ(xi)2. \hb@xt@.01(2.4)

By computing the eigenvectors and eigenvalues of , for , we are approximating the eigenfunctions and eigenvalues of , respectively. Since is a scalar constant, obviously, the eigenvectors and eigenvalues are discrete approximations of the eigenfunctions and eigenvalues of the continuous generator , respectively. We should note that this trivial argument does not hold when is a general non-diagonal matrix describing anisotropic diffusion on with respect to the Riemannian metric inherited from the ambient space . In that case, we suspect that one must use a different kernel function, such as the Local Kernels defined in [7] which we discuss in the conclusion.

In addition to the generator , we can also approximate the solution semi-group by the matrix,

 Fϵ≡IN×N+ϵDLϵ,

where has the same eigenvectors as but with eigenvalues where are the eigenvalues of . For arbitrary we can approximate the eigenvalues of the solution semi-group by which converges to as .

In order to insure that the eigenvectors (which approximate the eigenfunctions ) form an orthonormal basis with respect to the inner product , we note that,

 δij=⟨φi,φj⟩L2(M,peq)=∫Mφi(x)φj(x)peq(x)dV(x)=limN→∞1NN∑l=1(→φi)l(→φj)l=limN→∞→φ⊤i→φjN.

Since are eigenvectors, they are already orthogonal, thus we only need to renomalize so that . To do this we simply replace with . To simplify the notation below, we define .

### 2.2 Determining the Diffusion Coefficient

The algorithm described in Section LABEL:findingBK approximates the generator , where is defined in (LABEL:BKop). Since our aim is to obtain the generator , then we must approximate the scalar diffusion coefficient . Several nonparametric methods for estimating have been proposed. For example, [10] considered an optimization problem based on a variational formulation which matches information from the discrete approximation of a conditional Markov chain to the generators of the forward and backward Kolmogorov operators. Another method employed in [21] used the one-lag correlations to approximate the diffusion coefficient. The problem with this estimate is that when the true dynamics are not described by a gradient flow, we would like to use a gradient flow to approximate the dynamics on long time scales (see Section LABEL:sec52 for an example). If we estimate using the one-lag correlation as in [21], we will only match the short-time correlation of the dynamics, but we are interested in the long-time correlation which is better captured by the correlation time.

In this section, we introduce a method to determine using the correlation time of a one-dimensional observable , which results from applying the observation function to the multidimensional time series, . By computing the correlation time of this observable for the estimated dynamical system, , and comparing to the empirical correlation time estimated from the training data set, we will be able to extract the intrinsic parameter . This approach generalizes the Mean Stochastic Model (MSM) of [19, 20]. Intuitively, in the previous section we used the invariant measure to implicitly determine the potential function (through the kernel based generator), and in this section we use the correlation time to fit the stochastic forcing constant . The correlation time is defined by,

 Tc=∫∞0C(τ)C(0)−1dτ,

where is the correlation function of the one-dimensional time series, . The correlation function can be determined from the data by averaging over , or by the inverse Fourier transform of the power spectrum, which follows from the Wiener-Khinchin formula, . For small data sets we found the Wiener-Khinchin approach to be more robust and this is the approach used in the examples in Section LABEL:sec5. We note that the observation function can be any centered functional on the data set as long as the observed process is stationary and is not identically zero (which guarantees that the correlation time is well defined). Note that in the case of anisotropic diffusion, one would need to compute the entire correlation matrix as a function of . Since we assume the diffusion is isotropic, we will only need a single correlation statistic, and the correlation time of the time series will be sufficient.

Once is estimated from the data, we need to find the value of which makes (LABEL:SDE) have correlation time . In order to do this, we will show that can be estimated from the eigenvalues and eigenfunctions of . Let be the semigroup solution for the backward equation. Following [18] we have,

 C(τ)=⟨S(x(t+τ))S(x(t))⟩=∫M(eτL(S)(x))S(x)peq(x)dV(x).

Writing in the eigenbasis of , we have,

 C(τ)=∫\@fontswitchM∑ieDλiτ⟨S,φi⟩⊤peqφi(x)S(x)peq(x)dV(x)=∑ieDλiτ⟨S,φi⟩2peq. \hb@xt@.01(2.5)

Note that since and , so we require to be centered (otherwise the integral of the first term diverges). Noting that for , we can compute the correlation time analytically as,

 Tc=∫∞0C(τ)C(0)−1dτ=−∑i≥1(Dλi)−1⟨S,φi⟩2peq∑j≥1⟨S,φj⟩2peq,

and therefore,

 D=−1Tc∑i≥1λ−1i⟨S,φi⟩2peq∑j≥1⟨S,φj⟩2peq. \hb@xt@.01(2.6)

This gives us an equation for the diffusion coefficient which matches any given correlation time . Using the empirical correlation time allows us to find the diffusion coefficient of the system defined by the data. We should note that the correlation function, , and hence the correlation time , are not intrinsic to the manifold due to the dependence on the observation function, . However, the intrinsic diffusion constant can be recovered from the ratio (LABEL:Dformula) since both formulas for are based the same function .

For one-dimensional linear problems, such as the Ornstein-Uhlenbeck process, (LABEL:Dformula) reduces to the mean stochastic model (MSM) introduced in [19, 20]. To see this, let the potential function be , so that the gradient flow system in (LABEL:SDE) is a one-dimensional Ornstein-Uhlenbeck (OU) process and the eigenfunctions of are the Hermite polynomials . All the inner products are zero except for , in which case the inner product is and the associated eigenvalue is so that (LABEL:Dformula) becomes which is the MSM formula. For nonlinear problems, the correlation function (LABEL:correlationfunction) for the gradient flow system is more accurate than that of the MSM fit as we will show in Section LABEL:sec5.

To numerically estimate (LABEL:Dformula), we first approximate,

 ⟨S,φi⟩peq≈1NN∑l=1S(xl)φi(xl)=1NS(x)⊤→φi,

where is an matrix with th row given by, . We then approximate the diffusion coefficient as,

 D≈−1Tc∑Mi=1λ−1i(S(x)⊤→φi)2∑Mi=1(S(x)⊤→φi)2,

which estimates (LABEL:Dformula) with summations over modes. We obtained the best empirical results with the observation function , which results from summing the (centered) coordinates of the data point . Of course, this observable is vulnerable to certain pathological examples, such as a two dimensional time series with . A more robust choice for would be the norm of the multi-dimensional time series , however we found better results with the summation.

To verify the formula for estimating , we simulate the Ornstein-Uhlenbeck process with true to produce a time series with and discrete time step . We estimate the correlation function as a simple average of for lags starting at and increasing until the first value of for which the average was negative. Using the trapezoid rule to estimate the integral over all the values of the shift, , we estimate the correlation time to be . We then subsample every -th data point to produce a time series of length , the samples of which are approximately independent samples of the invariant measure. We apply the diffusion maps algorithm with the variable bandwidth kernel to the subsampled data to estimate the operator as well as the first eigenvalues and eigenfunctions. Approximating the formula (LABEL:Dformula) as described above, we found . In order to verify that this result for depends only on the intrinsic geometry, we then map the time series into with the isometric embedding,

 xi↦F(xi)=(sin(2x),cos(2x),x)⊤/√5.

One can easily check that is an isometry since . We then repeat the above procedure for the time series and found and . Notice that the estimates for are very similar, whereas the estimates for are different. The isometric change in the geometry, , has decreased the correlation time for the observable , however the intrinsic variable is not effected. This is because is approximated in (LABEL:Dformula) as a ratio between two different methods of estimating the correlation time, and the isometry has the same effect on both estimates of the correlation time.

## 3 Forecasting and Filtering in Diffusion Coordinates

In this section we formulate two UQ problems, namely forecasting and filtering, in diffusion coordinates. These two problems involve pushing a probability measure forward in time, and therefore they involve the Fokker-Planck operator . By projecting these infinite dimensional systems onto the eigenfunctions of the Fokker-Planck operator, we find infinite dimensional linear systems which govern the evolution on the projected coordinates. We can then approximate the solution by truncating the linear system to describe finitely many coordinates.

For gradient flows, it is easy to determine the eigenvalues and eigenfunctions of the Fokker-Planck operator, , from the eigenvalues and eigenfunctions of the generator . To see this, note that , and,

 1peqL∗(fpeq) =1p\textupdiv(D∇(fpeq)+fpeq∇U)=1peq\textupdiv(Dpeq∇f+Df∇peq+fpeq∇U) =1peq\textupdiv(Dpeq∇f)=DΔf+D∇f⋅∇peqpeq=DΔf−∇f⋅∇U=Lf, \hb@xt@.01(3.1)

since . Also, it follows from (LABEL:conj) that . The two operators have the same eigenvalues and it is easy to show that the semi-group solutions are related by and the eigenfunctions of and are given by where are the eigenfunctions of and . From the orthonormality of , it is easy to deduce that,

 ⟨ψi,ψj⟩1/peq=⟨ψi,φj⟩=⟨φi,φj⟩peq=δij. \hb@xt@.01(3.2)

Of course, since these eigenfunctions are all represented by -dimensional eigenvectors (which represent evaluation on the data set ) the actual computation of any inner product will always be realized by rewriting it with respect to , following the rule in (LABEL:innerproductrule).

### 3.1 Nonlinear forecasting

The forecasting problem is, given an arbitrary initial distribution for a state variable , find the density at any future time . This will be the most straightforward problem to solve in diffusion coordinates since solves the Fokker-Planck equation,

 ∂p∂t=L∗p,p(x,0)=p0(x). \hb@xt@.01(3.3)

Let be the eigenfunctions of with eigenvalues , and write in this basis as, where . We define the vector of eigencoordinates and the diagonal matrix , and writing (LABEL:forecasteq) in these coordinates we have,

 d→cdt=DΛ→c,→c(0)=⟨p0,ψi⟩1/peq. \hb@xt@.01(3.4)

In order to solve the linear system in (LABEL:forecastprojection), we first need to find the initial conditions by projecting the initial density onto . Since densities and are given on the training data set , we define the discrete densities and . Following the inner product rule in (LABEL:innerproductrule), we write the projection of the initial condition as,

 \hb@xt@.01(3.5)

since this converges to the true value as . With this initial condition, each component in (LABEL:forecastprojection) can be solved analytically as . Numerically, we will approximate the solutions with finitely many modes, . In Section LABEL:QoI we show how to use these solutions to reconstruct the density or to estimate a quantity of interest at any time .

### 3.2 Nonlinear filtering

The filtering problem is, given a time series of noisy observations of the state variable , find the posterior density given all the observations up to time . For an observation function , we consider continuous-time observations of the form,

 dz=h(x)dt+√RdWt, \hb@xt@.01(3.6)

where are i.i.d Gaussian noise in . With these observations, the evolution of the unnormalized posterior density, , defined as follows,

 P(x,t|z(s),s≤t)=p(x,t)∫\@fontswitchMp(x,t)dV(x),

is given by the Zakai equation [2],

 dp=L∗pdt+ph⊤R−1dz. \hb@xt@.01(3.7)

Writing the evolution becomes,

 ∑idciψi=∑iDλiciψidt+∑iciψih⊤R−1dz,

and projecting both sides of this equation onto we find,

 dcj=Dλjcjdt+∑ici⟨ψih⊤,ψj⟩1/peqR−1dz, \hb@xt@.01(3.8)

which is an infinite-dimensional system of stochastic differential equations with multiplicative stochastic forcing.

To numerically estimate the solution of (LABEL:infinitesde), we truncate this infinite-dimensional system of SDEs by solving only a system of dimensional SDEs for . This strategy is simply a Galerkin approximation for the Zakai equation for which we choose the diffusion coordinates as a basis rather than the Gaussian series as proposed in [1] or the Hermite polynomial basis as proposed in [11]. We should note that for scalar OU processes, the diffusion coordinates are exactly the Hermite polynomials since these polynomials are the eigenfunctions of the backward Kolmogorov operator for the OU process.

Defining as a dimensional vector for each pair ; where the -th component is given by , we can write the truncated dimensional system of SDEs in compact form as follows,

 d→c=DΛ→cdt+(H→c)R−1dz. \hb@xt@.01(3.9)

To solve this system of SDEs, we first project the initial condition to obtain as in (LABEL:coeffest) for . We then numerically solve the system of SDEs in (LABEL:filterprojection) using the splitting-up method (see for example [11]). Explicitly, given we compute by first using the solution to the deterministic part,

 →c0(ti)=exp(DΛΔt)→c(ti−1),

and then the solution to the stochastic part,

 →c(ti)=exp(HR−1dz(ti)−12H2R−1→1Δt)→c0(ti).

Here, the exponent term consists of an matrix given by

 HR−1dz(ti)−12H2R−1→1Δt=∑kHk(R−1dz(ti))k−12(Hk)2(R−1→1)kΔt.

Notice that unlike the forecasting problem which had an analytic solution for any time , this procedure must be iterated for each observation .

### 3.3 Quantity of Interest

In the uncertainty quantification problems described above, we reduce the PDE and SPDE problems which describe the evolution of density into systems of ODEs in (LABEL:forecastprojection) and SDEs in (LABEL:filterprojection), respectively, for the coefficients . Using the coefficients we can approximate the density function as follows,

 p(x,t)≈pM(x,t)=M∑i=0ci(t)ψi(x)=M∑i=0ci(t)φi(x)peq(x), \hb@xt@.01(3.10)

where the coefficients are from the solution of the dimensional systems of ODEs or SDEs.

One complication in some of the algorithms above is that the evolution of the coefficients may not preserve the normalization of the corresponding density. In fact, for the nonlinear filtering problems in (LABEL:filtereq), the solutions are unnormalized densities. Fortunately we can use the eigenfunctions to estimate the normalization factor of an unnormalized density from its diffusion coordinates. Assume that in (LABEL:reconstruct) is an unnormalized density with diffusion coordinates obtained from the solution of the dimensional systems of ODEs or SDEs. We can then estimate the normalization constant as,

 Z=∫MpM(x,t)dV(x)=M∑i=0ci∫Mφi(x)peq(x)dV(x)=M∑i=0ci⟨1,φi⟩peq.

where we estimate inner product, . Subsequently, we normalize our density by replacing with . In the remainder of this paper, we will refer to as the normalized density without ambiguity. We also note that by reconstructing the density from the coefficients we are implicitly projecting the density into the subspace of Hilbert space spanned by the first eigenfunctions. This has a smoothing effect which improves the clarity of figures and we always show reconstructed densities in the figures and videos.

In many cases, we are not interested in the entire density ; instead we are often interested in the expectation of an observable with respect to , namely . A common example of functionals of interest are the moments for . Although is a functional on , can be given as a functional on the ambient space because we only evaluate on the data set, which lies on the manifold . Using the formula (LABEL:reconstruct) for the density estimate at time , we can approximate the expectation of the functional at time as,

 Ep(x,t)[A(x)]≈⟨A,pM⟩=M∑i=0ci(t)⟨A,φi⟩peq=M∑i=0ci(t)ai=→c(t)⊤→a. \hb@xt@.01(3.11)

The formula (LABEL:functionalexpectation) shows that we can find the expectation of a functional by writing in diffusion coordinates as . In these coordinates, the expectation of is simply the inner product of the diffusion coefficients of and . Finally, if the quantity of interest is given by a functional evaluated on the training data set as , we approximate the inner product .

## 4 Nonlinear Response in Diffusion Coordinates

The response problem considered in [18] is to determine the change in the expectation of a functional as function of time, after perturbing the dynamics of a system at equilibrium. Letting be the density evolved according to the perturbed system, we define the response of a functional as the expected difference,

 δE[A(x)](t)=Epδ[A(x)](t)−Epeq[A(x)],t≥0, \hb@xt@.01(4.1)

We assume the unperturbed system has the form,

 dx=F(x)dt+√2DdW. \hb@xt@.01(4.2)

with equilibrium density and Fokker-Planck operator , so that . We assume the perturbed system has the form,

 dx=(F(x)+δF(x,t))dt+√2DdW, \hb@xt@.01(4.3)

with associated Fokker-Planck operator so that solves the Fokker-Planck equation,

 ∂pδ∂t=~L∗pδ=L∗pδ+δL∗pδ, \hb@xt@.01(4.4)

where denotes the Liouville operator corresponding to the external forcing with functional form .

This reveals that the response problem is closely related to the forecasting problem. Given the Fokker-Planck operator for the perturbed system, one must solve (LABEL:responseeq) with initial condition chosen to be the equilibrium distribution of the unperturbed system, . Since computational cost becomes an issue, especially for higher dimensional problems, one typically approximates by solving ensemble solutions of (LABEL:perturbed) with initial ensemble sampled from , and in fact we will use this approach as a benchmark in Section LABEL:dwresponsesection. However, as in [18], we are interested in cases where the Fokker-Planck operators and and also the functional form of are all unknown. We will assume that we only have access to data from the unperturbed system and the functional form of the external perturbation . Of course, since the techniques of Section LABEL:DM are restricted to gradient flow systems, we will assume that both the unperturbed and perturbed systems are well approximated by stochastically forced gradient flows.

For gradient flow systems, we have and , where the perturbation is a time-independent potential. Given a data set from sampled from and the functional form of , it is tempting to proceed as in Section LABEL:UQ by writing and projecting (LABEL:responseeq) to eigenfunctions of . The main issue with this projection is that it is difficult to computationally attain the external perturbed forcing term,

 ⟨δL∗ψi,ψj⟩1/peq=−⟨\textupdiv(δFψi),ψj⟩1/peq=⟨δF,ψi∇φj)⟩=⟨δU,\textupdiv(φi∇φj)⟩peq.

Even if or are known, we have no provably convergent technique to compute the gradient or divergence operators on the unknown Riemannian manifold. In fact, the technique discussed in Section LABEL:DM does not provide a framework to represent vector fields on manifolds.

As a remedy, we propose to project (LABEL:responseeq) onto the eigenfunctions of which can be accessed from the functional form of and the data set . This is an application of a result in [6], which showed that one can approximate the generator of any gradient flow system with a (partially) known potential even if the data is sampled from a different gradient flow system. In particular, it was shown (see eq.(17) in [6]) that for general choice of bandwidth function in (LABEL:VBkernel) and any , the matrix constructed in (LABEL:approxgenerator) converges to a weighted Laplacian in the following sense,

 Lϵf=Δf+2(1+d4)∇f⋅∇peqpeq+(d+2)∇f⋅∇ρρ+O(ϵ). \hb@xt@.01(4.5)

If we choose the following bandwidth function,

 ρ=p−1/2eqe−δUD(d+2), \hb@xt@.01(4.6)

where is estimated from the data set as before, substituting this into (LABEL:Lgeneral), we obtain,

 Lϵf =Δf+∇f⋅∇peqpeq−∇f⋅∇δUD+O(ϵ) =Δf−∇f⋅(∇U+∇δU)D+O(ϵ) =D−1~Lf+O(ϵ). \hb@xt@.01(4.7)

With this result, we construct a stochastic matrix following exactly the procedure described in Section LABEL:findingBK, except with the bandwidth function in (LABEL:modbandwidth).

Let and be the eigenvalues and eigenfunctions of , which is a discrete approximation to the continuous operator as shown in (LABEL:newL). Following the same argument as in Section LABEL:UQ, the corresponding Fokker-Planck operator, , has eigenvalues with eigenfunctions, , where is the equilibrium measure of the perturbed system. Now letting and projecting (LABEL:responseeq) with initial condition, , on the perturbed coordinate basis, the evolution of the perturbed density becomes a linear forecasting problem,

 d~cidt=D~λi~ci,~ci(0)=⟨peq,~ψi⟩1/~peq. \hb@xt@.01(4.8)

Therefore, the response in (LABEL:response) formula can be approximated by,

 δE[A](t)≈M∑i=1(~ci(t)−~ci(0))⟨A,~φi⟩~peq,t≥0 \hb@xt@.01(4.9)

utilizing (LABEL:functionalexpectation) on the perturbed diffusion coordinates. Notice that the zeroth mode is excluded in the summation in (LABEL:responsefunctionalexpectation) since and therefore, , for any .

A complication in numerically implementing the above technique is that the eigenfunctions are orthonormal with respect to the inner product weighted by the perturbed equilibrium density, , but we can only estimate inner products weighted with respect to the unperturbed equilibrium density, . This requires us to rewrite all of the inner products with respect to . In particular, we can rewrite the initial coefficients in (LABEL:responseprojection) as,

 ~ci(0)=⟨peq,~ψi⟩1/~peq=⟨peq,~φi⟩=⟨1,~φi⟩peq≈1NN∑j=1~φi(xj).

With these initial conditions, we have explicit solutions for the linear problem in (LABEL:responseprojection) given by . To compute the response formula in (LABEL:responsefunctionalexpectation), we evaluate the inner product terms in (LABEL:responsefunctionalexpectation) as follows,

 ⟨A,~φi⟩~peq=⟨A,~peqpeq~φi⟩peq≈1NN∑j=1A(xj)~p% eq(xj)peq(xj)~φi(xj)=1NZN∑j=1A(xj)e−δU(xj)/D~φi(xj),

using the fact that the equilibrium density of the perturbed system is given by, , where is the normalization factor for the density obtained as discussed in Section LABEL:QoI.

## 5 Numerical Results

For a state variable evolving according to (LABEL:SDE), we have shown how to estimate the density function , which solves either the forecasting, filtering, or response problem. We have also shown how to find the expectation of any quantity of interest, , with respect to this measure. In this section, we first validate the numerical algorithms developed in Sections LABEL:UQ and LABEL:sec4 on a simple linear gradient flow system. We then test our algorithms on a chaotically forced gradient flow system which is known to be well approximated as a stochastically forced gradient flow. In each example we will use only