Large deviations for Gaussian diffusions with delay

Large deviations for Gaussian diffusions with delay

Robert Azencott, Brett Geiger and William Ott University of Houston
July 1, 2019
Abstract.

Dynamical systems driven by nonlinear delay SDEs with small noise can exhibit important rare events on long timescales. When there is no delay, classical large deviations theory quantifies rare events such as escapes from metastable fixed points. Near such fixed points, one can approximate nonlinear delay SDEs by linear delay SDEs. Here, we develop a fully explicit large deviations framework for (necessarily Gaussian) processes driven by linear delay SDEs with small diffusion coefficients. Our approach enables fast numerical computation of the action functional controlling rare events for and of the most likely paths transiting from to . Via linear noise local approximations, we can then compute most likely routes of escape from metastable states for nonlinear delay SDEs. We apply our methodology to the detailed dynamics of a genetic regulatory circuit, namely the co-repressive toggle switch, which may be described by a nonlinear chemical Langevin SDE with delay.

Key words and phrases:
Gaussian process, diffusion, delay, large deviations, optimal transition path, chemical Langevin equation, linear noise approximation, bistable genetic switch
2010 Mathematics Subject Classification:
60F10, 60G15, 60H10

1. Introduction

Dynamical processes are often influenced by small random fluctuations acting on a variety of spatiotemporal scales. Small noise can dramatically affect the underlying deterministic dynamics by transforming stable states into metastable states and giving positive probability to rare events of high interest, such as excursions away from metastable states or transitions between metastable states. These rare events play important functional roles in a wide range of applied settings, including genetic circuits [16], molecular dynamics, turbulent flows [9], and other systems with multiple timescales [8].

The main goal of this paper is to present an explicit computational and theoretical large deviations analysis of rare events for Gaussian diffusion processes with delays. We are motivated in part by the importance of delay for the dynamics of genetic regulatory circuits. Indeed, we apply our approach to a bistable genetic switch driven by a delay stochastic differential equation (delay SDE) of Langevin type.

Consider a family of random processes indexed by a small parameter and driven by the following generic small-noise SDE with drift , diffusion , and no delays:

Large deviations theory for SDEs of this form was developed by Freidlin and Wentzell [18]. Freidlin-Wentzell theory estimates the probability that the process lies within a small tube around any given continuous path taking values in in terms of the action of :

Here denotes probability conditioned on and we assume that .

The Freidlin-Wentzell action functional was originally defined for uniformly bounded coefficients and uniformly elliptic by an explicit time integral involving , , and . These remarkable results were widely extended by S. Varadhan [3] to arbitrary sets of trajectories and by R. Azencott [3] to hypoelliptic diffusions with unbounded coefficients. Numerous extensions and applications to broad classes of stochastic processes have been published by D. Stroock, R. Ellis, A. Dembo, O. Zeitouni, G. Dupuis, and many others. For SDEs with delays, large deviations principles have been established or reasonably justified under a variety of hypotheses  [38, 37, 6, 12, 28, 41, 19].

For fixed time and points in the state space, the path that minimizes (under the constraints and ) is the most likely transition path starting at and reaching at time . A second minimization over provides the most likely transition path from to and the energy associated with this optimal path. Often called the quasi-potential, is central to the quantification of large deviations on long timescales [18].

A computational framework has been developed for the application of Freidlin-Wentzell theory to systems with no delays. This framework includes the minimum action method [15], an extension called the geometric minimum action method that synthesizes the minimum action method and the string method [24], as well as variants of these approaches (see e.g. [29, 30]).

For nonlinear delay SDEs, it is possible to compute a linear noise approximation [10] that is valid in a neighborhood of a given metastable state. Since linear noise approximations are Gaussian diffusions with delays, we have deliberately focused the present paper on Gaussian diffusions with delays. For such diffusions, we rigorously develop and implement a fully explicit large deviation framework, enabling fast numerical computation of optimal transition paths and the quasi-potential. Our methodology does not require the numerical solution of Hamilton-Jacobi equations, a significant positive since Hamilton-Jacobi equations are computationally costly in even moderately-high spatial dimension.

We thus center our study on the Itô delay SDE

(1)

Here , denotes time, is the delay, , and are real matrices, denotes standard -dimensional Brownian motion, denotes the diffusion matrix, and is a small noise parameter. The initial history of the process is given by the Lipschitz continuous curve . We work with a single fixed delay to simplify the presentation – all of our results apply just as well to multiple fixed delays and to delays distributed over a finite time interval.

The Gaussian diffusion (1) arises via linear noise approximation of nonlinear delay SDEs near metastable states in the following way. Suppose the nonlinear delay SDE

has a metastable state ; that is, is a stable fixed point of the deterministic limit ODE

Writing and expanding and around yields the linear noise approximation

where and denote differentiation with respect to the first and second sets of arguments, respectively. This is (1) with , , , and .

We demonstrate the utility of our approach by computing optimal escape trajectories for the co-repressive toggle switch, a bistable genetic circuit driven by a nonlinear delay Langevin equation.

The paper is organized as follows. In Section 2, we review the theory of large deviations for Gaussian processes and present optimal transition path theory for (1). We detail our numerical implementation of this theory in Section 3. Section 4 discusses the general idea of linear noise approximation. We present our computational study of a bistable genetic toggle switch in Section 5.

2. Theory

In this section we develop a rigorous large deviations framework for (1).

2.1. Outline

For brevity, we will often omit the superscript , writing instead of . We first show that the process driven by (1) is in fact a Gaussian process (Section 2.2). This is expected since (1) is linear, but not obvious because of the presence of delay. Since is a Gaussian process, it is completely determined by its mean and covariance matrices . Here denotes matrix transpose. We derive and analytically solve delay ODEs verified by the mean and covariance matrices of in Sections 2.32.7.

We center by writing . The probability distribution of is a centered Gaussian measure on the space of continuous paths starting at . To apply large deviations theory to the paths of and as , one needs to compute the action functional (or Cramer transform), , of . Classical large deviations results for centered Gaussian measures on Banach spaces express in terms of the integral operator determined by the matrix-valued covariance function of (see Sections 2.82.10). Here we derive explicit formulas and implementable computational schemes that allow us to numerically evaluate . We complete this program by explicitly deriving, as , the most likely transition path of between two points and . We achieve this by minimizing under appropriate constraints.

We finish the outline by introducing notation that will be used throughout Section 2.

Notation 2.1.

For a matrix and vector , let and denote matrix norm and Euclidean norm, respectively. The scalar product of vectors is denoted .

Let be the Hilbert space of -valued measurable functions on such that is square-integrable. The Banach space of -valued continuous functions on is denoted . Let be the Banach subspace of all such that . Endow these three spaces with their Borel -algebras.

2.2. The solution of (1) is a Gaussian process

Proposition 2.2.

The delay SDE  (1) has a unique strong solution . The process is Gaussian with almost surely continuous paths, and hence has a surely continuous version.

Proof of Proposition 2.2.

The existence of a unique strong solution is classical (see e.g. [34]). To prove that is Gaussian, we consider Euler-Maruyama discretizations [25] of . For positive integers , let denote timestep size. The Euler-Maruyama approximate solution to (1) is defined first at nonnegative integer multiples of by

Then is extended to by linear interpolation. The convergence of this discretization scheme is well-known (see e.g. [4, 35]) and Theorem 2.1 of [35] yields

Since is a Gaussian process by construction, this -convergence implies that is a Gaussian process as well. The expected values are then finite and clearly bounded due to the delay SDE driving . Again using the delay SDE, this implies that the covariance matrix of and remains bounded by a constant multiple of . A classical result of Fernique for Gaussian processes implies then that is almost surely continuous. ∎

2.3. Delay ODE for the mean of

Writing (1) in integral form, we have

(2)

Taking the expectation of (2) and applying Fubini gives

or, in differential form, a delay ODE for :

(3)

2.4. The centered Gaussian process

The process is clearly not centered in general. The centered process defined by is a centered Gaussian diffusion with delay. Since verifies (1) and verifies (3), elementary algebra shows that verifies the delay SDE

(4)

Note that this delay SDE does not depend on . The same is then true for . This is a crucial point further on because our key large deviations estimates for will be stated in path space for the “small” centered Gaussian process . As we will see, our large deviations computations will ultimately involve the deterministic mean path of and the covariance function of the process .

2.5. Delay ODEs for the covariances of

We now find delay ODEs for the covariance function of . Let

(5)

be the covariance matrix of and , where superscript denotes matrix transpose. Since the history of anterior to is deterministic, when either or is in . Fix , and let vary. We have

We thus obtain

(6)

Let Differentiating with respect to gives

(7)

which is a first-order delay ODE in for each fixed . To close (7), we compute a differential equation for . Proceeding as just done for (6), one checks that the function satisfies the delay ODE

(8)

where for Let denote the Heaviside function

We can rewrite (8) as

(9)

Note that the partial derivative of the Heaviside distribution is classically given by

where the distribution is the Dirac point mass concentrated at zero. By definition of and by (9), the function is continuous for all and and differentiable in and for . For , we write

(10)

and observe that verifies the initial condition for and .

Differentiating (9) in for and switching the order of partial derivatives yields a linear delay ODE in for , namely

(11)

with initial condition for all .

Once is determined, the covariance for each fixed will be computed by solving the delay ODE

(12)

We now describe how to successively solve the delay ODEs driving and .

2.6. Analytical solution of the delay ODE verified by the mean

First-order delay ODEs can be analytically solved by a natural stepwise approach, sometimes called the “method of steps,” a terminology which we will avoid since it is has a different meaning in classical numerical analysis. The basic idea is to convert each one of our delay ODEs into a finite sequence of nonhomogeneous ODEs in which the delay terms successively become known terms.

Consider first the delay ODE (3) for with . The delay term is unknown for but is known for . So we can solve the delay ODE (analytically or numerically) on the interval as a linear nonhomogeneous first-order ODE. Then, for , the delay term in the delay ODE has just been computed, so this delay ODE once again becomes a linear nonhomogeneous first-order ODE.

Notation 2.3.

To get a solution on the whole of , one successively solves the delay ODE on closed intervals with . Here is the largest integer inferior or equal to , and

We now compute the explicit solution for the mean on Let denote the solution of (3) on the interval . For , we have on . For , is the solution of (3) on :

We thus have

Similarly, given on , has the form

Piecing together the yields the full solution on all of .

Note that many characteristics of the given history function , such as continuity, differentiability, discontinuities, etc., will essentially propagate through to the full solution . More precisely, if is of class for some integer , then will be of class for all positive except possibly at integer multiples of Since we assume here that is Lipschitz continuous, will be differentiable except possibly at integer multiples of .

2.7. Analytical solutions of the delay ODEs verified by and

We can extend the preceding method to the delay ODE in verified by for each fixed and then to the delay ODE in verified by . We first focus on . Fix . Due to the delay ODE (11), the distribution defined on by

clearly verifies the delay ODE

(13)

with initial condition

for all and . Note that on , this initial condition is constant and hence continuous. For each fixed , we write the right side of equation (13) as

Observe that for , is bounded in (uniformly in ) and continuous in except for the two points and . As was done above for , one can perform the iterative analysis of the delay ODE (13) on successive time intervals . Since both the initial condition and the right side are known, the step of this iterative construction amounts to solving a first-order linear ODE with constant coefficients and known right side. So this construction is essentially stepwise explicit and proves by induction on that the distribution is actually a bounded function of which is differentiable except maybe at points of the form and .

For each , once the full solution has been constructed on as just outlined, we immediately obtain .

At this stage, is theoretically known over for each and can be plugged into the delay ODE (7) in verified by for each fixed , with initial conditions for . For each fixed , this delay ODE for can again be solved iteratively on the successive time intervals .

The preceding analysis is easy to implement numerically to solve the three types of delay ODEs involved. Each reduction to a succession of roughly linear ODEs enables the use of classical numerical schemes to compute and . We have used the now fairly standard numerical approach of [7]. Our numerical implementation is presented in Section 3.

We have focused on and because these two functions essentially determine the rate functional of large deviations theory for the Gaussian diffusion with delay .

2.8. General large deviations framework

We present, without proofs, a brief overview of large deviations theory for Gaussian measures and processes (refer to Chapter in [3] for proofs of theorems). We will then apply these principles to Gaussian diffusions with delay.

The following notations and definitions will be used throughout this section.

  • is any separable Banach space with dual space and duality pairing for and .

  • is any probability on the Borel -algebra .

  • For , the image probability is defined on by for all Borel subsets of .

  • is called centered iff is centered for all .

  • is called Gaussian iff for all , the image probability is a Gaussian distribution on .

The Laplace transform, , is defined as follows for :

A classical result asserts that when is a Gaussian probability measure on a separable Banach space E, then its Laplace transform is finite for all

We first recall key large deviations results for generic Borel probabilities on separable Banach spaces . Later on below, will be the space and will be Gaussian. Probabilities of rare events for the empirical mean of independent random vectors with identical distribution can be estimated via a non-negative functional, the Cramer transform of , defined as follows (see Theorem 3.2.1 in [3]).

Definition 2.4.

Assume has finite Laplace transform . The Cramer transform of , also called the large deviations rate functional of , is defined for by

Note that . The Cramer set functional is then defined for all by

When is a Banach space of continuous paths , the value of the Cramer transform can be viewed as the “energy” of the path . For instance, if is the probability distribution of the Brownian motion in its path space, the Cramer transform is the kinetic energy (see Proposition 6.3.8 in [3]). The set functional quantifies the probability of rare events in path space through classical large deviations inequalities, which we now recall.

2.9. Gaussian framework and associated Hilbert space

Here, is the random path of a centered continuous Gaussian process driven by the delay SDE (4). The probability distribution of is a centered Gaussian probability on the separable Banach space . So we now focus on Gaussian probabilities on separable Banach spaces. We first state key large deviations inequalities due to S. Varadhan.

Theorem 2.5 (see Theorem 6.1.6 in [3]).

Let be a centered Gaussian probability measure on a separable Banach space . Let be an E-valued random variable with probability distribution . Let be the Cramer set functional of . For every Borel subset of one has

(14)

where and denote the interior and the closure of , respectively.

Whenever , then the lower and upper limits in (14) are equal and

In this case, for small one has the rough estimate

The equality holds, for example, when the Cramer transform is finite and continuous on , and is the closure of .

Definition 2.6.

In the Banach space context, the covariance kernel of is defined for all by

(15)

Let be the probability space . The covariance kernel defines a linear embedding of into a Hilbert subspace of as follows. For each , define a Gaussian random variable on by for all . Let be the closure in of the vector space spanned by all the with . The linear map defined by is continuous with dense image in The inner product in verifies

Define a continuous linear operator by

for all . Then for all and , one has

so that restricted to is injective. Within this classical Gaussian framework, one obtains a generic expression for the Cramer transform of (see Theorem 6.1.5 in [3]).

Theorem 2.7.

Let be a centered Gaussian probability on the separable Banach space Let and be, respectively, the Hilbert space and the continuous linear injection associated to by the preceding construction. Then is a compact operator, and for , the Cramer transform of is given by

2.10. Large deviations rate functional for continuous Gaussian processes

For a centered continuous Gaussian process on with probability distribution in the path space , the generic framework in Section 2.9 can be applied to with and as the space of bounded Radon measures on to obtain a more explicit form of the operator in Theorem 2.7. For a detailed explanation on the connection among the spaces and in this context, see Section 6.3 in [3]. The rate functional of , when viewed either on or can be expressed via Proposition 6.3.7 and Lemma 6.3.6 in [3], which we reformulate as follows.

Proposition 2.8.

Let be any centered continuous Gaussian process on with continuous matrix-valued covariance function The random path takes values in the Banach space , and hence in the Hilbert space . Since the inclusion is continuous and injective, the probability distribution of can be viewed either as a centered Gaussian measure on or . Recall that the self-adjoint covariance operator of is defined by

(16)

for all . Then is a compact linear operator with finite trace, and . Moreover, is semi-positive definite. Let be the orthogonal complement in of the kernel of . Call the restriction of to . Then is injective and maps onto . Call the Cramer transform of on the Hilbert space . Then for any , one has

(17)

When viewed as a probability on the Banach space , the probability has a Cramer transform , and one has

(18)

for all .

2.11. Large deviations rate functional for Gaussian diffusions with delay

In this section we compute explicitly the large deviations rate functional for the centered Gaussian process driven by the delay SDE (4). We will adapt to this delay SDE a technique introduced by R. Azencott in [2] and [3] to study large deviations for hypoelliptic diffusions with unbounded smooth coefficients.

Proposition 2.9.

Consider the centered Gaussian process driven by the delay SDE (4), and call the random path . Let be the probability distribution of on the Banach space . Let be the continuous random path of the Brownian motion driving the delay SDE verified by . Let be the matrix of diffusion coefficients in the delay SDE verified by . We now assume that has full rank .

Then, there is a bijective linear map from onto such that and are both continuous and the random paths and verify almost surely . Moreover, is defined by iterating operators explicitly given in equation (22) below. Let be the dense subspace of all paths in such that is in . The restriction of to is a linear bijection onto . For the function is given by equation (23) below, and is in .

In the Banach space , the support of the probability distribution of is equal to , and the only closed vector subspace of such that is . Similarly, in the Hilbert space , the support of is equal to , and the only closed subspace of such that is .

Proof.

The centered process is driven by the delay SDE

For and , define as above the interval with and Extend any function in by giving it the value on . For any in , we now prove that there is a unique , denoted , verifying on and the following delay ODE:

(19)

To construct given , we set . The linear delay ODE (19) can be (uniquely) solved in successively on the intervals as seen earlier, and since the same recurrence on easily shows that is in . We now note that equation (19) has the following equivalent integral form, valid for all with

(20)

After setting on , equation (19) can hence be successively solved on the intervals with by applying, for the recurrence formula

(21)

Integration by parts of in (21) yields, for and ,

(22)

The new iterative formula (22) does not involve anymore and hence remains well-defined for all functions For each , denote the continuous function determined iteratively on the intervals by the integral equations (22), initialized with on . These equations show by recurrence on that is a continuous linear mapping of the Banach space into . We now show that given any , one can construct a unique such that . Fix . Assume that for all , the value is already known and verifies, for some constant independent of the function ,

for . We then want to solve in the equation (22) for . Define

so that . Note that is known. For define

Let denote matrix norm. For , the expression of yields

Note that for , one has and so that the preceding bound is valid with . Then for equation (22) becomes

with . This linear ODE in has, for , a unique solution given by

The bound on for then implies

for . For , the function is now uniquely determined by

This implies for all , where the constant is given by

as is easily deduced from the bounds on and . Hence, is now known for all and on this interval, one has , with . We know that , and that , so we can start this construction of with and , which will yield . We then proceed by recurrence on as just indicated to uniquely determine on such that . We have also proved that the bound holds for all for the fixed constant . This proves that the continuous linear map has a continuous inverse , which of course must be linear.

By construction, for any , the function is also in and verifies the delay ODE (19). Conversely, given any , we know that there is a unique such that . We now prove that must belong to . Indeed, we can define explicitly by and by

(23)

for all . This relation clearly implies and . Hence, the restriction of to defines a linear bijection of onto .

Let now be the random path of the Brownian motion on . Then is a continuous random path on , starting at . By equation (22), the path verifies, for ,

(24)

where we have set

(25)

For , Ito calculus enables the integration by parts of the last integral in (