Control Theory and Experimental Design in Diffusion Processes
Abstract
This paper considers the problem of designing timedependent, realtime control policies for controllable nonlinear diffusion processes, with the goal of obtaining maximallyinformative observations about parameters of interest. More precisely, we maximize the expected Fisher information for the parameter obtained over the duration of the experiment, conditional on observations made up to that time. We propose to accomplish this with a twostep strategy: when the full state vector of the diffusion process is observable continuously, we formulate this as an optimal control problem and apply numerical techniques from stochastic optimal control to solve it. When observations are incomplete, infrequent, or noisy, we propose using standard filtering techniques to first estimate the state of the system, then apply the optimal control policy using the posterior expectation of the state. We assess the effectiveness of these methods in 3 situations: a paradigmatic bistable model from statistical physics, a model of action potential generation in neurons, and a model of a simple ecological system.
1 Introduction
This paper considers a novel problem in experimental design, that of adaptively choosing inputs in stochastic systems to yield more precise estimates of system parameters. We suppose an experimenter is studying a physical or biological system that can be modeled by a diffusion process of the general form
(1) 
Here, is the state vector the system at time , which evolves according to a (deterministic) vectorvalued drift function that depends on an unknown parameter , as well as a timevarying input under the direct control of the experimenter. In addition to the drift f, the system is also influenced by noise, modeled as a multivariate Brownian motion with a statedependent covariance matrix . The functional forms of f and are assumed known – usually from first principles – and experiments are to be conducted to estimate the value of . We consider the problem: How can the experimenter design a protocol for setting the control signal dynamically, in real time, using all the information available up to time to optimize the precision of our estimate of ? To accomplish this, we use the tools of dynamic programming in order to account not only for the information gained from the future evolution of the system, but also the future control policies that will be applied to it.
As a concrete example, consider a typical in vitro neuron experiment: an electrode is attached to an isolated neuron, through which the experimenter can inject a timedependent current and measure the resulting membrane voltage . By injecting a sufficiently large current, the experimenter can elicit a rapid electrical pulse, or “spike,” from the neuron. These are generated by the flow of ionic currents across voltagesensitive membrane ion channels. A standard model for spike generation in neurons is the MorrisLecar model (Morris and Lecar, 1981), which has the form
(2) 
The auxiliary functions , , , and are given in Appendix C. In these equations, represents the externally applied, timedependent current, which we treat as the control variable; the variable is the membrane voltage. The first equation expresses Kirchoff’s current conservation law, and the constants , and are socalled “reversal potentials” associated with the potassium, calcium, and leakage currents, respectively and represent the equilibrium voltage that would be attained if the corresponding current were the only one present. The quantities , , and are the corresponding conductances; when divided by the membrane capacitance , they give the equilibration rate associated with each type of current (Ermentrout and Terman, 2010). The dimensionless variable is a “gating variable.” It takes value in , and describes the fraction of membrane ion channels that are open at any time. The Brownian motions and model membrane voltage fluctuations and ion channel noise, respectively. (This model is discussed in more detail in Sect. 5.)
In typical experiments, the current is a relatively simple function of , e.g., a step, square pulse, or a sinusoid, and the amplitude of is adjusted so that a spike is triggered. We seek instead to set in realtime^{1}^{1}1This can potentially be accomplished in “dynamic clamp” experiments, in which the recorded voltages are fed directly into a computer, which also sets the injected current. according to the measured voltage trace up to that time, i.e., , and a precomputed “control policy.” The latter is designed to optimize the Fisher Information gained about a parameter of interest, e.g., or , so that a precise estimate can be obtained more quickly.
In this paper, we propose a general, twostep strategy for designing optimal control policies for such dynamic experiments. First, assuming the experimenter can observe the full state vector for all , we show that there exists a function such that if the experimenter sets the control parameter to be at all times , then the Fisher information of the unknown parameters will be maximized. We call this the full observation case. The optimal “control policy” in this case can be precomputed offline using stochastic optimal control techniques. Second, in the more common partial observation case where one only has partial, infrequent, and/or noisy measurements of the system state, we show that stochastic optimal control can be usefully combined with standard filtering methods in a “closedloop” setup(see Fig. 1). For simplicity, we focus here on the estimation of a single parameter in the system, but extensions to linear combinations of such criteria, such as the trace of the information matrix, are easy.
We illustrate the methods on two examples: a doublewell potential example taken from statistical physics and the neuron example mentioned above. Appendix D also illustrates an ecological experiment, in which models of the form (1) are used to describe relative population sizes in a contained glass tank or chemostat, and the rate at which resources are added to and removed from the system is the control signal. Together, these model systems represent useful contrasts in the quality of data, system behavior, and level of stochastic variation in system dynamics.
In particular, the doublewell example presents a scenario where our method might be especially useful: in that example, the relevant parameter is a potential barrier height controlling the frequency of “barriorcrossing” events. he barrier occupies a small volume of phase space, but has a significant impact on the dynamics. In this system, a dynamic control policy can dramatically increase barriercrossing frequency and hence the quality of our parameter estimates; we do not expct control schemes that do not make use of current state information, including existing optimal “openloop” controls (see below), to be as effective. The other examples show that our method can have varying degrees of impact: the neuron model represents a realistic application in which membrane voltage can be measured nearly continuously and with very high precision, and the signals exhibit small, but important, stochastic fluctuations on top of regular (e.g., oscillatory) behavior. Our control policy modifies the shape of these oscillations to improve estimation precision but parameters can also be reasonably estimated with a wellchosen constant input. In Appendix D, data from chemostat experiments can be gathered at only infrequent intervals, and are subject to variation due to binomial sampling of the tank and counting error. Their dynamics exhibit relatively high stochasticity but tend to an overall fixed level, meaning that for longrun experiments, our methods only alter the fixedpoint and are not substantially different from constant controls, although they may make a difference in shortterm experiments.
We view the main purpose (and contribution) of the present paper as being the presentation of a novel problem in experimental design and frameworks for approaching it, and to explore dynamical conditions when our method might be most effective. Accordingly, the systems studied here represent the simplest possible models and experiments in their scientific domain, i.e., they are all lowdimensional, involving only one or two state variables. The strategy proposed in this paper can potentially be applied to systems with more than two state variables, but direct application of the numerical methods employed in this paper to systems with more than a few dimensions may be prohibitively expensive. Higherdimensional systems thus require substantially new numerical techniques and ideas; this is beyond the scope of the present paper, and is the subject of ongoing work.
Related work
Recent statistical literature has given considerable attention to providing methods for performing inference in mechanistic, nonlinear dynamic systems models, both those described by ordinary differential equations (Brunel, 2008; Ramsay, Hooker, Campbell, and Cao, 2007; Girolami, Calderhead, and Chin, 2010; Huang and Wu, 2006) and explicitly stochastic models (Ionides, Bretó, and King, 2006; AïtShahalia, 2008; Wilkinson, 2006) along with more general modeling concerns such as providing diagnostic methods for goodness of fit and model improvement (Hooker, 2009; Müller and Yang, 2010; Hooker and Ellner, 2013). However, little attention has been given to the problem of designing experiments on dynamic systems so as to yield maximal information about the parameters of interest. In this paper, we use stochastic optimal control techniques to construct dynamic, datadriven experimental protocols, which can improve the accuracy of parameter estimates based on dynamic experiments.
To our knowledge, this is the first time that experimental design has been employed to adaptively select timevarying inputs for explicitly stochastic, nonlinear systems for the purposes of improving statistical inference. Within the context of stochastic models, adaptation can be expected to be particularly important; information about parameters is typically maximized at particular parts of the state space and the choice of that will maintain a system close to high information regions will depend on the current state of the system and cannot be determined prior to the experiment. In related work, experimental design has been considered for ordinary differential equation models in Bauer, Bock, Körkel, and Schlöder (2000) and the choice of subject and sampling times has been investigated in Wu and Ding (2002) for mixedeffects models with ordinary differential equations. Huggins and Paninski (2011) consider the optimal placement of probes on dendritic trees following linear dynamics. A number of recent studies have examined the use of perturbations to improve parameter estimation, with a view toward biochemical reaction networks; see Busetto et al. (2009); Nandy et al. (2012); Zechner et al. (2012); Ruess et al. (2013). However, in these papers the sequence of perturbations is decided prior to the experiment, and is not adapted to the observations except in a sequence of discrete experiments, i.e., these methods employ a form of “openloop” control (Fig. 1). We argue that when they are feasible, adaptive experiments, i.e., “closedloop” control, are particularly advantageous. They are provably optimal among the class of all experiments. Because they are based on the system’s current state, they are likely to be particularly helpful relative to nonadaptive states for systems with complex dynamics that cannot be forecast with much precision.
The techniques applied in this paper rely on dynamic programming and resemble methods in sequential designs, particularly for nonlinear regression problems and item response theory. See Chaudhuri and Mykland (1993) for likelihoodbased methods or Chaloner and Verdinelli (1995) for a review of Bayesian experimental design methods, Berger (1994) in item response theory and Bartroff and Lai (2010) for recent developments. Our problem, however, is quite distinct from the sequential choice of treatments for independent samples that is generally considered. In this paper, dynamic programming is employed to account for the dependence of the future of the stochastic process on the current choice of the control variable rather than its effect on future design choices via an updated estimate (or posterior distribution) for the parameters of interest. Indeed, unlike most sequential design methods we will not attempt to estimate parameters during the course of the experiment. In some contexts this can certainly be incorporated in our framework if so desired (see Thorbergsson and Hooker (2012) for techniques for doing this in discrete systems), while in others — e.g., single neuron experiments — this would be computationally challenging given the timescales involved. Where the optimal design depends on the parameter, we propose averaging the Fisher Information over a prior, corresponding to the use of a quadratic loss function in Chaloner and Larntz (1989).
Organization
In Sect. 2, we provide a precise formulation of the dynamic experimental design problem and outline our solution strategy for “full observation” problems where we can observe the state vector continuously. Sect. 3 discusses our approximate strategy for solving partial observation problems. An illustrative example, taken from statistical physics, is studied in Sect. 4. A detailed study of the MorrisLecar neuron model is presented in Sect. 5. Some final remarks and directions for further research are detailed in Sect. 6. Appendices provide details of the numerical implementation of our algorithms and a further study of chemostat models. We have implemented all the algorithms described in this paper in R, and used the implementation to compute the examples. The source code is available as an online appendix.
2 Maximizing Fisher Information in Diffusion Processes
In this section we demonstrate that when the full state vector of the system is continuously observable, the problem of designing optimal dynamic inputs for parameter estimation can be cast as a problem in control theory. Once this is done, we can follow welldeveloped methods. A brief review of some such methods is provided here (see also Appendix A); interested readers are referred to Kushner (2000) for further details.
The solution of the “full observation” problem described in this section will form the basis for the more realistic but challenging “partial observation” problem, which we discuss in Sect. 3.
2.1 Problem formulation and general solution strategy
Consider the multivariate diffusion process (1). Our goal is to estimate using observations of for , up to some final time . At each moment in time, we assume the experimenter can adjust the control using all available information from the past, i.e., for . Our problem is to find an optimal control policy, i.e., a procedure for choosing a control value based on past observations such that the resulting estimator of is “optimal,” which we interpret here as meaning “minimum asymptotic variance.” Other criteria for experimental design could be applied; Busetto et al. (2009), for example, maximizes the expected divergence between prior and posterior for the purpose of model selection (see also Chaloner and Larntz, 1989). Some (though not all) of these can be put in the same dynamic programing framework that we employ here.
To begin, recall that the likelihood of based on a single realization or sample path of the diffusion process in (1) is given by
(see, e.g., Rao, 1999). Since we are interested in choosing to minimize the asymptotic variance of the parameter estimate, we should maximize the associated Fisher information
(3) 
with . Note that the Fisher Information (3) — and hence the optimal control policy — in general depends on the parameter which is, of course, unknown. This can be handled by assuming a prior distribution for ; see Sect. 2.2. For the discussion at hand, we assume that a reasonably good initial guess of is available, which we use in computing (3). In this paper we focus on parameters that appear only in the drift term f, but note that similar expressions – also represented as integrals over the duration of the experiment – can be obtained for parameters that appear in . However, unless also depends on the state , the use of a control policy that changes only the drift may not change the Fisher Information for its parameters.
A priori, the control can be any function of the past history and , i.e., it is a stochastic process adapted to the filtration generated by Note also that the choice of a control at time affects the future cost, and hence subsequent choices of the control. Because of the form of (3) as an integral over time and the Marcovian dynamics of the system, the optimal control only depends on the current state (and not its past). That is, there is a function , where is the state space, is the duration of the experiment, and is the set of permissible control values, such that gives the optimal control policy. Hereafter we will refer to this function as the optimal control policy.
Computing the optimal control policy given an equation of the form (1) is a standard problem in stochastic optimal control. Here, we follow an approach due to Kushner (see, e.g., Kushner (1971, 2000); Bensoussan (1992) for details), which consists of two main steps: (i) a finite difference approximation is used to discretize the problem in both space and time, thus reducing the problem to one involving a finitestate Markov chain; and (ii) using the wellknown dynamic programming algorithm, construct an optimal control policy for the discretized problem. For conceptual and notational simplicity, we review the basic dynamic programming idea — applicable to both continuous and discrete state spaces — in the context of a timediscretized version of the diffusion process, deferring details of the spatial discretization to Appendix A.
Dynamic programming
Let us approximate the diffusion process (1) at discrete times , via the EulerMaruyama scheme (Kloeden and Platen, 1992), yielding
(4) 
with the independent vectors of independent standard normal random variables, and approximate the Fisher Information by
(5) 
Note that in this section, we use X to refer to nonrealized random variables over which we take expectations, and use x to refer to specific realizations and deterministic values.
Our goal is now to choose at each step to maximize over all possible choices of , with the requirement that (i) for all , and (ii) is a function of for , i.e., u is adapted to the filtration generated by . To this end, we define for each timestep the Fisher Information To Go (FITG)
where denotes the conditional expectation over all , given , and the supremum is taken over all controls adapted to . The FITG is the maximal expected FI, given that the we start at step in state ; the optimal Fisher information in Eq. (5) is given by
Dynamic programming is based on the observation that we can rewrite the FITG at step recursively:
(6) 
where is the conditional expectation given and that we choose the control value at step . (Note that choice of affects not only the second term inside the expectation, but also the first term, since choice of affects .) This means that if we start at the final step and work progressively backwards in time, then at each step , the FITG will already be computed. Thus, the problem reduces to a sequence of optimizations over , which is generally straightforward to do (see below). The resulting optimal control policy is
(7) 
To compute on a computer, we can discretize the state space and tabulate on a discrete grid of x’s. In this way, dynamic programming allows us to construct a good approximation of the control policy for each time step .
Optimization over . The optimization of in (7) may be done in a number of ways, depending on the allowed set of control values . For example, if is an interval or an open region in derivativebased optimization techniques like NewtonRaphson may be useful. In the present paper, we assume the control values are drawn from a relatively small finite set ; the optimization in Eq. (7) is thus done by picking the maximum over this finite set via exhaustive search.
In more general situations than what we consider in this paper, one may wish to allow to be an unbounded set such as In these situations, the FITG may be maximized at very large or even infinite values of . As an example, consider the univariate OrnsteinUhlenbeck process with additive control
(8) 
approximated via discrete timesteps as in (4); at step the FITG for is leading to
which is maximized for .
Aside from the question of mathematical wellposedness of the optimization problem (7), large values of constitutes an important practical issue: in experimental setups, may correspond to, say, an electrical current, and allowing to become too large may have undesirable consequences. An alternative to restricting to a finite set of values is to allow continuous values of but add a penalty to the Fisher Information, so that (7) has a finite, welldefined solution.
Throughout the discussion above, we have discretized time but assume that expectations with respect to can be calculated. To make dynamic programming practical, we further discretize the state space of our system (using the methods of Kushner (1971)) to obtain a finitestate Markovchain approximation, to which dynamic programming can be readily applied. As the details of this implementation are somewhat cumbersome to describe and not directly relevant to the remainder of the paper, we refer interested readers to Appendix A.
2.2 Parameter dependence and priors
Throughout the discussion above, we have constructed the Fisher Information and the resulting control policy assuming a specific . In general, will not be known — there would otherwise be little point in the experiment — and both Fisher Information and the optimal control policy may depend on its value.
We address this issue by constructing a prior over plausible values of and maximizing . The choice of this prior is important: the dynamics of may depend on the value of , and the computed control policy may be ineffective if the value of assumed in computing the optimal control policy is very different from the true value. This averaging is easy to implement numerically: a grid is chosen over the relevant region of parameter space, then the FI (3) is averaged over this grid of (a weighted average can be used to implement a nonuniform prior). It is easy to check that all the algorithms described in Sect. 2, as well as those described below, apply in a straightforward way in this setting.
Averaging Fisher Information over a prior in this way corresponds to the use of a quadratic loss function in a Bayesian design setting as in Chaloner and Larntz (1989). We have employed this here as we wish to minimize the variance of the estimated parameter. A Bayesian Doptimal design corresponding to averaging log Fisher Information (or its log determinant if is multivariate) can be employed following the same methods.
We note that while averaging the objective over possible values of the parameter is a natural procedure in some situations, it may not always be appropriate. When an estimate of parameters can be obtained and updated as the experiment progresses, it may be preferable to use the control policy associated with using these online estimates, e.g., by precomputing a set of control policies (for a grid of parameter values) and use the current best parameter estimate to choose a control policy. Alternatively, as the experiment progresses, Fisher Information could be averaged over a posterior for the parameters. These ideas are explored in Thorbergsson and Hooker (2012) in the context of discretestate Markov models; such online strategies are more complicated to implement for the types of diffusion processes discussed here, and will be examined in a future publication.
3 Partial Observations
The algorithms discussed in Sect. 2 are appropriate for a model in which the entire state vector is observed essentially continuously in time, without error. This is rarely the case in practice. In neural recording models, only membrane voltages can be measured, leaving as latent, although voltage measurements have both high accuracy and high frequency. In the chemostat system described in Appendix D, algae are measured by placing a slide with a sample from the ecology in a particle counter and are thus subject to sampling and counting errors and can be taken at most every few hours. Nitrogen can also be measured but with less accuracy and greater expense.
Generally, not all state variables will be measured in most applications, and measurements will be taken at discrete times and are will likely be corrupted by measurement errors. We denote both conditions by the term partial observations. The observation process causes two problems for the strategy outlined in Sect. 2.1: first, since not all state variables are being measured, the Fisher information in Eq. (5) is no longer the correct expression for the asymptotic estimator variance; and second, the dynamic programming methodology outlined in Sect. 2 is no longer applicable.
These difficulties associated with partially observed systems are of a fundamental nature: projections of Markov processes are typically not Markovian, and the Markov property is essential to the dynamic programming algorithm (see Sect. 2.1 and references). While one can derive an explicit expression for the Fisher information of partially observed diffusion processes, the functional no longer has the form of a timeintegral of a function of the state vector (see Louis (1982)), making it much more difficult to work with. Even when algebraic expressions for the Fisher Information are available, either in specific cases (linear diffusions with continuouslyobserved stochastic noise, also discretestate systems) or by approximation (Komorowski et al., 2011, 2012), a straightforward application of dynamic program cannot be employed directly. See Thorbergsson and Hooker (2012) for approximations in discretestate systems.
In this section, we propose a simple approximation strategy aimed at overcoming these difficulties. We also provide some theoretical justification for the approximation strategy in the context of linear systems. We assume throughout this section that the dynamics have been timediscretized (see Sect. 2.1) to yield a sequence of state vectors , and that every steps a noisy observation is made, yielding the observation vectors , where is the number of observations. Furthermore, we assume that a noise model is given. The assumption of strictly timeperiodic observations is for simplicity only and can be easily relaxed.
3.1 Filtering and estimation for partially observed systems
Our strategy for dealing with partially observed systems entails the following steps:

We solve the full observation problem by finding the control policy that minimizes the fullobservation Fisher Information (FOFI) given by (3).

During the experiment, filtering techniques are applied to provide realtime estimates of the system state, which are then plugged into to obtain a control value.
While this strategy is not expected to be optimal in general because the control values are determined by the current state rather than the entire past history, we argue that it nevertheless provides an efficient control strategy for problems where accurate filtering is feasible, i.e., when the conditional variation of the process given observations of it is small.
Step (i) above follows exactly the methodology from Sect. 2. The main difference between the partial and full observation cases is the need for filtering. For conceptual and implementation simplicity, in this paper we use a version of a particle filter (see, e.g., Liu, 2008). These allow us to both produce a filter distribution for the and also to calculate a likelihood for each candidate value of . This allows us to employ a maximum likelihood estimate of in our simulations. The details of our particle filter and likelihood maximization algorithm are given in Appendix B.
3.2 Approximation Methods and Linear Systems
We now provide a partial theoretical justification of our approximation strategy for systems in which the observations are very informative about . This requires us to both examine the utility of the full observation Fisher Information as well as the use of a filtered estimate of within the control policy.
Beginning with the approximation to Fisher information, within a partially observed system the complete data log likelihood can be written as
assuming that does not appear in the observation process. From the formulation for the observed Fisher Information
the (expected) Fisher Information for the partially observed diffusion process can be written as
(see Louis, 1982) where we observe that the first term is the objective of our dynamic program. Taking a Taylor expansion of in centered on , we observe that the second term shrinks with assuming that is differentiable in x.
The argument above indicates that, provided the conditional variance of x given observations is small, FOFI is a reasonable approximation to the Fisher Information obtained from . Indeed, in the limiting case that this variance goes to zero, we recover FOFI exactly.
It remains necessary to demonstrate that employing the filtered estimate within the control policy will approximately maximize FOFI. A full theoretical justificaiton, however, is not straightforward: for linear diffusion processes, , with a quadratic objective function , the Separation Theorem guarantees that this strategy is optimal (see Kushner, 1971). This needs not be the case for nonlinear systems or nonquadratic objective functions. Extensions of the separation theorem have been developed in Kilicaslan and Banks (2009) based on successive approximations of (1) in terms of a linear system with timevarying parameters. Particle filter methods can also be employed to average the future cost over the current distribution of the estimated state variables (Andrieu et al., 2003). Both of these schemes require rerunning the dynamic program at each time point, a strategy that will not always be computationally feasible.
In contrast, the approach of using the filtered estimate of the state allows the map from state variables to controls to be precomputed. This was employed in Botchu and Ungarala (2007) for optimal control, for example, and we demonstrate here that it is also effective for estimating parameters. If we consider the limit of informative observation, that is in a sequence of systems in which where is the filtered estimate, then the Fisher Information resulting from this procedure can be shown to converge to its optimal value under suitable regularity conditions. The details of the conditions required will depend on contextspecific factors such as whether the control policy takes discrete or continuous values, and we do not attempt to fill these in here. The behavior suggested here can be seen in the comparative performance of results in the MorrisLecar and chemostat simulations examined here, as well as in results in Thorbergsson and Hooker (2012).
4 An illustrative example
To illustrate our method and its impact, we present a simulation study for a model in which we expect our methods will yield significant improvements in parameter estimation. We consider a small particle trapped in a doublewell potential (see Fig. 2(a)). This is a commonlyused model in statistical and chemical physics (Van Kampen, 1992; Hänggi et al., 1990); we use it here to illustrate our methodology, and to indicate when our methodology may be particularly effective.
The governing equation is
(9) 
where is the position of the particle and the potential is given by
(10) 
(See Fig. 2(a).) Here, we study the model with , , and . We selected the last value randomly picked from an interval centered around in order to avoid cherrypicking favorable parameter values. In this regime, the unforced system (i.e., with ) exhibits dynamics on two separate timescales: on the shorter timescale, the particle position fluctuates about the bottom of one of the wells due to thermal fluctuation; on longer timescales, on the order of where is the depth of the well, the system jumps between the two potential wells (Van Kampen, 1992; Freidlin and Wentzell, 2012). A sample path is shown in Fig. 2(c) (bottom, dark black curve).



(a) The potential  (b) Optimal control at 
[4ex]
(c) Controlled and uncontrolled sample paths
Models of this type are paradigms for physical systems with multiple metastable states, and calculating transition rates between metastable states (due to large deviations in the driving white noise process) are of interest in, e.g., in reaction rate theory. By extension, the heights of relevant potential barriers are also of interest. Our goal here is to estimate the barrier height in Eq. (10).
For this 1D model, it is straightforward to solve the optimal control problem. Specificially, we discretize the interval (with high probability the position will stay within this range) by 100 grid points. The control values are drawn from the finite set . We also assume a uniform prior for on the interval , which we discretize into 10 parameter values. The diffusion process (9) is discretized in time with a timestep of ; this is sufficiently small to satisfy the criteria set forth in Sect. A. An optimal control policy is computed over the time interval for various values of . The computed control for stabilize rather quickly as increases; the control at is shown in Fig. 2(b). Not surprisingly, the control encourages more frequent jumps by pushing left when the particle is in the right well, and vice versa.
To see the control policy “in action,” we carry out simulations of the controlled diffusion proceess for and for . For each choice of , we carry out 256 independent trials, and use the result to estimate the barrier height . The diffusion process is observed every 0.25 units of time; at observation times the system state is estimated, and the control value updated. The observations are assumed to have additive, Gaussian observation noise of standard deviation 0.05. Because of the observation noise, even though this is a 1D model (and so the full state “vector” is observed), filtering is still needed. Here, we use a particle filter with particles; far fewer particles would have sufficed for the controlled process, but the uncontrolled process needed more particles to obtain a reasonable parameter estimate. A sample trajectory subjected to the optimal control is shown in Fig. 2(c) (lighter / blue curve).
Duration  Control  N  In range  Mean  Bias  Std. Dev.  Std. Dev. Err. 

4  Dynamic  256  100%  4.233  0.3933  0.05947  0.002628 
4  0  256  98%  4.242  0.4016  0.3133  0.01385 
30  Dynamic  256  100%  4.225  0.3854  0.02098  9.272e4 
30  0  256  99.6%  4.223  0.3832  0.2888  0.01277 
[2ex] (a) Continuous noisefree observations
[4ex]
Duration  Control  N  In range  Mean  Bias  Std. Dev.  Std. Dev. Err 

4  Dynamic  256  100%  4.225  0.3846  0.1094  0.004833 
4  0  256  77%  4.202  0.3623  0.6048  0.02673 
30  Dynamic  256  100%  4.197  0.357  0.04881  0.002157 
30  0  256  71%  4.316  0.4763  0.5953  0.02631 
[2ex] (b) Infrequent, noisy observations
Table 1 shows the results of these trials. In Table 1(a), results are shown for full observation trials, in which the system is observed at every timestep, i.e., observation period = . In addition to the controlled diffusion process, we also computed estimates for the diffusion process with a range of constant control values , and found that gives minimumvariance estimates among the control values tested. As can be seen, the standard deviation of the estimate based on the controlled diffusion is significantly smaller for both and — for by a factor of and for by a factor of We note that there is a measurable bias in the estimates from both dynamic and constant control regimes. In practical experiments, this can be examined and corrected by conducting a simulation; we note that the dynamic control policy does not appear to affect the magnitude of this bias.
Table 1(b) shows the corresponding results for the case of noisyandinfrequent observations. For , we obtain a roughly factor of 6 reduction in standard deviation, while for , roughly a factor of 11. The reduction in the estimator variance is significant, though as expected somewhat less than the full observation case, at least over longer timescales. More telling is the fraction of estimates that were “inrange” (see the discussion of MLE in Appendix B): the controlled diffusion process always produced estimates that were within the prior parameter range, whereas the uncontrolled diffusion produces a nontrivial number of parameter estimates outside the range.
The reason that the optimal control strategy is particularly effective in this model is that while the barrier height has a significant impact on the dynamics of the system over long timescales, it only impacts the dynamics in a small part of state space, and on shorter timescales it has relatively little effect. The optimal control policy is able to drive the system into crossing the barrier much more frequently, thereby yielding more information about . Note in order to know which way to drive the system to speed up barriercrossing, the controller needs to know which state the particle is in.
We expect that, in general, our method will be particularly effective in situations like this, where the informationrich region of phase space is relatively small (in terms of how much time the uncontrolled trajectory spends there), and the control is able to increase the frequency of visits to these regions.
5 MorrisLecar (ML) Neuron Model
We now demonstrate the application of the optimal control methodology to a more complex model with different dynamics, with the goal of examining how the type of dynamics can affect the efficacy of parameter estimation in the presence of dynamic control. Appendix D presents a further exploration of a different system where our methods are less helpful. The MorrisLecar (ML) neuron model mentioned in the introduction is often used to model membrane voltage oscillations and other properties of neuronal dynamics; it describes how membrane voltage and ion channels interact to generate electrical impulses, or “spikes,” which is the primary means for neuronal information transmission. The ML model is planar, and hence more amenable to analysis than higherdimensional models like the HodgkinHuxley equations (Hodgkin and Huxley, 1952). At the same time, it faithfully captures certain important aspects of neuronal dynamics (Ermentrout and Rinzel, 1998), making it a commonlyused model in computational neuroscience. It has a rich bifurcation structure, and exhibits two dramatically different timescales. See Ermentrout and Terman (2010) and Ermentrout and Rinzel (1998) for a derivation and description of this model and its behavior.
The ML model is given by (2). To provide further details, the second equation can be interpreted as the master equation for a twostate Markov chain describing the opening and closing of ion channels. The terms with and are independent noise terms modeling different sources of noise: the term models voltage fluctuations, and is simply additive white noise. The term models random fluctuations in the number of open ion channels due to finitesize effects; here we use the function
(11) 
to scale the Wiener process. This function arises from an underlying Markov chain model; see, e.g., Kurtz (1971, 1981); Smith (2002).
Here, we assume a typical experimental setup in which an electrode (a “dynamic clamp”) is attached to the neuron, through which an experimenter can inject a current and measure the resulting voltage ; the gating variable is not directly observable. The electrode is usually attached directly to a computer, which records the measured voltage trace and generates the timedependent injected current, making this type of experiment a natural candidate for our method. Neuronal membrane voltages are measured with signaltonoise ratios of 1000 to 1 or more. On the other hand, dynamical events of interest can take place on timescales of milliseconds or less. So for this application it is vital that one precomputes as much as possible. These speed requirements may necessitate the use of computationally cheaper — but less accurate — state estimation technique, e.g., ensemble Kalman filters. Since we are interested evaluating the utility of dynamic control via simulations, we continue to use particle filters here.
We focus on a parameter regime in which the noiseless system can switch between having a globallyattracting fixed point and an unstable fixed point surrounded by a stable limit cycle (Fig. 3(a) illustrates the latter). The limit cycle represents a “tonically” (periodically) spiking neuron; the effect of noise is to “smear out” this limit cycle. The precise parameters we use come from Ermentrout and Terman (2010); they are summarized in Appendix C. The key parameter here is the injected DC current; in Fig. 3(a) this is . As one decreases from this value, the system undergoes a subcritical Hopf bifurcation: the unstable fixed point becomes stable, and at the same time an unstable periodic orbit emerges around the fixed point (Ermentrout and Terman, 2010). Postbifurcation, the system is bistable: while tonic spiking continues to be viable, a quiescent state (corresponding to the stable fixed point) has emerged. If we decrease even further, the stable and unstable cycles collide in a saddlenode bifurcation, leaving behind a single stable fixed point.
In this example, our control is The control values we use are corresponding to (we take ). The three values of the injected current place the system in the stable fixed point, bistable, and limit cycle regimes, respectively.
(a) Noiseless ML phase portrait  (b) Controlled and uncontrolled trajectories 
Simulation results
We have performed simulations in which we estimated the calcium conductance constant from simulated data; this is chosen over other parameters because it gives rise to a more complex control policy (as we show below). In detail, we assume a flat prior for on the interval ; in all experiments reported below, we fix a randomlygenerated value of 4.415, which we take to be the “true” value. The optimal control is computed using a timestep of ms, and we assume that measurements are available at every time step. State space is discretized by cutting the region into bins. Unless otherwise stated, “constant control” means . The value of was chosen as the best performing among several possible values of a constant control, which all exhibited significantly higher standard deviations.
(a)  (b) 
Fig. 3(b) shows examples of controlled and uncontrolled (with , or ) trajectories: the perturbations are measurable, but not large. Simulation results for estimates of using trajectories of duration ms are given in Table 2.
Observation  Control  Mean  Bias  Std. Dev. 

Full  Dynamic  4.416  6.315e4  0.01106 
Full  5.0  4.416  8.82e4  0.01405 
Noisy partial  Dynamic  4.413  0.002062  0.01579 
Noisy partial  5.0  4.416  0.001458  0.01847 
To better understand why the dynamic control performs better, we examine the structure of the control policy and Fisher information. Fig. 4(a) shows the Fisher information function for , which is easily shown to be proportional to
the controlled trajectory is superposed. From the geometry, one expects the optimal control should either find a way to keep the trajectory at a constant voltage or, failing that, try to increase the firing rate, so that trajectories cross the informationrich region as much as possible. As can be seen, the controlled trajectory does exactly that: starting at the resting voltage (around 80 mV), it jumps toward 0 mV, runs along the “informationrich” strip around before dipping back toward the resting voltage. In comparison, the uncontrolled trajectory in Fig. 3(b) appears to spend less time in this region. Fig. 4(b) shows how the optimal control policy accomplishes this by suitably pushing the trajectory at critical parts of its cycle, thus increasing the overall firing rate.
To see more clearly the effect of the dynamic control, we plot time traces of the controlled and uncontrolled trajectories in Fig. 5. As can be seen, the dynamic control not only tries to push the trajectory into the informationrich region, it also makes sure it spends more time there.
Some final remarks on the structure of the optimal control:

We note that the optimal control depends a great deal on the parameter being estimated. If one were to try to estimate , for example, the resulting control policy is essentially constant for most of phase space.

In this dynamic regime, with FI as in Fig. 4(a) (a unimodel function of ), it is natural to expect that increasing firing rate leads to an increase in the FI. This can be accomplished by either dynamic control or by static control. In more general situations, e.g., neuron models with additional currents and exhibiting dynamics on multiple timescales (Dayan et al., 2001), we expect dynamic control to yield a greater gain in FI. However, since such models usually entail degrees of freedom, they would be difficult to study using present numerical methods, and we leave the study of such models for future work.
6 Discussion
There has been increasing interest in combining experimental data and statistical methods with mechanistic dynamical models describing system behavior. This paper adds to this literature in considering the problem of designing inputs into such experiments that are directed at improving the precision of parameter estimates that result from them. In particular, we have demonstrated that in diffusion processes, maximizing Fisher Information about a parameter can be cast as a problem of optimal control and shown that using this strategy can substantially improve parameter estimates. In order to make control methods feasible when systems are observed with noise or only some state variables are observable, we employ the strategy of estimating the value of the state online and using this within a precomputed control policy.
We have demonstrated this approach on two examples that showcase when this form of adaptive control is most likely to be useful. One situation when we expect significant benefit from dynamic control is when visits to the informationrich regions of state space are relatively rare, as in the doublewell example. As we have seen, optimal control can be effective in increasing the frequency of visits to informationrich regions. In systems where trajectories naturally return to informationrich regions in a recurrent fashion even with static controls, our method may still yield modest benefits, but the exact degree of information gain will depend on the specific dynamic situation.
In systems with stable fixed points that are already informative about the parameters of interest, dynamic control may not be better than simply choosing an optimal static control. However, our methods also yield information about which constant control is likely to be most useful; this is helpful in, e.g., the chemostat experiments described in Appendix D, where measurements are relatively infrequent and dynamic control is relatively easy to implement. In systems such as (8) in which the location of fixed points do not yield information about parameters, a control policy that repeatedly returns the system to informative transient states could be more advantageous.
There remain numerous unresolved problems and open areas in which these methods can be extended. Computational cost, both in speed and memory, represent the largest limiting factor in employing these methods. In particular, the storage and computation costs of the policy scale exponentially with the number of state variables. Possible strategies here involve the application of sparse grid strategies for discretizing the state variables (Xiu, 2009) or – more heuristically – intensive Monte Carlo methods to simulate the state variables forward combined with machine learning methods to estimate control strategies. It may also be possible to precompute control policies only for highprobability regions of the state space and employ techniques of approximate dynamic programming (Powell, 2007).
Further extensions involve alternative targets for the control policy, including the Bayesian criteria explored in Busetto et al. (2009). Thorbergsson and Hooker (2012) explores maximizing the Fisher Information for a partially observed Markov decision process, although this exacerbates the computational difficulties discussed above. Another direction for extension is to systems with vector parameters:our methods can be readily extended to maximizing the trace of a Fisher Information matrix (optimal designs); other criteria such as the determinant (optimality) that can be expressed as combinations of time integrals can also be employed using the same ideas.
We have dealt with parameter uncertainty before the experiment by averaging the Fisher Information over the prior within our objective. Alternative strategies such as maximizing the minimum Fisher Information in a range of parameters can be implemented within the numerical control strategy. We have also not experimented with updating our prior as the experiment progresses; this can be accomplished within particle filtering, for example by treating parameters as additional state variables – an approach taken in Ionides et al. (2006) – although when observations are highly informative about the system state, this can result in particle filter collapse (Snyder et al., 2008) (recent advances such as implicit filters (Chorin et al., 2012) may be helpful here). Such an approach would also require either recomputing the optimal policy after each observation, or precomputing the control for a range of parameters and suitably combining them, but when the optimal control policy depends strongly on the system parameters such updates may be expected to produce significant improvements.
Acknowledgments.
We thank Prof. Cindy Greenwood for many helpful comments throughout the initial stages of this work, and Prof. Xueying Wang for help with the MorrisLecar model. We are also grateful to the anonymous referees for numerous suggestions and relevant references. This work was supported in part by the Statistical and Applied Mathematical Sciences Institute (KL, BR), the National Science Foundation under grants DMS1418775 (KL), DMS1053252 and DEB1353039 (GH).
References
 AïtShahalia (2008) AïtShahalia, Y. (2008). Closedform likelihood expansions for multivariate diffusions. Annals of Statistics 36(2), 906–937.
 Andrieu et al. (2003) Andrieu, C., A. Doucet, S. S. Singh, and V. B. Tadić (2003). Proceedings of the IEEE 92, 423 – 438.
 Asmussen and Glynn (2008) Asmussen, S. and P. W. Glynn (2008). Stochastic Simulation: Algorithms and Analysis. Springer.
 Bartroff and Lai (2010) Bartroff, J. and T. L. Lai (2010). Approximate dynamic programming and its applications to the design of pahse 1 cancer trials. Statistical Science 25, 245–257.
 Bauer et al. (2000) Bauer, I., H. G. Bock, S. Körkel, and J. P. Schlöder (2000). Numerical methods for optimum experimental design in DAE systems. Journal of Computational and Applied Mathematics 120, 1–25.
 Bensoussan (1992) Bensoussan, A. (1992). Stochastic Control of Partially Observable Systems. Cambridge, UK: Cambridge University Press.
 Berger (1994) Berger, M. P. F. (1994). Doptimal sequential sampling designs for item response theory models. Journal of Educational Statistics 19(1), pp. 43–56.
 Botchu and Ungarala (2007) Botchu, S. K. and S. Ungarala (2007). Nonlinear model predictive control based on sequential monte carlo state estimation. In 8th International IFAC Symposium on Dynamics and Control of Process Systems.
 Brunel (2008) Brunel, N. (2008). Parameter estimation of ODEs via nonparametric estimators. Electronic Journal of Statistics 2, 1242–1267.
 Busetto et al. (2009) Busetto, A. G., C. S. Ong, and J. M. Buhmann (2009). Optimized expected information gain for nonlinear dynamical systems. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, New York, NY, USA, pp. 97–104. ACM.
 Chaloner and Larntz (1989) Chaloner, K. and K. Larntz (1989). Optimal bayesian design applied to logistic regression experiments. Journal of Statistical Planning and Inference 21, 191–208.
 Chaloner and Verdinelli (1995) Chaloner, K. and I. Verdinelli (1995). Bayesian dexperimental design: A review. Statistical Science 10(3), 273–304.
 Chaudhuri and Mykland (1993) Chaudhuri, P. and P. A. Mykland (1993). Nonlinear experiments: Optimal design and inference based on likelihood. Journal of the American Statistical Association 88(422), 538–546.
 Chorin et al. (2012) Chorin, A., M. Morzfeld, X. Tu, and E. Atkins (2012). A random map implementation of implicit filters. Journal of Computational Physics 231, 2049–2066.
 Dayan et al. (2001) Dayan, P., L. F. Abbott, and L. Abbott (2001). Theoretical neuroscience: Computational and mathematical modeling of neural systems.
 Ermentrout and Rinzel (1998) Ermentrout, G. B. and J. Rinzel (1998). Analysis of neural excitability and oscillations. In C. Koch and I. Segev (Eds.), Methods of Computational Neuroscience, pp. 251–292. Bradford.
 Ermentrout and Terman (2010) Ermentrout, G. B. and D. Terman (2010). Foundations of Mathematical Neuroscience. Springer.
 Freidlin and Wentzell (2012) Freidlin, M. I. and A. D. Wentzell (2012). Random perturbations of dynamical systems, Volume 260. Springer.
 Girolami et al. (2010) Girolami, M., B. Calderhead, and S. A. Chin (2010). Riemannian manifold Hamiltonian Monte Carlo. Journal of the Royal Statistical Society in press.
 Hänggi et al. (1990) Hänggi, P., P. Talkner, and M. Borkovec (1990). Reactionrate theory: fifty years after kramers. Reviews of Modern Physics 62(2), 251.
 Hodgkin and Huxley (1952) Hodgkin, A. L. and A. F. Huxley (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 133, 444–479.
 Hooker (2009) Hooker, G. (2009). Forcing function diagnostics for nonlinear dynamics. Biometrics 65, 613–620.
 Hooker and Ellner (2013) Hooker, G. and S. P. Ellner (2013). Goodness of fit and nonlinear dynamics: Misspecified rates or misspecified states? under review.
 Huang and Wu (2006) Huang, Y. and H. Wu (2006). Bayesian experimental design for longterm longitudinal hiv dynamic studies. Journal of Statistical Planning and Inference 138, 105–113.
 Huggins and Paninski (2011) Huggins, J. H. and L. Paninski (2011). Optimal experimental design for sampling voltage on dendritic trees in the lowSNR regime. Journal of Computational Neuroscience, to appear.
 Ionides et al. (2006) Ionides, E. L., C. Bretó, and A. A. King (2006). Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences.
 Kilicaslan and Banks (2009) Kilicaslan, S. and S. P. Banks (2009). A separation theorem for nonlinear dsystems. Automatica 45, 928–935.
 Kloeden and Platen (1992) Kloeden, P. E. and E. Platen (1992). Numerical solution of stochastic differential equations, Volume 23. Springer Verlag.
 Komorowski et al. (2011) Komorowski, M., M. J. Costa, D. a. Rand, and M. P. H. Stumpf (2011, May). Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proceedings of the National Academy of Sciences of the United States of America 108(21), 8645–50.
 Komorowski et al. (2012) Komorowski, M., J. Zurauskiene, and M. P. H. Stumpf (2012, March). StochSens–Matlab package for sensitivity analysis of stochastic chemical systems. Bioinformatics (Oxford, England) 28(5), 731–3.
 Kurtz (1971) Kurtz, T. G. (1971). Limit theorems for sequences of jump markov processes approximating ordinary differential processes. Journal of Applied Probability 8, 344–356.
 Kurtz (1981) Kurtz, T. G. (1981). Approximation of Population Processes, Volume 36 of CBMSNSF Regional Conference Series in Applied Mathematics. SIAM.
 Kushner (1971) Kushner, H. (1971). Introduction to Stochastic Control. New York: Holt, Rinehart and Winston.
 Kushner (2000) Kushner, H. (2000). Numerical Methods for Stochastic Control Problems in Continuous Time. New York: Springer.
 Liu (2008) Liu, J. S. (2008). Monte Carlo Strategies in Scientific Computing. Springer.
 Louis (1982) Louis, T. A. (1982). Finding the observed information matrix when using the em algorithm. Journal of the Royal Statistical Society, Series B 44(2), 226–233.
 Morris and Lecar (1981) Morris, C. and H. Lecar (1981). Voltage oscillations in the barnacle giant muscle fiber. Biophysical Journal 35(1), 193Â213.
 Müller and Yang (2010) Müller, H.G. and W. Yang (2010). Dynamic relations for sparsely sampled gaussian processes. TEST 19, 1–29.
 Nandy et al. (2012) Nandy, P., M. Unger, C. Zechner, and H. Koeppl (2012). Optimal Perturbations for the Identification of Stochastic Reaction Dynamics. Computer and Information Science.
 Powell (2007) Powell, W. B. (2007). Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley.
 Ramsay et al. (2007) Ramsay, J. O., G. Hooker, D. Campbell, and J. Cao (2007). Parameter estimation in differential equations: A generalized smoothing approach. Journal of the Royal Statistical Society, Series B 16, 741–796.
 Rao (1999) Rao, B. L. S. P. (1999). Semimartingales and their statistical inference. New York: Chapman and Hall/CRC.
 Ruess et al. (2013) Ruess, J., A. MiliasArgeitis, and J. Lygeros (2013). Designing experiments to understand the variability in biochemical reaction networks Designing experiments to understand the variability in biochemical reaction networks. Journal of the Royal Society Interface 10(August).
 Smith (2002) Smith, G. (2002). Modeling the stochastic gating of ion channels. In Computational Cell Biology, Volume 20 (II) of Interdisciplinary Applied Mathematics.
 Snyder et al. (2008) Snyder, C., T. Bengtsson, P. Bickel, and J. Anderson (2008). Obstacles to highdimensional particle filtering. Monthly Weather Review 136(12), 4629–4640.
 Strikwerda (1989) Strikwerda, J. C. (1989). Finite Difference Schemes for Partial Differential Equations. New York: Chapman & Hall.
 Thorbergsson and Hooker (2012) Thorbergsson, L. and G. Hooker (2012). Experimental design in partially observed markov decision processes. under review.
 Van Kampen (1992) Van Kampen, N. G. (1992). Stochastic processes in physics and chemistry, Volume 1. North holland.
 Wilkinson (2006) Wilkinson, D. J. (2006). Stochastic Modelling for Systems Biology. London: Chapman and Hall/CRC.
 Wu and Ding (2002) Wu, H. and L. Ding (2002). Identification of significant host factors for hiv dynamics modeled by nonlinear mixedeffect models. Statistics in Medicine 21, 753–771.
 Xiu (2009) Xiu, D. (2009). Fast numerical methods for stochastic computations: a review. Commun. Comput. Phys. 5, 242–272.
 Zechner et al. (2012) Zechner, C., P. Nandy, M. Unger, and H. Koeppl (2012, December). Optimal variational perturbations for the inference of stochastic reaction dynamics. 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), 5336–5341.
Appendix A Markov chain approximation
This section provides technical details of our numerical methods for obtaining the optimal control policy. As noted above, in order to execute the dynamic programming strategy on a computer and tabluate the resulting control policy , it is necessary to discretize the state space. Here we follow a discretization strategy due to Kushner (Kushner, 1971). The starting point is the time distribution of the diffusion , which satisfies the forward equation
(12) 
where is the optimal control plan described above in the limit We discretize the optimization problem (7) as follows: (i) we cover the relevant portions of state space with a finite grid of points with spacing ; (ii) we then discetize Eq. (12) in space and time by a finite difference approximation on this grid with a timestep depending on ; and (iii) we interpret the coefficients of the finite difference approximation as the transition probabilities of a finitestate Markov chain (whose states are the discrete grid points chosen in (i)) to approximate the diffusion process. The dynamic programming algorithm can then be applied directly to this finite state Markov chain in a straightforward manner.
As an example of what we mean by (iii), consider the simple diffusion equation . The standard finite difference discretization of this PDE is
where Thus, if , we can interpret the equation above as describing a discrete statespace Markov chain whose states are with transition probability and . Note that the condition imposes an upper bound on the stepsize, specifically . Timestep limitations like this are a general feature of finite difference discretization schemes, and a poorlydesigned finite difference scheme can lead to a great increase in computational cost.
The main technical problem is thus step (ii), since we need to ensure that the coefficients of the finite difference scheme can be interpreted as probabilities. In our implementation, we employ a “split operator” finite difference discretization. Roughly speaking, this amounts to discretizing the drift and diffusion terms in the SDE (1) separately, then composing the resulting difference schemes. This is a standard technique often used for the numerical solution of partial differential equations (see, e.g., Strikwerda, 1989). In our test examples, the use of such a split operator scheme permits us to use larger grid spacings and timesteps while maintaining numerical stability, and can significantly speed up overall running speeds.
In detail: first, suppose for simplicity that the state space in Eq. (1) is (generalization to higher dimensions is straightforward, but with more complex notation). For each , let denote the regular grid with spacing . For the moment, assume extends to all of ; boundary conditions are discussed below. The grid will be the state space of our discretetime approximating Markov chain . To choose time step sizes, we fix a constant and an integer , and set the timestep to be Then our approximation can be described as follows:
Step 1. Suppose at time , the chain has state Then
with probability , jump to if and to if ; and
stay at with probability
Step 2. Let denote the state of the chain after the previous step. Then jump to with probability , and stay at with probability .
This discretization scheme treats the drift and diffusion terms in Eq. (1) separately, hence the name “split operator.” It is straightforward to show that the composite finite difference discretization provides a consistent discretization of Eq. (12). Step 1 interprets the drift term as a biased jump, and Step 2 interprets the diffusion term as one step of a symmetric random walk.
Both conceptually and in terms of programming, split operator schemes like the one above are easy to work with; however they also impart an additional variance on the motion of the approximating Markov chain. In the scheme above, this extra variance is on the order of per step; the relative effect of this extra variance (socalled numerical diffusion) vanishes as .
Clearly, in order for the various expressions above to be valid transition probabilities, we must have We also need . Since , the second condition can always be achieved by taking sufficiently small. The first condition, however, places a rather stringent constraint on the timestep The purpose of introducing the “skip factor” is precisely to partially alleviate this constraint, at the cost of losing some accuracy. In practice, going from to can have a large impact on the overall running time. If , our scheme closely resembles the “upwind” scheme of Kushner (Kushner, 1971); a scheme similar to our case is also described there.
In practice, rather than choosing and , we usually first choose and , and take with a constant of proportionality between 1 and 2, where is the maximum of over the domain of interest. This guarantees that the scheme converges (see below). We usually use relatively coarse grids, as (i) the structure of the optimal control policy is not too fine, and (ii) we usually operate in the presence of observation and dynamical noise, so there is not much point trying to pin down fine structures in the control policy.
Convergence. In general, a sufficient condition for the Markov chain approximation method to be valid is that the approximating chain satisfy
(Kushner, 1971). In the above, denotes conditional expectation given all information up to step , denotes the corresponding autocovariance matrix, and is a constant. Note the last line should be interpreted to hold surely, i.e., it says the discrete Markov chain makes jumps of in size. If the conditions above are satisfied, then as , the optimal control policy for the approximating chain will converge to an optimal control policy for the diffusion (1). It is easy to see that our scheme satisfies the convergence criteria above with
Boundary conditions. In practice, must be a finite grid. We assume that it spans a subset of sufficiently large that on the timescale of interest, trajectories have very low probability of exiting. We then impose that when the approximating Markov chain attempts to exit the domain, it is forced to stay at its current state. This gives a simple way to obtain a Markov chain on bounded grids.
Higher dimensions. The generalization of the preceding scheme to higher dimensions is straightforward: we simply treat each dimension separately and independently.
Appendix B Numerical Methods for Particle Filters and Maximum Likelihood Estimation
In this appendix we provide details of our particle filter methods to estimate both the state value given current observations and for maximum likelihood estimation. We discretize the diffusion process using the standard EulerMaruyama approximation (4), and incorporate observations via Bayes’s formula. In all the examples considered in this paper, the observations are linear functions of the state vector plus Gaussian error. This allows us to avoid weighting and resampling in the particle filter. Instead, we incorporate the observations within the EulerMaruyama scheme (4) by first applying the drift and then sampling the Gaussian disturbance conditional on the observation at time . When multiple steps are taken between observations, this update is applied for the final step. In practical terms, our particle filters are simpler and more robust than bootstrap filters, and do not differ significantly from other popular filtering schemes, e.g., the ensemble Kalman filter.
One additional implementation detail: when observations are infrequent, we simply hold the control value constant between observations. In principle, we can also use the past information to extrapolate the state trajectory between observations, and use a timedependent control. But for the examples considered in this paper, the simpler strategy appears sufficient; it is also more realistically applicable in some experiments.
Maximum likelihood estimates
Another advantage of using particle filters is that we can easily evaluate the likelihood as a biproduct. Let be the relevant parameter interval (see Sect. 2.2), and for simplicity assume a flat prior over . We discretize into a discrete grid of size . For each putative parameter value in we run a particle filter and compute an estimate of the loglikelihood using the formula (Ionides et al., 2006)
where denotes the ensemble of particles at step . If the computed loglikelihood curve takes on a maximum in the interior of the interval we do a standard 3point quadratic interpolation around the maximum and find the maximizer; we use this as the MLE. On the rare occasions when the loglikelihood curve does not have a maximum inside the interval we use the corresponding endpoint as the estimate (but record the estimate as being “out of range”).
One further “trick” can be used to improve the performance of the gridbased MLE: for each simulation experiment, corresponding “particles” for different parameter values are driven using the same sequences of Gaussian random numbers. That is, if the th particle in the ensemble for the th parameter value is in state at timestep , then the particle positions at the next timestep (where number of points in the parameter grid) are generated from the current positions using the same Gaussian random numbers. (To ensure correct sampling, the particles for fixed still receive independent Gaussian random numbers.) This samenoise coupling or method of same paths reduces the variance of the resulting estimates (Asmussen and Glynn, 2008), and produces smooth likelihood curves.
For comparison purposes, we also study the full observation problem in some of the examples, i.e., assume the full state vector is continuously and noiselessly observed. For such problems, the loglikelihood associated with particular sample paths are straightforward to calculate directly, and filtering is not needed.
Appendix C ML Model Parameters
Here we briefly summarize the details of the ML model used in this paper. The interested reader is referred to (Ermentrout and Rinzel, 1998; Ermentrout and Terman, 2010) for more details.
Recall the (deterministic) ML equations
As explained in Sect. 5, the ML model tracks the membrane voltage and a gating variable . The constant is the membrane capacitance, a timescale parameter, and an injected current. The reversal potentials and associated conductances characterize the ion channels and their dependence on the membrane voltage (see below).
Spike generation depends crucially on the presence of voltagesensitive ion channels that are permeable only to specific types of ions. The ML equations include just two ionic currents, here denoted calcium and potassium. The voltage response of ion channels is governed by the equation and the auxiliary functions , , and , which have the form
The function models the action of the relatively fast calcium ion channels; is the “reversal” (bias) potential for the calcium current and the corresponding conductance. The gating variable and the functions and model the dynamics of sloweracting potassium channels, with its own reversal potential and conductance . The constants and characterize the “leakage” current that is present even when the neuron is in a “quiescent” state. The forms of , , and (as well as the values of the ) can be obtained by fitting data, or reduction from more biophysicallyfaithful models of HodgkinHuxley type (see, e.g., (Ermentrout and Terman, 2010)).
The precise parameter values used in this paper are summarized in Table 3. These are obtained from (Ermentrout and Terman, 2010). Throughout, the noise intensities are , .
Parameter  Value 

95  
20.0  
4.41498308  
8.0  
2.0  
0.04 
Parameter  Value 

84.0  
60.0  
120.0  
1.2  
18.0  
2.0  
30.0 
Appendix D Chemostat Growth Models
Our second example comes from experimental ecology. In these experiments, a glass tank or chemostat is inoculated with a population of algae. The tank is bubbled to ensure that the contents are mixed, and to prevent oxygen deprivation. A nitrogenrich medium is continuously injected while the contents of the tank are evacuated at the same rate as the injection. Algae are assumed to consume nitrogen in proportion to their population size and nitrogen concentration, until the population size stabilizes due to saturation. For ecological models a set of stochastic differential equations can be proposed with drift term
(13)  
where the model has a mechanistic interpretation for an infinite population given in terms of:

( mol/l) represents the nitrogen concentration in the chemostat

( cells per liter) gives the relative algal density

(percentage per day) is the dilution rate; i.e., the rate at which medium is injected and the chemostat evacuated

( mol/l) is the nitrogen concentration in medium

( mol/ cells) is the rate of algal consumption of nitrogen

( mol/l) is a halfsaturation constant indicating the value of at which is halfway to its asymptote

( cells/ mol) is the algal conversion rate; how fast algae turn consumed nitrogen into new algae.
This system is made stochastic by multiplicative lognormal noise. This is equivalent to a diffusion process on and with additive noise:
Here, the diffusion terms provide an approximation to stochastic variation for large but finite populations and account for other sources of extrademographic variation. While alternative parametrizations of stochastic evolution can be employed, the diffusion approximation is convenient for our purposes. A typical goal is to estimate the algal conversion rate and the halfsaturation constant .
This experiment involves several experimental parameters that can be used as the dynamic input:

The dilution rate of the chemostat

The concentration of nitrogen input

The times at which the samples are taken from the chemostat

The quantities that are observed.
In this paper we use as the control parameter, which fits well within the design methodology outlined above. Realistic values for the parameters, estimated from previous experiments, are . We chose based on visual agreement with past experiments; since these act multiplicatively on the dynamics it is reasonable for them to be of the same order of magnitude. We hold constant; within the experimental apparatus, can only be modified by changing between discrete sources of medium. Chemostats are typically inoculated with a small number of algal cells and the models above give algal density relative to a total population carrying capacity in the tens of millions. Under these circumstances, initial conditions can be given by (the input concentration) and representing an initial population of a few hundred cells. The experimental apparatus places further constraints on the problem: the dilution rate must be positive, and cannot exceed the maximal rate at which medium can be pumped through the chemostat.
We focus on the estimation of In this model system, one can measure both and , though nitrogen measurements are both more costly to make and prone to distortion by various factors. Continuous measurements are not practical for either quantity; one can at best make measurements a few times a day. Here, we focus mainly on the completebutinfrequenth observation regime, where we assume we can make (noisy) measurements of both state variables.
This system represents a mostmarginal case for adaptive designs in that the system has a stable fixed point for each dilution rate and most information is gained near the fixed point. Further, the value of mostly affects the fixed point of . We thus expect that adaptive controls will exhibit little improvement over the best choice of a constant control policy and that estimates based solely on – by far the easier quantity to measure – will have poor statistical properties whatever policy is used. Figure 6 shows an example trajectory of this model along with the fixed point as a function of .
Model properties and further parameter constraints
We begin by examining the dynamic properties of the chemostat system. In the absence of noise, i.e., when , the phase space is fairly simple: it has a single fixed point given by
(14) 
It is easy to check that this fixed point is linearly stable, and attracts all initial conditions in the relevant part of state space. When all trajectories eventually converge to a neighborhood of the fixed point, and the stationary distribution is concentrated around the fixed point for moderate , . See Fig. 6. The fixed point exists only if the dilution rate is not too high; when for some critical the fixed point moves to Physically, this means the dilution rate is so high that the algal population crashes to 0. For (the typical value we assume in this example), the critical is roughly To avoid crashing populations, we keep the dilution rate below this number.
As mentioned above, is easier to measure experimentally because it can potentially be done by an optical particle counter. Unfortunately, in the parameter regime of interest, it is actually fairly difficult to infer from measurements alone. To see this, observe that for and as varies over the fixed point moves essentially vertically; see Fig. 6. Since trajectories are expected to stay near the fixed point, it follows that changing will have a significant, observable impact on the dynamics but relatively little on Equivalently, because our estimation strategy is based on the likelihoods of state variable observations, measurements (or combined measurements) will be much more useful for estimating than alone. This is our reason for focusing on the completeobservation case.
Finally, we note that the FI for is given by
which is maximized when but for As discussed in Sect. 2.2, if we allowed unbounded control values, a penalization would need to be imposed in order to avoid unrealistically large diluation rates. Since we are choosing dilution rates from a finite set here, this is not necessary. Figure 7 presents the control policy at times = 4, 7 and 30.
(a)  (b)  (c) 
Simulation results
We now examine the effectiveness of the optimal control methodology via simulations of the chemostat model. We start at , corresponding to typical experimental conditions. With this initial condition, it typically takes a trajectory about 7 days or so (physical time) to reach equilibrium. In our simulations, we carry out experiments of duration for various values of , ranging from 4 to 30 days. All simulation results are based on 256 realizations of the (controlled) diffusion process.
(a) A controlled trajectory  (b) The control 
(c) Payoff / versus  (d) The controled 
The optimal control policies, for different values of , are shown in Fig. 7. The control values are drawn from the set (remaining below the dilution rate at which all the algae is eventually removed from the system), and we assume a flat prior for over the interval As can be seen, the computed control policy is nontrivial and depends on However, we observed that after about , the control policy stops changing. The “longtime,” steadystate control policy uses only the extreme values 0.1 and 0.68; on shorter timescales, the control policy uses all available values.
Fig. 8(c) shows the quantity