Likelihood-free stochastic approximation EM for inference in complex models

# Likelihood-free stochastic approximation EM for inference in complex models

Umberto Picchini

Centre for Mathematical Sciences, Lund University

Sölvegatan 18
SE-22100 Lund, Sweden
Email:
umberto@maths.lth.se
###### Abstract

A maximum likelihood methodology for the parameters of models with an intractable likelihood is introduced. We produce a likelihood-free version of the stochastic approximation expectation-maximization (SAEM) algorithm to maximize the likelihood function of model parameters. While SAEM is best suited for models having a tractable “complete likelihood” function, its application to moderately complex models is a difficult or even impossible task. We show how to construct a likelihood-free version of SAEM by using the “synthetic likelihood” paradigm. Our method is completely plug-and-play, requires almost no tuning and can be applied to both static and dynamic models. Four simulation studies illustrate the method, including a stochastic differential equation model, a stochastic Lotka-Volterra model and data from -and- distributions. MATLAB code is available as supplementary material.

Keywords: incomplete data; intractable likelihood; Lotka-Volterra; SAEM; stochastic differential equation; synthetic likelihood; state space model.

## 1 Introduction

Most mathematical/statistical models for realistic experiments include unobservable (latent) components that complicate the statistical inference for model parameters . Here we consider the problem of estimating , given an observable process from which data are generated, in models characterized by missing (incomplete) data in the sense discussed in Dempster1977 when introducing the celebrated EM algorithm. Therefore, our goal is to estimate , in presence of a latent (unobservable) on which observed data depend.

While here we deal with a modification of an EM-type algorithm, for the moment our interest is to discuss the inference problem for models having so-called “intractable likelihoods”. For these models the likelihood function is unavailable in closed form and obtaining an approximation (or evaluating said approximation) is computationally prohibitive. Two of the discussed examples are state-space models (SSM), and for SSM recent advancements in sequential Monte Carlo methods (also known as particle filters) have revolutionised the practical application of statistical inference, especially the Bayesian kind, see the review in kantas2015particle. For more general models than SSM, approximate Bayesian computation (ABC) is often the only available solution to perform statistical inference for the parameters of complex models with intractable likelihoods. ABC (see marin-et-al(2011) for a review) is an ensemble of algorithms that only requires the ability to generate synthetic observations from the assumed data generating model, hence these are “plug-and-play” algorithms. While ABC algorithms have been developed since the ’90s, the most important issues for a successful implementation of ABC are still as relevant today as they were twenty years ago. In particular, the most typical usage of ABC requires the analyst to specify summary statistics that are “informative” regarding the unknown . Moreover, a threshold parameter is introduced to compare summary statistics computed on the available data with summaries computed on simulations from the assumed data generating model. The problem of selecting appropriate summaries is the most serious of the two (see fearnhead-prangle(2011)). The determination of the threshold for summaries comparison is also very important and has a significant impact on the computational budget. Finally, when ABC is implemented within an MCMC sampler, there is a further layer of practical issues that are usually of difficult management for the non-expert practitioner, such as coding an appropriate adaptive MCMC method for the generation of parameter proposals, also noting that the frequency of the adaptation affects results. It is fair to say that calibration of ABC algorithms is often not trivial. A more recent plug-and-play methoology is given by synthetic likelihoods (SL) (wood2010statistical). SL requires the specification of data summaries, but no threshold parameter is introduced and the weighting of the summaries is automatically handled, thus the method is very easy to implement. However, while ABC sets no assumptions on said summaries, SL assumes a multivariate Gaussian distribution: hence, SL is less general than ABC and as discussed in price2016bayesian and in section 5.3, significant departures from the assumed Gaussianity can have a negative impact on inference results.

In this work we consider the idea underlying the synthetic likelihood approach, and embed this into the stochastic approximation (SAEM) algorithm of Delyon1999, for maximum likelihood inference. The resulting SAEM-SL algorithm is a likelihood-free version of SAEM which is easy to code, requires minimal tuning and appeals a general class of incomplete-data models, either “static” (time-independent) and dynamic models. Since two of our simulation studies use state-space models, our notation introduces quantities that are time-indexed, however we emphasize that the methodology is suited for dynamic models that are not SSM and also for static models, see the example in the Supplementary Material.

State-space models (SSM, cappe2005inference), are used in many fields, such as biology, chemistry, ecology, signal processing etc. We now introduce some notation. Consider a stochastic process , , which is observed at discrete sampling times with , and we denote with the corresponding observations (data) from collected at said time points, where for . Consider also a latent (unobservable) continuous-time stochastic process , . Process is assumed Markovian with transition densities , . Denote with the unobserved values for at times and set , where is the (random or fixed) initial state for at time . Both processes and depend on their own (assumed unknown) vector-parameters and respectively. We think at as a measurement-error-corrupted version of and assume that observations for are conditionally independent given . The SSM can be summarised as

 {Yj∼p(yj|Xj,θy),j=1,...,nXt∼p(xt|xs,θx),t0≤s

Typically is a known density (or probability mass) function. Regarding the transition density , this is typically unknown except for very simple toy models.

Goal of our work is to estimate the parameters by maximum likelihood using data . For ease of notation we refer to the vector as the object of our inference. As previously remarked, the SAEM-SL methodology we introduce does not require data generated from a SSM, hence conditional independence of observations and Markovianity of are not necessary for SAEM-SL to work.

The well-known EM algorithm (Dempster1977) is suitable for maximum likelihood estimation for incomplete-data models. EM computes the conditional expectation of the complete-likelihood for the pair and then produces a (local) maximizer for the data likelihood function based on observations . One of the difficulties with EM is to compute the conditional expectation of the state given the observations . This conditional expectation can be computed exactly with the Kalman filter when the state-space is linear and Gaussian (cappe2005inference), and otherwise it has to be approximated. In this work we focus on a stochastic approximation of the EM algorithm, namely the Stochastic Approximation EM (SAEM) (Delyon1999). The problem with implementing SAEM is at least two-fold: (i) it is necessary to generate an appropriate “proposal” for the state , conditionally on the current value of . Sequential Monte Carlo (SMC) algorithms (doucet2001sequential) can provide such state proposal, and have already been coupled to stochastic EM algorithms (see e.g. HuysPaninski2009; Lindsten2013; Ditlevsen2014 and references therein). However (ii) a second and perhaps more serious difficulty is that in order to use SAEM the complete likelihood of based on the joint distribution of must be tractable. With “tractable” we mean that the model at hand has a complete likelihood that it is possible to write in closed-form, and that additionally it is possible to derive analytically essential quantities, such as the corresponding sufficient statistics: this is because the convergence of SAEM to the maximizer of the data likelihood is ensured only for observations belonging to the exponential family. These requirements are usually very difficult to satisfy, or result impossible for most realistic models. Even when these can be satisfied, the required analytic work is at best a tedious, difficult and error-prone task. Also, such difficulties force the modeller to formulate oversimplified, tractable models so that SAEM can be implemented. However realistic models call for more complex formulations which are usually not amenable to closed form analytic computations.

## 2 The complete likelihood and stochastic approximation EM

Recall that denotes the available data collected at times and denote with the corresponding unobserved states. We additionally set for the vector including an initial (fixed or random) state , that is is generated as . When the transition densities between sampling times are available in closed form (), the “data likelihood” function for (sometimes denoted “incomplete data likelihood”) can be written as

 p(Y1:n;θ) =∫p(X0){n∏j=1p(Yj|Xj;θ)p(Xj|Xj−1;θ)}dX0⋯dXn (2)

where we have assumed a random initial state with density . Here is the “complete data likelihood”, the conditional density of and the joint density of . The last equality in (2) exploits the notion of conditional independence of observations given latent states and the Markovian property of . In general the likelihood (2) is not explicitly known either because the integral is multidimensional or because expressions for transition densities are typically not available. In addition, when an exact simulator for the solution of the dynamical process associated with the Markov process is unavailable, hence it is not possible to sample from , numerical discretisation methods are required, see the example in section 5.2. Without loss of generality, say that we have equispaced sampling times such that , with . Now introduce a discretisation for the interval given by where and . We take , and therefore for . We denote with the number of elements in the discretisation and with the corresponding values of obtained when using a given numerical/approximated method of choice. Then the likelihood function becomes

 p(Y1:n;θ) =∫{n∏j=1p(Yj|Xj;θ)}p(X0)N∏i=1p(Xi|Xi−1;θ)dX0⋯dXN,

where the product having index is over the ’s and the product having index is over the ’s.

### 2.1 The standard SAEM algorithm

Let us briefly cover the EM principle (Dempster1977). The complete data of the model is , where if numerical discretisation is not required, and for ease of writing we denote this as for the remaining of this section. The EM algorithm maximizes the function in two steps, where is the log-likelihood of the complete data and is the conditional expectation under the conditional distribution . More explicitly, by denoting with the parameter estimate obtained at iteration of EM, at th iteration of EM the E-step computes . The M-step computes . The resulting sequence converges to a stationary point of the data likelihood , under weak assumptions. In most cases the E-step is difficult to perform, while the M-step can be considered relatively straightforward, meaning that standard optimization procedures for the M-step can be implemented, or closed form solutions are possible.

Important strategies for dealing with an intractable E-step are MCEM (wei1990monte) and SAEM (Delyon1999), see also Lindsten2013 for a synthetic review. In SAEM the integral in is approximated using a stochastic procedure. SAEM is proved to converge under general conditions if belongs to the regular exponential family

 Lc(Y,X;θ)=−Λ(θ)+⟨Sc(Y,X),Γ(θ)⟩, (3)

where is the scalar product, and are two functions of and is the minimal sufficient statistic of the complete model. The E-step is then divided into a simulation step (S-step) of the missing data under the conditional distribution and a stochastic approximation step (SA-step) of the conditional expectation, using a sequence of real numbers in , such that and . This SA-step approximates at each iteration by the value defined recursively as follows

 sk=sk−1+γk(Sc(Y,X(k))−sk−1).

The M-step is thus the update of the estimates

 ^θ(k)=argmaxθ∈Θ(−Λ(θ)+⟨sk,Γ(θ)⟩). (4)

A schematic description of the SAEM procedure (coupled with a bootstrap filter) is in algorithm 1, see also picchini2016coupling. Moreover, when it is possible to parametrize the complete loglikelihood in terms of as in (3), then it is sometimes possible to determine the in (4) explicitly (see sections 5.15.2), and this has an obvious computational advantage.

Usually, the simulation step of the hidden trajectory conditionally to the observations cannot be performed directly. A standard possibility is to use “particles” from sequential Monte Carlo filters, such as the bootstrap filter (gordon1993novel), see algorithm 2.

The quantity ESS in algorithm 2 is the effective sample size (e.g. liu2008monte) often estimated as and taking values between 1 and , while is a threshold value that “activates” the resampling step, see Cappe2007 for an introduction to particle filters. In addition to the procedure outlined in algorithm 2, once the set of normalised weights is available at the end of the bootstrap filter, we sample a single index from the set having associated probabilities . Denote with such index and with the “ancestor” of the generic th particle sampled at time , with (, ). Then we have that particle has ancestor and in general particle at time has ancestor , with . Hence, at the end of algorithm 2 we can sample and construct its genealogy (see also andrieu2010particle): the sequence of states resulting from the genealogy of is the chosen path that will be passed to SAEM in algorithm 1.

However, as explained in the Introduction and self-evident in the application in section 5.2, constructing the SAEM machinery is a challenging task for most realistic models as typically the sufficient statistics for the complete loglikelihood need to be available, for computational efficiency. Moreover, for state-space models it is necessary to know the expression of the transition densities, to construct the complete loglikelihood. For most stochastic nonlinear models, transition densities are typically unavailable in closed form. Finally, even when SAEM is implemented for state-space models, as highlighted in picchini2016coupling the particles selected from the bootstrap filter might result in a poor estimation when the resampling step is frequently triggered (see picchini2016coupling for solutions).

In section 3 we propose a new, likelihood-free version of SAEM, that is not restricted to dynamic models. But first, it is necessary to introduce the synthetic likelihoods methodology, due to wood2010statistical.

## 3 Synthetic likelihoods

Same as for approximate Bayesian computation (ABC) algorithms, synthetic likelihoods (wood2010statistical) is an “information reduction strategy” that constructs inference based on a set of ad-hoc summaries of the data , rather than use the full dataset directly. These summaries are defined by the analyst and have nothing to do with the complete sufficient summaries in (3). The synthetic likelihoods methodology assumes the data summaries to be jointly multivariate Gaussian as , with unknown mean and unknown covariance matrix . Instead, ABC does not make any parametric assumption on the summaries. Notation-wise we make explicit the dependence of the mean and covariance on , as later on it will be important to highlight this fact when estimating (e.g. in equation (11)).

Estimators for and are found by simulating datasets independently from the assumed data-generating model, conditionally on some . We denote the artificial datasets simulated from model (1) with . These are such that , . For each dataset wood2010statistical constructs the corresponding (vector valued) summary , with . Then he computes the following estimators:

 ^μθ=1RR∑r=1S∗r,^Σθ=1R−1R∑r=1(S∗r−^μθ)(S∗r−^μθ)′.

A “synthetic likelihood” based on the summaries for the observed data is defined as . It is then possible to numerically maximize with respect to or compute the MAP (maximum a posteriory) for the associated posterior distribution using MCMC, by using uniform priors for the parameters. In order to construct synthetic likelihoods the only parameter that needs to be set is (we consider the statistics as part of the model specification).

## 4 SAEM with synthetic likelihoods

We now use synthetic likelihoods (SL) to develop a likelihood-free version of SAEM. The main consequences of our approach are (i) sufficient statistics for the complete (synthetic) likelihood are immediately available, via simulation; (ii) we allow the SAEM optimizer to be implemented for complex/intractable models and (iii) the algorithm does not require advanced tuning. With specific reference to existing synthetic likelihoods approaches, with SAEM-SL the user does not need to set-up an MCMC implementation, as instead required in wood2010statistical and price2016bayesian and this usually comes with a need for expert tuning, as discussed in the introduction. A disadvantage of SAEM-SL is that uncertainty quantification is not provided. Denote with and user-defined summary statistics for and respectively. Again, these are meant to encode information regarding . Define and assume the complete likelihood for to be a multivariate Gaussian with mean and covariance . That is for the corresponding “complete synthetic log-likelihood” evaluated at we set

 Lc(S;θ):=Lc(S(Y),S(X);θ)=logN(S;μθ,Σθ). (5)

Of course and are in general unknown. Also, here and are not the same moments defined for the data likelihood in section 3, as the latter is based solely on .

Here we illustrate an instance of SL for the current , this returning estimators and . We call this procedure “internal SAEM-SL” to be distinguished from an “external” procedure described later. Crucially, thanks to the Gaussian assumption set on the user’s summaries it is known that is jointly sufficient for . Hence we are allowed to set the following equality for the complete sufficient statistics without the need to perform analytic calculations. Then we plug the obtained moment estimates into the “external SAEM-SL”. While below we describe the several steps of our approach, the complete procedure is illustrated in algorithm 3.

### Internal SAEM-SL

Assume a value for is given.

1. Simulate independently from the model realizations of processes and : and , .

2. compute user-defined summaries for each .

3. estimate moments (sufficient statistics for )

 ^μθ=1RR∑r=1S∗r,^Σθ=1R−1R∑r=1(S∗r−^μθ)(S∗r−^μθ)′. (6)

### External SAEM-SL

A generic iteration of SAEM is executed using the estimators from (6). At iteration we update separately the moments for the complete loglikelihood as

 ^μ(k)θ =^μ(k−1)θ+γ(k)(^μθ−^μ(k−1)θ) (7) ^Σ(k)θ =^Σ(k−1)θ+γ(k)(^Σθ−^Σ(k−1)θ). (8)

From the quantities computed in (7)-(8) extract the corresponding mean and covariances for the two simulated processes, that is set and

 ^Σ(k)≡^Σ(k)θ=[^Σx^Σxy^Σyx^Σy].

We now sample conditionally on by using well known properties of Gaussian distributions: we have where (here we drop the index and subscript for ease of reading)

 ^μx|y =^μx+^Σxy^Σ−1y(S(Y)−^μy) (9) ^Σx|y =^Σx−^Σxy^Σ−1y^Σyx. (10)

Some care should be used with the covariance matrix when sampling , as such covariance must be positive semi-definite. In fact is extracted from , however while it is known that a linear combination (via (8)) of semi-positive definite matrices is a semi-positive definite matrix and while the sample covariance created in the Internal SAEM-SL is by definition semi-positive definite, in numerical calculations it can still happen that the resulting matrix has negative eigenvalues due to round-off errors in floating point approximations. Therefore, before using in our conditional sampling, we first check whether this is a positive definite matrix. If it turns out to be positive definite, by using the Cholesky decomposition of , then we proceed with the sampling, that is we obtain the lower triangular matrix such that and then sample using , where is a vector of independent draws from the standard normal distribution. For those rare instances where it is not positive definite (and not even semi-positive definite) it is possible to compute a “nearest semi-positive definite matrix” (e.g. higham1988computing) and use this one for the sampling.

With the that has been sampled, set and compute the M-step

 ^θ(k)=argmaxθ∈Θ^Lc(S(k);θ)=argmaxθ∈ΘlogN(S(k);μθ,Σθ) (11)

where maximization is obtained numerically, for example using iterations of a Nelder–Mead simplex. Each iteration of the maximizer used for (11) tests a different value of by invoking the Internal-SL procedure, hence each call evaluates the complete synthetic loglikelihood using a different set of simulated moments produced using the synthetic likelihoods approach. At the end of the M-step (11), besides we also retrieve the corresponding “optimal moments” . Optimal moments are passed to (7)-(8) for a further iteration of the External SAEM-SL procedure. Algorithm 3 details a single iteration of the SAEM-SL procedure, which should be executed for iterations, with quantities having denoting input/starting values. The generality of the algorithm implies that to implement all our case studies we did not need to produce significant changes to our test code.

We initialize algorithm 3 by setting and to a vector of zeros and to a diagonal matrix with positive entries respectively, with and the -dimensional identity matrix with the length of vector . Notice that each time a numeric maximizer evaluates (11) for the current candidate parameters the vector does not vary within the Internal SL: contains both the observed summaries and the summaries for the latent state , which should not be altered when (11) is executed. Also notice that while in step 2 of the Internal-SL procedure the quantity is computed from the user defined set of summaries, the that is plugged into is instead sampled from a multivariate Gaussian distribution.

For the sake of discussion, here we illustrate an ideal scenario which in practice cannot be attained for most realistic models, namely assuming that (a) the user defined sumaries are jointly sufficient statistics for , and that (b) is distributed according to a multivariate Gaussian, though (b) is much easier to obtain than (a). Then under (a)–(b) SAEM-SL does not result in any approximation and converges to a (local) maximizer of the data likelihood function under the same assumptions set for SAEM in Delyon1999. In fact, if is sufficient for then it encodes the same amount of information regarding as the couple , hence . Then, under the additional Gaussian assumption, we have . Therefore, since the synthetic complete loglikelihood (5) is a member of the exponential family and can thus be written as (3), the two assumptions for the “ideal” scenario fit within the SAEM approach in section 2.1. Even if the two assumptions (a)–(b) are met, deviations from what is expected from the theory is due to the non-availability of an explicit M-step, as with SAEM-SL (11) has to be solved numerically. Hence, for a finite computational budget we might not really obtain the exact maximizer from the M-step.

The advantages of the proposed method, which we call SAEM-SL (SAEM using synthetic likelihoods) are that (i) unlike the “standard” SAEM, SAEM-SL is completely plug-and-play, only the ability to simulate from the model is required; (ii) while SAEM has been (perhaps exclusively?) applied to dynamic models since SMC methods are available to simulate , SAEM-SL is easily applicable also to static models. The disadvantage with SAEM-SL is the requirement to specify a set of summaries and that for each iteration of SAEM-SL the maximization of the loglikelihood (11) consists of an iterative procedure. On the other hand SAEM-SL considerably expands the set of problems that is possible to treat with SAEM. The standard SAEM itself is unable to deal with complex models, unless it is possible to derive the necessary constructs (sufficient statistics for the complete likelihood and corresponding updating equations for the M-step), which is often a difficult and tedious task. If the model has an intractable complete likelihood, the task is actually impossible.

## 5 Simulation studies

Simulations were coded in MATLAB (except for examples using the R pomp package) and executed on a Intel Core i7-4790 CPU 3.60 GhZ. In SAEM we always set for the first iterations and for as in lavielle2014mixed. However, we found that small modifications to this setup do not affect results significantly, that is using for and some is also valid. The numerical maximization of (11) is performed using the Nelder-Mead simplex as implemented in the Matlab function fminsearch. We compare our results with state-of-art algorithms for Bayesian and “classical” inference. MATLAB code is available at https://github.com/umbertopicchini/SAEM-SL.

### 5.1 Non-linear Gaussian state-space model

Here we study a simple non-linear model, useful to introduce the methods. We use a setup similar to jasra2012filtering. See also picchini2016coupling for inference using algorithm 1 as well as SAEM coupled with an ABC filter. We have

 {Yj=Xj+σyνj,j≥1Xj=2sin(eXj−1)+σxτj, (12)

with i.i.d. and . We assume as the only unknowns and therefore conduct inference for . We first consider the standard SAEM methodology outlined in section 2.1, and therefore construct the set of sufficient statistics corresponding to the complete log-likelihood . For this model the task is simple since and and it is easy to show that and are sufficient for and respectively. By plugging these statistics into and equating to zero the gradient of with respect to , we find that the M-step of SAEM results in updated values for and given by and respectively. In the following, we write SAEM-SMC to refer to Algorithm 1.

We generate observations for using model (12) with . Our setup consists in running 30 independent experiments with SAEM-SMC: for each experiment we simulate parameter starting values for independently generated from a bivariate Gaussian distribution with mean the true value of the parameter, i.e. , and diagonal covariance matrix having (2,2) on its diagonal. Hence the starting values are very spread. We take as the number of warmup iterations (see beginning of section 5) and use different numbers of particles in our simulation studies, see Table 1. We impose resampling when the effective sample size ESS gets smaller than , for any value of . In summary, for all 30 simulations we use the same data and the same setup except that in each simulation we use different starting values for the parameters. Table 1 reports the median of the 30 estimates and their quartiles. Simulations for converge to completely wrong values. We also experimented with using but this does not solve the problem with SAEM-SMC, even if we let the algorithm start at the true parameter values. However, in picchini2016coupling we learned that SAEM-SMC (this one using the bootstrap filter) is affected by “particles impoverishment” degrading the quality of the inference, and therefore it is better to set a very low : in fact, when using with results improve sensibly, see Table 1, though estimation of is still unsatisfactory. See picchini2016coupling for further insight on the problem.

We now compare the results above with the iterated filtering IF2 (ionides2015inference) using the R package pomp. We do not provide a detailed description of IF2 here: it suffices to say that in IF2 particles are generated for both (e.g. via perturbations using random walks) and for the systems state (using the bootstrap filter). Moreover a “temperature” parameter (to use an analogy with the simulated annealing optimization method) is let decrease until the algorithm “freezes” around an approximated MLE. This parameter that here we denote with is let decrease in where the first value is used for the first 500 iterations of IF2, then each of the remaining values is used for 100 iterations, for a total of 900 iterations. Notice that the tested version of pomp (v. 1.4.1.1) uses a bootstrap filter that resamples at each time point, and therefore results obtained with IF2 are not directly comparable with SAEM-SMC, hence the asterisk in Table 1. The output from one of the experiments obtained with is in Figure 1. From Figure 1 we notice that the last major improvement for the loglikelihood maximization takes place at iteration 600 when becomes , and reducing further does not give any significant benefit (we have verified this in a number of experiments with this model), therefore we are confident about our setup. With IF2 the estimation of is much improved compared to SAEM-SMC, however inference for is more biased than with SAEM-SMC.

We now consider a particle marginal method (PMM, andrieu2009pseudo) on a single simulation (instead of thirty), as PMM is a full Bayesian methodology and results are not directly comparable with SAEM nor IF2. Once more we make use of tools provided in pomp. We set wide uniform priors for both and and use particles. Also, we set the algorithm in the most favourable way, by starting it at the true parameter values (here we are only interested in using PMM to obtain exact Bayesian inference, not as a competitor to the other frequentist approaches we have illustrated). Parameters are proposed using an adaptive MCMC algorithm, and the algorithm is tuned to achieve the optimal 7% acceptance rate (sherlock2015efficiency). We obtained the following posterior means and 95% intervals: [0.49,2.46], [0.49,2.40]. Therefore, PMM seems to return values not very different from the ranges provided by IF2.

Finally, we consider inference with SAEM-SL. We performed simulations using , 1,000 and 2,000 simulated summaries and iterations for the numerical maximization step. We used the same data as for SAEM-SMC and IF2, however we decide to make the estimation procedure more challenging, so we let the parameter start at random locations sampled from a Gaussian centred at and having diagonal covariance with variances . Here we need to set a vector of summaries . Vector contains (i) the median value of ; (ii) the median absolute deviation of and (iii) the 10th, 20th, 75th and 90th percentile of . Vector contains the same summary functions, except that these are applied to . Of course summary functions for observed data are the same functions considered for except that now they are evaluated at . Same as before we consider thirty repetitions of our experiment: for each experiment we run a warmup of iterations and a total number of SAEM-SL iterations. Results are in Table 1 and trace plots for the case are in Figure 2. As from Figure 2 we notice that those parameters initialized at much higher values than the true parameter values decay rapidly to approach the true values. As shown in Table 1, the majority of them converges to reasonable values. SAEM-SL produces excellent inference for all tested values of , and convergence is very rapid, well within 10 iterations, corresponding to about 10 seconds on a computer desktop when .

For one of the thirty repetitions, Figure 3 shows the normal qq-plots for the twelve chosen summary statistics (the six statistics in and the six in ) for the case , generated at the optimum returned by SAEM-SL. Clearly there are no major departures from normality. Interestingly, we reach the same conclusion for the case (plots not reported).

### 5.2 A pharmacokinetics model

Here we consider a model for pharmacokinetics dynamics. For example, we may imagine to study the Theophylline drug pharmacokinetics, e.g. pinheiro1995approximations. It will be evident that in order to apply a standard SAEM it is required some preliminary analytic effort from the modeller. We denote with the level of Theophylline drug concentration in blood at time (hrs). Consider the following non-authonomous stochastic differential equation (SDE):

 dXt=(Dose⋅Ka⋅KeCle−Kat−KeXt)dt+σ√XtdWt,t≥t0 (13)

where is the known drug oral dose received by a subject, is the elimination rate constant, the absorption rate constant, the clearance of the drug and the intensity of intrinsic stochastic noise. We simulate data measured at equispaced sampling times where . The drug oral dose is chosen to be 4 mg. After the drug is administered, we consider as the time when the concentration first reaches . The error model is assumed to be linear, where the are i.i.d., . Inference is based on data collected at corresponding sampling times. Parameter is assumed known as it is not possible to determine the sufficient statistic for analytically, hence parameters of interest are as is also assumed known.

Equation (13) has no available closed-form solution, hence simulated data are created in the following way. We first simulate numerically a solution to (13) using the Euler–Maruyama discretization with stepsize on the time interval . The Euler-Maruyama scheme is defined as

 Xt+h=Xt+(Dose⋅Ka⋅KeCle−Kat−KeXt)h+σ√XtZt+h,

where the are i.i.d. distributed. The grid of generated values is then linearly interpolated at sampling times to give , and finally residual error is added to according to the error model as explained above. Data are conditionally independent given the latent process and are generated with