Bandwidth Selection In PreSmoothed Particle Filters
Abstract
For the purpose of maximum likelihood estimation of static parameters, we apply a kernel smoother to the particles in the standard SIR filter for nonlinear state space models with additive Gaussian observation noise. This reduces the Monte Carlo error in the estimates of both the posterior density of the states and the marginal density of the observation at each time point. We correct for variance inflation in the smoother, which together with the use of Gaussian kernels, results in a Gaussian (Kalman) update when the amount of smoothing turns to infinity. We propose and study of a criterion for choosing the optimal bandwidth in the kernel smoother. Finally, we illustrate our approach using examples from econometrics. Our filter is shown to be highly suited for dynamic models with high signaltonoise ratio, for which the SIR filter has problems.
Keywords: adaptive bandwidth selection; kernel smoothing; likelihood estimation; particle filter; state space model; variance inflation
1 Introduction
State space models are commonly used to represent dynamical systems in a wide range of scientific fields. For linear and Gaussian state space models, the Kalman Filter can be used to sequentially obtain the posterior mean and covariance of the current state vector, as well as the likelihood function required for estimation of model parameters. Gaussian mixture filters (Alspach and Sorenson, 1972) were among the first attempts to account for nonnormality in the posterior, resulting from nonlinearity, either in the state equation or in the observation equation. Later, sequential Monte Carlo (MC) based filtering methods, collectively known as particle filters, were introduced (Gordon et al., 1993; Kitagawa, 1996; Liu and Chen, 1998). The particle filter has the prospect of providing a samplingbased consistent estimate of the posterior distribution, but in many cases the sample size (number of particles) required to bring the MC error within tolerable bounds is prohibitively large. Consequently, there is now a large literature on improving the baseline particle filtering algorithms to work for a moderate numbers of particles. These include Pitt and Shephard (1999), various methods proposed in the chapters of Doucet et al. (2001) and more recently Polson et al. (2008), Chorin and Tu (2009) and Chopin et al. (2013).
Recently, a renewed interest in the use of particle filters for computing marginal likelihood (integrating over state variables) for the purpose of parameter estimation has emerged (FernandezVillaverde and RubioRamirez, 2007; Andrieu et al., 2010; Kantas et al., 2009; Malik and Pitt, 2011; DeJong et al., 2013). This is also the context of the present paper. Similar to Malik and Pitt (2011) and DeJong et al. (2013) we obtain a likelihood approximation which is continuous in the parameters, hence facilitating numerical optimization. We target situations with highly nonlinear state evolution, high signaltonoise ratios, and with lowtomoderate dimensional state vector, for which adaptation is difficult.
Throughout, we assume that the measurement model is linear and Gaussian, which at first glance may appear restrictive. However, nonlinear measurement equations with additive Gaussian noise can also be handled by a simple augmentation of the state vector, as shown in Section 3.1 below.
Let and denote the state vector and observation vector, respectively, at time , and define . In particle or ensemble methods the predictive density is represented by a random sample. We use a kernel smoother which can be updated analytically against a linear Gaussian measurement model . From the resulting mixture approximation of the posterior we draw a uniformly weighted sample of particles, which after a parallel run through the state equations, constitutes the approximation of the next predictive distribution . The resulting filter, which we call the PreSmoothed Particle Filter (PSPF) is a special case of the preregularized particle filter (Le Gland et al., 1998; Hürzeler and Künsch, 1998).
The main contribution of the present paper is to determine the optimal amount of smoothing in each updating step of the PSPF. This is done adaptively, i.e. for each time point an optimal bandwidth paramter is sought. For small the PSPF approaches the SIR filter, i.e. has low bias but high variance. Further, we correct for variance inflation (Jones, 1991), and hence when the kernel estimate reduces to a Gaussian density with mean and covariance calculated from the ensemble representation of . At this end of the spectrum the PSPF is strongly related to the Ensemble Kalman Filter (Stordal et al., 2011), which has low MC variance but high bias.
The rest of this paper is laid out as follows. Section 2 introduces notation and explains challenges related to particle filtering. Section 3 explains the presmoothed update and provides a method for automatic bandwidth selection. Section 4 introduces the PSPF, and also compares the PSPF to other particle filters using simulation experiments. Finally, Section 5 outlines two realistic applications, and Section 6 provides a discussion.
2 Model, Notation and Background
2.1 Model and notation
We consider a generic state space model consisting of a state transition equation and an observation equation, with the former given by
(1) 
where is the state transition function
().
The random disturbance term , which can account for incomplete
model specification, may be either absent, of fixed dimension, or
of infinite dimension in the case that the state dynamics are governed
by a stochastic differential equation. Under the assumption that the
s are independent (1) describes
a Markov process, with transition probability density denoted
by . Given the realization of , evaluation
of typically amounts to solving a differential equation.
The system (1) is initialized by drawing
from a distribution with density . It is assumed
that and are sufficiently regular to ensure that is continuous, and thereby that all involved conditional distributions can be estimated consistently using kernel density estimators.
The observation equation is
(2) 
where , is nondegenerate and the matrix is independent of the state, but may vary nonstochastically with time . Moreover, we use the notation , . denotes the multivariate Gaussian probability density function evaluated at , the identity matrix. Finally, we indicate which stochastic variable an expectation or variance is taken over using subscripts (e.g. when expectation is taken over variable ).
2.2 The SIR filter and sample impoverishment
This section introduces particle filters and the Sampling Importance Resampling (SIR) filter of Gordon et al. (1993), which is the limit of PSPF as . Any particle filtering approach relies on alternating between two steps: prediction (p) in which is represented by a random sample , and filtering (f) in which similarly is approximated by . These random samples of size are referred to as filter and predictive swarms, respectively, and are updated iteratively from each other. The prediction step, used in both SIR and PSPF, consists of , where the are independent random draws from the distribution of . In the filtering step Bayes formula is invoked:
(3) 
The SIR filter approximates (3) by performing a SIR update (Rubin, 1987), representing as a weighted sample with locations and corresponding weights
(4) 
where is a normalizing constant. Obtaining a uniformly weighted sample to complete next timestep’s prediction is simply a matter of drawing multinomially from with weights (4). A byproduct of the SIR filter is that the marginal likelihood of needed for parameter estimation can be approximated as
(5) 
for large (see e.g. Del Moral (2004, Proposition 7.4.1.)).
Sample impoverishment in the SIR filter occurs when, at time step , the predictive particle swarm and the data likelihood are poorly aligned (see e.g. Pitt and Shephard (1999)). The multinomial probabilities (4) then become very unevenly distributed, and the multinomial sampling will yield many repeated particles. Over time the swarm will degenerate in the sense that all particles can be traced back to a single particle in the initial swarm ( Sample impoverishment also increases the MC error of the likelihood estimator (5). This is likely to occur during numerical optimization of the likelihood, when the optimization algorithm tries an infeasible parameter value rendering the particle swarm and the data likelihood incompatible. The effect is amplified by a high signaltonoise ratio in the system. Numerous strategies have been proposed for aligning (adapting) the predictive swarm to the coming observation (see e.g. Cappe et al. (2007) for an overview), but these typically rely on evaluation of (or some of the characteristics of ) which may be costly. The PSPF, on the other hand, avoids evaluation of , and relies only on the ability to simulate (1) efficiently.
3 The PreSmoothing Update
In this section we consider the presmoothing (PS) update, as an alternative to Rubin (1987)’s SIR update when the observation equation is linear in the state and additively Gaussian. Focusing on a single updating step we can drop the index in our noation. In a general perspective, the problem we address is the Bayesian updating problem of evaluating the posterior density and the marginal density when the prior is represented by a random sample. In particular, we focus on optimal selection of the smoothing parameter in the PS update, with the overarching objective of producing accurate estimates of . In Section 4 we again return to the filter setting.
3.1 The updating problem
Consider the evaluation of the posterior and marginal , for the model
(6)  
(7) 
in a setting where is an unknown prior density, while and are given matrices. The available information about is a random sample drawn from . Our aim is to estimate both and for a given . We denote by and the empirical mean and covariance matrix of the sample , respectively.
Consider the shrunk kernel estimate (Jones, 1991; West, 1993)
(8)  
(9)  
(10) 
where the smoothing parameter governs the proportions of the total variance under stemming from interkernel variance () and intrakernel variance () in the Gaussian mixture (8). The replacement of the more conventional bandwidth parameter by simplifies certain expressions in what follows. The estimator (8) avoids the “variance inflation” to which the standard kernel estimator (Silvermann, 1986) is prone, as it is easily verified that the mean and variance under is and , respectively. For close to 1 ( close 0) behaves as a standard kernel estimator with equally weighted point masses located at as the limit. For () a Gaussian limit is obtained, i.e. .
By substituting for in the Bayes rule (3), we obtain the PS estimators
(11) 
(12) 
where
In our notation we have omitted the dependence on .
Gaussian ()  SIR ()  
Property  High bias, low variance  Low bias, high variance 
As varies from 1 to 0 the PS updates moves from a SIR update to the Gaussian update, both of which are summarized in Table 1. The mean of each posterior mixture component concurrently moves smoothly from () toward what is dictated by the likelihood, reducing the potential for sample impoverishment. In the same vein, we have as , i.e. uniform weighting. These properties of the PS update (and the updates employed in other preregularized filters) differ from those of the update mechanisms employed in postsmoothed particle filters advocated by Musso et al. (2001) and Flury and Shephard (2009), where the (one step) posterior locations and weights are unchanged relative to the SIR. However, these latter approaches do not require the Gaussian datalikelihood which is underlying the PS update.
The fact that is a finite Gaussian mixture (for ) has a number practical advantages. Firstly, moments, marginal and conditional distributions of the approximate posterior are easily derived from the representation (12). Further, has continuous support, and therefore direct copying of particles, which is applied in the SIR filter, is avoided in the resampling step. Sampling from is trivial. Moreover continuous (with respect to the parameters) sampling, resulting in a continuous simulated likelihood function, can be implemented.
The apparently restrictive linearity assumption (6) can be relaxed by augmenting the state variable . The case of a nonlinear measurement function with additive Gaussian noise can be accommodated without any conceptual change to the framework. The measurement variance is then split in two parts and , , and the augmented state vector is where is an auxiliary variable introduced for convenience. For the augmented system, equations (6)(7) take the form
where is the induced prior and is the matrix that selects from . Now, is a tuning parameter that must be chosen jointly with . Estimates of are easily obtained as a marginal in the finite mixture representation of . An application of this approach is given in section 5.2 below.
3.2 Criterion for smoothing parameter selection
A critical part of the PS update is the selection of the smoothing parameter, with the aim of obtaining both a representative posterior particle swarm and and accurate estimate of the marginal likelihood. For this purpose Flury and Shephard (2009) argue that the integrated mean squared error (MISE) of , which is commonly used in the kernel smoothing literature (Silvermann, 1986) is not a suitable criterion in a particle filter setting. Nevertheless, is has been used for pre and postsmoothed particle filters by e.g. Le Gland et al. (1998); Hürzeler and Künsch (1998); Musso et al. (2001). Instead, Flury and Shephard (2009) propose to minimize the MISE of the posterior cumulative distribution function. We propose a third criterion, namely to minimize the mean squared error (MSE) of , which is given as
(13)  
This criterion has the advantage of being analytically simple, in addition to targetting minimal Monte Carlo error in the likelihood function as explained below. Minimization of gives an optimal biasvariance balance that depends on the observation .
Switching momentarily to a dynamical system setting (with ) for the reminder of this paragraph, estimates , and thus choosing (13) as the criterion targets directly the factors involved in the likelihood function (5). However, it should be noted in the dynamic setting that current period’s filter distribution must be represented accurately as it serves as an important input to next period’s likelihood evaluation. We show in section 4.2 that using an approximation to (which targets ) also leads to competitive estimates of the filtering distribution (i.e. ). This will in particular be true whenever most of the information carried in comes from the likelihood (which is typical for the class of models we consider) as is almost proportional to , and therefore the posterior estimator is relatively insensitive to the choice of smoothing parameter. On the other hand, assuming a concentrated observation likelihood, and in addition that is invertible (i.e. ), will be highly sensitive to the choice of smoothing parameter since a zeroth order approximation of is proportional to . Hence it appears sensible to choose even in a dynamic setting, in particular in high signaltonoise situations.
3.3 Plugin and approximation
There are two obstacles to direct use of the criterion as given in (13). First, the expectation is taken over which has unknown distribution (see (7)). The same problem occurs in standard kernel estimation, and is solved by the use of a plugin estimator (Silvermann, 1986). The second problem is caused by the use of the shrunk kernel estimator, which involves the empirical (depending on ) quantities and through (9) and (10). Even if was known, analytical evaluation of the expectation in (13) would not be possible, and we have to resort to an approximation. Jones (1991) encounters the same problem, but argues that the effect can be ignored asymptotically when and . However, we consider the full range so the same asymptotic arguments do not apply. Instead we attempt to approximate the expectation (13) for finite .
We start by addressing the second problem. For the purpose of evaluating the mean and variance in (13) we replace and in expressions (811) by new random variables, and , respectively, both taken to be independent of . The simplification (approximation) lies mostly in this independence assumption, but also in the distributional assumptions made about and below. The reason that we cannot ignore the sampling variability in and is that the variance term in (13) would then be exactly zero for . Hence, for small we would underestimate the MSE of .
We make the following distributional choices
(14) 
and
(15) 
i.e. and are distributed as if they where calculated from a sample of iid vectors. Plugin versions of (14) and (15), i.e. where has been replaced by and by , are used immediately below for notational convenience. Strictly speaking these replacements take place after all moment calculations have been carried out.
After these simplifications, it is necessary to restate our criterion
(16) 
where expectation and variance now is taken relative to , and , which we emphasize are independent by assumption. Writing out the details of (16) we get where
(17) 
with and .
The next sections outline pilot distributions and develop asymptotic approximations (in ) that will enable us to evaluate the mean and variance in (16).
Expression  Interpretation 

,  . 
where .  
.  . 
.  . 
,  . 
where and . 
Pilot distributions
For the variance term in , we employ for convenience a Gaussian pilot,
(18) 
For the squared bias term in (16) a Gaussian pilot is ruled out because, as shown below, this leads asymptotically to zero bias for all . Instead a twocomponent Gaussian mixture
(19) 
is used. The bias pilot is flexible, allowing for analytical
computations of moments, and
may be estimated from using an EMalgorithm (see e.g. McLachlan and
Peel, 2000, section 2.8 for details).
To minimize the computational burden we perform only a few EMiterations,
and further computational savings are obtained by running the EM on
a subsample of when is large.
Practical squared bias
Under the above introduced simplifying approximations, and in particular under pilot density , we have that
(20)  
A closed form expression for is given in Table 2. The expectation over in (20) does not appear to have closed form, and we therefore employ the asymptotical (in ) mean statement of Corollary 2.2 of Iwashita and Siotani (1994) to obtain
(21) 
where serves as the practical approximation to .
Note that in (27) depends on and is hence estimated using the pilot density needs to be estimated. Under the pilot density it has a closed form expression
(22)  
Finally, is taken as the practical squared bias term. Note in particular that it is easily verified that also the practical squared bias vanishes for , as in this case.
To underpin the claim that a nonGaussian bias pilot is needed, we momentarily choose the parameters of so that coincides with , e.g. via , , , . Then whereas which shows that the practical bias would vanish as for all if a Gaussian bias pilot was employed.
Practical variance
The variance of , taken under pilot density , relies on the identity developed in Appendix A:
(23)  
where explicit expressions for are can be found in Table 2. As in the calculations leading to the squared bias, the expectations and variance over in (23) do not appear to have closed form expressions. Consequently we employ mean statement of Corollary 2.2 of Iwashita and Siotani (1994) to obtain
(24)  
The variance of in (23) is treated using the delta rule variance statement of Corollary 2.2 of Iwashita and Siotani (1994):
(25)  
Here, is the Jacobian matrix of up to a factor with explicit expression given in Table 2, and tr denotes the matrix trace. Combining (24) and (25) with (23) yields the practical variance approximation as
(26) 
Finally, collecting the results obtained in (21), (22) and (26) yields the practical MSE approximation
(27) 
which will be used throughout the rest of the paper.
Implementation
Given a prior sample , observation and matrices , , the location of one approximately optimal smoothing parameter involves the following steps:

Calculate , from and estimate from (possibly a subset of) using a few iterations of the EMalgorithm.

Compute using (22).

Numerically minimize (27) with respect to , and return the minimizer as .
Our MATLAB implementation of the smoothing parameter selection procedure uses the built in function gmdistribution.fit for estimating and the minimization of is carried out using the fminbnd optimizer. Our corresponding C++ implementation uses routines similar to those provided in Press et al. (2007), section 16.1 (EM) and section 10.4 (minimization).
4 PreSmoothed Particle Filters
Equipped with an optimal onestep PS update, PSPF for accumulating the loglikelihood follows quite directly, and consists of the following steps:

Simulate , set and .

As for the SIR filter (section 2.2) set to obtain an approximate sample from .

Compute optimal smoothing parameter using the algorithm given in section 3.3.4 with , and given by the model specification.

Sample from the posterior representation (12) to obtain an equally weighted approximate sample from .

Set so that now approximates .

If , set and go to step 2, else stop.
Various bias reduction techniques are available (see e.g. Shephard and Pitt (1997)) for computing the loglikelihood increment , but we do not employ those here in order to make comparisons easy. The PSPF has, like the SIR filter computational complexity . We have found that in practice the bottleneck in the PS update is actually fitting the prior pilot using the EM algorithm. Other, more problem specific pilots are conceivable, but we do not discuss this further here.
4.1 Continuous resampling
The resampling step obtains equally weighted particles from a finite mixture representation (12) of the filter distribution. Resampling is typically done by repetition of first drawing a component in the mixture, and subsequently sampling from this component. This process originates discontinuities in the simulated likelihood function, even if common random numbers are applied for repeated evaluation, and makes maximizing the simulated loglikelihood difficult. This issue was first addressed by Pitt (2002), who obtains continuous draws from a univariate mixture of point masses. For multivariate finite Gaussian mixture representations, Malik and Pitt (2011) provide an algorithm that may be used to as the resampling step in the PSPF for arbitrary (as the variance in each component of are equal). However for we have found algorithms based on computing on a fine grid using fast Fourier transform (FFT) methods and sampling from the corresponding CDFs desirable (more detailed descriptions are given in Appendix B). Like Malik and Pitt (2011)’s algorithm, these FFTbased algorithms have linear complexity in , but we find them easier to program and tune.
4.2 Comparison with other particle filters
To compare the proposed methodology with currently preferred particle filters, we carry out some simulation experiments.
Experiment 1
The first model we consider is is given as
(28)  
(29) 
where denotes a matrix with each element equal to 1.
The distribution of is a Gaussian mixture consisting of three equally weighted components
, and
.
We consider each combination of dimensions and measurement error scales ,
The loglikelihood
is available via the Kalman filter by conditioning
on each component in the mixture, and therefore admit comparison
between the particle filterbased loglikelihood approximations and
the truth for this globally nonGaussian model. We consider a short
time series so that the nonGaussian features introduced
by the initial Gaussian mixture do not die out. The contending filters
are PSPF, SIR, Ensemble Kalman Filter (EnKF), and post and presmoothed
regularized filters as described in Musso
et al. (2001). The latter
two are implemented using Gaussian kernels with bandwidth selection
based on the standard MISEbased plugin formulas (Musso
et al., 2001, equation 2.3),
and are therefore referred to as MISEPost and MISEPre respectively.
The above mentioned filters rely
only on simulation from the state equation, and are therefore directly
comparable with respect to scope. As additional references, we also
compare with Auxiliary SIR (ASIR, based on the mean of
as described in section 3.2 of Pitt and
Shephard (1999)) and a fully
adapted Auxiliary SIR (FASIR, Pitt and
Shephard (1999)). ASIR and
FASIR use knowledge of the mean of , and knowledge
of the full distribution of respectively, and are
therefore not directly comparable to PSPF with respect to scope. We
report in Table 3 the bias (loglike. bias), standard error
(loglike. std.dev.) and RMSE (loglike. RMSE) of the respective estimates
of across 10,000 data sets simulated from (2829).
In addition, as a crude measure for comparing
the posterior simulation performance, we also
report the square root of the expected squared Euclidian distance between
the mean of and the simulated (filter RMSE).
We used for PSPF and for the other filters so
that the computing times for each filter are comparable when implemented
in MATLAB. The mean computing times relative to PSPF (relative CPU time)
are also reported in Table 3.
For all filters but EnKF, noncontinuous resampling was
performed in each time step.
From Table 3, we see that PSPF produces smaller loglikelihood RMSEs then other filters that are based only on simulation of the state, except in the case where MISEPre has the smallest loglikelihood RMSE. However, assuming momentarily that the loglikelihood RMSEs are , it should be noted that even in case, the loglikelihood RMSE would be the smallest for PSPF if the same was applied. The loglikelihood performances of the PSPF and the EnKF are fairly similar, but it should be noted that the EnKF is not consistent, and therefore the biases cannot be eliminated. For increasing dimensions and fixed , PSPF and EnKF becomes more similar, which is a consequence of being chosen closer to 0 in the high cases (to counteract the curse of dimensionality). SIR and MISEPost perform poorly with respect to loglikelihood estimation in all the high signaltonoise ratio () cases, and also in the moderate signaltonoise ratio () cases for . MISEPre performs well in the cases, but the performance relative to PSPF deteriorates as grows.
The ASIR exhibit highly variable second stage weights, suggesting that the generic importance sampling density implicitly introduced works poorly for this model. As it is the optimal one step ahead particle filter, FASIR works extremely well in all cases, with loglikelihood RMSEs that are two orders of magnitude smaller than PSPF. Thus in the (not too often encountered) cases where full adaptation is possible, one should opt for the FASIR over the PSPF.
With respect to posterior simulation performance, PSPF produces filter RMSE results almost identical to those of FASIR, indicating that the posterior samples of PSPF are close to those of the optimal one step ahead filter. MISEPre also produces filter RMSEs close to those of FASIR, which underpins the claim made in Section 3.2 for this model, namely that the posterior estimator of presmoothed updates are relatively insensitive to the choice of smoothing parameter. In the same vein, the loglikelihood results of PSPF relative to those of MISEPre show that loglikelihood estimation is more sensitive to smoothing parameter selection and therefore targeting MSE() as is done here seems highly sensible.
PSPF  SIR  EnKF  MISE  MISE  ASIR  FASIR  
Post  Pre  
loglike. bias  0.070  3.561  0.109  3.468  0.022  96.33  1.6e5 
loglike. std. dev.  0.303  27.49  0.418  24.82  0.328  27.55  0.006 
loglike. RMSE  0.311  27.72  0.432  25.06  0.329  100.2  0.006 
filter RMSE  0.014  0.017  0.014  0.017  0.014  0.017  0.014 
relative CPU time  1.0  0.6  0.5  1.2  1.0  0.9  1.2 
loglike. bias  0.293  1.6e3  0.356  1.6e3  0.572  1.5e3  9.0e5 
loglike. std. dev.  0.608  649.5  0.676  647.4  1.192  585.2  0.007 
loglike. RMSE  0.675  1.8e3  0.764  1.8e3  1.322  1.6e3  0.007 
filter RMSE  0.022  0.183  0.022  0.183  0.022  0.180  0.022 
relative CPU time  1.0  1.1  1.2  2.4  1.7  1.6  2.1 
loglike. bias  0.612  2.2e4  0.634  2.3e4  4.086  2.1e4  3.8e6 
loglike. std. dev.  0.813  4.6e3  0.798  4.6e3  2.540  4.3e3  0.008 
loglike. RMSE  1.018  2.3e4  1.019  2.3e4  4.811  2.2e4  0.008 
filter RMSE  0.032  0.674  0.032  0.674  0.032  0.671  0.032 
relative CPU time  1.0  1.4  1.7  2.9  2.1  2.1  2.6 
loglike. bias  0.066  0.024  0.104  0.019  0.013  16.53  3.5e5 
loglike. std. dev.  0.291  0.300  0.404  0.244  0.173  6.772  0.006 
loglike. RMSE  0.299  0.301  0.417  0.245  0.174  17.86  0.006 
filter RMSE  0.139  0.140  0.139  0.140  0.139  0.152  0.139 
relative CPU time  1.0  0.8  0.5  1.5  1.0  1.0  1.1 
loglike. bias  0.278  3.423  0.340  3.381  0.304  50.47  3.8e5 
loglike. std. dev.  0.597  4.510  0.671  4.420  0.761  8.899  0.009 
loglike. RMSE  0.658  5.662  0.752  5.564  0.819  51.25  0.009 
filter RMSE  0.220  0.244  0.220  0.244  0.221  0.258  0.220 
relative CPU time  1.0  1.1  1.2  2.2  1.7  1.7  2.0 
loglike. bias  0.584  131.2  0.611  131.6  2.694  129.2  7.6e5 
loglike. std. dev.  0.809  41.74  0.797  41.99  1.983  29.49  0.012 
loglike. RMSE  0.998  137.7  1.005  138.1  3.345  132.5  0.012 
filter RMSE  0.309  0.678  0.309  0.677  0.312  0.603  0.309 
relative CPU time  1.0  1.4  1.7  2.7  2.1  2.1  2.6 
Experiment 2
For the nearGaussian model (2829), the EnKF has a similar performance to PSPF. To further explore the difference between PSPF and EnKF, we consider a second simulation experiment that involves the nonlinear model
(30)  
(31)  
(32) 
where . The nonlinear measurement equation is taken from a wellknown test case used by e.g. Andrieu et al. (2010). In particular, such models are capable of generating bimodal filtering distributions as the sign of cannot be determined from observations . For the PSPF and EnKF filters to be applicable, we need to augment the state as indicated in section 3.1, namely
(33)  
(34)  
(35)  