Approximate simulation-free Bayesian inference for multiple changepoint models with dependence within segments

Approximate simulation-free Bayesian inference for multiple changepoint models with dependence within segments

Jason Wyse, Nial Friel and Håvard Rue
University College London, London, UK
University College Dublin, Belfield, Dublin 4, Ireland
Norwegian University of Science and Technology, Trondheim, Norway
May 2011

This paper proposes approaches for the analysis of multiple changepoint models when dependency in the data is modelled through a hierarchical Gaussian Markov random field. Integrated nested Laplace approximations are used to approximate data quantities, and an approximate filtering recursions approach is proposed for savings in compuational cost when detecting changepoints. All of these methods are simulation free. Analysis of real data demonstrates the usefulness of the approach in general. The new models which allow for data dependence are compared with conventional models where data within segments is assumed independent.

1 Introduction

There is a substantial volume of literature devoted to the estimation of multiple changepoint models. These models are used frequently in econometrics, signal processing and bioinformatics as well as other areas. The idea is that “time” ordered data (where time may be fictitious and only refers to some natural ordering of the data) is assumed to follow a statistical model which undergoes abrupt changes at some time points, termed the changepoints. The changepoints split the data into contiguous segments. The parametric model assumed for the data usually remains the same accross segments, but changes occur in its specification. For example, in the famous coal mining disasters data (?), disasters are usually assumed to follow a Poisson distribution where the rate of this distribution undergoes abrupt changes at specific timepoints. ? discusses how to perform exact simulation from the posterior distribution of multiple changepoints for a specific class of models using recursive techniques based on filtering distributions. The class of models considered assumes data is independent within a homogeneous segment and the prior taken on the unknown model parameters for that segment allows analytical evaluation of the marginal likelihood for that segment. The paper of ? proposes a very promising step forward for the analysis of multiple changepoint models, where the number of changepoints is not known beforehand. The methods developed there allow for efficient simulation of large samples of changepoints without resorting to MCMC.

An obstacle which may prevent wide applicability of the methods discussed in ?, is the requirement that the assumed model must have a segment marginal likelihood which is analytically tractable. However, such a requirement can usually not be fulfilled by models which allow for data dependency within a segment, a desirable assumption in many situations. Dependency is possible across regimes in some cases (see ?), but the assumption of independent data still holds. The main aim of this paper is to provide a solution to these issues and open up the opportunity for more complex segment models which allow for temporal dependency between data points. This is achieved by hybridizing the methods in ? and recent methodology for the approximation of Gaussian Markov random field (GMRF) model quantities due to ? termed INLAs (integrated nested Laplace approximations).

The INLA methodology provides computationally efficient approximations to GMRF posteriors, which have been demonstrated to outperform MCMC in certain situations (?). An advantage to such approximations is that they avoid lengthly MCMC runs to fully explore the posterior support and they also avoid the need to demonstrate that these runs have converged. Another advantage is that the approximations may be used to estimate quantities such as the marginal likelihood of the data under a given GMRF model, the quantity which is of main interest here to overcome the requirement of an analytically tractable segment marginal likelihood in ?.

The R-INLA package ? for R-2.11.1 may be used to do all of the aforementioned approximations for a range of GMRF hierarchical models. It aims to give an off-the-shelf tool for INLAs. Currently the package implements many exponential family models; Gaussian with identity-link; Poisson with log-link; Binomial with logit-link; for many different temporal GMRFs; random effects models; first order auto-regressive; first and second order random walk (neither of these lists are exhaustive). The package also implements spatial GMRFs in two and three dimensions and is currently still evolving with new additions on a regular basis. Use of this package avoids programming for specific models as it allows the selection of any observational data model and selection of the desired GMRF through a one line call to the R-INLA package. The R-INLA package is used for all the computations on hierarchical GMRF models in this paper.

The remainder of this paper is organised as follows. Section 2 gives a brief review of recursions for performing inference conditional on a particular number of changepoints as given in ?. In Section 3 possible computational difficulties are discussed and solutions for these are proposed. Sections 4 and 5 analyze real data examples; the coal-mining data is analyzed using a model with dependency and this is compared with the analysis of ?; and Well-log data (?) is analyzed with a model that allows for dependency between adjacent data points, such that the dependency relation may change across segments. Section 6 explores the possibility of detecting changepoints under the assumption of a stochastic volatility model. The paper concludes with a discussion.

2 Changepoint models

? gives a detailed account of how filtering recursions approaches may be applied in changepoint problems. Some of the models considered there used a Markov point process prior for the number and position of the changepoints.  ? demonstrated that the posterior distribution may sometimes be sensitive to the choice of the parameters for the point process. In this paper, the focus will be on performing inference for the changepoint positions after estimating the most probable number of changepoints a posteriori, although it is noted that the methods also apply to the case of a point process prior. Denote ordered changepoints by . The likelihood for the data , conditional on the changepoints and the latent field , assuming segments are independent of one another is

where , represents the part of the GMRF which belongs to the segment, and are the segment hyperparameters. Independent priors are taken on the members of and the changepoints given their number. The prior taken on changepoints is assumed to have the product form

where . Note that this prior is conditional on a given number of changepoints, . The idea is to introduce a prior on and use the hierarchical form


to find the most likely number of changepoints. Using this, the most likely positions for the changepoints can then be found.

2.1 Recursively computing the posterior

Let . Then is the probability of the data from time point onwards given the changepoint is at time and there are changepoints in total, meaning that there are changepoints between times and . It is possible to compute in a backward recursion;

with going from to and going from to , where is the marginal likelihood of the segment . The marginal likelihood of under a changepoint model may be computed as


2.2 Choice of changepoint prior and computational cost

It will be necessary to compute for a range of values, say in order to do inference for using (1). This requires computational effort in and storage requirements in which could be costly. Both of these may be reduced by choosing an appropriate changepoint prior. One such prior, as used and noted by ?, is to take changepoint positions distributed as the even numbered order statistics of uniform draws from the set without replacement. Doing this gives

where and the normalizing constant . Using this prior restricts the dependence of the prior on the number of changepoints to the normalizing constant only, meaning that

Reusing these values gives a reduction by a factor of in computational effort and storage requirements. The recursions are now




Then (4) is divided by to correctly normalize the prior and (1) is obtained by multiplying this by the prior weight for changepoints . This prior will be used in the examples later.

2.3 Posterior of any changepoint

Since the prior on changepoints makes the changepoint model factorizable, it is possible to write down the posterior distribution of conditional on and ;

This is used for the forward simulation of changepoints once the backward recursions have been computed. It is also used to give the modal changepoint configuration as in the examples later.

3 Approximate changepoint inference using INLAs

The essential ingredient of the approach presented in this paper is to replace the segment marginal likelihood in the recursions

with a segment marginal likelihood approximated using INLA. It is the case that needs to be available in closed form to use a filtering recursions approach. This will never be the case for hierarchical GMRF models, which can account for within segment dependency. However, INLAs can be used to get a good approximation to for hierarchical GMRF segment models. This opens up the opportunity for more realistic data models in many cases. There are also two other advantages: the posterior of the number of changepoints may be well approximated for model selection; and the posterior of any given changepoint can be computed to a high degree of accuracy.

There are two potential drawbacks of the proposed approach however. The first is that it could require fitting a GMRF model to a very small amount of data, which could be limiting depending on the complexity of the within-regime model. For example, at least five data points would be required to make fitting a first order auto-regressive random field feasible. This means that for the approach to be reasonable it may be necessary to expect changepoints to be quite well separated. The second potential drawback contrasts with the first. For large amounts of data, using INLAs to compute the segment marginal likelihoods necessary to compute the recursions (3) could be costly. The next section proposes a way to overcome both of these problems simultaneously, while still retaining almost all of the advantages of using a filtering recursions approach. This proposed solution is termed reduced filtering recursions for changepoints (RFRs).

3.1 Reduced filtering recursions for changepoints

The main idea of RFRs is to compute the recursions at a reduced number of time points and approximate the full recursions (3). The recursion is not computed at every time point which takes computation. The motivation is that if segments have a reasonable duration, changepoints can be detected in the region where they have occurred.

An analysis using RFRs permits a changepoint to occur at some point in the reduced time index set with for all . The assumption is that if there is a changepoint between and it can be detected at . For convenience, define and . The spacing of the is an important issue. If the spacing is too wide, then changepoints will not be detected. If the spacing is too narrow, many points are required to represent the entire data, thus increasing the computation time. If there is little prior knowledge of where changepoints occur the natural choice is equal spacing; for some choice of . The following example briefly explores the choice of and makes the preceding discussion clearer.

Consider the data simulated from a Gaussian changepoint model shown at the top of Figure 1(a) with a clear change at 97. Searching for one changepoint, the bottom three plots in Figure 1(a) show the posterior probability of a changepoint for reduced time index sets given by . Note that corresponds to the original recursions (3). For the changepoint is detected at 95 and detects it at 100. In both cases the changepoint is identified as the closest possible point to its actual position. Figure 1(b) shows a similar example, where this time one of the segments is very short (only 13 points). Again, the changepoint is identified at the closest possible position in the cases of . In the case of it is the second closest, possibly due to the noise in the data contaminating the separation of the two regimes. The magnitude of the regime changes in these examples are large for illustration. If the magnitude of the change is small it will be necessary to have a longer segment to identify its presence with high power. It is also the case that if changes are short-lived, a large value of may cause changepoints to be missed.

Figure 1: Results when searching for one changepoint in simulated Gaussian data for . It can be seen that the changepoint is detected at one of its closest neighbouring points in the reduced time index set.

3.1.1 Recursions on the reduced time index set

The changepoints are . The reduced time index set is . The changepoint prior is now defined on the set of numbers and we let if . That is, corresponds to the changepoint position if time is indexed by whereas gives the changepoint position in the reduced time index set . Define


Then recursively, for and

After computing these, the approximate marginal likelihood of the data conditional on changepoints follows as,

When the grid spacing is not too large, that is is greater than a reasonable multiple of , the approximation to the marginal probability of changepoints should be reasonable for the competing models. There are many computational savings with this approach. Using the RFRs decreases the number of marginal likelihood evaluations required to where

3.1.2 Distribution of any changepoint

When the maximum a posteriori number of changepoints has been found, it is determined where the changepoints are most likely to occur on the reduced time index set. The distribution of is


Instead of generating samples of changepoints, our focus is to deterministically search for the most probable changepoint positions a posteriori. The first changepoint detected on the reduced time index set will be

Conditioning on the search proceeds for in the same way. In general,

This procedure is repeated until the changepoints are found.

3.1.3 Refining changepoint detection

After detecting changepoints on the reduced time index set, it is possible to refine the search and hone in on the most likely position of the changepoint. To begin, the changepoints obtained from the search above, where , will all be multiples of . Condition on the value of to update . Compute

using INLAs for . Then take to be the which maximizes this. Similarly maximizes

This procedure can be carried out just once, or repeated until there is no difference between updates.

This step does of course require additional computation. It may not be necessary in all cases to carry out a refined search. For example, the case of large and small would mean that refining the search would probably give little additional information. This approach should give near the global MAP for the changepoint positions. To ensure the global MAP is found it would be necessary to use some sort of Viterbi algorithm which would also use the approximated marginal likelihood values.

3.1.4 Simulation of changepoint positions

The approximate methods discussed here are entirely simulation free. It may be useful to allow for simulation of changepoints from their joint posterior, once all of the marginal likelihoods have been approximated as in ?. Introduction of the RFRs makes this a little more difficult here. We expect that for larger values of the distribution of the changepoints on the reduced time index set will be quite degenerate. This is since for a changepoint in a given region, it will always be detected at the same point in the reduced time index set. However, for smaller values of it may still be possible to simulate changepoints on the reduced time index set and refine their positions by simulating from a distribution which conditions on the two neighbouring changepoints- a stochastic version of the approach to refine the position of the changepoint discussed above (Section 3.1.3).

3.1.5 Exploring approximation error and computational savings in a DNA segmentation example

To get a rough idea of the approximation error and the possible computational savings to be made by using RFRs, the methods were applied in a DNA segmentation task with a conditional independence model. This deviates from the general theme of the paper (to fit models relaxing conditional independence), however, it is included to offer some insight into RFRs in general.

DNA sequence data is a string of the letters A,C,G and T representing the four nucleic acids, adenine, cytosine, guanine and thymine. Interest focuses on segmenting the sequence into contiguous segments characterized by their CG content. It is assumed that within a segment the frequency of constituent acids follows a multinomial distribution, so that

With a prior on the marginal likelihood for a segment is

where is the number of occurences of acid in the segment from to inclusive.

The data analyzed is the genome of a parasite of the intestinal bacterium Escherichia coli. The sequence consits of 48,502 base pairs, and so will provide a good measure of the computational savings to be made for larger datasets when using RFRs. This data has previously been analyzed by ?, who implemented a hidden Markov model using RJMCMC to select the Markov order. Here however, a changepoint model assuming data in segments are independent is applied. Cumulative counts of the nucleic acids over location along the genome are shown in Figure 2.

Figure 2: Cumulative counts of A,C,G,T for the DNA data. Identified changepoints are overlain (vertical lines).
1 5 10 15 20 25
Time taken (s) 1353.69 51.78 13.05 5.99 4.09 3.03
Changepoints 176 176 176 176 176
20092 20101 20092 20092 20092 20092
20920 20920 20920 20920 20920 20920
22546 22585 22546 22546 22546 22546
24119 24119 24119 24119 24119 24119
27831 27831 27831 27831 27831 27831
31226 31226 31226 31226
33101 33101 33101 33101 33101 33089
38036 38036 38049 38011 38036 38036
46536 46536 46536 46536 46501 46501
Table 1: Location of changepoints and computing time for DNA segementation example. As increases there is little deviation in changepoint estimates. Reported changepoints are found after a refined search.

The RFRs were applied to this data using an equally spaced reduced time index set with . The prior taken on the number of changes was uniform on . All runs were on a 2.66GHz processor written in C and the segment marginal likelihoods calculated in a step before the recursions were computed. Table 1 gives the identified changepoints and the computing time for each analysis. The value corresponds to filtering recursions on the entire data. It can be seen that using RFRs does not appear to have a considerable effect on the detected changepoints. However, there are drastic differences in computing time- the RFRs for give a 450 fold decrease in computing time with respect to recursions on the full data set. It can be seen that as the value of increases there are slight deviations in the result. For example, with , nine changepoints is most probable a posteriori, compared with ten for all other values. Also, for the second and fourth changepoints are detected in a different position to all other values of . This could be overcome by allowing for a wider search than just points either side in the refined search (Section 3.1.3). Despite this, the computational savings are large, and the approach appears to successfully isolate the regions where changepoints occur in a large search space.

It should be noted that the computation of the marginal likelihoods can be nested, although this was not done here. For example, the marginal likelihood calculations for could be reused for and likewise, some of the calculations for can be used for if it is desired to perform analysis for different values of .

3.2 Other approaches to save on computation

The RFRs approach reduces the computation necessary to perform analysis for changepoint models by introducing an approximation to full filtering recursions. This is complimentary with another approach to reduce computation suggested by ?. There, recursions are “pruned” by truncating the calculation of the backward recursion when the terms to be truncated will only contribute negligibly to the overall value. This occurs in situations where it is clear that a changepoint will have occurred before a certain time in the future, and so considering times after this point does not lead to gains in information. These ideas could be used together with the approach presented here to gain extra speed up in computation. In particular, RFRs are useful for situations where segment sizes are large while pruning ideas are useful when the number of changepoints is large (so that calculations may be truncated often), so combining the two approachess should be useful for large scale problems with regular changes and will lead to even greater computational savings than just using RFRs alone.

4 Coal mining disasters

This data records the dates of serious coal-mining disasters between 1851 and 1962 and is a benchmark dataset for new changepoint approaches. It has been analyzed in ????? and ?, amongst others. In all of these analyses it is assumed that observations arise from a Poisson process. This Poisson process is assumed to have intensity which follows a step function with a known or unknown number of steps. These steps or “jumps” in intensity occur at the changepoints. Other models have also been fit to this data. For example, a smoothly changing log-linear function for the intensity of the Poisson process:

(see for example ? and the original source of this data ?). The log-linear intensity model would favour more gradual change, rather than the abrupt changes implied by changepoint models. There is an argument for some of the elements of such a model that allows for gradual change. Although, as noted in ?, abrupt changes in this data are most likely due to changes in the coal mining industry at the time, such as trade unionization, the possibility of more subtle changes in rate could and should be entertained. A GMRF model applied to this data should be able to model gradual as well as abrupt change.

As in ? a week is the basic time unit. The data spans 5,853 weeks over 112 years. The latent field is taken as AR(1). This allows for an inhomogeneous Poisson process within segments, opening up the possibility for gradual change. The rate of the Poisson process is related to the field through a log-link function. More specifically,


The parameter is an intercept and follows an AR(1) process with persistence parameter .

Priors were chosen to loosely mimic the behaviour of the data. The priors chosen were

where we have reparametrized . Following ? and ?, the prior on the number of changepoints was taken to be Poisson with mean 3.

A spacing of was used. Figure 3 (a) shows the posterior distribution of the number of changepoints for the AR(1) latent field model. A two changepoint model is most likely, a posteriori. Figure 3 (b) shows the most likely position of these changepoints computed using the methods of Section 3.1.2. A plot of the log intensity of the Poisson process over the entire 5,853 weeks is shown in Figure 4, obtained by conditioning on the MAP changepoint positions from the two changepoint model. From this it can be argued that a model accounting for gradual changes in the rate of disasters is not entirely unjustified. There appears to be small fluctuations of rate around a mean rate. These fluctuations are treated differently to the two abrupt changes that are detected by the GMRF model.

Figure 3: Coal mining data: results from an anlysis using INLAs and . The figure on the left shows the posterior distribution of the number changes while that on the right shows the cumulative counts of disasters and the changepoints indicated (blue dashed line).
Figure 4: Coal mining data: Inferred log intensity by week.

There is a discrepancy between the posterior of the number of changepoints from RFRs given here and that given in ? (see Figure 1(a) there) which both allowed changepoints at all possible points in the data. This is a good opportunity to further investigate the approximation error introduced by using RFRs. Figure 5 shows the posterior number of changepoints obtained from using grids of size for the model and prior assumptions in ?. It is clear that as the value of increases, the RFRs become less sensitive to small or short lived changes for this model, as might be expected. However, at large values of the ability to pick out two abrupt changes does not seem to diminish. As pointed out by one reviewer, a simple strategy for choosing is possible by exploiting the nesting ideas outlined in Section 3.1.5. Starting out with a large value of corresponding to a coarse search this may be gradually reduced to see how values of the approximated marginal likelihood for a given number of changepoints differs. Approximate marginal likelihoods computed for larger values of may be recycled in doing computations for the more refined searches.

Figure 5: Investigating approximation error in RFRs; results from analyses of coal mining disasters with different values of using the model from Fearnhead (2006).

It is possible to compute approximate Bayes factors for the GMRF and independent data models conditional on there being a given number of changepoints. The marginal likelihood of the data conditional on changepoints is approximately

The different models are characterized by model assumptions and consequently the way in which the segment marginal likelihoods are computed;

The approximate Bayes factor for the GMRF model versus the analytic model conditioning on changepoints is given by

For a one changepoint model, this was and for two changepoints it was . This implies that there is more support for the GMRF model in these cases, suggesting that modelling small scale variation in the rate of disasters is worthwhile. This supports the interpretation of Figure 4. It is well known that Bayes factors can be sensitive to prior assumptions. In this case prior hyperparameters have been chosen with care for both the independent data model and the GMRF model. A change in these choices could potentially lead to a different strength of conclusion as to which is the best model. However, it is still promising in this setting to see that modelling the dependency in the data appears worthwhile.

5 Well-log data

The Well-log data (?) records 4050 measurements on the magnetic response of underground rocks obtained from a probe lowered into a bore-hole in the Earth’s surface. The data is shown in Figure 6. The model fitted here aims to account for dependency in the nuclear magnetic response as the probe is lowered into the bore-hole. This is an improvement on the independence model fitted in Section 4.2 of ?; as the probe lowers, it moves through different rock strata and some will have greater depth than others. Therefore, it would be expected to see some correlation between observations arising from rock strata of the same type. Fitting this model can also reduce the detection of false signals as changepoints. See ? for a discussion of the issue of outliers in Well-log data.

Figure 6: Well-log data. Observations are the nuclear magnetic response recorded by a probe being lowered into a bore-hole in the Earth’s surface.

Since this is a large data set a larger value of should be used to isolate regions where changepoints occur. This vastly reduces the computational time required for the necessary approximations for data of this size. Analyses using were carried out, choosing the prior parameters using the information obtained from an analysis using MCMC and an independent data model. In each instance numerical instability prevented the recursions on the reduced time index set from being computed. This happened because the scale of the data is so large (). In general, measures need to be introduced to prevent numerical instabilities in these types of recursions. In the computations of the RFRs a measure similar to those in ? (changepoint models) and ? (hidden Markov models) was employed. This consisted of two steps to ensure stability. Firstly, compute

and then

The reason these do not work here is that the large scale of the data means that is much larger than usual, since it is the marginal likelihood of points. It thus makes the argument to the exponential function in the first stabilizing equation cause instabilities at some points. This then carries through the remainder of the recursions.

A simple way to overcome the issues is to just do an equivalent analysis of the data on a smaller scale, so that large is avoided. Simply dividing the data by its sample standard deviation reduces the scale appropriately. The parameters for the prior specification were also adjusted to allow for the difference in scale to give the priors

where . The prior on here gives most prior weight to values of in (about 93%). This will allow the possibility for the AR(1) GMRF model to closely approximate the behaviour of a random walk of order one. However, it still allows the freedom for the dependence pattern to vary across segments. ? fits a random walk model of order one to this data, showing that a latent field can be robust to short lived changes and outliers for Well-log data. A uniform prior on was taken for the number of changepoints.

Figure 7: Posterior of the number of changepoints for the Well-log data fitting an AR(1) GMRF model. This suggests the most likely number of changepoints a posteriori is 19.

Figure 8: Well-log data: results from RFRs and INLA (top) and independent data model.

For the final analysis was taken to be 25. This reduced the necessary number of approximate marginal likelihood approximations from roughly (for ) to ; over 600 times less. The computations for these approximations took about a day of computing time. This appears lengthly, however this should be judged along with the fact that the model is more flexible and that the mean signal can be estimated at every point in the data. Figure 7 shows the posterior probability of the number of changepoints. The mode is at 19, but there appears to be support for up to 22. Conditioning on 19 changepoints, their locations were determined using the search strategy outlined in Section 3.1.2. These locations were then refined to hone in on the actual changepoint positions. Conditioning on these positions inference was carried out for the latent field. This is shown in the top figure of Figure 8. The field appears to follow the trend of the data closely, while the changepoint model caters for abrupt change. ? compared the results of a first order random walk field to those from an independent Gaussian model for the data. Similarly, the results from the GMRF model here are compared with those obtained using an MCMC sampler with an independent data model on the Well-log data. For comparison, the 54 most likely changepoints (mode of posterior) were taken from the independent Gaussian model, and segment means were computed conditional on these (bottom of Figure 8). It can be seen that the independent model is sensitive to changes in the mean and is conservative when inferring changepoints (more rather than less). The GMRF model however appears to be more robust to noisy data points and only infers changepoints when abrupt changes occur in the field.

6 Stochastic volatility data

The aim of this section is to explore whether INLAs and RFRs can be used to estimate changepoint models where segment observations are assumed to arise from a stochastic volatility model. To this end, the approach proposed could potentially used to detect shocks in financial and other time series. The segment model assumed is

with following an AR(1) process with persistence parameter and innovation variance where may be interpreted as an intercept for the volatilities. Data in different segments are assumed independent, so that concern here is only in the complex intra segment correlation structure.

Figure 9: Simulated stochastic volatility data with the corresponding latent log squared volatilities shown on the bottom.
Figure 10: Posterior of the number of changepoints for simulated stochastic volatility data.
Figure 11: MAP changepoint positions conditioning on six changepoints.

The approach was applied to a simulated data set of length 1000 with eight changepoints at times 200, 400, 600, 700, 800, 850, 900 and 950, shown along with the corresponding log latent squared volatilities in Figure 9. Segment parameters were chosen as outlined in Table 2. The length of the segments were reduced as well as the magnitude of the regime change to see how powerful the approach is in detecting smaller and smaller changepoints.

Segment 1 2 3 4 5 6 7 8 9
0.9 0.8 0.9 0.7 0.8 0.9 0.8 0.9 0.8
0 2 0 1 0 0.5 0 0.25 0
Table 2: Segment parameters for simulated stochastic volatility data.

The priors assumed for the analysis were

and the computations were done for a reduced time index set with spacing . Priors were chosen to loosely mimic the behaviour of the data. The prior chosen for the number of changepoints was .

Figure 10 shows the posterior of the number of changepoints with most support for six changes, but reasonable support for any number from five to eight. The changepoint positions found using the search strategy of Section 3.1.3 while conditioning on six changepoints were 192, 400, 595, 697, 806 and 939. These changepoints are shown with the data in Figure 11. The changepoints are not detected at their exact positions, but are roughly close to the true positions given the scale of the data. The two changepoints at 850 and 900 are missed by this estimation strategy. These two changes are small in magnitude and are barely noticeable by simply looking at the data. In this situation it may be too difficult for any estimation strategy to distinguish between noise and a weak signal.

7 Discussion

This paper demonstrates two new useful approximate methods for changepoint problems when the assumption of independent data is relaxed. The first of these was INLAs, a new approximate inference method for GMRFs due to ?. This allows the marginal likelihood for complex segment models to be evaluated approximately, so that it may be used for an approximate filtering recursions approach.

Some computational considerations led to the second proposed method. Instead of performing filtering recursions analysis on the entire data, RFRs were introduced so that recursions may be computed only on a reduced time index set, thus using all of the data, but only searching for changepoints in the general region where they occur. It was demonstrated that this method can be useful in cutting computation time for larger datasets by applying it to a DNA segmentation example with about 49,000 data points.

The hybrid INLAs-RFRs methodology was applied to three different data examples. The first of these was an analysis of the coal mining disasters data where the model allowed for small scale variation in the intensity of the process and allowed for week to week dependency. This new model was more supported by the data than the usual step function intensity models which are often fitted. This was demonstrated by approximate calculation of Bayes factors for the GMRF model and the independent data model for one and two changepoint models. The GMRF model out-performed the independent data model in both cases. The second example was an analysis the Well-log data of ?. It was shown that allowing for segment dependency can be more robust to noisy observations, and that unnecessary changepoints (short lived changes, outliers etc.) are not inferred in this case. For the final example, the methods were applied to some simulated stochastic volatility data. Performance of the approach was promising in this case, however, there was difficulty in detecting some smaller changes.

It is worth noting again that RJMCMC would be practically infeasible for the data models considered here. This is since in addition to the issue of efficient sampling from hierarchical GMRF models (see ?), there is also segmentation of the data. Thus adding a new changepoint would require designing a reversible move between proposed and current field hyperparameters and in addition resampling field elements. Making moves of this type which exhibit good mixing would be challenging, and further diagnosing convergence would be difficult with the chains possibly requiring very long run times. This gives the approximate approach even more of an advantage. This is true especially in the case of models which require good corresponding proposal densities to perform well when it comes to MCMC, such as stochastic volatility models.

Overall, this paper has explored a promising new direction for estimation of changepoint models by creating a hybrid of two popular methods in their respective fields, namely INLAs in the GMRF field of study, and filtering recursions for sequential changepoint model estimation. Other data models are possible which have not been applied to any of the examples in this paper. For example, it is possible to have higher order Markov dependencies for random walk fields in the R-INLA package. Zero inflated Poisson and Binomial data models are also possible.


The authors would like to thank the Editor, Associate Editor and two anonymous reviewers for their attentive reading of the manuscript and valuable suggestions which improved the presentation of the ideas in the paper. In particular we would like to thank the Editor for pointing out an inconsistency in results which led to us spotting a coding error in the DNA example. The first author would like to dedicate his work on this paper to the memory Cian Costello (1984-2010) who is enormously missed as a colleague and friend.


  • [1]
  • [2] [] Boys, R. J. & Henderson, D. A. (2004), ‘A Bayesian Approach to DNA Sequence Segmentation’, Biometrics 60, 573–588.
  • [3]
  • [4] [] Carlin, B. P., Gelfand, A. E. & Smith, A. F. M. (1992), ‘Hierarchical Bayesian Analysis of Changepoint Problems’, Applied Statistics 2, 389–405.
  • [5]
  • [6] [] Chib, S. (1998), ‘Estimation and comparison of multiple change-point models’, Journal of Econometrics 86, 221–241.
  • [7]
  • [8] [] Cox, D. R. & Lewis, P. A. W. (1966), The Statistical Analysis of Series of Events, Methuen, London.
  • [9]
  • [10] [] Fearnhead, P. (2005), ‘Exact Bayesian Curve Fitting and Signal Segmentation’, IEEE Transactions on Signal Processing 53, 2160–2166.
  • [11]
  • [12] [] Fearnhead, P. (2006), ‘Exact and efficient Bayesian inference for multiple changepoint problems’, Statistics and Computing 16, 203–213.
  • [13]
  • [14] [] Fearnhead, P. & Clifford, P. (2003), ‘On-Line Inference for Hidden Markov Models via Particle Filters’, Journal of the Royal Statistical Society, Series B 65, 887–899.
  • [15]
  • [16] [] Fearnhead, P. & Liu, Z. (2010), ‘Efficient Bayesian analysis of multiple changepoint models with dependence across segments’, Statistics and Computing . To appear.
  • [17]
  • [18] [] Green, P. (1995), ‘Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model determination’, Biometrika 82, 711–732.
  • [19]
  • [20] [] Jarrett, R. G. (1979), ‘A note on the intervals between coal-mining disasters’, Biometrika 66, 191–193.
  • [21]
  • [22] [] Ó Ruanaidh, J. J. K. & Fitzgerald, W. J. (1996), Numerical Bayesian Methods applied to Signal Processing, Springer, New York.
  • [23]
  • [24] [] Raftery, A. E. & Akman, V. E. (1986), ‘Bayesian Analysis of a Poisson Process with a Change-Point’, Biometrika 73, 85–89.
  • [25]
  • [26] [] Rue, H. & Held, L. (2005), Gaussian Markov Random Fields: Theory and Applications, Vol. 104 of Monographs on Statistics and Applied Probability, Chapman & Hall, London.
  • [27]
  • [28] [] Rue, H., Martino, S. & Chopin, N. (2009), ‘Approximate Bayesian Inference for Latent Gaussian Models Using Integrated Nested Laplace Approximations (with discussion)’, Journal of the Royal Statistical Society, Series B 71, 319–392.
  • [29]
  • [30] [] Scott, S. L. (2002), ‘Bayesian Methods for Hidden Markov Models: Recursive Computing in the 21st Century’, Journal of the American Statistical Association 97, 337–351.
  • [31]
  • [32] [] Wyse, J. & Friel, N. (2010), ‘Simulation-based Bayesian analysis for multiple changepoints’, Available at .
  • [33]
  • [34] [] Yang, T. Y. & Kuo, L. (2001), ‘Bayesian Binary Segmentation Procedure for a Poisson Process with Multiple Changepoints’, Journal of Computational and Graphical Statistics 10, 772–785.
  • [35]
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description