A procedure for the change point problem in parametric models based on divergence teststatistics
Abstract
This paper studies the change point problem for a general parametric, univariate or multivariate family of distributions. An information theoretic procedure is developed which is based on general divergence measures for testing the hypothesis of the existence of a change. For comparing the accuracy of the new teststatistic a simulation study is performed for the special case of a univariate discrete model. Finally, the procedure proposed in this paper is illustrated through a classical changepoint example.
MSC: primary 62F03; 62F05; secondary 62H15
Keywords: Change point; Information criterion; Divergence; Wald teststatistic; General distributions.
1 Introduction
The change point problem has been considered and studied by several authors the last five decades. Change point analysis is a statistical tool for determining whether a change has taken place at a point of a sequence of observations, such that the observations are described by one distribution up to that point and by another distribution after that point. Changepoint analysis concerns with the detection and estimation of the point at which the distribution changes. One change point problem or multiple change points problem have been studied in the literature, depending on whether one or more change points are observed in a sequence of random variables. Several methods, parametric or nonparametric, have been developed to approach the solution of this problem while the range of applications of change point analysis is broad. Applications can be encountered in many areas such as statistical quality control, public health, medicine, finance, biomedical signal processing, meteorology, seismology, etc. The monograph by Chen and Gupta (2000) summarizes recent developments in parametric changepoint analysis.
Typical situations encountered in the literature of parametric multiple change points analysis are as follows: Let be independent variate observations () and let the statistical space associated with the random variable (r.v.) , . The probability density function with respect to a finite measure given by , , , . For simplicity, is either the Lebesgue measure or a counting measure. We adopt in the sequel the formulation of the multiple change point problem as it appeared in Srivastava and Worsley (1986) and Chen and Gupta (2000, 2004). Based on these authors, suppose that adjacent observations are grouped in groups, so that , are in the first group, , are in the second group and we continue in a similar manner until are in the th group.
Consider the model for changes in the parameters. This is formulated as a problem of testing the following hypotheses,
(1) 
versus the alternative
where , , is the unknown number of changes and are the unknown positions of the change points. The above hypotheses can be equivalently stated in the form
(2) 
versus the alternative
with .
There is an extensive bibliography on the subject and several methods to search for the change point problem have appeared in the literature. Among them, the generalized likelihood ratio test, Bayesian solution of the problem, information criterion approaches, cumulative sum method, etc. Based on these methods, several papers discuss the changepoint problems in specific probabilistic models, like the univariate and multivariate normal distribution, the gamma model and the exponential model. For instance, Sen and Srivastava (1980) focused on the single changepoint problem. Moreover, they consider that within each section, the distributions are the same, while the distribution in a section is different from that in the preceding and the following section in mean vector or covariance matrix. For an exposition of these methods and their application to specific distributions we refer to the monograph or the survey paper by Chen and Gupta (2000, 2001) and the references appeared therein.
It has been proposed in these and other treatments (cf., for instance, Vostrikova (1981)), that in order to study the multiple change point problem, which is formulated by (1) or (2), we just need to test the single change point hypothesis and then to repeat the procedure for each subsequence. Hence, we turn to the testing of (2) against the alternative,
(3) 
where the symbol is used to denote that the observations on the left follow the parametric density on the right. In (3), represents the position a single change point, which is supposed to be unknown. A general description of this technique in the detection of the changes is summarized in the following steps by Chen and Gupta (2001). First we test for no change point versus one change point, that is, we test the null hypothesis given by (2) versus the alternative given by (3) and equivalently stated by : . Here, is the unknown location of the single change point. If is not rejected, then the procedure is finished and there is no change point. If is rejected, then there is a change point and we continue with the step 2. In the second step we test separately the two subsequences before and after the change point found in the first step for a change. In the sequel, we repeat these two steps until no further subsequences have change points. At the end of the procedure, the collection of change point locations found by the previous steps constitute the set of the change points.
The subject of change point analysis is twofold. On the one hand to detect if there is one or more changes in a sequence of observation. The second aspect of change point analysis is the estimation of the number of changes and their corresponding locations. In this paper we will develop an information theoretic procedure which is based on divergence, in order to study the change point problem. The measures background is a general parametric, univariate or multivariate family of distributions. We describe formally the framework and the problem in Section 2, and the main results are presented in Section 3. In Section 4 we focus our interest on a specific distribution, the binomial distribution and a simulation study is performed in order to compare the accuracy the new teststatistic with some preexisting teststatistics. In the final Section 5, the general results of this paper are illustrated by means of the wellknown Lisdisfarne scribes data set.
2 Information theoretic procedure
Consider now the single change point problem, that is the problem of testing the pair of hypotheses
(4a)  
(4b)  
which are presented by (2) and (3), respectively. In the above formulation, and are unknown. Since is the unknown location of the single change point, we will consider all the candidate points . Let denotes the maximum likelihood estimator (MLE) of which is based on the random sample from and let denotes the m.l.e. of which is based on the random sample from . If the hypothesis is true, then there is a difference between the probabilistic models and , which cause a large value for a measure of the distance between and . Given that the divergence is a broad family of distance measures between probability distributions, the divergence between and is large if is true and hence it can be used in order to decide if the candidate point in (4b) is a change point (). Taking into account that the m.l.e. and of and , respectively, depend on the candidate change point , we will adopt the following notation for the divergence between and , 
(5) 
provided that the convex function satisfies some additional conditions (see page 408 in Pardo (2006)) which ensure the existence of the above integral. Moreover, we consider convex functions which satisfy and . Large values of support the existence of a change point and therefore large values of suggest rejection of the null hypothesis . Hence can be used as a test statistic for testing the hypotheses (4a). Then, motivated by the fact that large values of are in favor of , a test for testing the existence of a single change point, that is the hypotheses (4a), should be based on the divergence test statistic,
(6) 
where
(7) 
Moreover, the unknown position of the change point is estimated by such that
(8) 
Based on the above discussion, in (4a) is rejected for , where is a constant to be determined by the null distribution of . Hence, in order to use of (6) for testing hypotheses (4a), it is necessary the knowledge of the distribution of , under .
There are two important reasons why working directly with teststatistics , defined in (6), is avoided, on one hand, its asymptotic distribution , is not an easy to handle random variable (see for instance Theorem 1.2 and 1.3 in Gombay and Horváth (1989)) and on the other hand, in practice cases such that are very difficult to detect. Let be the set all possible integers such that , with , small enough. We shall modify (6) to be maximized in , i.e.
(9) 
and in the same manner (8) becomes
(10) 
3 Main result
In order to get the asymptotic distribution of the family of tests statistics , given in (6), we shall assume the usual regularity assumptions for the multiparameter Central Limit Theorem (see for instance Theorem 5.2.2. in Sen and Singer (1993)):
 (i)

The parameter space, , is either or a rectangle in .
 (ii)

For all ,
 (iii)

For ,
exist almost everywhere and are such that
where
 (iv)

Denoting ,
exist almost everywhere and are such that the Fisher information matrix is finite and positive definite. In addition, where
with and is the Euclidean norm.
Theorem 1
Proof. According to the properties of the MLEs we know that
where for , such that , , is the information matrix. If we consider that , then
This means that under , i.e. ,
and hence we can construct a Waldtype teststatistic as follows
(13) 
where is any consistent estimator of . From Theorem 1 in Hawkins (1987) we know that
In addition from Pardo (2006), page 443, we have
where
With both results we conclude (11).
Remark 2
If we compare (13) with formula (2.3) in Hawkins (1987), both apparently are not equivalent because in our case appears rather than of formula (2.3). This difference is associated with the way of understanding Fisher Information matrix, in fact our Wald teststatistic coincide with the empirical stochastic process denoted by at the beginning of Section 3 in Hawkins (1987).
Remark 3
The probability distribution function of random variable , for , given in (12), can be found in Sen (1981, page 397) and De Long (1981). The computation of the probability distribution function is complex, however it is possible to approximate the value of the test in which the distribution of is considered under the null hypothesis. In Estrella (2003), for instance,
(14) 
with being the Gamma function, is proposed as an approximation of
When calibrating the approximation for the univariate parameter (), we can take into account that the exact quantiles of order for , are , and respectively, i.e. , , . If we use (14) with and the aforementioned quantiles, we obtain , , . We can see that in particular, approximates very well when is the quantile of order , which is in practice of major interest.
4 Simulation Study
In this section we are going to focus on the change point analysis for a particular discrete probability model, the binomial model. For this special case we will give an explicit expression for divergence based teststatistics. The accuracy will be compared by simulation with respect to preexisting teststatistics. In this context, suppose we are dealing with a sequence of independent r.v.’s , , for which we are interested in testing (1). In order to do that we are going to consider a sequence of independent Bernoulli r.v.’s , , , whose probability mass function (p.m.f.) is given by , , and , . If we denote the cumulative steps between consecutive Binomial r.v.’s by
the change points are located at for and at for . Hence, is the only sequence of r.v.’s which are strictly identically distributed, but the change points of interest are located in . This means that we can construct the teststatistic by considering a sequence of i.i.d. r.v.’s but in addition we restrict the set of possible change points to , rather than one step change points. When the change point is located at , the MLEs of and are given by
The likelihood ratio teststatistic is given by , where
(15) 
Two important papers which cover are Worsley (1983), and Horváth (1989). The expression they gave for is not exactly the same, but it is equivalent to (15) (see formula (3.22) in Horváth and Serbinowska (1995)). Horváth (1989) found that the asymptotic distribution for a kind of normalization of based on the DarlingErdös formula
is asymptotically equal to a Extreme Value random variable with parameters and . In addition, in Theorem 1.2 of Horváth and Serbinowska (1995), a modified version of the likelihood ratio teststatistic was given, , where
The asymptotic distribution of is the supremum in of a standard univariate Brownian bridge (its probability distribution function is tabulated in Kiefer (1959)). We consider the version of the Wald teststatistic , with
where the consistent estimator of is given by
Finally, in order to give an explicit expression for divergence based teststatistics we are going to focus on a family of divergences, power divergences (see Read and Cressie (1988)), for which , if and , if , that is for each we obtain a different divergence measure between the p.m.f.s and ,
When the power divergence coincides with the so called Kullback divergence
and when the power divergence coincides with the modified Kullback divergence . Hence, the shape of the powerdivergence based teststatistics is , where
that is
(16) 
and
(17) 
Assuming that there is a monotone, continuous function such that and
the asymptotic distribution of and , for all , is the supremum in of the univariate tieddown Bessel process, i.e. (12) with . This assumption is very similar to the assumption given in Horváth and Serbinowska (1995) for the asymptotic distribution of .
A simulation study is performed in order to compare the accuracy of the proposed power divergence type test with respect to preexisting teststatistics. In this context we apply teststatistics , , , , , with replication. The design is essentially the same as the study performed in Horváth and Serbinowska (1995):