Partial quasi likelihood analysis1footnote 11footnote 1 This work was in part supported by CREST JPMJCR14D7 Japan Science and Technology Agency; Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research No. 17H01702 (Scientific Research) and by a Cooperative Research Program of the Institute of Statistical Mathematics.

Partial quasi likelihood analysis1


Summary  The quasi likelihood analysis is generalized to the partial quasi likelihood analysis. Limit theorems for the quasi likelihood estimators, especially the quasi Bayesian estimator, are derived in the situation where existence of a slow mixing component prohibits the Rosenthal type inequality from applying to the derivation of the polynomial type large deviation inequality for the statistical random field. We give two illustrative examples.  
 
Keywords and phrases  Partial quasi likelihood analysis, large deviation, quasi maximum likelihood estimator, quasi Bayesian estimator, mixing, partial mixing.  

1 Introduction

The Ibragimov-Has’minskii theory enhanced the asymptotic decision theory by Le Cam and Hájek by convergence of the likelihood ratio random field, and was programed by Kutoyants to statistical inference for semimartingales. The core of the theory is the large deviation inequality for the associated likelihood ratio random field. Asymptotic properties of the likelihood estimators are deduced from those of the likelihood ratio random field. Precise estimates of the tail probability and hence convergence of moments of the estimators follow in a unified manner once such a strong mode of convergence of the likelihood ratio random field is established. For details, see Ibragimov and Has’minskii [IbragimovHascprimeminskiui1972, IbragimovHascprimeminskiui1973, IbragimovHascprimeminskiui1981] and Kutoyants [Kutoyants1984, Kutoyants1994, Kutoyants1998, Kutoyants2004].

The quasi likelihood analysis (QLA) descended from the Ibragimov-Has’minskii-Kutoyants program.3 In Yoshida [yoshida2011polynomial], it was showed that a polynomial type large deviation (PLD) inequality universally follows from certain separation of the random field, such as the local asymptotic quadraticity of the random field, and estimates of easily tractable random variables. Since the PLD inequality is no longer a bottleneck of the program, the QLA applies to various complex random fields.

The QLA is a framework of statistical inference for stochastic processes. It features the polynomial type large deviation of the quasi likelihood random field. Through QLA, one can systematically derive limit theorems and precise tail probability estimates of the associated QLA estimators such as quasi maximum likelihood estimator (QMLE), quasi Bayesian estimator (QBE) and various adaptive estimators. The importance of such precise estimates of tail probability is well recognized in asymptotic decision theory, prediction, theory of information criteria for model selection, asymptotic expansion, etc. The QLA is rapidly expanding the range of its applications: for example, sampled ergodic diffusion processes (Yoshida [yoshida2011polynomial]), contrast-based information criterion for diffusion processes (Uchida [Uchida2010]), approximate self-weighted LAD estimation of discretely observed ergodic Ornstein-Uhlenbeck processes (Masuda [Masuda2010a]), jump diffusion processes Ogihara and Yoshida([OgiharaYoshida2011]), adaptive estimation for diffusion processes (Uchida and Yoshida [UchidaYoshida2012Adaptive]), adaptive Bayes type estimators for ergodic diffusion processes (Uchida and Yoshida [uchida2014adaptive]), asymptotic properties of the QLA estimators for volatility in regular sampling of finite time horizon (Uchida and Yoshida [UchidaYoshida2013]) and in non-synchronous sampling (Ogihara and Yoshida [ogihara2014quasi]), Gaussian quasi-likelihood random fields for ergodic Lévy driven SDE (Masuda [masuda2013convergence]), hybrid multi-step estimators (Kamatani and Uchida [KamataniUchida2014]), parametric estimation of Lévy processes (Masuda [masuda2015parametric]), ergodic point processes for limit order book (Clinet and Yoshida [clinet2015statistical]), a non-ergodic point process regression model (Ogihara and Yoshida [ogihara2015quasi]), threshold estimation for stochastic processes with small noise (Shimizu [shimizu2015threshold]), AIC for non-concave penalized likelihood method (Umezu et al. [umezu2015aic]), Schwarz type model comparison for LAQ models (Eguchi and Masuda [eguchi2016schwarz]), adaptive Bayes estimators and hybrid estimators for small diffusion processes based on sampled data (Nomura and Uchida [nomura2016adaptive]), moment convergence of regularized least-squares estimator for linear regression model (Shimizu [shimizu2017moment]), moment convergence in regularized estimation under multiple and mixed-rates asymptotics (Masuda and Shimizu [masuda2017moment]), asymptotic expansion in quasi likelihood analysis for volatility (Yoshida [Yoshida2017asymptoticexpansion]) among others.

As already mentioned, the PLD inequality is the key to the QLA. Once a PLD inequality is established, we can obtain a very strong mode of convergence of the random field and the associated estimators. However, in the present theory, boundedness of high order of moments of functionals is assumed. On the other hand, for example, if the statistical model has a component with a slow mixing rate, the Rosenthal inequality does not serve to validate the boundedness of moments of very high order. How do QMLE and QBE behave in such a situation? This question motivates us to introduce the partial quasi likelihood analysis (PQLA).

The aim of this short note is to formulate the PQLA and to exemplify it. The basic idea is conditioning by partial information. Easy to understand is a situation where there are two components of stochastic processes and has a fast mixing rate but has a slow mixing rate. Suppose that the Rosenthal inequality may control the moments of a functional of but cannot control the moments of a functional of . In this situation, we cannot apply the present QLA theory or the way of derivation of the PLD inequality to the random fields expressed by and . However, if there is a partial mixing structure in that possesses a very good mixing rate conditionally on , then we can apply a conditional version of the QLA theory for given . Even if has a bad mixing rate and its temporal impact on the system is unbounded, there is a possibility that we can recover limit theorems for the QLA estimators. Technically, a method of truncation is essential to detach the slow mixing component’s effects from the main body of the randomness.

Partial QLA naturally emerges in the structure of the partial mixing. The notion of partial mixing was used in Yoshida [yoshida2004partial] to derive asymptotic expansion of the distribution of an additive functional of the conditional -Markov process admitting a component with long-range dependency.

The organization of this note is as follows. Section 2 presents a frame of the partial quasi likelihood analysis. The asymptotic properties of the QMLE and QBE are provided there. The conditional polynomial type large deviation inequality is the key to the partial QLA. Section 3 gives a set of sufficient conditions for it. A conditional version of a Rosenthal type inequality is stated in Section 4. Section 5 illustrates a diffusion process having slow and fast mixing components. Statistics is ergodic in Section 5, while a non-ergodic statistical problem will be discussed in Section 6.

2 Partial quasi likelihood analysis

2.1 Quasi likelihood analysis

Given a probability space , we consider a sequence of random fields , , where is a subset of with , is a bounded domain in and is its closure. We assume that is -measurable and that the mapping is continuous for every . By convention, is simply denoted by .

The random field serves like the log likelihood function in the likelihood analysis, but does more. A measurable mapping is called a quasi maximum likelihood estimator (QMLE) if

for all . The mapping , the convex hull of , is defined by

and called the quasi Bayesian estimator (QBE) with respect to the prior density . We assume is continuous and satisfies . We call these estimators together quasi likelihood estimators.

The quasi likelihood analysis (QLA) is formulated with the random field

Here is the target value of in estimation and , where . The matrix satisfies as . It is possible to extend to so that the extension has a compact support and . We denote this extended random field by the same . Let . Then .

Consider -fields and such that . We introduce -measurable variables . These functionals are helpful to localize QLA.

2.2 Quasi maximum likelihood estimator

Let be a positive constant. We start with the so-called polynomial type large deviation inequality, which plays an essential role in the theory of QLA as in [yoshida2011polynomial]. Let . Let for . The modulus of continuity of is

Let and let

Let . Let be the set of sequences of numbers in such that for all and . Let be a sequence of -valued -measurable random variables.

[A1

] There exists a sequence of positive -measurable random variables such that a.s. and that

for every .

[A1

] For a sequence of positive numbers with and a sequence of positive random variables with a.s., it holds that

for every .

[A2

] as for every , and .

Remark 2.1.

The estimate of modulus of continuity is used only countable times to prove tightness.

We consider and its extension , that is, , and . Let be a -valued random variable defined on . 4

[A3

] (i) For any , , and any bounded -measurable random variable ,

(ii)

.

[A4

] With probability one, there exists a unique element that maximizes .

Remark: From [A3] (i), we can remove but keeping it explicitely is helpful in applications. We may assume is -measurable; the given mapping has a measurable version. The following theorems claim -conditional -stable convergence of and .

Theorem 2.2.

Suppose that , and are satisfied. Then

(2.1)

for any and any bounded -measurable random variable . In particular,

as .

Theorem 2.3.

Suppose that , , and are satisfied. Then

(2.2)

for any and any bounded -measurable random variable . In particular,

as .

Proof of Theorems 2.2 and 2.2. (a) We may assume that . Let

In view of [A3] (ii), we may show

(2.3)

for in order to show (2.1). Then, by subsequence argument, it suffices to show that for any sequence with , there exists a subsequence of such that (2.3) holds along . For , let be a countable subset of that determines probability measures on .

Let be a regular conditional distribution of on given . Let be a regular conditional distribution of given . Moreover let

for , and let

According to (ii) and , there exists a subsequence of such that a.s. and that

for all . Moreover, from , for , there exists an such that for all . Then

Thus, thanks to [A1], [A2] and [A3], there exist an event with and a subsequence of such that for any , the following conditions hold:

(i)

(ii)

(iii)

For every ,

as for all .

(iv)

as .

(v)

for all .

For , and , there exist such that

and

where

and

respectively. Let . Then is a compact set in and . Therefore the family of probability measures is tight since . Let be any subsequence of . Then there exist a subsequence of , depending on , and a probability measure on such that as . In particular,

as for every and every , . Therefore

for all , . Since all finite-dimensional marginal distributions coincide, . This implies as , and hence

(2.4)

for every . In particular, we obtain (2.3) along , which gives Theorem 2.2.
(b) We may assume . Consider sequences and in Step (a). Let with

where for . If satisfies , then the already obtained (2.4) yields

(2.5)

as , where

and

for . We notice that as well as is a probability measure by (v) of Part (a). The convergence (2.5) gives

for all , and hence

(2.6)

for all , since this is obvious when . By definition,

and

Therefore (2.6) implies

(2.7)

for all , and

(2.8)

for all , where means for all , and we used uniqueness of in the last part of each.

Denote by [resp. ] a regular conditional probability of [resp. ] given . From (2.7) and (2.8), there exists with such that

for all and all , and that

for all and all . If , then

(2.9)

for all and all , where the probability measures and on are given by

for and . For any continuity point of , we take with so that both are sufficiently close to , and apply (2.9) to conclude for such . Thus

(2.10)

for , with . In the case , it is obvious, so (2.10) holds for all . This concludes the proof of Theorem 2.2. ∎

Conditional type PLD provides convergence of the conditional moments of under truncation.

Theorem 2.4.

Suppose that and . Suppose that , , and are satisfied. Then (2.1) of Theorem 2.2 holds. Moreover

(2.11)

for any bounded -measurable random variable and any such that . In particular, as .

Remark 2.5.

The conditional expectation of a random variable is defined as the integral with respect to a regular conditional probability of given . If , then it coincides with the ordinary conditional expectation almost surely. However in general we do not assume nor in this article. We should be careful when applying the formula ; it is possible only when is well defined. The same remark applies to . On the other hand, each is bounded because is bounded, so in (2.11) is well defined in any sense.

Proof of Theorem 2.4. Let . We may assume along . Almost surely

where is a random variable bounding the right-hand side of the inequality of . The variable a.s. because . By the convergence (2.2) of Theorem 2.3, we have

for and some sequence , and then the conditional monotone convergence theorem gives

by letting