Partial quasi likelihood analysis^{1}
Summary The quasi likelihood analysis is generalized to the partial quasi likelihood analysis.
Limit theorems for the quasi likelihood estimators, especially the quasi Bayesian estimator, are derived
in the situation where existence of a slow mixing component prohibits the Rosenthal type inequality from applying to the derivation of
the polynomial type large deviation inequality for the statistical random field.
We give two illustrative examples.
Keywords and phrases Partial quasi likelihood analysis, large deviation,
quasi maximum likelihood estimator, quasi Bayesian estimator, mixing, partial mixing.
1 Introduction
The IbragimovHas’minskii theory enhanced the asymptotic decision theory by Le Cam and Hájek by convergence of the likelihood ratio random field, and was programed by Kutoyants to statistical inference for semimartingales. The core of the theory is the large deviation inequality for the associated likelihood ratio random field. Asymptotic properties of the likelihood estimators are deduced from those of the likelihood ratio random field. Precise estimates of the tail probability and hence convergence of moments of the estimators follow in a unified manner once such a strong mode of convergence of the likelihood ratio random field is established. For details, see Ibragimov and Has’minskii [IbragimovHascprimeminskiui1972, IbragimovHascprimeminskiui1973, IbragimovHascprimeminskiui1981] and Kutoyants [Kutoyants1984, Kutoyants1994, Kutoyants1998, Kutoyants2004].
The quasi likelihood analysis
(QLA) descended from the IbragimovHas’minskiiKutoyants program.
The QLA is a framework of statistical inference for stochastic processes. It features the polynomial type large deviation of the quasi likelihood random field. Through QLA, one can systematically derive limit theorems and precise tail probability estimates of the associated QLA estimators such as quasi maximum likelihood estimator (QMLE), quasi Bayesian estimator (QBE) and various adaptive estimators. The importance of such precise estimates of tail probability is well recognized in asymptotic decision theory, prediction, theory of information criteria for model selection, asymptotic expansion, etc. The QLA is rapidly expanding the range of its applications: for example, sampled ergodic diffusion processes (Yoshida [yoshida2011polynomial]), contrastbased information criterion for diffusion processes (Uchida [Uchida2010]), approximate selfweighted LAD estimation of discretely observed ergodic OrnsteinUhlenbeck processes (Masuda [Masuda2010a]), jump diffusion processes Ogihara and Yoshida([OgiharaYoshida2011]), adaptive estimation for diffusion processes (Uchida and Yoshida [UchidaYoshida2012Adaptive]), adaptive Bayes type estimators for ergodic diffusion processes (Uchida and Yoshida [uchida2014adaptive]), asymptotic properties of the QLA estimators for volatility in regular sampling of finite time horizon (Uchida and Yoshida [UchidaYoshida2013]) and in nonsynchronous sampling (Ogihara and Yoshida [ogihara2014quasi]), Gaussian quasilikelihood random fields for ergodic Lévy driven SDE (Masuda [masuda2013convergence]), hybrid multistep estimators (Kamatani and Uchida [KamataniUchida2014]), parametric estimation of Lévy processes (Masuda [masuda2015parametric]), ergodic point processes for limit order book (Clinet and Yoshida [clinet2015statistical]), a nonergodic point process regression model (Ogihara and Yoshida [ogihara2015quasi]), threshold estimation for stochastic processes with small noise (Shimizu [shimizu2015threshold]), AIC for nonconcave penalized likelihood method (Umezu et al. [umezu2015aic]), Schwarz type model comparison for LAQ models (Eguchi and Masuda [eguchi2016schwarz]), adaptive Bayes estimators and hybrid estimators for small diffusion processes based on sampled data (Nomura and Uchida [nomura2016adaptive]), moment convergence of regularized leastsquares estimator for linear regression model (Shimizu [shimizu2017moment]), moment convergence in regularized estimation under multiple and mixedrates asymptotics (Masuda and Shimizu [masuda2017moment]), asymptotic expansion in quasi likelihood analysis for volatility (Yoshida [Yoshida2017asymptoticexpansion]) among others.
As already mentioned, the PLD inequality is the key to the QLA. Once a PLD inequality is established, we can obtain a very strong mode of convergence of the random field and the associated estimators. However, in the present theory, boundedness of high order of moments of functionals is assumed. On the other hand, for example, if the statistical model has a component with a slow mixing rate, the Rosenthal inequality does not serve to validate the boundedness of moments of very high order. How do QMLE and QBE behave in such a situation? This question motivates us to introduce the partial quasi likelihood analysis (PQLA).
The aim of this short note is to formulate the PQLA and to exemplify it. The basic idea is conditioning by partial information. Easy to understand is a situation where there are two components of stochastic processes and has a fast mixing rate but has a slow mixing rate. Suppose that the Rosenthal inequality may control the moments of a functional of but cannot control the moments of a functional of . In this situation, we cannot apply the present QLA theory or the way of derivation of the PLD inequality to the random fields expressed by and . However, if there is a partial mixing structure in that possesses a very good mixing rate conditionally on , then we can apply a conditional version of the QLA theory for given . Even if has a bad mixing rate and its temporal impact on the system is unbounded, there is a possibility that we can recover limit theorems for the QLA estimators. Technically, a method of truncation is essential to detach the slow mixing component’s effects from the main body of the randomness.
Partial QLA naturally emerges in the structure of the partial mixing. The notion of partial mixing was used in Yoshida [yoshida2004partial] to derive asymptotic expansion of the distribution of an additive functional of the conditional Markov process admitting a component with longrange dependency.
The organization of this note is as follows. Section 2 presents a frame of the partial quasi likelihood analysis. The asymptotic properties of the QMLE and QBE are provided there. The conditional polynomial type large deviation inequality is the key to the partial QLA. Section 3 gives a set of sufficient conditions for it. A conditional version of a Rosenthal type inequality is stated in Section 4. Section 5 illustrates a diffusion process having slow and fast mixing components. Statistics is ergodic in Section 5, while a nonergodic statistical problem will be discussed in Section 6.
2 Partial quasi likelihood analysis
2.1 Quasi likelihood analysis
Given a probability space , we consider a sequence of random fields , , where is a subset of with , is a bounded domain in and is its closure. We assume that is measurable and that the mapping is continuous for every . By convention, is simply denoted by .
The random field serves like the log likelihood function in the likelihood analysis, but does more. A measurable mapping is called a quasi maximum likelihood estimator (QMLE) if
for all . The mapping , the convex hull of , is defined by
and called the quasi Bayesian estimator (QBE) with respect to the prior density . We assume is continuous and satisfies . We call these estimators together quasi likelihood estimators.
The quasi likelihood analysis (QLA) is formulated with the random field
Here is the target value of in estimation and , where . The matrix satisfies as . It is possible to extend to so that the extension has a compact support and . We denote this extended random field by the same . Let . Then .
Consider fields and such that . We introduce measurable variables . These functionals are helpful to localize QLA.
2.2 Quasi maximum likelihood estimator
Let be a positive constant. We start with the socalled polynomial type large deviation inequality, which plays an essential role in the theory of QLA as in [yoshida2011polynomial]. Let . Let for . The modulus of continuity of is
Let and let
Let . Let be the set of sequences of numbers in such that for all and . Let be a sequence of valued measurable random variables.
 [A1

] There exists a sequence of positive measurable random variables such that a.s. and that
for every .
 [A1

] For a sequence of positive numbers with and a sequence of positive random variables with a.s., it holds that
for every .
 [A2

] as for every , and .
Remark 2.1.
The estimate of modulus of continuity is used only countable times to prove tightness.
We consider and its extension , that is,
, and .
Let be a valued random variable defined on
.
 [A3

] (i) For any , , and any bounded measurable random variable ,
 (ii)

.
 [A4

] With probability one, there exists a unique element that maximizes .
Remark: From [A3] (i), we can remove but keeping it explicitely is helpful in applications. We may assume is measurable; the given mapping has a measurable version. The following theorems claim conditional stable convergence of and .
Theorem 2.2.
Suppose that , and are satisfied. Then
(2.1) 
for any and any bounded measurable random variable . In particular,
as .
Theorem 2.3.
Suppose that , , and are satisfied. Then
(2.2) 
for any and any bounded measurable random variable . In particular,
as .
Proof of Theorems 2.2 and 2.2. (a) We may assume that . Let
In view of [A3] (ii), we may show
(2.3) 
for in order to show (2.1). Then, by subsequence argument, it suffices to show that for any sequence with , there exists a subsequence of such that (2.3) holds along . For , let be a countable subset of that determines probability measures on .
Let be a regular conditional distribution of on given . Let be a regular conditional distribution of given . Moreover let
for , and let
According to (ii) and , there exists a subsequence of such that a.s. and that
for all . Moreover, from , for , there exists an such that for all . Then
Thus, thanks to [A1], [A2] and [A3], there exist an event with and a subsequence of such that for any , the following conditions hold:
 (i)

 (ii)

 (iii)

For every ,
as for all .
 (iv)

as .
 (v)

for all .
For , and , there exist such that
and
where
and
respectively. Let . Then is a compact set in and . Therefore the family of probability measures is tight since . Let be any subsequence of . Then there exist a subsequence of , depending on , and a probability measure on such that as . In particular,
as for every and every , . Therefore
for all , . Since all finitedimensional marginal distributions coincide, . This implies as , and hence
(2.4) 
for every .
In particular, we obtain (2.3) along , which gives
Theorem 2.2.
(b) We may assume .
Consider sequences and in Step (a).
Let with
where for . If satisfies , then the already obtained (2.4) yields
(2.5) 
as , where
and 
for . We notice that as well as is a probability measure by (v) of Part (a). The convergence (2.5) gives
for all , and hence
(2.6)  
for all , since this is obvious when . By definition,
and
Therefore (2.6) implies
(2.7)  
for all , and
(2.8)  
for all , where means for all , and we used uniqueness of in the last part of each.
Denote by [resp. ] a regular conditional probability of [resp. ] given . From (2.7) and (2.8), there exists with such that
for all and all , and that
for all and all . If , then
(2.9)  
for all and all , where the probability measures and on are given by
for and . For any continuity point of , we take with so that both are sufficiently close to , and apply (2.9) to conclude for such . Thus
(2.10) 
for , with . In the case , it is obvious, so (2.10) holds for all . This concludes the proof of Theorem 2.2. ∎
Conditional type PLD provides convergence of the conditional moments of under truncation.
Theorem 2.4.
Remark 2.5.
The conditional expectation of a random variable is defined as the integral with respect to a regular conditional probability of given . If , then it coincides with the ordinary conditional expectation almost surely. However in general we do not assume nor in this article. We should be careful when applying the formula ; it is possible only when is well defined. The same remark applies to . On the other hand, each is bounded because is bounded, so in (2.11) is well defined in any sense.
Proof of Theorem 2.4. Let . We may assume along . Almost surely
where is a random variable bounding the righthand side of the inequality of . The variable a.s. because . By the convergence (2.2) of Theorem 2.3, we have
for and some sequence , and then the conditional monotone convergence theorem gives
by letting