Asymptotic analysis for RCBAR processes

# Asymptotic results for random coefficient bifurcating autoregressive processes

Vassili Blandin Université Bordeaux 1 Université Bordeaux 1, Institut de Mathématiques de Bordeaux, UMR CNRS 5251, and INRIA Bordeaux, team ALEA, 351 cours de la libération, 33405 Talence cedex, France.
###### Abstract.

The purpose of this paper is to study the asymptotic behavior of the weighted least squares estimators of the unknown parameters of random coefficient bifurcating autoregressive processes. Under suitable assumptions on the immigration and the inheritance, we establish the almost sure convergence of our estimators, as well as a quadratic strong law and central limit theorems. Our study mostly relies on limit theorems for vector-valued martingales.

###### Key words and phrases:
bifurcating autoregressive process; random coefficient; weighted least squares; martingale; almost sure convergence; central limit theorem
###### 2010 Mathematics Subject Classification:
Primary 60F15; Secondary 60F05, 60G42

## 1. Introduction

In this paper, we will study random coefficient bifurcating autoregressive processes (RCBAR). Those processes are an adaptation of random coefficient autoregressive processes (RCAR) to binary tree structured data. We can also see those processes as the combination of RCAR processes and bifurcating autoregressive processes (BAR). RCAR processes have been first studied by Nicholls and Quinn [18, 19] while BAR processes have been first investigated by Cowan and Staudte [5]. Both inherited and environmental effects are taken into consideration in RCBAR processes in order to explain the evolution of the characteristic under study. The binary tree structure could lead us to take cell division as an example.

More precisely, the first-order RCBAR process is defined as follows. The initial cell is labelled and the offspring of the cell labelled are labelled and . Denote by the characteristic of individual . Then, the first-order RCBAR process is given, for all , by

 {X2n=anXn+ε2nX2n+1=bnXn+ε2n+1

The environmental effect is given by the driven noise sequence while the inherited effect is given by the random coefficient sequence . The cell division example leads us to consider that and are correlated since the environmental effect on two sister cells can reasonably be seen as correlated.

This study is inspired by experiments on the single celled organism Escherichia coli, see Stewart et al. [21] or Guyon et al. [10], which reproduces by dividing itself into two poles, one being called the new pole, the other being called the old pole. Experimental data seems to show that some variables among cell lines, such as the life span of the cells, does not evolve in the same way whether it is the new or the old pole. The difference in the evolution leads us to consider an asymmetric RCBAR. Considering a RCBAR process instead of a BAR process allows us to assume that the inherited effect is no more deterministic, as randomness often appears in nature. Moreover, we can consider both deterministic and random inherited effects since we also allow the random variables modeling the inherited effect to be deterministic, making this study usable for RCBAR as well as BAR.

This paper, which is an adaptation of [4] to RCBAR processes, intends to study the asymptotic behavior of the weighted least squares (WLS) estimators of first-order RCBAR processes using a martingale approach. This martingale approach has been first proposed by Bercu et al. [3] and de Saporta et al. [6] for BAR processes. The WLS estimation of parameters branching processes was previously investigated by Wei and Winnicki [24] and Winnicki [25]. We will make use several times of the strong law of large numbers [8] as well as the central limit theorem [8, 11] for martingales, in order to investigate the asymptotic behavior of the WLS estimators. Those theorems have been previously used by Basawa and Zhou [2, 26, 27].

Several approaches appeared for BAR processes, and we tried not to set aside any of them. Thus, we took into account the classical BAR studies as seen in Huggins and Basawa [13, 14] and Huggins and Staudte [15] who studied the evolution of cell diameters and lifetimes, and also the bifurcating Markov chain model introduced by Guyon [9] and used in Delmas and Marsalle [7]. Still, we did not forget to have a look to the analogy with the Galton-Watson processes as studied in Delmas and Marsalle [7] and Heyde and Seneta [12]. Several methods have also been used for parameter estimation in RCAR processes. Koul and Schick [17] used an M-estimator while Aue et al. [1] preferred a quasi-maximum likelihood approach. Schick [20] introduced a new class of estimator that Vanecek [22] used in his work. Hwang et al. [16] also tackled the critical case where the environmental effect follows a Rademacher distribution.

The paper is organized as follows. Section 2 allows us to explain more precisely the model in which we are interested in, then Section 3 formulates the WLS estimators of the unknown parameters we will study. Section 4 permits us to introduce the martingale point of view of this paper. The main results are collected in Section 5, those results concern the asymptotic behavior of our WLS estimators, to be more accurate, we will establish the almost sure convergence, the quadratic strong law and the asymptotic normality of our estimators. Finally, the other sections gathers the proofs of our main results, except the last section which illustrates our results with a small simulation study.

## 2. Random coefficient bifurcating autoregressive processes

Consider the first-order RCBAR process given, for all , by

 (2.1) {X2n=anXn+ε2nX2n+1=bnXn+ε2n+1

where the initial state is the ancestor of the process and stands for the driven noise of the process. In all the sequel, we shall assume that . We also assume that both and are i.i.d., and that those two sequences are independent. One can see the RCBAR process given by (2.1) as a first-order random coefficient autoregressive process on a binary tree, where each node represents an individual, node 1 being the original ancestor. For all , denote the -th generation by . In particular, is the initial generation and is the first generation of offspring from the first ancestor. Recall that the two offspring of individual are labelled and , or conversely, the mother of individual is where stands for the largest integer less than or equal to . Finally denote by

 Tn=n⋃k=0Gn

the sub-tree of all individuals from the original individual up to the -th generation. On can observe that the cardinality of is while that of is .

## 3. Weighted least-squares estimation

Denote by the natural filtration associated with the first-order RCBAR process, which means that is the -algebra generated by all individuals up to the -th generation, in other words . We will assume in all the sequel that, for all and for all ,

 (3.1) ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩E[ak|Fn]=aa.s.E[bk|Fn]=ba.s.E[ε2k|Fn]=ca.s.E[ε2k+1|Fn]=da.s.

Consequently, we deduce from (2.1) and (3.1) that, for all and for all ,

 (3.2) {X2k=aXk+c+V2k,X2k+1=bXk+d+V2k+1,

where, and . Therefore, the two relations given by (3.2) can be rewritten in a classic autoregressive form

 (3.3) χn=θtΦn+Wn

where

 χn=(X2nX2n+1),Φn=(Xn1),Wn=(V2nV2n+1),

and the matrix parameter

 θ=(abcd).

Our goal is to estimate from the observation of all individuals up to . We propose to make use of the WLS estimator of which minimizes

 Δn(θ)=12∑k∈Tn−11ck∥χk−θtΦk∥2

where the choice of the weighting sequence is crucial. We shall choose and we will go back to this suitable choice in Section 4. Consequently, we obviously have for all

 (3.4) ˆθn=S−1n−1∑k∈Tn−11ckΦkχtk,whereSn=∑k∈Tn1ckΦkΦtk.

In order to avoid useless invertibility assumption, we shall assume, without loss of generality, that for all , is invertible. Otherwise, we only have to add the identity matrix of order 2, to . In all what follows, we shall make a slight abuse of notation by identifying as well as to

 vec(θ)=⎛⎜ ⎜ ⎜⎝acbd⎞⎟ ⎟ ⎟⎠andvec(ˆθn)=⎛⎜ ⎜ ⎜ ⎜⎝ˆanˆcnˆbnˆdn⎞⎟ ⎟ ⎟ ⎟⎠.

Therefore, we deduce from (3.4) that

 ˆθn=Σ−1n−1∑k∈Tn−11ckvec(Φkχtk)=Σ−1n−1∑k∈Tn−11ck⎛⎜ ⎜ ⎜⎝XkX2kX2kXkX2k+1X2k+1⎞⎟ ⎟ ⎟⎠

where and stands for the standard Kronecker product. Consequently, (3.3) yields to

 ˆθn−θ =Σ−1n−1∑k∈Tn−11ck%vec(ΦkWtk), (3.5) =Σ−1n−1∑k∈Tn−11ck⎛⎜ ⎜ ⎜⎝XkV2kV2kXkV2k+1V2k+1⎞⎟ ⎟ ⎟⎠.

In all the sequel, we shall make use of the following moment hypotheses.

1. For all ,

 E[a2k]<1andE[b2k]<1.
2. For all and for all

 Var[ak|Fn]=σ2a≥0andVar[bk|Fn]=σ2b≥0a.s.%
 Var[ε2k|Fn]=σ2c>0andVar[ε2k+1|Fn]=σ2d>0a.s.
3. For all and for all , if , and are conditionally independent given and for all , if , and are conditionally independent given . While otherwise, it exists and such that, for all

 E[(ε2k−c)(ε2k+1−d)|Fn]=ρcd a.s.
 E[(ak−a)(bk−b)|Fn]=ρaba.s.
4. One can find , , and such that, for all and for all

 E[(ak−a)4|Fn]=μ4aandE[(bk−b)4|Fn]=μ4ca.s.
 E[(ε2k−c)4|Fn]=μ4candE[(ε2k+1−d)4|Fn]=μ4da.s.%

In addition, it exists and such that, for all

5. It exists such that

 supn≥0supk∈GnE[|ak−a|α|Fn]<∞,supn≥0supk∈GnE[|bk−b|α|Fn]<∞a.s.

One can observe that those hypotheses allows us to consider the deterministic case where it exists some constants , with such that, for all , and a.s. Moreover, under assumption (H.2), we have for all and for all

 (3.6) E[V22k|Fn]=σ2aX2k+σ2c and E[V22k+1|Fn]=σ2bX2k+σ2d a.s.

Consequently, if we choose for all , we clearly have for all

 E[V22k∣∣Fn]≤max(σ2a,σ2c)ck and E[V22k+1∣∣Fn]≤max(σ2b,σ2d)ck a.s.

It is exactly the reason why we have chosen this weighting sequence into (3.4). Similar WLS estimation approach for branching processes with immigration may be found in [24] and [25]. For all and for all , denote . We deduce from (3.6) that for all , where is defined by

It leads us to estimate the vector of variances by the WLS estimator

 (3.7) ˆηn=Q−1n−1∑k∈Tn−11dkˆV22kψk,whereQn=∑k∈Tn1dkψkψtk

and for all ,

 {ˆV2k=X2k−ˆanXk−ˆcn,ˆV2k+1=X2k+1−ˆbnXk−ˆdn.

Finally the weighting sequence is given, for all , by . This choice is due to the fact that for all and for all

 E[v22k|Fn] =E[V42k|Fn]−(E[V22k|Fn])2 a.s. =(μ4a−σ4a)X4k+4σ2aσ2cX2k+(μ4c−σ4c)a.s.

Consequently, as , we clearly have for all and for all

 E[v22k|Fn]≤max(μ4a−σ4a,2σ2aσ2c,μ4c−σ4c)dk a.s.

We have a similar WLS estimator of the vector of variances

 ζt=(σ2bσ2d)

by replacing by into (3.7). Let us remark that, for all and for all ,

 (3.8) E[V2kV2k+1|Fn]=ρabX2n+ρcd.

Then, for all and for all , denote . We deduce from (3.8) that for all , where is defined by

 ν=(ρabρcd).

It leads us to estimate the vector of covariances by the WLS estimator

 (3.9) ˆνn=Q−1n−1∑k∈Tn−11dkˆV2kˆV2k+1ψk.

This choice is due to the fact that for all and for all

 E[V22kV22k+1|Fn]=ν2abX4k+(σ2aσ2d+4ρabρcd+σ2bσ2c)X2k+ν2cda.s.

Consequently, as , we clearly have for all and for all

 E[w22k|Fn] =(ν2ab−ρ2ab)X4k+(σ2aσ2d+σ2bσ2c+2ρabρcd)X2k+(ν2cd−ρ2cd)a.s. ≤max(ν2ab,ν2cd,(σ2a+σ2c)(σ2b+σ2d))dk a.s.

## 4. A martingale approach

In order to establish all the asymptotic properties of our estimators, we shall make use of a martingale approach. For all , denote

 Mn=∑k∈Tn−11ck⎛⎜ ⎜ ⎜⎝XkV2kV2kXkV2k+1V2k+1⎞⎟ ⎟ ⎟⎠.

We can clearly rewrite (3.5) as

 (4.1) ˆθn−θ=Σ−1n−1Mn.

As in [3], we make use of the notation since it appears that is a martingale. This fact is a crucial point of our study and it justifies the vector notation since most of all asymptotic results for martingales were established for vector-valued martingales. Let us rewrite in order to emphasize its martingale quality. Let where is the matrix of dimension given by

It represents the individuals of the -th generation which is also the collection of all where belongs to . Let be the random vector of dimension

 ξtn=(V2n√c2n−1V2n+2√c2n−1+1…V2n+1−2√c2n−1V2n+1√c2n−1V2n+3√c2n−1+1…V2n+1−1√c2n−1).

The vector gathers the noise variables of . The special ordering separating odd and even indices has been made in [3] so that can be written as

 Mn=n∑k=1Ψk−1ξk.

Under (3.1), we clearly have for all , a.s. and is -measurable. In addition it is not hard to see that under (H.1) to (H.2), is a locally square integrable vector martingale with increasing process given, for all , by

 (4.2) ⟨M⟩n =n−1∑k=0ΨkE[ξk+1ξtk+1|Fk]Ψtk=n−1∑k=0Lka.s.

where

 (4.3) Lk=∑i∈Gk1c2i(P(Xi)Q(Xi)Q(Xi)R(Xi))⊗(X2iXiXi1).

with

 ⎧⎪⎨⎪⎩P(X)=σ2aX2+σ2c,Q(X)=ρabX2+ρcd,R(X)=σ2bX2+σ2d.

One can remark that we obviously have but it is necessary to establish the convergence of , properly normalized, in order to prove the asymptotic results for our RCBAR estimators , , and .

## 5. Main results

We have to introduce some more notations in order to state our main results. From the original process , we shall define a new process recursively defined by , and if with , then

 Yn+1=X2k+κn

where is a sequence of i.i.d. random variables with Bernoulli distribution. Such a construction may be found in [9] for the asymptotic analysis of BAR processes. The process gathers the values of the original process along the random branch of the binary tree given by . Denote by the unique such that . Then, for all , we have

 (5.1) Yn+1=˜an+1Yn+en+1

where, with the unique number such that ,

 (5.2) ˜an+1={akn if κn=0,bkn otherwise,anden=εkn.
###### Lemma 5.1.

Assume that (H.1) and (H.2) are satisfied. Then, we have

 YnL% ⟶T

where is a positive non degenerate random variable with .

Denote .

###### Lemma 5.2.

Assume that (H.1) and (H.2) are satisfied. Then, for all , we have

 limn→∞1|Tn|∑k∈Tnf(Xk)=E[f(T)]a.s.
###### Proposition 5.3.

Assume that (H.1) to (H.3) are satisfied. Then, we have

 (5.3)

where is the positive definite matrix given by

 L=E[1(1+T2)2(P(T)Q(T)Q(T)R(T))⊗(T2TT1)].

Our first result deals with the almost sure convergence of our WLS estimator .

###### Theorem 5.4.

Assume that (H.1) to (H.5) satisfied. Then, converges almost surely to with the rate of convergence

 ∥ˆθn−θ∥2=O(n|Tn−1|) a.s.

 (5.4) limn→∞1nn∑k=1|Tk−1|(ˆθk−θ)tΛ(ˆθk−θ)=tr(Λ−1/2LΛ−1/2) a.s.

where

 (5.5) Λ=I2⊗C and C=E[11+T2(T2TT1)].

Our second result concerns the almost sure asymptotic properties of our WLS variance and covariance estimators , and . Let

 ηn=Q−1n−1∑k∈Tn−11dkV22kψk,ζn=Q−1n−1∑k∈Tn−11dkV22k+1ψk,
 νn=Q−1n−1∑k∈Tn−11dkV2kV2k+1ψk.
###### Theorem 5.5.

Assume that (H.1) to (H.5) are satisfied. Then, and converge almost surely to and respectively. More precisely,

 (5.6) ∥ˆηn−ηn∥ (5.7) ∥ˆζn−ζn∥

In addition, converges almost surely to with

 (5.8) ∥ˆνn−νn∥=O(n|Tn−1|) a.s.
###### Remark 5.6.

We also have the almost sure rates of convergence

 ∥ˆηn−η∥2=O(n|Tn−1|),  ∥ˆζn−ζ∥2=O(n|Tn−1|),  ∥ˆνn−ν∥2=O(n|Tn−1|)   a.s.

Our last result is devoted to the asymptotic normality of our WLS estimators , , and .

###### Theorem 5.7.

Assume that (H.1) to (H.5) are satisfied. Then, we have the asymptotic normality

 (5.9)

 (5.10) √|Tn−1|(ˆηn−η)L⟶N(0,D−1MacD−1), (5.11) √|Tn−1|(ˆζn−ζ)L⟶N(0,D−1MbdD−1),

where

 D=E[1(1+T2)2(T4T2T21)],

Finally,

 (5.12) √|Tn−1|(ˆνn−ν)L⟶N(0,D−1HD−1)

where

 H=E[(ν2ab−ρ2ab)T4+(σ2aσ2d+σ2bσ2c+2ρabρcd)T2+(ν2cd−ρ2cd)(1+T2)4(T4T2T21)].

The rest of the paper is dedicated to the proof of our main results.

## 6. Proof of Lemma 5.1

We can reformulate (5.1) and (5.2) as

 Yn=˜an˜an−1…˜a2Y1+n−1∑k=2˜an˜an−1…˜ak+1ek+en.

We already made the assumption that both and are i.i.d. and that those two sequences are independent. Consequently, the couples and share the same distribution. Hence, for all , has the same distribution than the random variable

 Zn =˜a2…˜anY1+n−1∑k=2˜a2˜a3…˜an−k+1en−k+2+e2, =˜a2…˜anY1+n∑k=3˜a2˜a3…˜ak−1ek+e2.

For the sake of simplicity, we will denote

 (6.1) Zn=˜a2…˜anY1+n∑k=2˜a2˜a3…˜ak−1ek.

On the first hand, and since

 |E[˜a2]|=∣∣∣a+b2∣∣∣<1

 limn→∞˜a2˜a3…˜anY1=0a.s.

On the other hand, let be defined as

 Tn=n∑k=2˜a2˜a3…˜ak−1ek

and given by

 T=∞∑k=2˜a2˜a3…˜ak−1ek.

We have

 E[|T−Tn|] =E[∣∣ ∣∣∞∑k=n+1˜a2˜a3…˜ak−1ek∣∣ ∣∣], ≤∞∑k=n+1E[|˜a2˜a3…˜ak−1ek|],