An estimator for the quadratic covariation of asynchronously observed Itô processes with noise: Asymptotic distribution theory

# An estimator for the quadratic covariation of asynchronously observed Itô processes with noise: Asymptotic distribution theory

Markus Bibinger111Financial support from the Deutsche Forschungsgemeinschaft via SFB 649 ‘Ökonomisches Risiko’, Humboldt-Universität zu Berlin, is gratefully acknowledged. Institut für Mathematik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
###### Abstract

The article is devoted to the nonparametric estimation of the quadratic covariation of non-synchronously observed Itô processes in an additive microstructure noise model. In a high-frequency setting, we aim at establishing an asymptotic distribution theory for a generalized multiscale estimator including a feasible central limit theorem with optimal convergence rate on convenient regularity assumptions. The inevitably remaining impact of asynchronous deterministic sampling schemes and noise corruption on the asymptotic distribution is precisely elucidated. A case study for various important examples, several generalizations of the model and an algorithm for the implementation warrant the utility of the estimation method in applications.

###### keywords:
non-synchronous observations, microstructure noise, integrated covolatility, multiscale estimator, stable limit theorem
MSC Classification: 62M10, 62G05, 62G20, 91B84
JEL Classification: C14, C32, C58, G10
\newdefinition

remarkRemark \newdefinitiondefiDefinition

## 1 Introduction

The nonparametric estimation of the univariate quadratic variation of a latent semimartingale from n observations in a high-frequency setting with additive observation noise has been studied intensively in recent years. It is known from Gloter and Jacod (2001) that n^{\nicefrac{{1}}{{4}}} constitutes a lower bound for the rate of convergence. An important motivation which has stimulated an alliance of economists and statisticians to establish estimation techniques for this kind of latent semimartingale models is their utility in estimating daily integrated (co-)volatilities from high-frequency intraday returns that serve as a basis for risk management as well as portfolio optimization and hedging strategies. The last years have seen an enormous increase of the amount of trading activities for many liquid securities. Paradoxically, the availability of high-frequency data necessitated a new angle on financial modeling. In fact, for every semimartingale the discrete realized (co-)volatilities converge in probability to the integrated measures. However, realized volatilities of typical high-frequency financial time series data explode for very high frequencies. This effect is ascribed to market microstructure frictions. Sources of the market microstructure noise are manifold. One important role plays the occurrence of bid-ask spreads. Aside from that transaction costs, strategic trading, limited market depths and discreteness of prices spread out the structure of the long-run dynamics that can be characterized by semimartingales.
This strand of literature followed Zhang et al. (2005) that has attracted a lot of attention to this estimation problem. The so-called two-scales realized volatility by Zhang et al. (2005) is based on subsampling and a bias-correction and a stable central limit theorem with n^{\nicefrac{{1}}{{6}}}-rate has been proved. A refinement of the subsample approach using multiple scales in Zhang (2006) and related alternative techniques in Barndorff-Nielsen et al. (2008a), Podolskij and Vetter (2009) and Xiu (2010) have led to rate-optimal estimators and feasible stable central limit theorems. For the more specific nonparametric model with Gaussian noise, asymptotic equivalence in the Le Cam sense to a Gaussian shift experiment is shown in Reiß (2011) and an asymptotically efficient estimator whose asymptotic variance equals the parametric efficiency bound is constructed.
In the article on hand we are concerned with a multivariate stetting and apart from taking additive microstructure noise into account, we we focus on a way to deal with non-synchronous observation schemes. This is also a central theme in financial applications. When realized covolatilities are calculated for fixed time distances and a previous-tick interpolation is applied, the phenomenon of the so-called Epps effect described in Epps (1979) appears that the realized covolatility tends to zero at the highest frequencies.
A methodology to deal with non-synchronous observations in a bivariate Itô processes model has been proposed by Hayashi and Yoshida (2005). The so-called Hayashi-Yoshida estimator has superseded simpler previous-tick interpolation methods setting the standard for the estimation of the quadratic covariation from asynchronous observations in the absence of microstructure noise effects.
Our estimation approach, first proposed in Bibinger (2011b), for the most general case in the presence of noise and non-synchronicity arises as a combination of the multiscale estimator to handle noise contamination on the one hand and a synchronization algorithm in accordance with the Hayashi-Yoshida estimator to cope with non-synchronicity on the other hand. A first attempt in the same direction, combining one-scale subsampling and the Hayashi-Yoshida estimator, has been given in Palandri (2006).
In Bibinger (2011b) it has been shown in the spirit of Gloter and Jacod (2001) that the optimal convergence rate n^{\nicefrac{{1}}{{4}}} carries over to the general multidimensional setup. The mathematical analysis of our generalized multiscale estimator in Bibinger (2011b) shows that it is rate-optimal.
Alternative approaches to similar statistical models has been suggested by Barndorff-Nielsen et al. (2008b), Christensen et al. (2010) and Aït-Sahalia et al. (2010). In Barndorff-Nielsen et al. (2008b) a kernel-based method with a previous-tick interpolation to so-called refresh times is proposed and a stable central limit theorem with sub-optimal n^{\nicefrac{{1}}{{5}}}-rate is established for a multivariate non-synchronous design. This estimator, furthermore, ensures that the estimated covariance matrix is positive semi-definite. Christensen et al. (2010) and Aït-Sahalia et al. (2010) come up with combinations of pre-averaging (Podolskij and Vetter (2009),Jacod et al. (2009)) and the Hayashi-Yoshida estimator and of the univariate quasi-maximum-likelihood method by Xiu (2010), the polarization identity and a generalized synchronization scheme which is different from the Hayashi-Yoshida ansatz that we use, respectively, both also attaining the optimal rate.
In this article we aim at providing an asymptotic distribution theory for the generalized multiscale estimator. In distinction from alternative methods, the influence of non-synchronicity effects on the expectation is null and on the variance limited up to an interaction of interpolation steps and microstructure noise. The main result is a feasible stable central limit theorem for its estimation error with optimal rate and a closed-form asymptotic variance that does not hinge on interpolation errors in the signal term. The stable weak convergence of the estimation error to a centred mixed Gaussian limit and the consistent estimation of the random unknown asymptotic variances are the essential steps towards statistical inference and confidence sets. The theory is grounded on stable limit theorems for semimartingales from Jacod (1997).
In Section 2 we present the model and our main findings. Section 3 comes up with a concise overview on the construction of the estimator and in Section 4 we develop the asymptotic theory. In Section 5 we propose a consistent estimator for the asymptotic variance and Section 6 comprises various extensions and and a concluding discussion. The proofs are postponed to the Appendix.

## 2 Model and key result

The considered statistical model of noisy latently observed Itô processes at deterministic observation times is precisely described by Assumptions 1-3 in this section.

###### Assumption 1 (efficient processes).

On a filtered probability space \left(\Omega,\mathcal{F},\left(\mathcal{F}_{t}\right),\mathbb{P}\right), the efficient processes X=(X_{t})_{t\in\mathds{R}^{+}} and Y=(Y_{t})_{t\in\mathds{R}^{+}} are Itô processes defined by the following stochastic differential equations:

 \displaystyle dX_{t} \displaystyle=\mu_{t}^{X}\,dt+\sigma_{t}^{X}\,dB_{t}^{X}~{}, \displaystyle dY_{t} \displaystyle=\mu_{t}^{Y}\,dt+\sigma_{t}^{Y}\,dB_{t}^{Y}~{},

with two \left(\mathcal{F}_{t}\right)–adapted standard Brownian motions B^{X} and B^{Y} and \rho_{t}\,dt=d\left[B^{X},B^{Y}\right]_{t}. The drift processes \mu_{t}^{X} and \mu_{t}^{Y} are \left(\mathcal{F}_{t}\right)–adapted locally bounded stochastic processes and the spot volatilities \sigma_{t}^{X} and \sigma_{t}^{Y} and \rho_{t} are assumed to be \left(\mathcal{F}_{t}\right)–adapted with continuous paths. We assume strictly positive volatilities and the Novikov condition \mathbb{E}\left[\exp{\left((1/2)\int_{0}^{T}(\mu^{\,\cdot\,}/\sigma^{\,\cdot\,% })^{2}_{t}\,dt\right)}\right]<\infty for X and Y.

###### Assumption 2 (observations).

The deterministic observation schemes \mathcal{T}^{X,n}=\{0\leq t_{0}^{(n)}<t_{1}^{(n)}<\ldots<t_{n}^{(n)}\leq T\} of X and \mathcal{T}^{Y,m}=\{0\leq\tau_{0}^{(m)}<\tau_{1}^{(m)}<\ldots<\tau_{m}^{(m)}% \leq T\} of Y are assumed to be regular in the following sense: There exists a constant 0<\alpha\leq 1/9 such that

 \displaystyle\delta_{n}^{X} \displaystyle=\sup_{i\in\{1,\ldots,n\}}{\left(\left(t_{i}^{(n)}-t_{i-1}^{(n)}% \right),t_{0}^{(n)},T-t_{n}^{(n)}\right)}~{}\;\,=\mathcal{O}\left(n^{-% \nicefrac{{8}}{{9}}-\alpha}\right)~{}, (1a) \displaystyle\delta_{m}^{Y} \displaystyle=\sup_{j\in\{1,\ldots,m\}}{\left(\left(\tau_{j}^{(m)}-\tau_{j-1}^% {(m)}\right),\tau_{0}^{(m)},T-\tau_{m}^{(m)}\right)}=\mathcal{O}\left(m^{-% \nicefrac{{8}}{{9}}-\alpha}\right)~{}. (1b)

We consider asymptotics where the number of observations of X and Y are assumed to be of the same asymptotic order n=\mathcal{O}(m) and m=\mathcal{O}(n) and express that shortly by n\sim m. The efficient processes X and Y which satisfy Assumption 1 are discretely observed at the times \mathcal{T}^{X,n} and \mathcal{T}^{Y,m} with additive observation noise:

 \displaystyle\tilde{X}_{t_{i}^{(n)}}=\int_{0}^{t_{i}^{(n)}}\mu_{t}^{X}\,dt+% \int_{0}^{t_{i}^{(n)}}\sigma_{t}^{X}\,dB_{t}^{X}+\epsilon_{t_{i}^{(n)}}^{X}~{}% ,0\leq i\leq n~{}, (2a) \displaystyle\tilde{Y}_{\tau_{j}^{(m)}}=\int_{0}^{\tau_{j}^{(m)}}\mu_{t}^{Y}\,% dt+\int_{0}^{\tau_{j}^{(m)}}\sigma_{t}^{Y}\,dB_{t}^{Y}+\epsilon_{\tau_{j}^{(m)% }}^{Y}~{},0\leq j\leq m~{}. (2b)

Although we consider sequences of deterministic observation times, the case of random sampling that is independent of the observed processes is included when regarding the conditional law.
It turns out that it is accurate to prove the key result of the article on the following i. i. d. assumption on the microstructure noise since a closed-form expression for the asymptotic variance is not available for a combination of general asynchronous observation schemes and serially dependent observation errors. Since an extension to non-i. i. d. noise is crucial for the utility in financial applications, we comment on the robustness of our estimator to that case in Section 6.

###### Assumption 3 (microstructure noise).

The discrete microstructure noise processes

 \epsilon_{t_{i}^{(n)}}^{X},\epsilon_{\tau_{j}^{(m)}}^{Y},0\leq i\leq n,0\leq j% \leq m~{}.

are centred i. i. d. , independent of each other and independent of the efficient processes X and Y. We assume that the observation errors have finite fourth moments and denote the variances

 \displaystyle\eta_{X}^{2}=\mathbb{V}\hskip-1.422638pt\textnormal{a r}\left(% \epsilon_{t_{1}^{(n)}}^{X}\right)~{},~{}\eta_{Y}^{2}=\mathbb{V}\hskip-1.422638% pt\textnormal{a r}\left(\epsilon_{\tau_{1}^{(m)}}^{Y}\right)~{}.

The number of synchronized observations N\sim n\sim m which appears in the rate of our feasible stable central limit theorem is introduced in Section 3.

###### Theorem 1 (feasible stable central limit theorem).

The generalized multiscale estimator (12) specified by the later given weights (30), with M_{N}=c_{multi}\cdot\sqrt{N} converges on the Assumptions 1, 2, 3 and further mild regularity conditions on the asymptotics of the sampling schemes, stated below in Assumptions 4 and 5, \mathcal{F}-stably in law with optimal rate N^{\nicefrac{{1}}{{4}}}\sim n^{\nicefrac{{1}}{{4}}}\sim m^{\nicefrac{{1}}{{4}}} to a mixed Gaussian limiting distribution:

 \displaystyle N^{\nicefrac{{1}}{{4}}}\left(\widehat{\left[X,Y\right]}_{T}^{% multi}-\left[X,Y\right]_{T}\right)\lx@stackrel{{\scriptstyle st}}{{% \rightsquigarrow}}\mathbf{N}\left(0,\operatorname{\mathbf{AVAR}}_{multi}\right)

with a almost surely finite random asymptotic variance given in (18) in Theorem 3. With the consistent estimator for the asymptotic variances \widehat{\operatorname{\mathbf{AVAR}}}_{multi} in Proposition 2 , the feasible central limit theorem

 \displaystyle n^{\nicefrac{{1}}{{4}}}\frac{\left(\widehat{\left[X,Y\right]}_{T% }^{multi}-\left[X,Y\right]_{T}\right)}{\widehat{\operatorname{\mathbf{AVAR}}}_% {multi}} \displaystyle\lx@stackrel{{\scriptstyle st}}{{\rightsquigarrow}}\mathbf{N}(0,1% )~{}, (3)

holds true.

The notion of stable weak convergence going back to Rényi (1963) is essential for our asymptotic theory. Stable weak convergence X_{n}\lx@stackrel{{\scriptstyle st}}{{\rightsquigarrow}}X is the joint weak convergence of (X_{n},Z) to (X,Z) for every measurable bounded random variable Z. The limiting random variables in stable limit theorems are defined on extensions of the original underlying probability spaces. The reason for us to involve this concept of a stronger mode of weak convergence is that mixed normal limiting distributions are derived where asymptotic variances are themselves strictly positive random variables. Provided we have a consistent estimator V_{n}^{2} for such a random asymptotic variance V^{2} on hand, the stable central limit theorem X_{n}\lx@stackrel{{\scriptstyle st}}{{\rightsquigarrow}}VZ with Z distributed according to a standard Gaussian law, yields the joint weak convergence (X_{n},V_{n}^{2})\rightsquigarrow(VZ,V^{2}) and also X_{n}/V_{n}\rightsquigarrow Z and hence allows to perform statistical inference providing tests or confidence intervals.
In the proofs of our limit theorems we will ‘remove’ the drifts in the sense that after a transformation to an equivalent martingale measure stable central limit theorems for Itô processes without drift are proved and, as illustrated in Mykland and Zhang (2009), stability of the weak convergence ensures that the asymptotic law holds true under the original measure. In this sense stable convergence is commutative with measure change.
From now on, we often omit the superscripts of observation times for a shorter notation.

## 3 Brief review on the foundation

### 3.1 Subsampling and the multiscale estimator

In the model imposed by Assumption 1, Assumption 2 with synchronous observations, n=m and t_{j}^{(n)}=\tau_{j}^{(n)},1\leq j\leq n, and Assumption 3, the realized (co-)volatilities do not provide consistent estimators for the quadratic (co-)variations any more. The variance due to noise conditional on the paths of the efficient processes

 \mathbb{V}\hskip-1.422638pt\textnormal{a r}_{X,Y}\left(\sum_{j=1}^{n}\left(% \tilde{X}_{t_{j}}-\tilde{X}_{t_{j-1}}\right)\left(\tilde{Y}_{t_{j}}-\tilde{Y}_% {t_{j-1}}\right)\right)=4n\,\eta_{X}^{2}\eta_{Y}^{2}~{},

increases linearly with n. The error due to noise perturbation can be reduced by the following estimator, which has been proposed for the univariate estimation of integrated volatility as the “second best approach” in Zhang et al. (2005) and which is called one-scale subsampling estimator in Bibinger (2011b) and throughout this article.

It can be motivated from two perspectives that are both sketched in Figure 1. On the left-hand side we have visualized that one can calculate simultaneously lower frequent realized covolatilities using subsamples, e. g. to the lag three in Figure 1, and (post-)average them.

 \displaystyle\widehat{\left[X,Y\right]}_{T}^{sub}=\frac{1}{i}\sum_{j=i}^{n}% \left(\tilde{X}_{t_{j}}-\tilde{X}_{t_{j-i}}\right)\left(\tilde{Y}_{t_{j}}-% \tilde{Y}_{t_{j-i}}\right)~{}. (4a) This motivation given in Zhang et al. (2005) is in line with the former common practice of a sparse-sampled low-frequency realized (co-)volatility estimator and proposes to use an average instead of one single lower frequent realized measure. The same estimator arises as the usual realized covolatility calculated from the time series on that a linear filter is run before, what means that non-noisy observations at a time t_{j} are estimated with a (pre-)average of noisy observations at times t_{j},\ldots,t_{j+i} for some i. This is sketched on the right-hand side of Figure 1 for i=3. Passing over to increments leads to telescoping sums and we end up finally with the one-scale subsampling estimator. Since on the Assumption 3 there is no bias due to noise for the bivariate estimator, it already corresponds to the “first best approach” from Zhang et al. (2005) whereas in the univariate case a bias-correction completes the two scales realized volatility (TSRV): \displaystyle\widehat{\left[X\right]}_{T}^{TSRV}=\frac{1}{i}\sum_{j=i}^{n}% \left(\tilde{X}_{t_{j}}-\tilde{X}_{t_{j-i}}\right)^{2}-\frac{1}{2n}\sum_{i=1}^% {n}\left(\tilde{X}_{t_{j}}-\tilde{X}_{t_{j-1}}\right)^{2}~{}. (4b)

There is a trade-off between the signal term and the error due to noise. Choosing i=c_{sub}n^{\nicefrac{{2}}{{3}}} dependent on n with a constant c_{sub}, the overall mean square error is minimized and of order n^{-\nicefrac{{1}}{{3}}}. The one-scale subsampling estimator (4a) is hence a consistent and asymptotically unbiased estimator. The rate of convergence n^{\nicefrac{{1}}{{6}}}, however, is slow and does not attain the optimal rate n^{\nicefrac{{1}}{{4}}} determined in Bibinger (2011b). For this reason, we focus on a multiscale extension of the subsampling approach on which the methods developed in Bibinger (2011b) are based on. The multiscale realized covolatility (MSRC), and the univariate multiscale realized volatility (MSRV) introduced in Zhang (2006), are linear combinations of one-scale subsampling estimators with M_{n} different subsampling frequencies i=1,\ldots,M_{n}:

 \displaystyle\widehat{\left[X,Y\right]}_{T}^{multi}=\sum_{i=1}^{M_{N}}\frac{% \alpha_{i,M_{N}}^{opt}}{i}\sum_{j=i}^{n}\left(\tilde{X}_{t_{j}}-\tilde{X}_{t_{% j-i}}\right)\left(\tilde{Y}_{t_{j}}-\tilde{Y}_{t_{j-i}}\right)~{}, (5a) \displaystyle\widehat{\left[X\right]}_{T}^{multi}=\sum_{i=1}^{M_{N}}\frac{% \alpha_{i,M_{N}}^{opt}}{i}\sum_{j=i}^{n}\left(\tilde{X}_{t_{j}}-\tilde{X}_{t_{% j-i}}\right)^{2}~{}. (5b)

The weights are chosen such that the estimator is asymptotically unbiased and the error due to noise minimized. They are given later in (30) and can be chosen equally for the bivariate and the univariate case. Those are the standard discrete weights of Zhang (2006) and we abstain from giving a more general class of possible weight functions.
The mean square error of the multiscale realized covolatility (5a) can be split in uncorrelated addends that stem from discretization, microstructure noise and cross terms and end-effects. They are of orders M_{n}/n, n/M_{n}^{3}, and M_{n}^{-1}, respectively. Hence, a choice M_{n}=c_{multi}\sqrt{n} leads to a rate-optimal n^{\nicefrac{{1}}{{4}}}-consistent estimator.
The following stable central limit theorems for the multiscale realized covolatility (5a) and the one-scale estimator (4a) are implied by Theorem 3 and Corollary 4.1:

###### Proposition 3.0.

On Assumptions 1, 2 and 3 in the synchronous setup and if (n/T)\sum_{i}(t_{i}^{(n)}-t_{i-1}^{(n)})^{2} converges to a continuously differentiable limiting function G and the difference quotients converge uniformly to G^{\prime} on [0,T], the multiscale realized covolatility (5a) and the subsampling estimator (4a) converge stably in law to mixed normal limiting random variables:

 \displaystyle n^{\nicefrac{{1}}{{4}}}\left(\widehat{\left[X,Y\right]}_{T}^{% multi}-\left[X,Y\right]_{T}\right)\lx@stackrel{{\scriptstyle st}}{{% \rightsquigarrow}}\mathbf{N}\left(0\,,\,\operatorname{\mathbf{AVAR}}_{multi,% syn}\right)~{}, (6a) \displaystyle n^{\nicefrac{{1}}{{6}}}\left(\widehat{\left[X,Y\right]}_{T}^{sub% }-\left[X,Y\right]_{T}\right)\lx@stackrel{{\scriptstyle st}}{{\rightsquigarrow% }}\mathbf{N}\left(0\,,\,\operatorname{\mathbf{AVAR}}_{sub,syn}\right)~{}, (6b) with \displaystyle\operatorname{\mathbf{AVAR}}_{multi,syn} \displaystyle=c_{multi}^{-3}24\eta_{X}^{2}\eta_{Y}^{2}+c_{multi}\frac{26}{35}T% \int_{0}^{T}G^{\prime}(t)(\rho_{t}^{2}+1)(\sigma_{t}^{X}\sigma_{t}^{Y})^{2}\,dt (6c) \displaystyle~{}+c_{multi}^{-1}\frac{12}{5}\left(\eta_{X}^{2}\eta_{Y}^{2}+\eta% _{X}^{2}\int_{0}^{T}(\sigma_{t}^{Y})^{2}\,dt+\eta_{Y}^{2}\int_{0}^{T}(\sigma_{% t}^{X})^{2}\,dt\right)~{}, \displaystyle\operatorname{\mathbf{AVAR}}_{sub,syn}=c_{sub}^{-2}4\eta_{X}^{2}% \eta_{Y}^{2}+c_{sub}\frac{2}{3}T\int_{0}^{T}G^{\prime}(t)(\rho_{t}^{2}+1)(% \sigma_{t}^{X}\sigma_{t}^{Y})^{2}\,dt~{}. (6d)

### 3.2 Synchronization and the Hayashi–Yoshida estimator

We use the short notation \Delta X_{t_{i}},i=1,\ldots,n from now on for increments X_{t_{i}}-X_{t_{i-1}} and analogously for Y. The Hayashi-Yoshida estimator

 \displaystyle\widehat{\left[X,Y\right]}_{T}^{(HY)}=\sum_{i=1}^{n}\sum_{j=1}^{m% }\Delta X_{t_{i}}\Delta Y_{\tau_{j}}\mathbbm{1}_{[\min{(t_{i},\tau_{j})}>\max{% (t_{i-1},\tau_{j-1})}]}~{}, (7)

where the product terms include all increments of the processes with overlapping observation time instants, has been proved in Hayashi and Yoshida (2005) to be consistent in a model of asynchronously observed Itô processes with deterministic correlation, drift and volatility functions in the absence of observation noise and on further regularity conditions to be asymptotically normally distributed in Hayashi and Yoshida (2008).
For a combination of the strategy of the Hayashi-Yoshida estimator with techniques to handle noise contamination, we use an iterative algorithm introduced in Palandri (2006) as ‘pseudo-aggregation’. Incorporating telescoping sums there are the following rewritings of the estimator (7):

 \displaystyle\widehat{\left[X,Y\right]}_{T}^{(HY)} \displaystyle=\sum_{i=1}^{n}\Delta X_{t_{i}}\left(Y_{t_{i,+}}-Y_{t_{i-1,-}}\right) \displaystyle=\sum_{i=1}^{N}\left(X_{g_{i}}-X_{l_{i}}\right)\left(Y_{\gamma_{i% }}-Y_{\lambda_{i}}\right) (8) \displaystyle=\sum_{i=1}^{N}\left(X_{T_{i,+}^{X}}-X_{T_{i-1,-}^{X}}\right)% \left(Y_{T_{i,+}^{Y}}-Y_{T_{i-1,-}^{Y}}\right)~{},

with the notion of next-tick interpolated times t_{i,+}\mathrel{\mathop{:}}=\min_{0\leq j\leq m}{\left(\tau_{j}|\tau_{j}\geq t% _{i}\right)} and previous-tick interpolated ones t_{i,-}\mathrel{\mathop{:}}=\max_{0\leq j\leq m}{\left(\tau_{j}|\tau_{j}\leq t% _{i}\right)} in the first equality. This rewriting can be as well done in the symmetric way.
The illustration of (8) that serves as a basis for the construction of the generalized multiscale estimator relies on an aggregation of the observations according to Algorithm 1. This algorithm, which is a concise version of the construction in Bibinger (2011b), stops after (N+1)\leq\min{(n,m)}+1 steps when all observation times are grouped. Summation in (8) can start with i=0 or i=1.

In the last equality only the denotation expressions g_{i},\gamma_{i},l_{i},\lambda_{i} are substituted emphasizing that those sampling times obtained by Algorithm 1 can be interpreted as previous- and next-tick interpolations again with respect to a synchronous sampling scheme T_{k}\mathrel{\mathop{:}}=\min{(g_{k},\gamma_{k})},\,1\leq k\leq N, which we call the closest synchronous approximation. Increments in (8) are taken from previous-tick interpolations at left-end points of instants [T_{k-1},T_{k}],\,2\leq k\leq N to next-tick interpolated sampling times at right-end points. Since T_{k}=\max{(l_{k+1},\lambda_{k+1})},\,1\leq k\leq(N-1) holds true, we split the estimation error of (8) in two uncorrelated parts D_{T}^{N}+A_{T}^{N} with

 \displaystyle D_{T}^{N}\mathrel{\mathop{:}}=\sum_{i=1}^{N}\left(\left(X_{T_{i}% }-X_{T_{i-1}}\right)\left(Y_{T_{i}}-Y_{T_{i-1}}\right)-\int_{T_{i-1}}^{T_{i}}% \rho_{t}\sigma_{t}^{X}\sigma_{t}^{Y}\,dt\right) (9) \displaystyle~{}~{}~{}-\int_{0}^{t_{0}\wedge\tau_{0}}\rho_{t}\sigma_{t}^{X}% \sigma_{t}^{Y}\,dt-\int_{t_{n}\wedge\tau_{m}}^{T}\rho_{t}\sigma_{t}^{X}\sigma_% {t}^{Y}\,dt

being the discretization error of a synchronous-type realized covolatility including in general non-observable idealized values at the times of the closest synchronous approximation and A_{T}^{N} an additional error due to the lack of synchronicity, in particular next- and previous-tick interpolations. The times T_{k} equal the so-called refresh times of Barndorff-Nielsen et al. (2008b) and thus our synchronization differs from the one in Barndorff-Nielsen et al. (2008b) by replacing pure previous-tick interpolation by the above given machinery of previous- and next-tick interpolations.
The asymptotic theory for the estimator (8) as N\rightarrow\infty, concisely repeated here, is separately proved and presented in a more elaborate way in Bibinger (2011a).
First, we take up the illustrative example from Bibinger (2011b) to motivate the synchronization procedure. For further details and examples we refer to Bibinger (2011b) and Palandri (2006). Figure 2 visualizes the aggregation carried out by Algorithm 1 and the times T_{i},\,i=0,\ldots,8 for a toy example. The example emphasizes the important fact that consecutive right-end points of increments can be the same time points. The realized covolatility calculated from previous-tick interpolated values to refresh times equals

 \displaystyle(X_{t_{2}}-X_{t_{0}})(Y_{\tau_{1}}-Y_{\tau_{0}})+(X_{t_{3}}-X_{t_% {2}})(Y_{\tau_{3}}-Y_{\tau_{1}})+(X_{t_{5}}-X_{t_{3}})(Y_{\tau_{4}}-Y_{\tau_{3% }})+ \displaystyle(X_{t_{6}}-X_{t_{5}})(Y_{\tau_{5}}-Y_{\tau_{4}})+(X_{t_{7}}-X_{t_% {6}})(Y_{\tau_{6}}-Y_{\tau_{5}})+(X_{t_{8}}-X_{t_{7}})(Y_{\tau_{7}}-Y_{\tau_{6% }})+ \displaystyle(X_{t_{9}}-X_{t_{8}})(Y_{\tau_{8}}-Y_{\tau_{7}})+(X_{t_{10}}-X_{t% _{9}})(Y_{\tau_{10}}-Y_{\tau_{8}})

and is systematically biased downwards by interpolations, whereas (8) yields

 \displaystyle(X_{t_{3}}-X_{t_{0}})(Y_{\tau_{1}}-Y_{\tau_{0}})+(X_{t_{3}}-X_{t_% {2}})(Y_{\tau_{3}}-Y_{\tau_{1}})+(X_{t_{6}}-X_{t_{3}})(Y_{\tau_{4}}-Y_{\tau_{3% }})+ \displaystyle(X_{t_{7}}-X_{t_{5}})(Y_{\tau_{5}}-Y_{\tau_{4}})+(X_{t_{8}}-X_{t_% {6}})(Y_{\tau_{6}}-Y_{\tau_{5}})+(X_{t_{8}}-X_{t_{7}})(Y_{\tau_{8}}-Y_{\tau_{6% }})+ \displaystyle(X_{t_{9}}-X_{t_{8}})(Y_{\tau_{9}}-Y_{\tau_{7}})+(X_{t_{10}}-X_{t% _{9}})(Y_{\tau_{10}}-Y_{\tau_{8}})~{},

which is not biased due to interpolations.

{defi}

[quadratic (co-)variations of time] For any N\in\mathds{N} let T_{i}^{(N)},~{}i=0,\ldots,N be the times of the closest synchronous approximation and g_{i}^{(N)},\gamma_{i}^{(N)},l_{i}^{(N)},\lambda_{i}^{(N)} the corresponding observation times that appear in the estimator (8) defined above by Algorithm 1 . T/N is the mean of the time instants \Delta T_{i}^{(N)}=T_{i}^{(N)}-T_{i-1}^{(N)},~{}i=1,\ldots,N. Define the following sequences of functions

 \displaystyle G^{N}(t)=\frac{N}{T}\sum_{T_{i}^{(N)}\leq t}\left(\Delta T_{i}^{% (N)}\right)^{2}~{}, (10a) \displaystyle F^{N}(t)=\frac{N}{T}\sum_{T_{i+1}^{(N)}\leq t}(T_{i}^{(N)}-% \lambda_{i}^{(N)})(g_{i}^{(N)}-T_{i}^{(N)})+\left(T_{i}^{(N)}-l_{i}^{(N)}% \right)\left(\gamma_{i}^{(N)}-T_{i}^{(N)}\right) \displaystyle+\Delta T_{i+1}^{(N)}\left(T_{i}^{(N)}-l_{i+1}^{(N)}\right)+% \Delta T_{i+1}^{(N)}\left(T_{i}^{(N)}-\lambda_{i+1}^{(N)}\right)~{}, (10b) \displaystyle H^{N}(t)=\frac{N}{T}\sum_{T_{i+1}^{(N)}\leq t}\left(T_{i}^{(N)}-% l_{i+1}^{(N)}\right)\left(g_{i}^{(N)}-T_{i}^{(N)}\right)+\left(T_{i}^{(N)}-% \lambda_{i+1}^{(N)}\right)\left(\gamma_{i}^{(N)}-T_{i}^{(N)}\right)~{}, (10c)

for t\in[0,T] that we call sequences of quadratic (co-)variations of times. A stable central limit theorem for the estimation error is deduced in Bibinger (2011a) on the assumption that the sequences defined by (10a), (10) and (10c) converge pointwise to continuous differentiable limiting functions G,F,H and the sequences of difference quotients uniformly. The asymptotic quadratic variation of time G of the T_{i}^{(N)}s influences the asymptotics of D_{T}^{N}. The covariation of times F^{N} measures an interaction of interpolation errors between the two processes and H^{N} the impact of the in general non-zero correlations of the products involving previous- and next-tick interpolations at the same T_{i}^{(N)}s for each process separately.
Consider as easiest example the synchronous equidistant sampling schemes with N=n=m and t_{i}^{(n)}=\tau_{j}^{(n)}=i/n,i=0,\ldots,n. In this case F^{N} and H^{N} are identically zero since interpolations are redundant. The function G^{N} is a step function that will tend to the identity on [0,T] as N\rightarrow\infty.
Then, consider a situation of completely non-synchronous sampling schemes that originates from the complete synchronous equidistant one by shifting one time-scale half a time instant 1/2N. We will call this situation intermeshed sampling. For this example, the synchronous approximation is still equidistant with instants 1/N and, hence, G is the identity function. F and H are linear limiting functions with slope 1 and 1/4, respectively.
In Bibinger (2011a) we show for an important special case, independent homogeneous Poisson sampling, that the convergence assumptions on (10a)-(10c) are fulfilled when replacing deterministic convergence by convergence in probability. Furthermore, the stochastic limits G^{\prime}(t),F^{\prime}(t),H^{\prime}(t) are calculated explicitly.
The main result for the estimator (8) is Theorem 2. It serves as preparation to prove the stable limit theorem for the generalized multiscale estimator in Theorem 3 and gives insight into the asymptotic distribution of (8). For the proof we refer to Bibinger (2011a). A similar stable limit theorem for the original Hayashi-Yoshida estimator is provided in Hayashi and Yoshida (2011).

###### Theorem 2.

The estimation error of (8) converges on the Assumptions 1, 2 and convergence assumptions on (10a)-(10c) and the difference quotients stably in law to a centred, mixed Gaussian distribution:

 \sqrt{N}\left(\sum_{i=1}^{N}\left(X_{g_{i}}-X_{l_{i}}\right)\left(Y_{\gamma_{i% }}-Y_{\lambda_{i}}\right)-\left[X\,,\,Y\right]_{T}\right)\lx@stackrel{{% \scriptstyle st}}{{\rightsquigarrow}}\mathbf{N}\left(0\,,\,v_{D_{T}}+v_{A_{T}}% \right)~{}, (11)

with the asymptotic variance

 \hskip-1.422638ptv_{D_{T}}\hskip-1.422638pt+\hskip-1.422638ptv_{A_{T}}\hskip-1% .422638pt=\hskip-1.422638ptT\hskip-2.845276pt\int_{0}^{T}\hskip-2.845276ptG^{% \prime}(t)\hskip-1.422638pt\left(\sigma_{t}^{X}\sigma_{t}^{Y}\right)^{2}\hskip% -1.422638pt\left(\rho_{t}^{2}+1\right)dt+T\hskip-2.845276pt\int_{0}^{T}\hskip-% 2.845276pt\left(F^{\prime}(t)\hskip-1.422638pt\left(\sigma_{t}^{X}\sigma_{t}^{% Y}\right)^{2}\hskip-1.422638pt+2H^{\prime}(t)\hskip-1.422638pt\left(\rho_{t}% \sigma_{t}^{X}\sigma_{t}^{Y}\right)^{2}\hskip-1.422638pt\right)dt

where the two addends come from the asymptotic variances of D_{T}^{N} and A_{T}^{N}, respectively.

### 3.3 Hybrid approach to non-synchronous and noisy observations

In Bibinger (2011b) we have proposed the following combined estimation method for the quadratic covariation or integrated covolatility from noisy asynchronous observations. After applying Algorithm 1 to the observation times, the generalized multiscale estimator is defined by

 \displaystyle\widehat{\left[X,Y\right]}_{T}^{multi}=\sum_{i=1}^{M_{N}}\frac{% \alpha_{i,M_{N}}^{opt}}{i}\sum_{j=i}^{N}\left(\tilde{X}_{g_{j}^{(N)}}-\tilde{X% }_{l_{j-i+1}^{(N)}}\right)\left(\tilde{Y}_{\gamma_{j}^{(N)}}-\tilde{Y}_{% \lambda_{j-i+1}^{(N)}}\right)~{}. (12)

It is a weighted sum of M_{N} one-scale subsampling estimators of the type

 \displaystyle\widehat{\left[X,Y\right]}_{T}^{sub}=\frac{1}{i_{N}}\sum_{j=i_{N}% }^{N}\left(\tilde{X}_{g_{j}^{(N)}}-\tilde{X}_{l_{j-i_{N}+1}^{(N)}}\right)\left% (\tilde{Y}_{\gamma_{j}^{(N)}}-\tilde{Y}_{\lambda_{j-i_{N}+1}^{(N)}}\right) (13)

with subsampling frequencies i=1,\ldots,M_{N} and optimal weights given later in (30). Owing to the aggregation of non-synchronous observation times before applying subsampling and the multiscale approach, the resulting estimator has a conformable appearance as in the synchronous case (5a). Recall that in the synchronous setting g_{j}=\gamma_{j}=T_{j} and l_{j-i+1}=\lambda_{j-i+1}=T_{j-i} holds.
Choosing M_{N}=c_{multi}\cdot\sqrt{N} and i_{N}=c_{sub}\cdot N^{\nicefrac{{2}}{{3}}}, both estimators above provide consistent and asymptotically unbiased estimators with convergence rate N^{\nicefrac{{1}}{{4}}} and N^{\nicefrac{{1}}{{6}}}, respectively.

## 4 Asymptotics and a stable central limit theorem for the generalized multiscale estimator

A comprehensive analysis of the asymptotic distribution of the estimation error necessitates an elaborate screening of the conjunction of Algorithm 1 and the joint sampling design \left(\mathcal{T}^{X,n},\mathcal{T}^{Y,m}\right).
Note that the generalized multiscale estimator (12) differs from the other plausible Hayashi-Yoshida version of a multiscale estimator

 \displaystyle\sum_{i=1}^{M_{N}}\frac{\beta_{i,M_{N}}^{opt}}{i}\hskip-1.422638% pt\sum_{j=i}^{n}\sum_{k\in\mathds{Z}}\hskip-1.422638pt\Big{(}\tilde{X}_{t_{j}^% {(n)}}\hskip-1.422638pt-\tilde{X}_{t_{j-i}^{(n)}}\Big{)}\Big{(}\tilde{Y}_{\tau% _{j+k\cdot i}^{(m)}}\hskip-2.845276pt-\tilde{Y}_{\tau_{j+(k-1)\cdot i}^{(m)}}% \Big{)}\hskip-1.422638pt\mathbbm{1}_{\{\max{(t_{j-i}^{(n)},\tau_{j+(k-1)\cdot i% }^{(m)})}<\min{(t_{j}^{(n)},\tau_{j+k\cdot i}^{(m)})}\}}~{}, (14)

which arises as natural Hayashi-Yoshida multiscale estimator when, on the basis of (non-synchronized) observations of \tilde{X} and \tilde{Y}, sparse-sample Hayashi-Yoshida estimators are averaged to one-scale subsample estimators and those extended to a linear combination using different time lags. We state without proof that this estimator is consistent, asymptotically unbiased and will attain the optimal rate of convergence. Nevertheless, we benefit from the data aggregation method and applying subsampling to the synchronized scheme, since the variance of our estimator (12) is smaller than the one of this alternative estimator and we are able to find a feasible closed-form expression of the asymptotic variance.
The crucial difference between both approaches is that for the alternative method next- and previous-tick interpolation errors take place on sparse-sampling time intervals in average of order i/N whereas the interpolation errors of the generalized multiscale estimator (12) take place on the highest-frequency-scale and hence on intervals in average of order 1/N. In particular the decomposition

 \left(X_{g_{j}^{(N)}}-X_{l_{j-i_{N}+1}^{(N)}}\right)=\big{(}\underbrace{X_{g_{% j}^{(N)}}-X_{{T_{j}}^{(N)}}}_{=\mathcal{O}_{p}(N^{-1/2})}+\underbrace{X_{{T_{j% }}^{(N)}}-X_{T_{j-i}^{(N)}}}_{=\mathcal{O}_{p}((i/N)^{(1/2)})}+\underbrace{X_{% T_{j-i}^{(N)}}-X_{l_{j-i_{N}+1}^{(N)}}}_{=\mathcal{O}_{p}(N^{-1/2})}\big{)}

of the increments of X and analogously for Y, give an heuristic that the interpolation errors driving the error due to non-synchronicity asymptotically not affect the variance of the signal term. The stochastic orders are given for times instants of average order N^{-1}.
For a rigorous clarification of the asymptotic error due to noise and the cross term, both influenced by the i. i. d. observation errors at times g_{i},l_{i},\gamma_{i},\lambda_{i}, we figure out the times g_{i}=g_{i+1} and the right-end points g_{i}^{(N)}=l_{i+1}^{(N)},g_{i}^{(N)}=l_{i+2}^{(N)} that are as well preceding left-end points and analogously for the sampling times of \tilde{Y}.
All observation times \gamma_{i},\lambda_{i} are characterized through one of the following four mutually exclusive cases. Denote \gamma_{j,-} the last observation time of \tilde{Y} before \gamma_{j} and \gamma_{j,+} the first one after \gamma_{j}. We illustrate the allocation of the observation times for \mathcal{T}^{Y,m} and \gamma_{j}\,,\,j=1,\ldots,N-2:

 \displaystyle\text{\textcircled{\texttt{1}}}~{}\gamma_{j}\leq g_{j}\hskip 137.% 569075pt\Rightarrow\gamma_{j}\neq\gamma_{j+1}~{},~{}\gamma_{j}=\lambda_{j+1}~{% },~{}\gamma_{j}\neq\lambda_{j+2}~{}~{}, \displaystyle\text{\textcircled{\texttt{2}}}~{}\gamma_{j}>g_{j}~{},~{}\gamma_{% j}\geq g_{j,+}\hskip 88.203543pt\Rightarrow\gamma_{j}=\gamma_{j+1}~{},~{}% \gamma_{j}\neq\lambda_{j+1}~{},~{}\gamma_{j}=\lambda_{j+2}~{}~{}, \displaystyle\text{\textcircled{\texttt{3}}}~{}\gamma_{j}>g_{j}~{},~{}\gamma_{% j}g_{j,+}\hskip 31.298031pt\Rightarrow\gamma_{j}% \neq\gamma_{j+1}~{},~{}\gamma_{j}\neq\lambda_{j+1}~{},~{}\gamma_{j}=\lambda_{j% +2}~{}~{}, \displaystyle\text{\textcircled{\texttt{4}}}~{}\gamma_{j}>g_{j}~{},~{}\gamma_{% j}

Only sampling times distributed to case 2⃝ lead to repeated \gamma_{i}=\gamma_{i+1}. In cases \text{\textcircled{\texttt{1}}},\text{\textcircled{\texttt{2}}} and 3⃝ a subsequent left-end point \lambda_{k},k=i+1 or k=i+2 of observation time instants incorporated in the subsampling estimators is designated by \gamma_{i}. All other \lambda_{k},\,k=2,\ldots,N appear in an allocation of sampling times of the type 4⃝, where \lambda_{j+2}=\gamma_{j,+}\neq\gamma_{l}\,\forall l. Recall that \lambda_{i}\neq\lambda_{k} for all i\neq k holds true.
If 2⃝ holds for \gamma_{j} with fixed j\in\{1,\ldots,N-2\} and if k\mathrel{\mathop{:}}=\arg\min_{k\in\{j,\ldots,N-1\}}{\left(\gamma_{k}>g_{k}\,,\,\gamma_{k}\geq g_{k,+}\right)} exists, then 2⃝ holds necessarily for one g_{l},l\in\{j+1,\ldots,k-1\} or g_{l}=\gamma_{l}.
In Table 1 we list the relations for the sampling design of our previous example.

###### Assumption 4 (asymptotic quadratic variation of time).

Assume that for the sequences of sampling schemes and the times T_{i}^{(N)} of the closest synchronous approximations and for the sequence of quadratic variations of time G^{N}(t) defined in Definition 3.2, the following holds true:

1. G^{N}(t)\rightarrow G(t) as N\rightarrow\infty, where G(t) is a continuously differentiable function on [0,T].

2. For any null sequence (h_{N}),\,h_{N}=\mathcal{O}\left(N^{-1}\right)

 \displaystyle\frac{G^{N}(t+h_{N})-G^{N}(t)}{h_{N}}\rightarrow G^{\prime}(t) (15)

uniformly on [0,T] as N\rightarrow\infty..

3. The derivative G^{\prime}(t) is bounded away from zero.

{defi}

[degree of regularity of asynchronicity] For N\in\mathds{N} and sets \mathcal{H}^{i},\mathcal{G}^{i},i=0,\ldots,N constructed from aggregated sampling schemes \mathcal{T}^{X,n},\mathcal{T}^{Y,m} that fulfill Assumption 2, define the following sequences of functions:

 \displaystyle I_{X}^{N}(t)=\frac{1}{N}\sum_{g_{j}^{(N)}\leq t}\mathbbm{1}_{\{g% _{j}^{(N)}=g_{j-1}^{(N)}\}}~{}, (16a) \displaystyle I_{Y}^{N}(t)=\frac{1}{N}\sum_{\gamma_{j}^{(N)}\leq t}\mathbbm{1}% _{\{\gamma_{j}^{(N)}=\gamma_{j-1}^{(N)}\}}~{}, (16b)

which describe the degree of regularity of asynchronicity between observation times \mathcal{T}^{X,n} and \mathcal{T}^{Y,m}. In the completely asynchronous case, we can directly conclude that |I^{N}_{X}(t)-I^{N}_{Y}(t)|\leq T/N for all t\in[0,T] and one sequence suffices to reflect the regularity of the non-synchronous sampling schemes.

###### Assumption 5 (asymptotic degree of regularity of asynchronicity).

Assume that for the sequences of sampling schemes and for the sequences of functions I_{X}^{N},I^{N}_{Y} defined in Definition 4, the following holds true:

1. I^{N}_{X}(t)\rightarrow I_{X}(t),I^{N}_{Y}(t)\rightarrow I_{Y}(t) as N\rightarrow\infty, where I_{X}(t),I_{Y}(t) are continuously differentiable functions on [0,T].

2. For any null sequence (h_{N}),\,h_{N}=\mathcal{O}\left(N^{-1}\right)

 \displaystyle\frac{I_{X}^{N}(t+h_{N})-I_{X}^{N}(t)}{h_{N}}\rightarrow I_{X}^{% \prime}(t)~{}, (17a) \displaystyle\frac{I_{Y}^{N}(t+h_{N})-I_{Y}^{N}(t)}{h_{N}}\rightarrow I_{Y}^{% \prime}(t) (17b)

uniformly on [0,T] as N\rightarrow\infty.

For both, synchronous and intermeshed sampling which have been introduced in the last section, the sequences of functions I_{X}^{N},I_{Y}^{N} are identically zero. The functions defined in Definition 4 are non-negative and bounded above by 1. In Section 6 we explicitly deduce the asymptotic degree of regularity of asynchronicity for mutually independent homogeneous Poisson sampling schemes. The term (asymptotic) degree of regularity of asynchronicity has been chosen since Assumption 5 holds for all non-degenerate sequences where observation times conforming to one of the cases \text{\textcircled{\texttt{1}}}-\text{\textcircled{\texttt{4}}} from above tend to be distributed according to some regular pattern and it gives information on the interaction of allocations of observation times.
It is interesting and might seem surprising at first glance that the asymptotics of the estimator (12) hinges on this asymptotic feature whereas, as indicated before, the asymptotic interpolations to the closest synchronous approximation are asymptotically immaterial. This circumstance is caused by the fact that for the construction of an estimator with Algorithm 1, as for the original Hayashi-Yoshida estimator (8), observed values of the processes at next-tick interpolated observation times can appear twice. If there is observation noise, the number of observations allocated conforming to case 2⃝ has an impact on the asymptotics. The influence of interpolations is asymptotically vanishing for the combined method in contrast to the estimator (8) with faster convergence rate \sqrt{N} since interpolation steps take place on the time-scale of high-frequency observations, but lower-frequency sparse-sampled increments of the synchronous approximation are involved to reduce the error due to noise. We continue with the central result of this article:

###### Theorem 3 (Central limit theorem for the generalized multiscale estimator).

On the Assumptions 1, 2, 3, 4 and 5, the generalized multiscale estimator (12) with noise-optimal weights \alpha_{i,M_{N}}^{opt}=(12i^{2}/M_{N}^{3})-(6i/M_{N}^{2})\left(1+{\scriptstyle% {\mathcal{O}}}(1)\right), that are explicitly given in (30), and M_{N}=c_{multi}\cdot\sqrt{N} converges \mathcal{F}-stably in law with optimal rate N^{\nicefrac{{1}}{{4}}} to a mixed Gaussian limiting distribution:

 \displaystyle N^{\nicefrac{{1}}{{4}}}\left(\widehat{\left[X,Y\right]}_{T}^{% multi}-\left[X,Y\right]_{T}\right)\lx@stackrel{{\scriptstyle st}}{{% \rightsquigarrow}}\mathbf{N}\left(0,\operatorname{\mathbf{AVAR}}_{multi}\right)

with the asymptotic variance

 \displaystyle\operatorname{\mathbf{AVAR}}_{multi} \displaystyle=c_{multi}^{-3}\,\underbrace{\left(24+12\,(I_{X}(T)+I_{Y}(T))% \right)\eta_{X}^{2}\eta_{Y}^{2}}_{=\operatorname{\mathbf{AVAR}}_{\text{noise}}% }+c_{multi}^{-1}\,\frac{12\eta_{X}^{2}\eta_{Y}^{2}}{5} \displaystyle\;+c_{multi}\,\underbrace{\frac{26}{35}T\int_{0}^{T}G^{\prime}(t)% (\sigma_{t}^{X}\sigma_{t}^{Y})^{2}(1+\rho_{t}^{2})\,dt}_{=\operatorname{% \mathbf{AVAR}}_{dis,multi}} (18) \displaystyle\;+c_{multi}^{-1}\,\underbrace{\frac{12}{5}\left(\eta_{Y}^{2}\int% _{0}^{T}(1+I^{\prime}_{Y}(t))(\sigma_{t}^{X})^{2}\,dt\,+\eta_{X}^{2}\int_{0}^{% T}(1+I^{\prime}_{X}(t))(\sigma_{t}^{Y})^{2}\,dt\right)}_{=\operatorname{% \mathbf{AVAR}}_{cross}}~{}.

The weak convergence is proved to be stable with respect to the \sigma-algebra \mathcal{F} associated with the efficient processes. As a side result, we also obtain a stable central limit theorem for a simpler one-scale subsampling estimator:

###### Corollary 4.1 (Central limit theorem for the one-scale subsampling estimator).

On the Assumptions 1, 2, 3 and 4, the one-scale subsampling estimator with subsampling frequency i_{N}=c_{sub}\cdot N^{\nicefrac{{2}}{{3}}} converges \mathcal{F}-stably in law with rate N^{\nicefrac{{1}}{{6}}} to a mixed Gaussian limiting distribution:

 \displaystyle N^{\nicefrac{{1}}{{6}}}\left(\widehat{\left[X,Y\right]}_{T}^{sub% }-\left[X,Y\right]_{T}\right)\lx@stackrel{{\scriptstyle st}}{{\rightsquigarrow% }}\mathbf{N}\left(0,\operatorname{\mathbf{AVAR}}_{sub}\right)~{}, (19)

with the asymptotic variance

 \displaystyle\operatorname{\mathbf{AVAR}}_{sub} \displaystyle=c_{sub}^{-2}\,\underbrace{4\eta_{X}^{2}\eta_{Y}^{2}}_{=% \operatorname{\mathbf{AVAR}}_{\text{noise,sub}}}\hskip-9.958465pt+\,c_{sub}\,% \underbrace{\frac{2}{3}T\int_{0}^{T}G^{\prime}(t)(\sigma_{t}^{X}\sigma_{t}^{Y}% )^{2}(1+\rho_{t}^{2})\,dt}_{=\operatorname{\mathbf{AVAR}}_{dis,sub}}~{}. (20)

For the proof of Theorem 3, we split the total estimation error of the generalized multiscale estimator in three asymptotically uncorrelated addends due to noise, cross terms and the signal term. For the one-scale subsampling estimator we follow the same ansatz. The orders of the errors have been derived in Bibinger (2011b) and we focus on the asymptotic distribution here.
The error due to microstructure noise of the one-scale subsampling estimator has expectation zero and the variance yields

 i_{N}^{-2}\sum_{j=i_{N}}^{N}\mathbb{E}\left[\left(\epsilon_{g_{j}}^{X}-% \epsilon_{l_{j-i_{N}+1}}^{X}\right)^{2}\left(\epsilon_{\gamma_{j}}^{Y}-% \epsilon_{\lambda_{j-i_{N}+1}}^{Y}\right)^{2}\right]=4Ni_{N}^{-2}\eta_{X}^{2}% \eta_{Y}^{2}+{\scriptstyle{\mathcal{O}}}\left(Ni_{N}^{-2}\right)~{},

since observation noises of \tilde{X} and \tilde{Y} are independent of each other by Assumption 3 and l_{k}\neq l_{r} for k\neq r, \lambda_{k}\neq\lambda_{r} for k\neq r and if g_{k}=g_{k+1}\Rightarrow\gamma_{k}<\gamma_{k+1}~{},0\leq k\leq(N_{1}). Hence, the error due to noise is a sum of uncorrelated centred random variables with equal variances and the standard central limit theorem applies.
For the generalized multiscale estimator, we further decompose the error due to noise in a main part of order N^{\nicefrac{{1}}{{2}}}M_{N}^{-\nicefrac{{3}}{{2}}} and two terms due to end-effects of orders M_{N}^{-\nicefrac{{1}}{{2}}}, where all three terms are asymptotically uncorrelated. In Propositions 3 and 4 we prove central limit theorems for these terms. Asymptotic normality holds both, conditionally and unconditionally on the paths of the efficient processes.
The error due to noise of the one-scale estimator does not depend on any further influence of the sampling schemes except the number of constructed sets N and G^{\prime}. Cross terms are, in contrast to the multiscale case, asymptotically negligible since

 \displaystyle\mathbb{E}\left[\left(i_{N}^{-1}\sum_{j=i_{N}}^{N}\left(\left(X_{% g_{j}}-X_{l_{j-i_{N}+1}}\right)\left(\epsilon_{\gamma_{j}}^{Y}-\epsilon_{% \lambda_{j-i_{N}+1}}^{Y}\right)+\left(Y_{\gamma_{j}}-Y_{\lambda_{j-i_{N}+1}}% \right)\left(\epsilon_{g_{j}}^{X}-\epsilon_{l_{j-i_{N}+1}}^{X}\right)\right)% \right)^{2}\right] \displaystyle=i_{N}^{-2}\sum_{j=i_{N}}^{N}\left(\int_{l_{j-i_{N}+1}}^{g_{j}}(% \sigma_{t}^{X})^{2}\,dt~{}2\eta_{Y}^{2}+\int_{\lambda_{j-i_{N}+1}}^{\gamma_{j}% }(\sigma_{t}^{Y})^{2}\,dt~{}2\eta_{X}^{2}\right)+{\scriptstyle{\mathcal{O}}}% \left(i_{N}^{-2}\right) \displaystyle=\mathcal{O}\left(i_{N}^{-2}\right)={\scriptstyle{\mathcal{O}}}(1% )~{}.

For the generalized multiscale estimator instead the cross terms are of order M_{N}^{-\nicefrac{{1}}{{2}}} and will have effect upon the asymptotic distribution. In Proposition 11 a limit theorem is stated where the weak convergence also holds conditionally and unconditionally on the paths of the efficient processes. The asymptotic variance \operatorname{\mathbf{AVAR}}_{cross} includes the influence of the asymptotic degree of regularity of asynchronicity.
The error due to discretization of the one-scale subsampling estimator yields:

 \displaystyle\frac{1}{i}\sum_{j=i}^{N}\left(X_{g_{j}}-X_{l_{j-i+1}}\right)% \left(Y_{\gamma_{j}}-Y_{\lambda_{j-i+1}}\right)-\left[X,Y\right]_{T} \displaystyle=\frac{1}{i}\sum_{j=i}^{N}\left(X_{T_{j}}-X_{T_{j-i}}\right)\left% (Y_{T_{j}}-Y_{T_{j-i}}\right)-\left[X,Y\right]_{T} \displaystyle+\frac{1}{i}\sum_{j=i}^{N}\left[\left(X_{g_{j}}-X_{T_{j}}\right)% \left(Y_{T_{j}}-Y_{T_{j-i}}\right)+\left(X_{g_{j}}-X_{T_{j}}\right)\left(Y_{T_% {j-i}}-Y_{\lambda_{j-i+1}}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\left(X_{T_{j-i}}-X_{l_{j-i+1}}\right)% \left(Y_{\gamma_{j}}-Y_{T_{j}}\right)+\left(X_{T_{j-i}}-X_{l_{j-i+1}}\right)% \left(Y_{T_{j}}-Y_{T_{j-i}}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\left(Y_{\gamma_{j}}-Y_{T_{j}}\right)% \left(X_{T_{j}}-X_{T_{j-i}}\right)+\left(Y_{T_{j-i}}-Y_{\lambda_{j-i+1}}\right% )\left(X_{T_{j}}-X_{T_{j-i}}\right)\right] \displaystyle=\frac{1}{i}\sum_{j=i}^{N}\Big{(}\sum_{k=j-i+1}^{j}\Delta X_{T_{k% }}\Big{)}\Big{(}\sum_{k=j-i+1}^{j}\Delta Y_{T_{k}}\Big{)}-\left[X,Y\right]_{T} \displaystyle+\frac{1}{i}\sum_{j=i}^{N-1}\left[\left(X_{g_{j}}-X_{T_{j}}\right% )\Big{(}\sum_{k=j-i+1}^{j}\Delta Y_{T_{k}}\Big{)}+\left(X_{g_{j}}-X_{T_{j}}% \right)\left(Y_{T_{j-i}}-Y_{\lambda_{j-i+1}}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\left(X_{T_{j-i}}-X_{l_{j-i+1}}\right)% \left(Y_{\gamma_{j}}-Y_{T_{j}}\right)+\left(X_{T_{j}}-X_{l_{j+1}}\right)\Big{(% }\sum_{k=j-i+1}^{j}\Delta Y_{T_{k}}\Big{)}\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\left(Y_{\gamma_{j}}-Y_{T_{j}}\right)% \Big{(}\sum_{k=j-i+1}^{j}\Delta X_{T_{k}}\Big{)}+\left(Y_{T_{j}}-Y_{\lambda_{j% +1}}\right)\Big{(}\sum_{k=j-i+1}^{j}\Delta X_{T_{k}}\Big{)}\right]+\mathcal{O}% _{p}\left(N^{-1}\right) \displaystyle=\frac{1}{i}\sum_{j=i}^{N}\Big{(}\sum_{k=j-i+1}^{j}\Delta X_{T_{k% }}\Delta Y_{T_{k}}+\sum_{\begin{subarray}{c}l\neq r\\ l,r\in\{j-i+1,\ldots,j\}\end{subarray}}\Delta X_{T_{l}}\Delta Y_{T_{r}}\Big{)}% -\left[X,Y\right]_{T} \displaystyle+\frac{1}{i}\sum_{j=i}^{N-1}\left[\left(X_{g_{j}}-X_{T_{j}}\right% )\Big{(}\sum_{k=j-i+1}^{j}\Delta Y_{T_{k}}\Big{)}+\left(Y_{\gamma_{j}}-Y_{T_{j% }}\right)\Big{(}\sum_{k=j-i+1}^{j}\Delta X_{T_{k}}\Big{)}\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\Delta X_{T_{j+1}}\Big{(}\sum_{k=j-i+1}^% {j}\left(Y_{T_{k}}-Y_{\lambda_{k+1}}\right)\Big{)}+\Delta Y_{T_{j+1}}\Big{(}% \sum_{k=j-i+1}^{j}\left(X_{T_{k}}-X_{l_{k+1}}\right)\Big{)}\right] \displaystyle~{}~{}~{}~{}~{}~{}+\mathcal{O}_{p}\left(i^{-1}N^{-\frac{1}{2}}% \right)+\mathcal{O}_{p}\left(N^{-1}\right)~{}.

We have written the overall discretization error of the one-scale estimator as the sum of a discretization error of the closest synchronous approximation

 \displaystyle\sum_{j=1}^{N}\left(\hskip-1.422638pt\Delta X_{T_{j}}\sum_{l=1}^{% i\wedge j}\left(1-\frac{l}{i}\right)\Delta Y_{T_{j-l}}\hskip-1.422638pt+\Delta Y% _{T_{j}}\sum_{l=1}^{i\wedge j}\left(1-\frac{l}{i}\hskip-1.422638pt\right)% \Delta X_{T_{j-l}}\right)\hskip-1.422638pt+\mathcal{O}_{p}\left(iN^{-1}\right)% \hskip-1.422638pt+\mathcal{O}_{p}\left(N^{-\nicefrac{{1}}{{2}}}\right) (21)

and the asymptotically negligible error due to the lack of synchronicity. A stable central limit theorem using the theory of Jacod (1997) for the leading term of order i^{\nicefrac{{1}}{{2}}}N^{-\nicefrac{{1}}{{2}}} that will drive the asymptotic distribution is postponed to Proposition 6. The error due to asynchronicity is treated in Proposition 10.
The discretization error of the generalized multiscale estimator is of order M_{N}^{\nicefrac{{1}}{{2}}}N^{-\nicefrac{{1}}{{2}}} and that of the one-scale estimator of order i_{N}^{\nicefrac{{1}}{{2}}}N^{-\nicefrac{{1}}{{2}}}. There is a trade-off between the error due to noise and the discretization error for both estimators. For the generalized multiscale estimator these are of orders N^{\nicefrac{{1}}{{2}}}M_{N}^{-\nicefrac{{3}}{{2}}} and M_{N}^{\nicefrac{{1}}{{2}}}N^{-\nicefrac{{1}}{{2}}}, respectively. Remaining other terms are of orders M_{N}^{-\nicefrac{{1}}{{2}}}. Thus, choosing M_{N}=c_{multi}\cdot N^{\nicefrac{{1}}{{2}}}, the total estimation error is minimized and of order M_{N}^{-\nicefrac{{1}}{{2}}}=N^{-\nicefrac{{1}}{{4}}} which constitutes the optimal rate of convergence in Theorem 3.
The weak convergence of the discretization error is proved to be stable, so it converges jointly in law with every bounded \mathcal{F}-measurable random variable defined on the same probability space. Since the asymptotic normality of the cross term and the error due to noise holds both, conditionally and unconditionally given the efficient processes, and the discretization error is independent of \epsilon^{X} and \epsilon^{Y} we can apply a central limit theorem for mixing triangular arrays as in Utev (1990) to the sum that is adapted with respect to \mathcal{A}_{j}=\sigma\left(\epsilon_{t_{k}}^{X},t_{k}<T_{j+1},\epsilon_{\tau_% {k}}^{Y},\tau_{k}<T_{j+1},\mathcal{F}_{T_{j}}\right) where \mathcal{F} is the \sigma-algebra associated with the efficient processes. The asymptotic variance is the sum of those of the uncorrelated addends. With the Cramér-Wold device joint normality and asymptotic independence of the different errors can be concluded.
This is likewise for the one-scale estimator and Corollary 4.1. Choosing the subsampling frequency i_{N}=c_{sub}\cdot N^{\nicefrac{{2}}{{3}}} balances the variance of the error due to noise which is of order Ni^{-2} and the discretization variance of order iN^{-1}.

## 5 Asymptotic variance estimation

The asymptotic variances (18) and (20) of the generalized multiscale estimator (12) and the one-scale subsampling estimator (13), appearing in the stable central limit theorems in Theorem 3 and Corollary 4.1, are random and depend on unknown quantities. In this section, we aim at estimating these asymptotic variances consistently to make our limit theorems feasible.
It is a known result that a consistent estimator of the noise variances is given by (24) (cf. Zhang et al. (2005)). Furthermore, the estimators for \eta_{X}^{2} and \eta_{Y}^{2} are asymptotically uncorrelated on Assumption 3, since the uncorrelated noise terms dominate the correlated Brownian parts. The constant I_{X}(T)+I_{Y}(T) in the noise part of (18) can be estimated with the empirical version I_{X}^{N}(T)+I_{Y}^{N}(T) that converges as N\rightarrow\infty on Assumption 5. Eventually, consistent estimators for the discretization variances and the variance due to cross terms for the multiscale estimator are required.
We propose histogram-type estimators using bins according to timescales associated with the quadratic variation of synchronized sampling times and associated with the degree of regularity of asynchronicity, respectively. For this purpose, given a chosen number of bins K_{N}, with K_{N}\rightarrow\infty and K_{N}^{-1}N\rightarrow\infty as N\rightarrow\infty, we determine the assigned non-equispaced bin-widths \Delta G_{j}^{N}=G_{j}^{N}-G_{j-1}^{N}, \Delta{(I_{X})}_{j}^{N}={(I_{X})}_{j}^{N}-{(I_{X})}_{j-1}^{N} and \Delta{(I_{Y})}_{j}^{N}={(I_{Y})}_{j}^{N}-{(I_{Y})}_{j-1}^{N}, j\in\{1,\ldots,K_{N}\}, where

 \displaystyle G_{j}^{N}\mathrel{\mathop{:}}=\inf{\big{\{}t\in[0,T]\,\big{|}\;G% ^{N}(t)=(N/T)\sum_{T_{k}^{(N)}\leq t}\big{(}\Delta T_{k}^{(N)}\big{)}^{2}\geq(% j/{K_{N}})\cdot G^{N}(T)\big{\}}}~{},

j\in\{1,\ldots,K_{N}\}, and analogously for the functions I_{X}^{N} and I_{Y}^{N} if I_{X}^{N}(T)>0 and I_{Y}^{N}(T)>0. Set G_{0}^{N}=(I_{X})_{0}^{N}=(I_{Y})_{0}^{N}\mathrel{\mathop{:}}=0 and recall that those functions are monotone increasing on [0,T]. On each bin we calculate multiscale estimators in the same spirit as (12) and its univariate version from Zhang (2006) for the increase of the quadratic (co-) variations that are denoted \widehat{\Delta\left[X\right]}_{G_{j}^{N}}, \widehat{\Delta\left[Y\right]}_{G_{j}^{N}}, \widehat{\Delta\left[X,Y\right]}_{G_{j}^{N}}, \widehat{\Delta\left[X\right]}_{(I_{Y})_{j}^{N}} and \widehat{\Delta\left[Y\right]}_{(I_{X})_{j}^{N}} in the following. The underlain idea is to approximate the continuous random processes (\sigma_{t}^{X}\sigma_{t}^{Y}\rho_{t})^{2}, (\sigma_{t}^{X})^{2} and (\sigma_{t}^{Y})^{2}, or rather their time-transformed versions, by locally constant functions. This construction leads to time-adjusted histogram estimators

 \displaystyle\hat{I}_{1} \displaystyle=\sum_{j=1}^{K_{N}}\left(\frac{\widehat{\Delta\left[X,Y\right]}_{% G_{j}^{N}}}{\Delta G_{j}^{N}}\right)^{2}\frac{G^{N}(T)}{K_{N}} for \displaystyle\int_{0}^{T}G^{\prime}(t)(\sigma_{t}^{X}\sigma_{t}^{Y}\rho_{t})^{% 2}\,dt, (22a) \displaystyle\hat{I}_{2} \displaystyle=\sum_{j=1}^{K_{N}}\left(\frac{\widehat{\Delta\left[X\right]}_{G_% {j}^{N}}\widehat{\Delta\left[Y\right]}_{G_{j}^{N}}}{\left(\Delta G_{j}^{N}% \right)^{2}}\right)\frac{G^{N}(T)}{K_{N}} for \displaystyle\int_{0}^{T}G^{\prime}(t)(\sigma_{t}^{X}\sigma_{t}^{Y})^{2}\,dt, (22b) \displaystyle\hat{I}_{3} \displaystyle=\sum_{j=1}^{K_{N}}\left(\frac{\widehat{\Delta\left[X\right]}_{(I% _{Y})_{j}^{N}}}{\Delta(I_{Y})_{j}^{N}}\right)\frac{I_{Y}^{N}(T)}{K_{N}} for \displaystyle\int_{0}^{T}I_{Y}^{\prime}(t)(\sigma_{t}^{X})^{2}\,dt, (22c) \displaystyle\hat{I}_{4} \displaystyle=\sum_{j=1}^{K_{N}}\left(\frac{\widehat{\Delta\left[Y\right]}_{(I% _{X})_{j}^{N}}}{\Delta(I_{X})_{j}^{N}}\right)\frac{I_{X}^{N}(T)}{K_{N}} for \displaystyle\int_{0}^{T}I_{X}^{\prime}(t)(\sigma_{t}^{Y})^{2}\,dt. (22d)
###### Proposition 5.0.

The asymptotic variances (18) and (20) of the generalized multiscale estimator (12) and the one-scale subsampling estimator (13) with M_{N}=c_{multi}N^{\nicefrac{{1}}{{2}}} and i_{N}=c_{sub}N^{\nicefrac{{2}}{{3}}}, can be estimated consistently by

 \displaystyle\widehat{\operatorname{\mathbf{AVAR}}}_{multi} \displaystyle=\left(c_{multi}^{-3}\left(24+12\,\frac{I_{X}^{N}(T)+I_{Y}^{N}(T)% }{T}\right)+\frac{12}{5}c_{multi}^{-1}\right)\widehat{\eta_{X}^{2}}\widehat{% \eta_{Y}^{2}} \displaystyle+c_{multi}\frac{26}{35}T\left(\hat{I}_{1}+\hat{I}_{2}\right)+c_{% multi}^{-1}\frac{12}{5}\left(\widehat{\eta_{Y}^{2}}(1+\hat{I}_{3})+\widehat{% \eta_{X}^{2}}(1+\hat{I}_{4})\right)~{}, (23a) \displaystyle\widehat{\operatorname{\mathbf{AVAR}}}_{sub} \displaystyle=c_{sub}^{-2}4\widehat{\eta_{X}^{2}}\widehat{\eta_{Y}^{2}}+c_{sub% }\frac{2}{3}\left(\hat{I}_{1}+\hat{I}_{2}\right)~{}, (23b)

where \hat{I}_{1}-\hat{I}_{4} are the estimators (22a)-(22d) and

 \displaystyle\widehat{\eta_{X}^{2}}=(2n)^{-1}\sum_{i=1}^{n}(\Delta X_{t_{i}})^% {2}~{},~{}\widehat{\eta_{Y}^{2}}=(2m)^{-1}\sum_{j=1}^{m}(\Delta Y_{\tau_{j}})^% {2}~{}. (24)
{remark}

Convergence rates of the estimators (23) and (23b) for the asymptotic variances depend on the smoothness of \sigma^{X},\sigma^{Y} and \rho. For current stochastic volatility models as the Heston model, they are N^{\nicefrac{{1}}{{5}}}-consistent when choosing K_{N}=c_{K}N^{\nicefrac{{1}}{{5}}} for a constant c_{K} and M_{N}\sim N^{\nicefrac{{3}}{{5}}} for the binwise multiscale estimators.
In the absence of noise, a consistent estimator for the asymptotic variances 2T\int_{0}^{T}G_{X}^{\prime}(t)(\sigma_{t}^{X})^{4}dt of the realized volatility has been proposed in Barndorff-Nielsen and Shephard (2002) as (2n/3)\sum_{i=1}^{n}(\Delta X_{t_{i}})^{4}. In the bivariate synchronous setting (n/2)\sum_{i=1}^{n-1}(\Delta X_{t_{i}})^{2}\left((\Delta Y_{t_{i}})^{2}+(% \Delta Y_{t_{i+1}})^{2}\right) is a convenient estimator. Consistency can be proved with Itô’s formula and partial integration and comprehended by the analogy to a bivariate Gaussian distribution (X,Y)\sim\mathbf{N}(0,\Sigma) with a covariance matrix \Sigma with entries \sigma_{X}^{2},\sigma_{Y}^{2},\rho\sigma_{X}\sigma_{Y}. Then, \mathbb{E}X^{4}=3\sigma_{X}^{4} and \mathbb{E}\left[X^{2}Y^{2}\right]=2\rho^{2}\sigma_{X}^{2}\sigma_{Y}^{2}+\sigma% _{X}^{2}\sigma_{Y}^{2} hold true.
In the noisy case smoothed versions of the estimators (using multiscale or alternative methods) are adequate (cf. Christensen et al. (2010)). However, in the non-synchronous non-noisy setting, there is no direct extension available and for that reason we have made the effort to construct the consistent histogram-based estimators (22a)-(22d) above.

## 6 Discussion and application

### 6.1 A case study

We have learned by now that in a synchronous setting the special version of the central limit Theorem 3 from Proposition 1 holds true. Since asymptotics of the estimator (12) not hinge on interpolations in the signal term the same central limit theorem applies in the case of intermeshed sampling introduced in Section 3. Now we focus on observation schemes that arise as realizations of two homogeneous Poisson processes that are mutually independent and independent of the processes \tilde{X} and \tilde{Y}. Although this model can be criticized for its flaw that sampling schemes of two correlated processes follow two independent processes and time homogeneity, what might seem to be unrealistic in financial applications, independent and homogeneous Poisson sampling constitutes the most commonly used model in this research area (cf. Zhang (2006), Hayashi and Yoshida (2005) among others) and appertains to show that the general form of (18) is tractable.
Let \tilde{n}^{(n)}(t) and \tilde{m}^{(n)}(t) be sequences of two independent homogeneous Poisson processes with parameters Tn/\theta_{1} and Tn/\theta_{2} (n\in\mathds{N}), such that the waiting times between jumps of \tilde{n}^{(n)} and \tilde{m}^{(n)} are exponentially distributed with expectations \mathbb{E}\left[\Delta t_{i}^{(n)}\right]=\theta_{1}/n and \mathbb{E}\left[\Delta\tau_{j}^{(n)}\right]=\theta_{2}/n~{},i\in\mathds{N},j% \in\mathds{N}. In this case

 \Delta T_{k}^{(n)}\sim F(t)=1-\exp{\left(-\frac{tn}{\theta_{1}}\right)}-\exp{% \left(-\frac{tn}{\theta_{2}}\right)}+\exp{\left(-tn\left(\frac{1}{\theta_{1}}+% \frac{1}{\theta_{2}}\right)\right)}~{},k\in\mathds{N}\,,
 \displaystyle I_{X}^{N}(t)\lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}% \frac{\theta_{1}\theta_{2}t}{(\theta_{1}+\theta_{2})^{2}}~{},~{}I_{Y}^{N}(t)% \lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}\frac{\theta_{1}\theta_{2}t}{% (\theta_{1}+\theta_{2})^{2}}~{}~{}\left(=\frac{z\,t}{(z+1)^{2}}~{}~{}\text{if}% ~{}~{}\theta_{1}=z\theta_{2}\right)~{}, (25)

hold true and we derive the following Poisson sampling version of Theorem 3:

###### Corollary 6.1.

On the Assumptions 1 and 3, the generalized multiscale estimator (12) with noise-optimal weights (30), and M_{N}=c_{multi}\cdot\sqrt{N}, converges conditionally on the independent Poisson sampling scheme with 0<\theta_{1}<\infty and 0<\theta_{2}<\infty stably in law with rate N^{\nicefrac{{1}}{{4}}} to a mixed normal limit:

 \displaystyle N^{\nicefrac{{1}}{{4}}}\left(\widehat{\left[X,Y\right]}_{T}^{% multi}-\left[X,Y\right]_{T}\right)\lx@stackrel{{\scriptstyle st}}{{% \rightsquigarrow}}\mathbf{N}\left(0,\operatorname{\mathbf{AVAR}}_{multi}^{% poiss}\right)

with the asymptotic variance

 \displaystyle\operatorname{\mathbf{AVAR}}_{multi}^{poiss} \displaystyle=c_{multi}^{-3}\,\left(24+12\,\frac{2\theta_{1}\theta_{2}}{(% \theta_{1}+\theta_{2})^{2}}\right)\eta_{X}^{2}\eta_{Y}^{2}+c_{multi}^{-1}\,% \frac{12\eta_{X}^{2}\eta_{Y}^{2}}{5} \displaystyle\;+c_{multi}\,\frac{26}{35}\int_{0}^{T}2\left(1-\frac{2\theta_{1}% ^{2}\theta_{2}^{2}}{\theta_{1}^{2}\theta_{2}^{2}+(\theta_{1}^{2}+\theta_{2}^{2% })(\theta_{1}+\theta_{2})^{2}}\right)(\sigma_{t}^{X}\sigma_{t}^{Y})^{2}(1+\rho% _{t}^{2})\,dt (26) \displaystyle\;+c_{multi}^{-1}\frac{12}{5}\left(\eta_{Y}^{2}\int_{0}^{T}(1+% \frac{\theta_{1}\theta_{2}}{\theta_{1}+\theta_{2}})(\sigma_{t}^{X})^{2}\,dt\,+% \eta_{X}^{2}\int_{0}^{T}(1+\frac{\theta_{1}\theta_{2}}{\theta_{1}+\theta_{2}})% (\sigma_{t}^{Y})^{2}\,dt\right)~{}.

The asymptotic variance of the N^{\nicefrac{{1}}{{6}}}-consistent one-scale estimator becomes

 \displaystyle\operatorname{\mathbf{AVAR}}_{sub}^{poiss} \displaystyle=c_{sub}^{-2}\,4\eta_{X}^{2}\eta_{Y}^{2} \displaystyle\;+c_{sub}\,\frac{2}{3}\int_{0}^{T}2\left(1-\frac{2\theta_{1}^{2}% \theta_{2}^{2}}{\theta_{1}^{2}\theta_{2}^{2}+(\theta_{1}^{2}+\theta_{2}^{2})(% \theta_{1}+\theta_{2})^{2}}\right)(\sigma_{t}^{X}\sigma_{t}^{Y})^{2}(1+\rho_{t% }^{2})\,dt~{}. (27)

The order for the supremum of time instants in Assumption 2 holds in probability and the proof in A stays valid. A Poisson sampling version of Theorem 2 is given in Bibinger (2011a), where the stochastic limit of G^{N} is deduced using the distribution of a maximum of two exponentials stated above.

### 6.2 A bridge between the noisy and the non-noisy setup

So far we considered noise variances not dependent on N, but from an applied point of view there is interest in the case where the noise level may vary with N\sim n\sim m. The primary motivation to accommodate dependence of the noise on the sample size in the model originates from the economic background. Empirical studies of (ultra) high-frequency financial data suggest to rather model the observed log-prices as sum of a latent semimartingale and noise for that the variance decreases in N as reported in Kalnina and Linton (2008) and Awartani et al. (2009), among others.
If an estimation approach uses previous-tick interpolations, as the one proposed in Barndorff-Nielsen et al. (2008b), these methods are not accurate for that setting any more. This also becomes apparent in the simulation study in Bibinger (2011b) when the performance of the estimators is compared for varying noise levels. The generalized multiscale estimator is not biased due to asynchronicity and passes over to the Hayashi-Yoshida estimator (8) for M_{N}=1, a \sqrt{N}-consistent estimator in the complete absence of noise. For that reason, our estimation method achieves an improved convergence rate in the model with decreasing noise variances. The next Corollary is obtained by a direct extension of the proof of Theorem 3 in A when replacing the moments of the noise processes. A similar extension for the one-scale estimator where we obtain the rate N^{\frac{1}{6}+\frac{\alpha}{3}} for a subsample frequency i_{N}=c_{sub}N^{\frac{2}{3}(1-\alpha)} holds analogously.

###### Corollary 6.2.

Consider the model imposed by Assumption 1, 2 and 3, but with noise variances \eta_{X}^{2}(N)=\zeta_{X}N^{-\alpha}\,,\,\eta_{Y}^{2}=\zeta_{Y}N^{-\alpha}\,,% \,0<\alpha<1 and constants 0<\zeta_{X}<\infty\,,\,0<\zeta_{Y}<\infty. The generalized multiscale estimator (12) with M_{N}=c_{multi}N^{\frac{1}{2}-\frac{\alpha}{2}} and optimal weights (30) converges stably in law to a mixed Gaussian limit:

 \displaystyle N^{\frac{1}{4}+\frac{\alpha}{4}}\left(\widehat{\left[X,Y\right]}% _{T}^{multi}-\left[X,Y\right]_{T}\right)\lx@stackrel{{\scriptstyle st}}{{% \rightsquigarrow}}\mathbf{N}\left(0,\operatorname{\mathbf{AVAR}}_{multi}^{*}\right) (28)

with the asymptotic variance

 \displaystyle\operatorname{\mathbf{AVAR}}_{multi}^{*} \displaystyle=c_{multi}^{-3}\left(24+12\,\frac{I_{X}(T)+I_{Y}(T)}{T}\right)% \zeta_{X}\zeta_{Y}+c_{multi}^{-1}\,\frac{12\zeta_{X}\zeta_{Y}}{5} \displaystyle\;+c_{multi}\frac{26}{35}T\int_{0}^{T}G^{\prime}(t)(\sigma_{t}^{X% }\sigma_{t}^{Y})^{2}(1+\rho_{t}^{2})\,dt \displaystyle\;+c_{multi}^{-1}\frac{12}{5}\left(\zeta_{Y}\int_{0}^{T}(1+I^{% \prime}_{Y}(t))(\sigma_{t}^{X})^{2}\,dt\,+\zeta_{X}\int_{0}^{T}(1+I^{\prime}_{% X}(t))(\sigma_{t}^{Y})^{2}\,dt\right)~{}.

Incorporating a pure previous-tick interpolation strategy as in Barndorff-Nielsen et al. (2008b), one cannot gain an improved convergence rate in that setting due to the bias by non-synchronicity effects.

### 6.3 Application

#### 6.3.1 Note on modeling assumptions

For an application to financial time series data the conditions imposed by Assumptions 1-3 seem to be restrictive and the model will not describe stylized facts of the data in an adequate way. In particular relaxing the i. i. d. assumption on the noise is important. On the other hand, the underlying model in the sections before is convenient to establish an asymptotic distribution theory and ascertains a closed-form expression for the asymptotic variance. Yet, the generalization for serially dependent observation errors as carried out in Aït-Sahalia et al. (2011) for the one-dimensional case is possible for the generalized multiscale estimator (12) as well. On the assumption of stationary strong mixing noise processes the multiscale estimator remains consistent and rate-optimal without any further adjustment. The analysis for the synchronous case can be adopted from Aït-Sahalia et al. (2011), but for the general non-synchronous noisy setting a closed-form expression of the asymptotic variance and a corresponding limit theorem is not available. Furthermore, the condition that noise processes are mutually independent can be relaxed if one wants to allow some correlation \mathbb{E}\left[\epsilon_{t_{i}}^{X}\epsilon_{\tau_{j}}^{Y}\right]=\eta_{X,Y}^% {i,j} for t_{i} and \tau_{j} located near each other. In any case the generalized multiscale estimator remains asymptotically unbiased and N^{\nicefrac{{1}}{{4}}}-consistent. This does not necessarily hold true for the one-scale estimator that would have to be bias-corrected as the TSRV estimator by Zhang et al. (2005) in the one-dimensional case.
More general efficient processes including jumps can be covered in the model when we combine the method to a two-stage approach as presented by Fan and Wang (2007) for the one-dimensional estimation approach. Considering semimartingales

 X_{t}=\int_{0}^{t}\mu_{s}^{X}\,ds+\int_{0}^{t}\sigma_{s}^{X}dB_{s}^{X}+\sum_{l% =1}^{J_{t}^{X}}L_{l}^{X}~{},~{}Y_{t}=\int_{0}^{t}\mu_{s}^{Y}\,ds+\int_{0}^{t}% \sigma_{s}^{Y}dB_{s}^{Y}+\sum_{l=1}^{J_{t}^{Y}}L_{l}^{Y}~{},

with locally bounded drifts, continuous volatilities and counting processes J_{t}^{X},J_{t}^{Y} counting the jumps of X and Y with jump sizes L_{l}^{X},l=1,\ldots,J_{t}^{X} and L_{l}^{Y},l=1,\ldots,J_{t}^{Y}, respectively, the generalized multiscale estimator converges in probability to the total quadratic covariation

 \int_{0}^{T}\rho_{t}\sigma_{t}^{X}\sigma_{t}^{Y}+\sum_{0\leq s\leq t}\Delta X_% {s}\Delta Y_{s}

where \Delta X_{s}=X_{s}-X_{s,-},\Delta Y_{s}=Y_{s}-Y_{s,-}, and the second addend is the sum of the simultaneous co-jumps. When one is interested in disentangling the continuous part from the jumps, one convincing possibility following Fan and Wang (2007) is to use wavelet methods that locate the jumps in the sample paths, estimate jump sizes and afterwards use our estimation approach for the validated observations.
In conclusion the generalized multiscale estimator (12) is capable for usage in various applications. In Bibinger (2011b), we have approved that the estimator performs well and has satisfying finite sample size features in simulations including serial dependent noise and typical stochastic volatility models.

#### 6.3.2 Choice of tuning parameters

An implementation of the estimation approach requires first a rule to choose tuning parameters. We provide an accurate algorithm to implement the estimators and to obtain estimates for their asymptotic variances.

One plausible selection of the constants c_{multi}=\sqrt{N}/M_{N} and c_{sub}=N^{\nicefrac{{2}}{{3}}}/i_{N} can be derived as solutions of the minimization problems of the asymptotic variances. This leads to formula (29) in Algorithm 2.
The tactic of Algorithm 2 is the following: Evaluate a pilot estimate \hat{c}_{multi}^{(p)} for c_{multi} as solution of formula (29) inserting a sparse-sampled estimator for the signal term. Then set up the estimation of the asymptotic variances involving the estimators (22a)-(22d). Take M_{N}^{b}=c_{multi}^{b}\sqrt{NK_{N}} fixed for the multiscale estimators on all bins and set M_{N}^{b}=c_{multi}^{b}\sqrt{Nc_{K}N^{\nicefrac{{1}}{{5}}}} where c_{K}=K_{N}^{-1}N^{\nicefrac{{1}}{{5}}}. This selection is optimal for common volatility models. We obtain c_{multi}^{b}c_{K}^{-\nicefrac{{1}}{{2}}}=\hat{c}_{multi}^{(p)} and from the orders of the different errors of the histogram estimators c_{multi}^{b}=c_{K}^{\nicefrac{{5}}{{2}}}. Hence, c_{multi}^{b}=\left(c_{multi}^{(p)}\right)^{\nicefrac{{5}}{{4}}} and c_{K}=\sqrt{c_{multi}^{(p)}} is derived. Using estimators (22a)-(22d), we calculate estimates for the addends of the asymptotic variance and \hat{c}_{multi} according to formula (29) again and M_{N}=\lceil\hat{c}_{multi}\sqrt{N}\,\rceil is used for the final estimator. It turns out that this strategy it quite robust to the a priori chosen sparse-sample frequency that can be chosen under the impression of usual diagnostic tools as signature plots and acfs.

#### 6.3.3 Simulation and data analysis check

As completion to the detailed simulation study in the supplementary material to Bibinger (2011b) we investigate the performance of Algorithm 2 and the histogram-estimators (22a)-(22d) here. For this purpose we simulate from an simple Brownian motion model with zero drifts and constant volatilities \sigma^{X}=\sigma^{Y}=1 and \rho=1/2 with equal noise variances \eta^{2}. Sampling schemes are generated by independent time-homogeneous Poisson sampling with 30.000 expected observation for both processes on [0,1]. Results of the estimates are listed in Table 2.

In addition, our approach is tested in an application to EUREX future tick-data taken from a database provided by the Research Data Center (RDC) of the CRC 649 ‘Economic Risk’ in Berlin. We aim at estimating integrated covolatilities between the four financial securities with the highest tick-frequencies in the database. These are the Euro-Bund Future (FGBL), that is based on a notional long-duration debt instrument issued by the Federal Republic of Germany, the Euro-Bobl Future (FGBM), a likewise medium-duration contract, and futures on the EURO STOXX 50 (FESX) and the German DAX (FDAX).
We apply the procedure with Algorithm 1 and Algorithm 2 to a suitable filtered dataset and give the results for two days in Table 3 with associated estimated optimal multiscale frequencies. The estimates for the quadratic covariations are given \pm estimated standard deviation from the estimated asymptotic variances and the pertaining N for each pair. Although there are characteristics of the dataset not in accordance with the model assumptions, above all price discreteness and the fact that most returns are zero, the estimation approach passes this intuition check. Since the ESX and the DAX share 13 companies constituting c. 28.5% weighting in the ESX and c. 72.4% in the DAX there is a big systematic positive correlation between both and we presume that there is as well a high correlation between the two debt instruments which is both revealed by the estimates. On 09/11/2001, there has been an tremendous impact so that FGBL/FGBM have increased and the FESX/FDAX decreased. For that day we have significantly negative integrated covolatilities between debt instruments and stock indices which is not the case for the common trading day in comparison. As answer to the great amount of zero returns the only adjustment of the method that we undertake is to estimate noise variances in (24) by dividing the realized volatility by twice the number of non-zero returns instead of all returns.
To sum up, the estimation approach based on Algorithm 1, a multiscale extension of subsampling and Algorithm 2 provides a convincing method to obtain integrated covolatility estimates for very general high-frequency data.

## Appendix A Proof of Theorem 3

### A.1 Error due to noise and choosing the weights

The error due to microstructure noise of the generalized multiscale estimator is given by

 \displaystyle\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}}{i}\sum_{j=i}^{N}\left(% \epsilon_{g_{j}}^{X}-\epsilon_{l_{j-i+1}}^{X}\right)\left(\epsilon_{\gamma_{j}% }^{Y}-\epsilon_{\lambda_{j-i+1}}^{Y}\right)=\sum_{i=1}^{M_{N}}\frac{\alpha_{i,% M_{N}}}{i}\Big{(}\sum_{j=1}^{N}\left(\epsilon_{g_{j}}^{X}\epsilon_{\gamma_{j}}% ^{Y}+\epsilon_{l_{j}}^{X}\epsilon_{\lambda_{j}}^{Y}\right) \displaystyle                                 -\sum_{j=i}^{N}\left(\epsilon_{g% _{j}}^{X}\epsilon_{\lambda_{j-i+1}}^{Y}+\epsilon_{\gamma_{j}}^{Y}\epsilon_{l_{% j-i+1}}^{X}\right)-\sum_{j=1}^{i-1}\epsilon_{g_{j}}^{X}\epsilon_{\gamma_{j}}^{% Y}-\sum_{j=N-i+1}^{N}\epsilon_{l_{j}}^{X}\epsilon_{\lambda_{j}}^{Y}\Big{)}.

 \sum_{i=1}^{M_{N}}\alpha_{i,M_{N}}=1~{}, (C1)

that is necessary for asymptotic unbiasedness and consistency, we now impose the auxiliary condition

 \sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}}{i}=0~{}, (C2)

on the weights which assures that the leading term of the noise error equals zero. Hence, there remain three uncorrelated addends in the error induced by microstructure noise.

###### Proposition A.0.

Let Assumptions 2 and 5 on the observation times and Assumption 3 on the observation errors hold true. The asymptotic variance of the term

 -\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}}{i}\sum_{j=i}^{N}\left(\epsilon_{g_{% j}}^{X}\epsilon_{\lambda_{j-i+1}}^{Y}+\epsilon_{\gamma_{j}}^{Y}\epsilon_{l_{j-% i+1}}^{X}\right)

is minimized by the weights

 \displaystyle\alpha_{i,M_{N}}^{opt}=\left(\frac{12i^{2}}{(M_{N}^{3}-M_{N})}-% \frac{6i}{(M_{N}^{2}-1)}-\frac{6i}{(M_{N}^{3}-M_{N})}\right)=\frac{12i^{2}}{M_% {N}^{3}}-\frac{6i}{M_{N}^{2}}\left(1+{\scriptstyle{\mathcal{O}}}(1)\right) (30)

as M_{N},N\rightarrow\infty and M_{N}/N\rightarrow 0 with N={\scriptstyle{\mathcal{O}}}\left(M_{N}^{4}\right). The following asymptotic normality result holds true:

 \displaystyle\sqrt{\frac{M_{N}^{3}}{N}}\left(\sum_{i=1}^{M_{N}}\frac{\alpha_{i% ,M_{N}}^{opt}}{i}\sum_{j=i}^{N}\left(\epsilon_{g_{j}}^{X}\epsilon_{\lambda_{j-% i+1}}^{Y}-\epsilon_{\gamma_{j}}^{Y}\epsilon_{l_{j-i+1}}^{X}\right)\right)% \rightsquigarrow\mathbf{N}\left(0\,,\,\operatorname{\mathbf{AVAR}}_{\text{% noise}}\right)~{}, (31)

with the asymptotic variance

 \operatorname{\mathbf{AVAR}}_{\text{noise}}=(24+12(I_{X}(T)+I_{Y}(T)))\eta_{X}% ^{2}\eta_{Y}^{2} (32)

with the functions I_{X} and I_{Y} defined in Assumption 5. The weak convergence also holds true conditionally given the paths of the efficient processes.

###### Proof.

The term is centred and we illustrate it in the way

 \displaystyle-\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}}{i}\sum_{j=i}^{N}\left(% \epsilon_{g_{j}}^{X}\epsilon_{\lambda_{j-i+1}}^{Y}+\epsilon_{\gamma_{j}}^{Y}% \epsilon_{l_{j-i+1}}^{X}\right)=-\sum_{j=1}^{N}\sum_{i=1}^{M_{N}\wedge j}\frac% {\alpha_{i,M_{N}}}{i}\Big{(}\epsilon_{g_{j}}^{X}\epsilon_{\lambda_{j-i+1}}^{Y} \displaystyle                      (\mathbbm{1}_{\{g_{j}=g_{j+1}\}}+\mathbbm{1% }_{\{g_{j}\neq g_{j+1}\}})+\epsilon_{\gamma_{j}}^{Y}\epsilon_{l_{j-i+1}}^{X}(% \mathbbm{1}_{\{\gamma_{j}=\gamma_{j+1}\}}+\mathbbm{1}_{\{\gamma_{j}\neq\gamma_% {j+1}\}})\Big{)}~{}.

For fixed i the addends of the inner sum are uncorrelated because l_{i}\neq l_{j} and \lambda_{i}\neq\lambda_{j} for all i\neq j. Consecutive right-end points g_{i},\gamma_{i} can be the same observation times instead, so that the inner sums are 2-dependent random variables. Thus, the variance is given by

 \displaystyle\mathbb{V}\hskip-1.422638pt\textnormal{a r}\left(\sum_{j=1}^{N}% \frac{\alpha_{i,M_{N}}}{i}\sum_{i=1}^{M_{N}\wedge j}\left(\epsilon_{g_{j}}^{X}% \epsilon_{\lambda_{j-i+1}}^{Y}-\epsilon_{\gamma_{j}}^{Y}\epsilon_{l_{j-i+1}}^{% X}\right)\right) \displaystyle=\sum_{j=1}^{N}\sum_{i=1}^{M_{N}\wedge j}\left(\frac{\alpha_{i,M_% {N}}}{i}\right)^{2}2\eta_{X}^{2}\eta_{Y}^{2}+\sum_{j=1}^{N}\sum_{i=1}^{(M_{N}-% 1)\wedge(j-1)}\frac{\alpha_{i,M_{N}}\alpha_{i+1,M_{N}}}{i(i+1)}\eta_{X}^{2}% \eta_{Y}^{2}(1-\mathbbm{1}_{\{g_{j}\neq g_{j+1}\,,\,\gamma_{j}\neq\gamma_{j+1}% \}})~{}.

The weights that minimize the first addend of the above variance and also the total variance asymptotically have been determined in Bibinger (2011b). Those weights are in line with the standard weights from Zhang (2006) in the univariate setting and correspond to a cubic kernel for the kernel estimator by Barndorff-Nielsen et al. (2008b).
Inserting the noise-optimal weights (30), we can apply a central limit theorem for strong mixing triangular arrays from Utev (1990) to

 \sqrt{\frac{M_{N}^{3}}{N}}\sum_{j=1}^{N}\frac{\alpha_{i,M_{N}}^{opt}}{i}\sum_{% i=1}^{M_{N}\wedge j}\left(\epsilon_{g_{j}}^{X}\epsilon_{\lambda_{j-i+1}}^{Y}+% \epsilon_{l_{j-i+1}}^{X}\epsilon_{\gamma_{j}}^{Y}\right)~{}.

The sequence of variances with the chosen weights according to (30)

 \displaystyle\mathbb{V}\hskip-1.422638pt\textnormal{a r}\left(\sqrt{\frac{M_{N% }^{3}}{N}}\sum_{j=1}^{N}\frac{\alpha_{i,M_{N}}^{opt}}{i}\sum_{i=1}^{M_{N}% \wedge j}\left(\epsilon_{g_{j}}^{X}\epsilon_{\lambda_{j-i+1}}^{Y}+\epsilon_{l_% {j-i+1}}^{X}\epsilon_{\gamma_{j}}^{Y}\right)\right) \displaystyle~{}=\frac{M_{N}^{3}}{N}\sum_{j=0}^{N}\sum_{i=1}^{M_{N}\wedge(j+1)% }\left(\frac{\alpha_{i,M_{N}}^{opt}}{i}\right)^{2}\eta_{X}^{2}\eta_{Y}^{2}(3-% \mathbbm{1}_{\{g_{j}\neg_{j+1}\,,\,\gamma_{j}\neq\gamma_{j+1}\}})+{% \scriptstyle{\mathcal{O}}}(1) \displaystyle~{}\longrightarrow 36\eta_{X}^{2}\eta_{Y}^{2}-12(1-(I_{X}(T)+I_{Y% }(T)))\eta_{X}^{2}\eta_{Y}^{2}

converges to \operatorname{\mathbf{AVAR}}_{\text{noise}} on the Assumption 5.
Since the inner sums are 2-dependent and hence in particular \phi-mixing, the Lyapunov condition that holds obviously suffices to apply the central limit theorem of Utev (1990). This completes the proof of the proposition.∎

Next, we consider the remaining addends of the error induced by microstructure frictions and insert the weights (30):

###### Proposition A.0.

On the Assumptions 2, 5 and 3, the following weak convergence to a centred normal distribution holds true:

 \displaystyle\sqrt{M_{N}}\left(\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}^{opt}}% {i}\left(\sum_{j=1}^{i-1}\epsilon_{g_{j}}^{X}\epsilon_{\gamma_{j}}^{Y}+\sum_{j% =N-i+1}^{N}\epsilon_{l_{j}}^{X}\epsilon_{\lambda_{j}}^{Y}\right)\right)% \rightsquigarrow\mathbf{N}\left(0,\frac{12}{5}\eta_{X}^{2}\eta_{Y}^{2}\right)~% {}. (33)

This convergence also holds conditionally on the paths of the efficient processes.

###### Proof.
 \displaystyle\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}^{opt}}{i}\left(\sum_{j=1% }^{i-1}\epsilon_{g_{j}}^{X}\epsilon_{\gamma_{j}}^{Y}+\sum_{j=N-i+1}^{N}% \epsilon_{l_{j}}^{X}\epsilon_{\lambda_{j}}^{Y}\right)=\sum_{j=1}^{M_{N}-1}% \left(\epsilon_{g_{j}}^{X}\epsilon_{\gamma_{j}}^{Y}+\epsilon_{l_{N-j}}^{X}% \epsilon_{\lambda_{N-j}}^{Y}\right)\sum_{i=j+1}^{M_{N}}\frac{\alpha_{i,M_{N}}^% {opt}}{i}

Both addends are uncorrelated and treated analogously. We restrict ourselves to the proof for the first term. \sqrt{M_{N}}\sum_{j=1}^{M_{N}-1}\epsilon_{g_{j}}^{X}\epsilon_{\gamma_{j}}^{Y}% \sum_{i=j+1}^{M_{N}}\alpha_{i,M_{N}}^{opt}/i is the endpoint of a discrete centred martingale with respect to the filtration \overline{\mathcal{A}}_{j}^{N}\mathrel{\mathop{:}}=\sigma\left(\epsilon_{t_{k}% }^{X}|t_{k}\leq g_{j}\,,\,X_{t_{k}}|0\leq k\leq n\right)\vee\sigma\left(% \epsilon_{\tau_{k}}^{Y}|\tau_{k}\leq\gamma_{j}\,,\,Y_{\tau_{k}}|0\leq k\leq m\right). Namely, since g_{j}=g_{j-1}\Rightarrow\gamma_{j}>\gamma_{j-1} and \gamma_{j}=\gamma_{j-1}\Rightarrow g_{j}>g_{j-1} analogously:

 \displaystyle\mathbb{E}\left[\sqrt{M_{N}}\epsilon_{g_{l}}^{X}\epsilon_{\gamma_% {l}}^{Y}\sum_{i=l+1}^{M_{N}}\frac{\alpha_{i,M_{N}}^{opt}}{i}\Big{|}\overline{% \mathcal{A}}_{l-1}^{N}\right] \displaystyle=\hskip-1.422638pt\sqrt{M_{N}}\hskip-1.422638pt\left(\mathbbm{1}_% {\{g_{l}=g_{l-1}\}}\mathbb{E}\left[\epsilon_{\gamma_{l}}^{Y}\right]\epsilon_{g% _{l-1}}^{X}\hskip-1.422638pt+\hskip-1.422638pt\mathbbm{1}_{\{\gamma_{l}=\gamma% _{l-1}\}}\mathbb{E}\left[\epsilon_{g_{l}}^{X}\right]\epsilon_{\gamma_{l-1}}^{Y% }\hskip-1.422638pt+\hskip-1.422638pt\mathbbm{1}_{\{g_{l}\neq g_{l-1}\,,\,% \gamma_{l}\neq\gamma_{l-1}\}}\mathbb{E}\left[\epsilon_{g_{l}}^{X}\right]% \mathbb{E}\left[\epsilon_{\gamma_{l}}^{Y}\right]\right)\hskip-1.422638pt=% \hskip-1.422638pt0\,.

A central limit theorem for martingale triangular arrays from Hall and Heyde (1980) is applied, in particular the non-stable version of Corollary 3. 1 (cf. the following remark in Hall and Heyde (1980) and references cited therein). The conditional Lindeberg condition can be verified by the stronger conditional Lyapunov condition. The proof of it is obtained by a similar calculation as the following one and we omit it. The conditional variance equals

 \displaystyle M_{N}\sum_{j=1}^{M_{N}-1}\mathbb{V}\hskip-1.422638pt\textnormal{% a r}\left(\epsilon_{g_{j}}^{X}\epsilon_{\gamma_{j}}^{Y}\sum_{i=j+1}^{M_{N}}% \frac{\alpha_{i,M_{N}}^{opt}}{i}\Big{|}\mathcal{A}_{N,j-1}\right)=M_{N}\sum_{j% =1}^{M_{N}-1}\left(\sum_{i=j+1}^{M_{N}}\frac{\alpha_{i,M_{N}}^{opt}}{i}\right)% ^{2} \displaystyle                          \times\Big{(}\eta_{X}^{2}\eta_{Y}^{2}% \mathbbm{1}_{\{g_{j}\neq g_{j-1}\,,\,\gamma_{j}\neq\gamma_{j-1}\}}+\left(% \epsilon_{g_{j}}^{X}\right)^{2}\mathbbm{1}_{\{g_{j}=g_{j-1}\}}\eta_{Y}^{2}+% \left(\epsilon_{\gamma_{j}}^{Y}\right)^{2}\mathbbm{1}_{\{\gamma_{j}=\gamma_{j-% 1}\}}\eta_{X}^{2}\Big{)} \displaystyle~{}\lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}\frac{6}{5}% \eta_{X}^{2}\eta_{Y}^{2}~{}.

We have used the formula \sum_{j=1}^{M_{N}-1}\left(\sum_{i=j+1}{M_{N}}(\alpha_{i,M_{N}}^{opt}/i)\right)% ^{2}=(6/5)M_{N}^{-1}+{\scriptstyle{\mathcal{O}}}(M_{N}^{-1}).

### A.2 Discretization errors of the estimators

###### Proposition A.0.

On the Assumptions 1, 2 and 4, the discretization error of the one-scale subsampling estimator with subsampling frequency i_{N} converges stably in law to a centred mixed normal limit as i_{N}\rightarrow\infty,N\rightarrow\infty,i_{N}/N^{\alpha}\rightarrow 0 for every \alpha>2/3:

 \displaystyle\sqrt{\frac{N}{i_{N}}}\left(\sum_{j=i}^{N}\left(X_{g_{j}}-X_{l_{j% -i+1}}\right)\left(Y_{\gamma_{j}}-Y_{\lambda_{j-i+1}}\right)-\left[X,Y\right]_% {T}\right)\lx@stackrel{{\scriptstyle st}}{{\rightsquigarrow}}\mathbf{N}\left(0% ,\operatorname{\mathbf{AVAR}}_{dis,sub}\right)~{},

with asymptotic variance

 \displaystyle\operatorname{\mathbf{AVAR}}_{dis,sub}=\frac{2}{3}T\int_{0}^{T}G^% {\prime}(t)(\sigma_{t}^{X}\sigma_{t}^{Y})^{2}(1+\rho_{t}^{2})\,dt~{}. (34)
###### Proposition A.0.

On the Assumptions 1,2 and 4, the discretization error of the generalized multiscale estimator with the noise-optimal weights given in (30) converges with rate \sqrt{N/M_{N}} stably in law to a centred mixed Gaussian limit as M_{N}\rightarrow\infty,N\rightarrow\infty,M_{N}/N^{\alpha}\rightarrow 0 for every \alpha>2/3:

 \displaystyle\sqrt{\frac{N}{M_{N}}}\left(\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{% N}}^{opt}}{i}\sum_{j=i}^{N}\left(X_{g_{j}}-X_{l_{j-i+1}}\right)\left(Y_{\gamma% _{j}}-Y_{\lambda_{j-i+1}}\right)-\left[X,Y\right]_{T}\right)\lx@stackrel{{% \scriptstyle st}}{{\rightsquigarrow}}\mathbf{N}\left(0,\operatorname{\mathbf{% AVAR}}_{dis,multi}\right)~{},

with asymptotic variance

 \displaystyle\operatorname{\mathbf{AVAR}}_{dis,multi}=\frac{26}{35}T\int_{0}^{% T}G^{\prime}(t)(\sigma_{t}^{X}\sigma_{t}^{Y})^{2}(1+\rho_{t}^{2})\,dt~{}. (35)

#### A.2.1 Time-change in the asymptotic quadratic variation of time

###### Proposition A.0.

In the proof of a central limit theorem for the discretization error of the closest synchronous approximation T_{k}^{(N)},\,k=0,\ldots,N, of our generalized multiscale estimator (12) on the Assumptions 2 and 5, we can additionally, without loss of further generality, assume that

 \displaystyle\sum_{k=1}^{N}\left(\Delta T_{k}^{(N)}-\frac{T}{N}\right)^{2}={% \scriptstyle{\mathcal{O}}}\left(N^{-1}\right)~{}. (36)
{remark}

From Assumptions 1 and 2, we can deduce directly that the sum above is at most of order N^{-1}. The stronger assertion, that the closest synchronous approximation defined by the times T_{k}^{(N)},\,k=0,\ldots,N introduced in paragraph 3.2 is close to equidistant sampling in the sense that the sum above is of smaller asymptotic order than N^{-1}, is derived by the concept of a time-change in the asymptotic quadratic variation of time from Assumption 4. For the proof of (36) we refer to Lemma 1 from Zhang (2006) where this concept has been presented for the univariate multiscale approach and it directly carries over to the synchronous multivariate case.
On the Assumption 4, a transformation g can be defined that maps the refresh times T_{k}^{(N)} to values g(T_{k}^{(N)}), so that (36) holds true for the transformed synchronous observation scheme. Thanks to the fact that the corresponding time-changed processes L_{g(t)} and M_{g(t)} fulfill Assumption 1 again and the transformed observation scheme Assumption 2, we are able to prove a central limit theorem for the time-changed version of the discretization error if (36) did not hold.
Since the resulting asymptotic variance will be invariant under the transformation g, the central limit theorem will analogously hold true for the original sampling scheme. Hence, no further restriction has to be made when assuming (36).

#### A.2.2 Discretization error of the closest synchronous approximation

Note that it suffices to prove the foregoing limit theorems for the zero-drift case. Since our limit theorems are stable, asymptotic mixed normality is assured to hold for the general setting on Assumption 1. Denote L_{t}=\int_{0}^{t}\sigma_{s}^{X}dW_{s}^{X} and M_{t}=\int_{0}^{t}\sigma_{s}^{Y}dW_{s}^{Y} the continuous martingales that represent the efficient processes under the equivalent martingale measure after a Girsanov transformation. The Novikov condition has been imposed in Assumption 1 to allow for this transformation.
The asymptotic mixed normality result is implied as marginal distribution at t=T of a limiting time-changed Brownian motion which is proven to be the stable weak limit of the process corresponding to the discretization error (21) with the theory of Jacod (1997).
We begin with the discretization error of the closest synchronous approximation of a one-scale subsampling estimator.

###### Proposition A.0.

On the same assumptions as in Proposition 5, the continuous martingale

 \displaystyle\mathfrak{D}_{t}^{N}\mathrel{\mathop{:}}=\sqrt{\frac{N}{i_{N}T}} \displaystyle\left[\sum_{T_{k}\leq t}\left(\Delta L_{T_{k}}+\int_{T_{k}}^{t}% \sigma_{s}^{X}dW_{s}^{X}\right)\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}% \right)\Delta M_{T_{k-l}}\right)\right. \displaystyle\left.+\sum_{T_{k}\leq t}\left(\Delta M_{T_{k}}+\int_{T_{k}}^{t}% \sigma_{s}^{Y}dW_{s}^{Y}\right)\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}% \right)\Delta L_{T_{k-l}}\right)\right]

for t\in[0,T], where \Delta\,\cdot\,_{T_{k}}=\,\cdot\,_{T_{k}}-\,\cdot\,_{T_{k-1}} is the backward difference operator, converges stably in law as N\rightarrow\infty,i_{N}\rightarrow\infty,i_{N}/N\rightarrow 0 to a limiting time-changed Brownian motion

 \displaystyle\mathfrak{D}^{N}_{t}\lx@stackrel{{\scriptstyle st}}{{% \rightsquigarrow}}\int_{0}^{t}\sqrt{v_{\mathfrak{D}_{s}}}d{\mathfrak{W}}_{s}^{% \bot}~{},

where \mathfrak{W}^{\bot} is independent of \mathcal{F} and

 \displaystyle v_{\mathfrak{D}_{s}}=\frac{2}{3}G^{\prime}(s)(\sigma_{s}^{X}% \sigma_{s}^{Y})^{2}(1+\rho_{s}^{2})~{}.

Proof of Proposition 8:
The subscript of the subsampling frequency is omitted in the following proof and C denotes a generic constant and \delta_{N}=\sup_{i\in\{1,\ldots,N\}}{(T_{i}-T_{i-1})}.
We apply a simplified martingale version of the stable central limit theorem 2–1 from Jacod (1997). For other applications and expositions of the theory from Jacod (1997) we refer to Podolskij and Vetter (2010), Fukasawa (2010) and Bibinger (2011a). The above limit theorem is implied by the following three conditions:

 \displaystyle\left[\mathfrak{D}\right]_{t}\lx@stackrel{{\scriptstyle p}}{{% \longrightarrow}}\int_{0}^{t}v_{\mathfrak{D}_{s}}\,ds~{}, (37a) \displaystyle\left[\mathfrak{D},L\right]_{t}\lx@stackrel{{\scriptstyle p}}{{% \longrightarrow}}0~{},~{}\left[\mathfrak{D},M\right]_{t}\lx@stackrel{{% \scriptstyle p}}{{\longrightarrow}}0, (37b) \displaystyle\left[\mathfrak{D},L^{\bot}\right]_{t}\lx@stackrel{{\scriptstyle p% }}{{\longrightarrow}}0~{},~{}\left[\mathfrak{D},M^{\bot}\right]_{t}% \lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}0, (37c)

for all t\in[0,T] and for all M^{\bot}\in\mathcal{M}{\bot} and L^{\bot}\in\mathcal{L}{\bot} that denote the set of \left(\mathcal{F}\right)_{t}-adapted bounded martingales orthogonal to M and L, respectively.
Calculating the quadratic variation of \mathfrak{D}_{t}^{N} yields

 \displaystyle\left[\mathfrak{D}^{N}\right]_{t} \displaystyle=\frac{N}{iT}\left[\sum_{T_{k}\leq t}\left(\Delta\left[L\right]_{% T_{k}}\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)\Delta M_{T_{k-l}}% \right)^{2}+\Delta\left[M\right]_{T_{k}}\left(\sum_{l=1}^{i\wedge k}\left(1-% \frac{l}{i}\right)\Delta L_{T_{k-l}}\right)^{2}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+2\sum_{T_{k}\leq t}\Delta% \left[L,M\right]_{T_{k}}\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)% \Delta L_{T_{k-l}}\right)\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right% )\Delta M_{T_{k-l}}\right)\right]+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle=~{}\frac{N}{iT}\left[\sum_{T_{k}\leq t}\Delta\left[L\right]_{T_{% k}}\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2}\left(\Delta M_{T_{k-l}% }\right)^{2}+\sum_{T_{k}\leq t}\Delta\left[M\right]_{T_{k}}\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}~{}~{}\times\sum_{l=1}^{i\wedge k}\left(1% -\frac{l}{i}\right)^{2}\left(\Delta L_{T_{k-l}}\right)^{2}+2\sum_{T_{k}\leq t}% \Delta\left[L,M\right]_{T_{k}}\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)% ^{2}\left(\Delta L_{T_{k-l}}\right)^{2}\right]+{\scriptstyle{\mathcal{O}}}_{p}% (1) \displaystyle=~{}\frac{N}{iT}\left[\sum_{T_{k}\leq t}\int_{T_{k-1}}^{T_{k}}(% \sigma_{s}^{X})^{2}ds\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2% }\int_{T_{k-l-1}}^{T_{k-l}}(\sigma_{s}^{Y})^{2}ds\right)\right. \displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}\left.+\sum_{T_{k}% \leq t}\int_{T_{k-1}}^{T_{k}}(\sigma_{s}^{Y})^{2}ds\left(\sum_{l=1}^{i\wedge k% }\left(1-\frac{l}{i}\right)^{2}\int_{T_{k-l-1}}^{T_{k-l}}(\sigma_{s}^{X})^{2}% ds\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+2\sum_{T_{k}% \leq t}\int_{T_{k-1}}^{T_{k}}\rho_{s}\sigma_{s}^{X}\sigma_{s}^{Y}ds\left(\sum_% {l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2}\int_{T_{k-l-1}}^{T_{k-l}}\rho_% {s}\sigma_{s}^{X}\sigma_{s}^{Y}ds\right)\right]+{\scriptstyle{\mathcal{O}}}_{p% }(1) \displaystyle{\underset{\text{\tiny{Lemma \ref{crucialh2}}}}{=}}~{}\frac{N}{iT% }\sum_{T_{k}\leq t}2(1+\rho_{T_{k-1}}^{2})(\sigma_{T_{k-1}}^{X}\sigma_{T_{k-1}% }^{Y})^{2}\left(\Delta T_{k}\right)^{2}\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{% i}\right)^{2}+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle~{}~{}=\sum_{T_{k}\leq t}\frac{2}{3}\frac{G^{N}(T_{k})-G^{N}(T_{k% -1})}{T_{k}-T_{k-1}}\left(\rho_{T_{k-1}}^{2}+1\right)\left(\sigma_{T_{k-1}}^{X% }\sigma_{T_{k-1}}^{Y}\right)^{2}\Delta T_{k}+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle~{}~{}\lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}\frac{2}{3% }\int_{0}^{t}(1+\rho_{s}^{2})(\sigma_{s}^{X}\sigma_{s}^{Y})^{2}G^{\prime}(s)\,% ds~{}~{}.

In the first step cross terms of the inner sums have been neglected since they are centred and by Itô isometry it can be shown that their second moments are bounded from above by Ci^{3}\delta_{N}^{3}. We frequently use estimates \delta_{N}^{l-1} for sums of the type \sum_{i}^{N}(\Delta T_{i})^{l}\,,l>1, by Hölder’s inequality with the supremum norm to obtain upper bounds.
Subsequently squared increments of L and M and the increments of the product L\cdot M in these inner sums are substituted by the increments of the quadratic (co-)variation processes. The induced error terms are centred by Itô isometry and involving Cauchy-Schwarz inequality it follows that C\delta_{N} is an upper bound for their second moments.
The crucial non-standard approximation is that on each block (T_{k-1},\ldots,T_{k-i\vee 0}) the increments of the form \int_{T_{k-l-1}}^{T_{k-l}}f(t)dt with continuous functions f for l=1,\ldots,k\wedge i are approximated by \Delta T_{k}f\left(T_{k-1}\right). This blockwise approximation is treated in Lemma A.1 and makes use of the concept of a time-changed quadratic variation of times and particularly (36). Finally, 1/i\sum_{l=1}^{i}(1-(l/i))^{2}=1/3+{\scriptstyle{\mathcal{O}}}(1) and the convergence in probability is ensured by Assumption 4 and the convergence of the Riemann sums to the integral.

###### Lemma A.1.

On the same assumptions as in Proposition 5, it holds true that the term

 \displaystyle\frac{N}{iT}\sum_{T_{k}\leq t}\left(\int_{T_{k-1}}^{T_{k}}(\sigma% _{s}^{X})^{2}ds\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2}\int_% {T_{k-l-1}}^{T_{k-l}}(\sigma_{s}^{Y})^{2}ds\right)-(\sigma_{T_{k-1}}^{X}\sigma% _{T_{k-1}}^{Y}\Delta T_{k})^{2}\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right% )^{2}\right)~{},

and the analogous blockwise approximations of \Delta[M]_{T_{k}} and \Delta[M]_{T_{k}} through constant left-end points converge to zero in probability.

###### Proof.

The approximation uses the concept of a time-change in the asymptotic quadratic variation of refresh times introduced in Zhang (2006) which is expounded in 7. By virtue of that concept we may suppose without loss of generality that the sampling design of the closest synchronous approximation satisfies (36).
The asymptotic orders of the three terms are deduced analogously and we restrict us to the proof of the above given first term. An application of the mean value theorem yields

 \displaystyle\frac{N}{iT}\sum_{T_{k}\leq t}\int_{T_{k-1}}^{T_{k}}(\sigma_{s}^{% X})^{2}ds\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2}\int_{T_{k-% l-1}}^{T_{k-l}}(\sigma_{s}^{Y})^{2}ds\right) \displaystyle=\frac{N}{iT}\sum_{T_{k}\leq t}(\sigma_{\zeta_{k}}^{X})^{2}\Delta T% _{k}\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2}(\sigma_{\zeta^{% *}_{k-l}}^{Y})^{2}\Delta T_{k-l}\right)

with \zeta_{k}\in[T_{k-1},T_{k}], \zeta^{*}_{q}\in[T_{q-1},T_{q}]. Since the volatility processes \sigma^{X},\sigma^{Y} are uniformly continuous on [0,T] by Assumption 1

 \displaystyle\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2}\left|(\sigma% _{\zeta^{*}_{k-l}}^{Y})^{2}-(\sigma_{T_{k-1}}^{Y})^{2}\right|\hskip-1.422638pt% \Delta T_{k-l}\leq i\delta_{N}\sup_{|t-s|\leq i\delta_{N}}{\left|(\sigma_{t}^{% Y})^{2}-(\sigma_{s}^{Y})^{2}\right|}={\scriptstyle{\mathcal{O}}}_{a.\,s.\,}% \hskip-2.845276pt(i\delta_{N})~{},
 \displaystyle\sum_{T_{k}\leq t}\hskip-1.422638pt\left|(\sigma_{\zeta_{k}}^{X})% ^{2}\hskip-1.422638pt-\hskip-1.422638pt(\sigma_{T_{k-1}}^{X})^{2}\right|(% \sigma_{T_{k-1}}^{Y})^{2}\Delta T_{k}\hskip-1.422638pt\sum_{l=1}^{i\wedge k}% \hskip-1.422638pt\left(1-\frac{l}{i}\right)^{2}\hskip-4.267913pt\Delta T_{k-l}% \leq i\delta_{N}\hskip-4.267913pt\sup_{|t-s|\leq\delta_{N}}\hskip-2.133957pt{% \left|(\sigma_{t}^{Y})^{2}\hskip-1.422638pt-\hskip-1.422638pt(\sigma_{s}^{Y})^% {2}\right|}\hskip-1.422638pt={\scriptstyle{\mathcal{O}}}_{a.\,s.\,}\hskip-2.84% 5276pt(i\delta_{N}\hskip-1.422638pt)

hold almost surely (denoted a. s. ).
With the Cauchy-Schwarz inequality and (36), we obtain

 \displaystyle\frac{N}{iT}\sum_{T_{k}\leq t} \displaystyle(\sigma_{T_{k-1}}^{X}\sigma_{T_{k-1}}^{Y})^{2}\Delta T_{k}\left(% \sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)^{2}\left|\Delta T_{k-l}-\frac% {T}{N}\right|\right) \displaystyle\leq\frac{N}{iT}\sup_{s\in[0,t]}{\left(\sigma_{t}^{X}\sigma_{t}^{% Y}\right)^{2}}\sum_{l=1}^{i}\sum_{j=1}^{N-l}\left|\left(\Delta T_{j}-\frac{T}{% N}\right)\Delta T_{j+l}\right| \displaystyle\leq\frac{N}{iT}C\left(\sum_{j=1}^{N}\left(T_{(j+i)\vee N}-T_{j}% \right)^{2}\sum_{j=1}^{N}\left(\Delta T_{j}-\frac{T}{N}\right)^{2}\right)^{% \nicefrac{{1}}{{2}}}={\scriptstyle{\mathcal{O}}}_{a.\,s.\,}(1)~{}.

Furthermore,

 \displaystyle\frac{N}{iT}\sum_{T_{k}\leq t}(\sigma_{T_{k-1}}^{X}\sigma_{T_{k-1% }}^{Y})^{2}\Delta T_{k}\left|\Delta T_{k}-\frac{T}{N}\right|\sum_{l=1}^{i% \wedge k}\left(1-\frac{l}{i}\right)^{2} \displaystyle\leq\frac{N}{T}C\hskip-2.845276pt\left(\sum_{j=1}^{N}(\Delta T_{j% })^{2}\sum_{j=1}^{N}\left(\Delta T_{j}-\frac{T}{N}\right)^{2}\right)^{% \nicefrac{{1}}{{2}}}

holds, where the right-hand side converges to zero almost surely due to (36) and the Cauchy-Schwarz inequality. The preceding estimates imply the statement of the lemma. ∎

We proceed proving (37b) that the quadratic covariations \left[\mathfrak{D}^{N},L\right]_{t} and \left[\mathfrak{D}^{N},M\right]_{t} converge to zero in probability for all t\in[0,T].

 \left[\mathfrak{D}^{N},L\right]_{t}=\sqrt{\frac{N}{iT}}\sum_{T_{k}\leq t}% \hskip-2.845276pt\left(\Delta\left[L\right]_{T_{k}}\hskip-2.845276pt\left(\sum% _{l=1}^{i\wedge k}\left(1-\frac{l}{i}\right)\Delta M_{T_{k-l}}\right)\hskip-2.% 845276pt+\Delta\left[L,M\right]_{T_{k}}\left(\sum_{l=1}^{i\wedge k}\left(1-% \frac{l}{i}\right)\Delta L_{T_{k-l}}\right)\right)

has an expectation equal to zero for all t\in[0,T] and the second moment is bounded above by iN\delta_{N}^{2}. The order follows from the evaluation of the second moment that is carried out analogously as for the calculation of \left[\mathfrak{D}^{N}\right]_{t} before. For this reason \left[\mathfrak{D}^{N},L\right] converges to zero in probability on [0,T]. It can be directly deduced that \left[\mathfrak{D}^{N},M\right]_{t}={\scriptstyle{\mathcal{O}}}_{p}(1) as well. If L^{\bot} is a bounded (\mathcal{F}_{t})-martingale with \left[L,L^{\bot}\right]\equiv 0, the quadratic covariation

 \left[\mathfrak{D}^{N},L^{\bot}\right]_{t}=\sqrt{\frac{N}{iT}}\sum_{T_{k}\leq t% }\Delta\left[L^{\bot},M\right]_{T_{k}}\left(\sum_{l=1}^{i\wedge k}\left(1-% \frac{l}{i}\right)\Delta L_{T_{k-l}}\right)

converges to zero in probability on [0,T] what can be concluded following the same principles and also that \left[\mathfrak{D}^{N},M^{\bot}\right]_{t}={\scriptstyle{\mathcal{O}}}_{p}(1)~% {}\forall M^{\bot}\in\mathcal{M}^{\bot},\,t\in[0,T]. An application of Jacod’s Theorem from Jacod (1997) completes the proof of Proposition 8.\hfill\Box

###### Proposition A.1.

On the same assumptions as in Proposition 6, the continuous martingale

 \displaystyle\mathfrak{M}_{t}^{N}\mathrel{\mathop{:}}=\sqrt{\frac{N}{M_{N}}} \displaystyle\sum_{i=1}^{M_{N}}\left[\sum_{T_{k}\leq t}\left(\Delta L_{T_{k}}+% \int_{T_{k}}^{t}\sigma_{s}^{X}dW_{s}^{X}\right)\left(\sum_{l=1}^{i\wedge k}% \left(1-\frac{l}{i}\right)\Delta M_{T_{k-l}}\right)\right. \displaystyle\left.+\sum_{T_{k}\leq t}\left(\Delta M_{T_{k}}+\int_{T_{k}}^{t}% \sigma_{s}^{Y}dW_{s}^{Y}\right)\left(\sum_{l=1}^{i\wedge k}\left(1-\frac{l}{i}% \right)\Delta L_{T_{k-l}}\right)\right]

for t\in[0,T] converges stably in law as N\rightarrow\infty,M_{N}\rightarrow\infty,M_{N}/N^{\alpha}\rightarrow 0 for every \alpha>2/3 to a limiting time-changed Brownian motion

 \displaystyle\mathfrak{M}^{N}_{t}\lx@stackrel{{\scriptstyle st}}{{% \rightsquigarrow}}\int_{0}^{t}\sqrt{v_{\mathfrak{M}_{s}}}d\tilde{\mathfrak{W}}% _{s}^{\bot}~{},

where \tilde{\mathfrak{W}}^{\bot} is independent of \mathcal{F} and with

 \displaystyle v_{\mathfrak{M}_{s}}=\frac{26}{35}TG^{\prime}(s)(\sigma_{s}^{X}% \sigma_{s}^{Y})^{2}(1+\rho_{s}^{2})~{}.

Proof of Proposition 9:
The discretization error of the generalized multiscale estimator calculated with the closest synchronous approximation under the equivalent martingale measure where the drift terms equal zero

 \displaystyle\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}^{opt}}{i}\sum_{j=i}^{N}% \left(L_{T_{j}}-L_{T_{j-i}}\right)\left(M_{T_{j}}-M_{T_{j-i}}\right)-\left[X,Y% \right]_{T} \displaystyle=\sum_{i=1}^{M_{N}}\alpha_{i,M_{N}}^{opt}\left(\frac{1}{i}\sum_{j% =i}^{N}\left(L_{T_{j}}-L_{T_{j-i}}\right)\left(M_{T_{j}}-M_{T_{j-i}}\right)-% \left[X,Y\right]_{T}\right)

equals the weighted sum of M_{N}\rightarrow\infty discretization errors of the type considered in Proposition 5 because \sum_{i=1}^{M_{N}}\alpha_{i,M_{N}}^{opt}=1. Note, that all approximation errors in the preceding proof of Proposition 8 converge to zero in probability as long as N\rightarrow\infty,i/N^{\alpha}\rightarrow 0 for every \alpha>2/3.
We begin with the proof of a multivariate stable central limit theorem for a finite-dimensional vector:

###### Lemma A.2.

Consider the sequence of K-dimensional vectors \mathds{D}^{N}=\left(\mathfrak{D}^{i_{N}^{1}},\ldots,\mathfrak{D}^{i_{N}^{K}}\right) where the entries {\mathfrak{D}}^{i_{N}^{k}},k=1,\ldots,K<\infty are the continuous martingales

 \displaystyle{\mathfrak{D}}^{i_{N}^{k}}_{t} \displaystyle=\left[\sum_{T_{r}\leq t}\left(\Delta L_{T_{r}}+\int_{T_{r}}^{t}% \sigma_{s}^{X}dW_{s}^{X}\right)\left(\sum_{l=1}^{i_{N}^{k}\wedge r}\left(1-% \frac{l}{i_{N}^{k}}\right)\Delta M_{T_{r-l}}\right)\right. \displaystyle\left.+\sum_{T_{r}\leq t}\left(\Delta M_{T_{r}}+\int_{T_{r}}^{t}% \sigma_{s}^{Y}dW_{s}^{Y}\right)\left(\sum_{l=1}^{i_{N}^{k}\wedge r}\left(1-% \frac{l}{i_{N}^{k}}\right)\Delta L_{T_{r-l}}\right)\right]

with a sequence of integers i_{N}^{k},k=1,\ldots,K. On the Assumptions 1, 2 and 4 and if for every k\in\{1,\ldots,K\} there exists a constant q_{k} with i_{N}^{k}/M_{N}\rightarrow q_{k}, the following stable convergence holds true as N\rightarrow\infty,M_{N}\rightarrow\infty,M_{N}/N^{\alpha}\rightarrow 0 for every \alpha>2/3:

 \displaystyle\sqrt{\frac{N}{M_{N}}}{\mathds{D}}^{N}_{t}\lx@stackrel{{% \scriptstyle st}}{{\rightsquigarrow}}\int_{0}^{t}w_{s}d{\mathds{W}}_{s}~{}, (38)

with a K-dimensional Brownian motion \mathds{W} independent of \mathcal{F} and a predictable process w_{s} with

 \displaystyle\left(w_{s}w_{s}^{*}\right)_{mn}=\frac{T}{3}\min{(q_{m},q_{n})}% \left(3-\frac{\min{(q_{m},q_{n})}}{\max{(q_{m},q_{n})}}\right)(1+\rho_{s}^{2})% \left(\sigma_{s}^{X}\sigma_{s}^{Y}\right)^{2}G^{\prime}(s) (39)

with the convention that for q_{m}=q_{n}=0 the ratio is one.
For \mathds{D}^{N}_{T} we obtain the following multivariate stable central limit theorem

 \displaystyle\sqrt{\frac{N}{M_{N}}}{\mathds{D}}^{N}_{T}\lx@stackrel{{% \scriptstyle st}}{{\rightsquigarrow}}\mathbf{N}\left(0,\eta^{2}\Sigma\right)~{}, (40)

with \eta^{2}=2T\int_{0}^{T}(1+\rho_{t}^{2})(\sigma_{t}^{X}\sigma_{t}^{Y})^{2}G^{% \prime}(t)dt and

 \Sigma_{mn}=\frac{1}{6}\min{(q_{m},q_{n})}\left(3-\frac{\min{(q_{m},q_{n})}}{% \max{(q_{m},q_{n})}}\right)~{}.
###### Proof.

Define for k\in\{1,\ldots,K\} the continuous martingales

 \mathfrak{M}_{t}^{i_{N}^{k}}=\sqrt{\frac{N}{M_{N}}}\mathfrak{D}_{t}^{i_{N}^{k}% }~{}.

By virtue of Proposition 5, we already have that

 \left[\mathfrak{M}^{i_{N}^{k}}\right]_{t}\lx@stackrel{{\scriptstyle p}}{{% \longrightarrow}}\frac{2}{3}Tq_{k}\int_{0}^{t}(1+\rho_{s}^{2})(\sigma_{s}^{X}% \sigma_{s}^{Y})^{2}G^{\prime}(s)\,ds~{}.

The limit of the quadratic covariations \left[\mathfrak{M}^{i_{N}^{m}},\mathfrak{M}^{i_{N}^{k}}\right] is derived using the same approximations as for the quadratic variation in the preceding proof:

 \displaystyle\left[\mathfrak{M}^{i_{N}^{m}},\mathfrak{M}^{i_{N}^{k}}\right]_{t} \displaystyle=\frac{N}{M_{N}}\left[\sum_{T_{r}\leq t}\Delta\left[L\right]_{T_{% r}}\left(\sum_{l=1}^{\min{(i_{N}^{m},i_{N}^{k},r)}}\left(1-\frac{l}{i_{N}^{m}}% \right)\left(1-\frac{l}{i_{N}^{k}}\right)\left(\Delta M_{T_{r-l}}\right)^{2}% \right)\right. \displaystyle\left.~{}+\sum_{T_{r}\leq t}\Delta\left[M\right]_{T_{r}}\left(% \sum_{l=1}^{\min{(i_{N}^{m},i_{N}^{k},r)}}\left(1-\frac{l}{i_{N}^{m}}\right)% \left(1-\frac{l}{i_{N}^{k}}\right)\left(\Delta L_{T_{r-l}}\right)^{2}\right)\right. \displaystyle\left.~{}+\sum_{T_{r}\leq t}2\Delta\left[L,M\right]_{T_{r}}\left(% \sum_{l=1}^{\min{(i_{N}^{m},i_{N}^{k},r)}}\left(1-\frac{l}{i_{N}^{m}}\right)% \left(1-\frac{l}{i_{N}^{k}}\right)\Delta L_{T_{r-l}}\Delta M_{T_{r-l}}\right)% \right]+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle=N\sum_{T_{r}\leq t}2\,\frac{G^{(N)}(T_{r})-G^{(N)}(T_{r-1})}{% \Delta T_{r}}(\rho_{T_{r-1}}^{2}+1)(\sigma_{T_{r-1}}^{X}\sigma_{T_{r-1}}^{Y})^% {2}\Delta T_{r} \displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{% }~{}~{}~{}~{}\times\left(\sum_{l=1}^{\min{(i_{N}^{m},i_{N}^{k},r)}}\left(1-% \frac{l}{i_{N}^{m}}\right)\left(1-\frac{l}{i_{N}^{k}}\right)\right)+{% \scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle\lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}2T\int_{0}^{t}(% \rho_{s}^{2}+1)(\sigma_{s}^{X}\sigma_{s}^{Y})^{2}G^{\prime}(s)\left(\frac{1}{6% }\min{(q_{m},q_{k})}\left(3-\frac{\min{(q_{m},q_{k})}}{\max{(q_{m},q_{k})}}% \right)\right)\,ds~{},

since \sum_{l=1}^{m}(1-(l/m))(1-(l/M))=(1/2)m-(m^{2}/6M)-1/8+1/(12M) for m,M\in\mathds{Z}. The multi-dimensional version of Jacod’s stable central limit Theorem 2–1 from Jacod (1997) enables us to prove the result of stable weak convergence of the vector provided we can verify the conditions

 \left[\mathds{D}^{N},\mathds{L}\right]_{t}\lx@stackrel{{\scriptstyle p}}{{% \longrightarrow}}0~{}~{},~{}~{}\left[\mathds{D}^{N},\mathds{M}\right]_{t}% \lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}0,~{}~{}\forall t\in[0,T]~{},

where \mathds{L} denotes the vector with entries \mathds{L}^{j}=L,j=1,\ldots,K and \mathds{M} with \mathds{M}^{j}=M,j=1,\ldots,K, respectively, and

 \left[\mathds{D}^{N},\mathds{L}^{\bot}\right]_{t}\lx@stackrel{{\scriptstyle p}% }{{\longrightarrow}}0~{}~{},~{}~{}\left[\mathds{D}^{N},\mathds{M}^{\bot}\right% ]_{t}\lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}0,~{}~{}\forall t\in[0,T]

where \mathds{L}^{\bot} and \mathds{M}^{\bot} are bounded (\mathcal{F}_{t})-adapted martingales orthogonal to \mathds{L} and \mathds{M}, respectively. That is because the reference continuous martingales for all entries of the vector \mathds{D}^{N} are L and M. The componentwise proof of the conditions above is yet analogous as for the univariate case in the preceding proof. We conclude that the asymptotic distribution of the vector is described by a limiting time-changed Brownian motion on [0,T], and the marginal distribution at time T by a mixed Gaussian limit, where the normal distribution is defined as well as for all componentwise marginals on an orthogonal extension of the original underlying probability space. ∎

From the preceding multivariate limit theorem the Cramér-Wold device allows to conclude the weak convergence of all one-dimensional linear combinations of the transformed discretization errors of a finite collection of one-scale subsampling estimators. For an asymptotically \mathbf{N}\left(0,\Sigma\right)-distributed random vector the sum of all components is asymptotically normally distributed with variance \sum_{i,j}(\Sigma_{ij}) by the Cramér-Wold device and the normality of any linear sum of components of a multivariate normal distribution (see e. g. pp. 516-517 in Rao (2001)).
The asymptotic variance in Proposition 6 is deduced from the multivariate limit and

 \sum_{k,l}(\Sigma_{k,l})=2\sum_{k=1}^{M_{N}}\sum_{l=1}^{k}\frac{l}{6M_{N}}% \left(3-\frac{l}{k}\right)\alpha_{k,M_{N}}^{opt}\alpha_{l,M_{N}}^{opt}+{% \scriptstyle{\mathcal{O}}}(1)=\frac{13}{35}+{\scriptstyle{\mathcal{O}}}(1)

with the weights (30) inserted.
For the completion of the proof of Propositions 9 and hence 6, it remains to extend the result for asymptotically infinitely many addends. This part of the proof can be adopted from Zhang (2006) where a stable central limit theorem for a multiscale estimator for the integrated volatility in the univariate setting is proved. \hfill\Box

#### A.2.3 Discretization error due to the lack of synchronicity

###### Proposition A.2.

On the Assumptions 1 and 2, it holds true that

 \displaystyle\mathcal{A}_{T}^{N} \displaystyle=\frac{1}{i}\sum_{j=i}^{N-1}\left[\left(L_{g_{j}}-L_{T_{j}}\right% )\left(\sum_{k=j-i+1}^{j}\Delta M_{T_{k}}\right)+\left(M_{\gamma_{j}}-M_{T_{j}% }\right)\left(\sum_{k=j-i+1}^{j}\Delta L_{T_{k}}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\Delta L_{T_{j+1}}\left(\sum_{k=j-i+1}^{% j}\left(M_{T_{k}}-M_{\lambda_{k+1}}\right)\right)+\Delta M_{T_{j+1}}\left(\sum% _{k=j-i+1}^{j}\left(L_{T_{k}}-L_{l_{k+1}}\right)\right)\right]=\mathcal{O}_{p}% \left(\sqrt{i}{N}\right)~{}.

for the error associated with interpolation errors \mathcal{A}_{T}^{N} for a one-scale subsampling estimator.

###### Proof.

\mathcal{A}_{T}^{N} is the endpoint of a \mathcal{F}_{j,N}=\mathcal{F}_{T_{j+1}^{(N)}}-measurable discrete martingale with conditional expectation zero, since the addends incorporate products of Brownian increments over disjoint time intervals. The conditional variance yields

 \displaystyle\frac{1}{i^{2}}\sum_{j=i}^{N-1}\mathbb{E}\left[\left(\left(L_{g_{% j}}-L_{T_{j}}\right)\left(\sum_{k=j-i+1}^{j}\Delta M_{T_{k}}\right)+\left(M_{% \gamma_{j}}-M_{T_{j}}\right)\left(\sum_{k=j-i+1}^{j}\Delta L_{T_{k}}\right)% \right.\right. \displaystyle\left.\left.~{}~{}~{}~{}~{}~{}+\Delta L_{T_{j+1}}\left(\sum_{k=j-% i+1}^{j}\left(M_{T_{k}}-M_{\lambda_{k+1}}\right)\right)+\Delta M_{T_{j+1}}% \left(\sum_{k=j-i+1}^{j}\left(L_{T_{k}}-L_{l_{k+1}}\right)\right)\right)^{2}% \big{|}\mathcal{F}_{T_{j}^{(N)}}\right] \displaystyle=\frac{1}{i^{2}}\sum_{j=i}^{N-1}\left(\mathbb{E}\left[(L_{g_{j}}-% L_{T_{j}})^{2}\right]\left(\sum_{k=j-i+1}^{j}\Delta M_{T_{k}}\right)^{2}+% \mathbb{E}\left[(M_{\gamma_{j}}-M_{T_{j}})^{2}\right]\left(\sum_{k=j-i+1}^{j}% \Delta L_{T_{k}}\right)^{2}\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\mathbb{E}\left[(\Delta L_{T_{j+1}})^{2}% \right]\left(\sum_{k=j-i+1}^{j}\left(M_{T_{k}}-M_{\lambda_{k+1}}\right)\right)% ^{2}+\mathbb{E}\left[(\Delta M_{T_{j+1}})^{2}\right]\left(\sum_{k=j-i+1}^{j}% \left(L_{T_{k}}-L_{l_{k+1}}\right)\right)^{2}\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\mathbb{E}\left[\int_{T_{j}}^{g_{j}}(% \sigma_{t}^{X})^{2}dt\right]\left(\sum_{k=j-i+1}^{j}\left(M_{T_{k}}-M_{\lambda% _{k+1}}\right)\right)\left(\sum_{k=j-i+1}^{j}\Delta M_{T_{k}}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\mathbb{E}\left[\int_{T_{j}}^{\gamma_{j}% }(\sigma_{t}^{Y})^{2}dt\right]\left(\sum_{k=j-i+1}^{j}\left(L_{T_{k}}-L_{l_{k+% 1}}\right)\right)\left(\sum_{k=j-i+1}^{j}\Delta L_{T_{k}}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\mathbb{E}\left[\int_{T_{j}}^{g_{j}}\rho% _{t}\sigma_{t}^{X}\sigma_{t}^{Y}dt\right]\left(\sum_{k=j-i+1}^{j}\left(L_{T_{k% }}-L_{l_{k+1}}\right)\right)\left(\sum_{k=j-i+1}^{j}\Delta M_{T_{k}}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}+\mathbb{E}\left[\int_{T_{j}}^{\gamma_{j}% }\rho_{t}\sigma_{t}^{X}\sigma_{t}^{Y}dt\right]\left(\sum_{k=j-i+1}^{j}\left(M_% {T_{k}}-M_{\lambda_{k+1}}\right)\right)\left(\sum_{k=j-i+1}^{j}\Delta L_{T_{k}% }\right)\right) \displaystyle=\mathcal{O}_{p}\left(i^{-1}N^{-1}\right)~{}.

The variance of the term is of order (iN)^{-1} which can be proved by taking the expectation of the above given conditional variance and an upper bound of the second moment. The asymptotic orders of the addends follow from taking the expectations using Itô isometry and analyzing the differences of the addends minus their expectations, that converge to zero at a faster rate. That part is similar to the proofs above and we forgo a more detailed computation here. ∎

Denote A_{T}^{N,i} the error due to non-synchronicity and interpolations for a fixed subsampling frequency i=1,\ldots,M_{N} in the following. The error due to asynchronicity of the generalized multiscale estimator (12) equals the weighted sum \sum_{i=1}^{M_{N}}\alpha_{i,M_{N}}^{opt}A_{T}^{N,i}. It has expectation zero and the variance is of order

 \displaystyle\mathbb{V}\hskip-1.422638pt\textnormal{a r}\left(\sum_{i=1}^{M_{N% }}\alpha_{i,M_{N}}^{opt}\mathcal{A}_{T}^{N,i}\right) \displaystyle=\sum_{i,k}\alpha_{i,M_{N}}^{opt}\alpha_{k,M_{N}}^{opt}\mathbb{C}% \textnormal{o v}\left(\mathcal{A}_{T}^{N,i}\,,\,\mathcal{A}_{T}^{N,k}\right) \displaystyle=\underbrace{\sum_{i=1}^{M_{N}}\left(\alpha_{i,M_{N}}^{opt}\right% )^{2}\mathbb{E}\left[\left(\mathcal{A}_{T}^{N,i}\right)^{2}\right]}_{=\mathcal% {O}\left(M_{N}^{-2}N^{-1}\right)}+\underbrace{\sum_{i\neq k}\alpha_{i,M_{N}}^{% opt}\alpha_{k,M_{N}}^{opt}\mathbb{E}\left[\mathcal{A}_{T}^{N,i}\mathcal{A}_{T}% ^{k,N}\right]}_{=\mathcal{O}\left(M_{N}^{-1}N^{-1}\right)}={\scriptstyle{% \mathcal{O}}}\left(\frac{M_{N}}{N}\right)~{}.

Thus, the error due to interpolations is of smaller asymptotic order than the discretization error of the closest synchronous approximation and asymptotically negligible.

### A.3 Asymptotics of the cross term

For a one-scale subsampling estimator cross terms are asymptotically negligible and hence the stable central limit theorem in Theorem 4.1 is implied by Theorem 5. For the proof of the stable central limit theorem in Theorem 3 for the multiscale approach, we cope with the asymptotics of the cross terms in this subsection.

###### Proposition A.2.

On the Assumptions 1, 2, 5 and 3, the cross terms of the generalized multiscale estimator (12) with noise-optimal weights (30) weakly converge to a mixed normal limit as M_{N}\rightarrow\infty,\,N\rightarrow\infty,\,M_{N}\delta_{N}\rightarrow 0:

 \displaystyle\sqrt{M_{N}}\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}^{opt}}{i}% \sum_{j=i}^{N}\left((X_{g_{j}}-X_{l_{j-i+1}})(\epsilon^{Y}_{\gamma_{j}}-% \epsilon^{Y}_{\lambda_{j-i+1}})+(Y_{\gamma_{j}}-Y_{\lambda_{j-i+1}})(\epsilon^% {X}_{g_{j}}-\epsilon^{X}_{l_{j-i+1}})\right) \displaystyle\rightsquigarrow\mathbf{N}\left(0,\operatorname{\mathbf{AVAR}}_{% cross}\right)~{}, (41)

with asymptotic variance

 \displaystyle\operatorname{\mathbf{AVAR}}_{cross}=\frac{12}{5}\left(\eta_{Y}^{% 2}\int_{0}^{T}(1+I^{\prime}_{Y}(t))(\sigma_{t}^{X})^{2}\,dt\,+\eta_{X}^{2}\int% _{0}^{T}(1+I^{\prime}_{X}(t))(\sigma_{t}^{Y})^{2}\,dt\right)~{}. (42)

The convergence holds conditionally given the paths of the efficient processes.

###### Proof.

This proof affiliates to the discussion in Section 4, where degrees of regularity of non-synchronous sampling schemes have been defined in Definition 4 that are assumed to converge to continuously differentiable functions.
On the Assumption 3 of independent observation noise of X and Y, the two different cross terms are uncorrelated and we prove a central limit theorem for the first one:

 \sqrt{M_{N}}\sum_{i=1}^{M_{N}}\frac{\alpha_{i,M_{N}}^{opt}}{i}\sum_{j=i}^{N}(X% _{g_{j}}-X_{l_{j-i+1}})(\epsilon^{Y}_{\gamma_{j}}-\epsilon^{Y}_{\lambda_{j-i+1% }})\rightsquigarrow\mathbf{N}\left(0,\frac{12}{5}\eta_{Y}^{2}\int_{0}^{T}(1+I^% {\prime}_{Y}(t))(\sigma_{t}^{X})^{2}\,dt\right)~{}.

The parallel result for the other term can be proved analogously.
For the purpose of a shorter notation we have left out superscripts of the observation times, and write \alpha_{i},\,i=1,\ldots,M_{N} for the weights although we are interested in the specific weights (30). Denote \delta_{N}=\sup_{i\in\{1,\ldots,N\}}\Delta T_{i} and \gamma_{j,+}=\min{\left(\tau_{k}\in\mathcal{T}^{Y}|\tau_{k}\in\mathcal{G}^{j+1% }\right)},g_{j,+}=\min{\left(t_{k}\in\mathcal{T}^{X}|t_{k}\in\mathcal{H}^{j+1}% \right)} and C a generic constant as before. From

 \displaystyle\mathbb{E}\left[\left(\sqrt{M_{N}}\sum_{i=1}^{M_{N}}\frac{\alpha_% {i}}{i}\sum_{j=i}^{N}(X_{g_{j}}-X_{T_{j}})(\epsilon^{Y}_{\gamma_{j}}-\epsilon^% {Y}_{\lambda_{j-i+1}})+(X_{T_{j-i}}-X_{l_{j-i+1}})(\epsilon^{Y}_{\gamma_{j}}-% \epsilon^{Y}_{\lambda_{j-i+1}})\right)^{2}\right] \displaystyle~{}~{}\leq M_{N}\sum_{i,k\in\{1,\ldots,M_{N}\}}\frac{\alpha_{i}% \alpha_{k}}{ik}2\eta_{Y}^{2}\left(\sum_{j=i\vee k}^{N}\mathbb{E}(X_{g_{j}}-X_{% T_{j}})^{2}+\sum_{j=0}^{N-(i\vee k)}\mathbb{E}(X_{T_{j}}-X_{l_{j+1}})^{2}\right) \displaystyle~{}~{}\leq M_{N}\,C4\eta_{Y}^{2}\sum_{i,k\in\{1,\ldots,M_{N}\}}% \frac{\alpha_{i}\alpha_{k}}{ik}=\mathcal{O}\left(M_{N}^{-1}\right)~{},

for the errors due to interpolations and

 \displaystyle\mathbb{E}\left[\left(\sqrt{M_{N}}\sum_{i=1}^{M_{N}}\frac{\alpha_% {i}}{i}\left(\sum_{k=N-i+1}^{N}\epsilon^{Y}_{\gamma_{k}}(X_{T_{k}}-X_{T_{k-i}}% )-\sum_{k=1}^{i}\epsilon^{Y}_{\lambda_{k}}(X_{T_{k+i}}-X_{T_{k}})\right)\right% )^{2}\right] \displaystyle~{}~{}=M_{N}\sum_{i,k\in\{1,\ldots,M_{N}\}}\frac{\alpha_{i}\alpha% _{k}}{ik}\eta_{Y}^{2}\left(\sum_{r=N-(i\wedge k)+1}^{N}\mathbb{E}(X_{T_{r}}-X_% {T_{r-i}})^{2}+\sum_{r=1}^{i\wedge k}\mathbb{E}(X_{T_{r+i}}-X_{T_{r}})^{2}\right) \displaystyle~{}~{}=\mathcal{O}\left(M_{N}\delta_{N}\right)

for boundary terms, we conclude that

 \displaystyle\sqrt{M_{N}}\sum_{i=1}^{M_{N}}\frac{\alpha_{i}}{i}\sum_{j=i}^{N}(% X_{g_{j}}-X_{l_{j-i+1}})(\epsilon^{Y}_{\gamma_{j}}-\epsilon^{Y}_{\lambda_{j-i+% 1}}) \displaystyle=\sqrt{M_{N}}\sum_{i=1}^{M_{N}}\frac{\alpha_{i}}{i}\left(\sum_{j=% i}^{N-i}\epsilon^{Y}_{\gamma_{j}}(X_{T_{j}}-X_{T_{j-i}})-\epsilon^{Y}_{\lambda% _{j+1}}(X_{T_{j+i}}-X_{T_{j}})\right)+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle=\sqrt{M_{N}}\sum_{j=2}^{N-2}\left(\epsilon^{Y}_{\gamma_{j}}\sum_% {i=1}^{M_{N}^{*}(j)}\frac{\alpha_{i}}{i}(X_{T_{j}}-X_{T_{j-i}})-\epsilon^{Y}_{% \lambda_{j+1}}\sum_{i=1}^{M_{N}^{*}(j)}(X_{T_{j+i}}-X_{T_{j}})\right)+{% \scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle=\sqrt{M_{N}}\left(\sum_{j\in\mathcal{Y}_{1}}\epsilon^{Y}_{\gamma% _{j}}\sum_{i=1}^{M_{N}^{*}(j)}\frac{\alpha_{i}}{i}\zeta_{i,j}^{1}+\sum_{j\in% \mathcal{Y}_{2}}\epsilon^{Y}_{\gamma_{j}}\sum_{i=1}^{M_{N}^{*}(j)}\frac{\alpha% _{i}}{i}\zeta_{i,j}^{2}+\sum_{j\in\mathcal{Y}_{3}}\epsilon^{Y}_{\gamma_{j}}% \sum_{i=1}^{M_{N}^{*}(j)}\frac{\alpha_{i}}{i}\zeta_{i,j}^{3}\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{% }~{}+\sum_{j\in\mathcal{Y}_{4}}\epsilon^{Y}_{\gamma_{j}}\sum_{i=1}^{M_{N}^{*}(% j)}\frac{\alpha_{i}}{i}\zeta_{i,j}^{4a}-\sum_{j\in\mathcal{Y}_{4}}\epsilon^{Y}% _{\gamma_{j,+}}\sum_{i=1}^{M_{N}^{*}(j)}\frac{\alpha_{i}}{i}\zeta_{i,j}^{4b}% \right)+{\scriptstyle{\mathcal{O}}}_{p}(1)~{}.

Here, we aggregate the observation times \gamma_{j},\lambda_{j},~{}j=2,\ldots,N-2 in disjoint sets conforming to the four cases discussed in Section 4. Denote thereto

 \displaystyle\mathcal{Y}_{1} \displaystyle=\{j\in\{2,\ldots,N_{2}\}|\gamma_{j}\neq\gamma_{j-1}\,,\,\gamma_{% j}\leq g_{j}\}~{}, \displaystyle\mathcal{Y}_{2} \displaystyle=\{j\in\{2,\ldots,N_{2}\}|\gamma_{j}>g_{j}\,,\,\gamma_{j}\geq g_{% j,+}\}~{}, \displaystyle\mathcal{Y}_{3} \displaystyle=\{j\in\{2,\ldots,N_{2}\}|\gamma_{j}>g_{j}\,,\,\gamma_{j}g_{j,+}\}~{}, \displaystyle\mathcal{Y}_{4} \displaystyle=\{j\in\{2,\ldots,N_{2}\}|\gamma_{j}>g_{j}\,,\,\gamma_{j}

and M_{N}^{*}(j)=\min{(j,N-j,M_{N})}. The increments of X that are multiplied with each observation error differ according to the set \mathcal{Y}_{k},~{}1\leq k\leq 4 to which \gamma_{j} belongs. We use the notation

 \displaystyle\zeta_{i,j}^{1} \displaystyle=(X_{T_{j}}-X_{T_{j-i}})-(X_{T_{j+i}}-X_{T_{j}})~{}, \displaystyle\zeta_{i,j}^{2} \displaystyle=(X_{T_{j}}-X_{T_{j-i}})+(X_{T_{j+1}}-X_{T_{j-i+1}})-(X_{T_{j+i+1% }}-X_{T_{j+1}})~{}, \displaystyle\zeta_{i,j}^{3} \displaystyle=(X_{T_{j}}-X_{T_{j-i}})-(X_{T_{j+i+1}}-X_{T_{j+1}})~{}, \displaystyle\zeta_{i,j}^{4a} \displaystyle=(X_{T_{j}}-X_{T_{j-i}})~{},~{}\zeta_{i,j}^{4b}=(X_{T_{j+i+1}}-X_% {T_{j+1}})~{}.

The resulting aggregated leading term above of the cross term is the endpoint of a discrete martingale with respect to the filtration \mathcal{F}_{j,N}\mathrel{\mathop{:}}=\sigma\big{(}\epsilon^{Y}_{\tau_{k}}|% \tau_{k}<\gamma_{j+1}\,,\,X,Y\big{)}. Since if j\in\mathcal{Y}_{4}\Rightarrow\gamma_{j,+}<\gamma_{j+1}, the martingale property with respect to the filtration \mathcal{F}_{j,N} is assured by Assumption 3.
An application of the non-stable version of the central limit theorem for martingale triangular arrays from Hall and Heyde (1980) will proof the asymptotic normality of the cross term conditionally on the paths of the efficient processes. The conditional Lindeberg condition can be verified (using Chebyshev’s inequality or directly verifying the conditional Lyapunov condition) in the same way as before and we omit it here. The sum of conditional variances yields

 \displaystyle\sum_{l\in\{1,2,3,4a\}}\hskip-4.267913pt\left(\sum_{j\in\mathcal{% Y}_{l}}\hskip-2.133957pt\mathbb{E}\hskip-2.133957pt\Big{[}\Big{(}\sqrt{M_{N}}% \epsilon^{Y}_{\gamma_{j}}\sum_{i=1}^{M_{N}^{*}(j)}\frac{\alpha_{i}}{i}\zeta_{i% ,j}^{l}\Big{)}^{2}\Big{|}\mathcal{F}_{j-1,N}\hskip-2.133957pt\Big{]}\hskip-2.1% 33957pt+\hskip-2.845276pt\sum_{j\in\mathcal{Y}_{4}}\hskip-2.133957pt\mathbb{E}% \hskip-2.133957pt\Big{[}\Big{(}\hskip-3.556594pt-\hskip-2.133957pt\sqrt{M_{N}}% \epsilon^{Y}_{\gamma_{j,+}}\sum_{i=1}^{M_{N}^{*}(j)}\frac{\alpha_{i}}{i}\zeta_% {i,j}^{4b}\Big{)}^{2}\Big{|}\mathcal{F}_{j-1,N}\hskip-1.422638pt\Big{]}\hskip-% 1.422638pt\right) \displaystyle=M_{N}\eta_{Y}^{2}\left(\sum_{j\in\mathcal{Y}_{1}\cup\mathcal{Y}_% {3}\cup\mathcal{Y}_{4}}\left(\sum_{i=1}^{M_{N}^{*}(j)}\frac{\alpha_{i}}{i}% \zeta_{i,j}^{1}\right)^{2}+\sum_{j\in\mathcal{Y}_{2}}\left(\sum_{i=1}^{M_{N}^{% *}(j)}\frac{\alpha_{i}}{i}\zeta_{i,j}^{2}\right)^{2}\right)+{\scriptstyle{% \mathcal{O}}}_{p}(1) \displaystyle=M_{N}\eta_{Y}^{2}\left(\sum_{j\in\mathcal{Y}_{1}\cup\mathcal{Y}_% {3}\cup\mathcal{Y}_{4}}\sum_{i,k\in\{1,\ldots,M_{N}^{*}(j)\}}\hskip-7.113189pt% \frac{\alpha_{i}\alpha_{k}}{ik}\left(\zeta_{i\wedge k,j}^{1}\right)^{2}\hskip-% 2.845276pt+\hskip-2.845276pt\sum_{j\in\mathcal{Y}_{2}}\hskip-2.845276pt\left(% \sum_{i,k\in\{1,\ldots,M_{N}^{*}(j)\}}\frac{\alpha_{i}\alpha_{k}}{ik}\left(% \zeta_{i\wedge k,j}^{2}\right)^{2}\right)\hskip-2.845276pt\right)\hskip-2.8452% 76pt+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle=M_{N}\eta_{Y}^{2}\left(\sum_{j\in\mathcal{Y}_{1}\cup\mathcal{Y}_% {3}\cup\mathcal{Y}_{4}}\sum_{i,k\in\{1,\ldots,M_{N}^{*}(j)\}}\frac{\alpha_{i}% \alpha_{k}}{ik}\left((X_{T_{j}}-X_{T_{j-(i\wedge k)}})^{2}+(X_{T_{j+(i\wedge k% )}}-X_{T_{j}})^{2}\right)\right. \displaystyle\left.~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+\sum_{j\in% \mathcal{Y}_{2}}\sum_{i,k\in\{1,\ldots,M_{N}^{*}(j)\}}\frac{\alpha_{i}\alpha_{% k}}{ik}\left(4(X_{T_{j}}-X_{T_{j-(i\wedge k)}})^{2}+(X_{T_{j+(i\wedge k)}}-X_{% T_{j}})^{2}\right)\right)+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle=M_{N}\eta_{Y}^{2}\sum_{j=2}^{N-2}\sum_{i,k\in\{1,\ldots,M_{N}^{*% }(j)\}}\frac{\alpha_{i}\alpha_{k}}{ik}(2+\mathbbm{1}_{\{j\in\mathcal{Y}_{2}\}}% )(X_{T_{j}}-X_{T_{j-(i\wedge k)}})^{2}+{\scriptstyle{\mathcal{O}}}_{p}(1) \displaystyle=M_{N}\eta_{Y}^{2}\left(\sum_{i,k\in\{1,\ldots,M_{N}^{*}(j)\}}% \frac{\alpha_{i}\alpha_{k}}{ik}\left(2(i\wedge k)\widehat{\left[X\right]}_{T}^% {sub,i\wedge k}+\sum_{j=i\wedge k}^{N}\mathbbm{1}_{\{j\in\mathcal{Y}_{2}\}}(X_% {T_{j}}-X_{T_{j-(i\wedge k)}})^{2}\right)\right)+{\scriptstyle{\mathcal{O}}}_{% p}(1) \displaystyle\lx@stackrel{{\scriptstyle p}}{{\longrightarrow}}\frac{12}{5}\eta% _{Y}^{2}\left(\left[X\right]_{T}+\int_{0}^{T}I^{\prime}_{Y}(t)(\sigma_{t}^{X})% ^{2}\,dt\right)~{}.

Since for the shifted increments

 (X_{T_{j+i+1}}-X_{T_{j+1}})=(X_{T_{j+i}}-X_{T_{j}})+\mathcal{O}_{p}\left(N^{-% \nicefrac{{1}}{{2}}}\right)

holds, where the order is for time instants of average length N^{-1}, the variances of the sums over all j\in\mathcal{Y}_{1} and j\in\mathcal{Y}_{3} are asymptotically equal. The variance of both uncorrelated sums over maxima \gamma_{j} and minima \gamma_{j,+} distributed according to the fourth case is also asymptotically equal to the variances of those two addends. Only the asymptotic variance of the sum over all j\in\mathcal{Y}_{2} is bigger. For this reason the total asymptotic variance hinges on the asymptotic degree of regularity of the non-synchronous sampling scheme (\mathcal{T}^{X},\mathcal{T}^{Y}) defined in Definition 4.
In the calculation of the asymptotic variance we have used that

 \zeta_{i,j}^{1}\zeta_{i,k}^{1}=\left(\zeta_{i\wedge k,j}^{1}\right)^{2}+\zeta^% {1}_{i\wedge k,j}\left(\sum_{l=j-(i\vee k)+1}^{j-(i\wedge k)}\Delta X_{T_{l}}+% \sum_{l=j+(i\wedge k)+1}^{j+(i\vee k)}\Delta X_{T_{l}}\right)~{},

where the second remainder addend has an expectation equal to zero, and analogous formulae for \zeta_{i,j}^{2}, for all 1\leq i\leq M_{N}\,,1\leq k\leq M_{N}\,,\,k\vee i\leq j\leq N-(i\vee k).
Furthermore, an application of the mean value theorem, Itô isometry and approximations in the same spirit as in the calculation of the asymptotic variance in the proof of the central limit theorem for the discretization errors of the estimators, lead to the Riemann sum in the calculation of the asymptotic variance above. The cross terms in (\zeta_{i,j}^{l})^{2}, l=1,2 are asymptotically negligible. Since in \mathcal{Y}_{4} repeating maxima \gamma_{i}=\gamma_{i+1} are considered only once, it holds true that |\mathcal{Y}_{1}|+|\mathcal{Y}_{3}|+|\mathcal{Y}_{4}|+2|\mathcal{Y}_{2}|=N-3\pm 1 (the last addend can appear due to boundary term effects). In the last step we have used that

 M_{N}\sum_{i,k\in\{1,\ldots,M_{N}\}}\frac{\alpha_{i,M_{N}}^{opt}\alpha_{k,M_{N% }}^{opt}}{ik}(i\wedge k)=6/5+{\scriptstyle{\mathcal{O}}}(1)

when inserting the weights (30).
From the analysis for the asymptotic discretization error of a one-scale subsampling estimator, we know that

 \widehat{\left[X\right]}_{T}^{sub,i\wedge k}=\frac{1}{i\wedge k}\sum_{j=i% \wedge k}^{N}(X_{T_{j}}-X_{T_{j-(i\wedge k)}})^{2}=\left[X\right]_{T}+\mathcal% {O}_{p}\left(\sqrt{\frac{(i\wedge k)}{N}}\right)

holds true. Similarly, it can be deduced that

 \displaystyle\frac{1}{i}\sum_{j=i}^{N}\mathbbm{1}_{\{j\in\mathcal{Y}_{2}\}}(X_% {T_{j}}-X_{T_{j-i}})^{2} \displaystyle=\frac{1}{i}\sum_{l=1}^{N}\left(\Delta X_{T_{l}}\right)^{2}\left(% \sum_{k=1}^{i}\mathbbm{1}_{\{(k+l-1)\in\mathcal{Y}_{2}\}}\right)+\mathcal{O}_{% p}\left(\sqrt{i/N}\right) \displaystyle=\int_{0}^{T}I^{\prime}_{Y}(t)(\sigma_{t}^{X})^{2}\,dt+\mathcal{O% }_{p}\left(\sqrt{i/N}\right)~{},

on Assumption 5. ∎

## B Proof of Proposition 2

Let R_{j}^{N}, R_{j}^{n}, R_{j}^{m}, S_{j}^{N,X} and S_{j}^{N,Y} denote the number of times T_{k}^{(N)},0\leq k\leq N, t_{i}^{(n)} and \tau_{j}^{(m)} in the bins [G_{j}^{N},G_{j+1}^{N}), [(I_{Y})_{j}^{N},(I_{Y})_{j+1}^{N}), [(I_{X})_{j}^{N},(I_{X})_{j+1}^{N}),0\leq j\leq K_{N}-1,. Define the generalized multiscale estimator in the fashion of (12)

 \widehat{\Delta\left[X,Y\right]}_{G_{j+1}^{N}}=\sum_{i=1}^{M_{N}(j)}\frac{% \alpha_{i,M_{N}(j)}^{opt}}{i}\sum_{r=i}^{R_{j}^{N}}\left(\tilde{X}_{g_{r}}-% \tilde{X}_{l_{r-i+1}}\right)\left(\tilde{Y}_{\gamma_{r}}-\tilde{Y}_{\lambda_{r% -i+1}}\right)

for the increase of the quadratic covariation \Delta\left[X,Y\right]_{G_{j+1}^{N}} and the univariate multiscale estimators

 \displaystyle\widehat{\Delta\left[X\right]}_{G_{j+1}^{N}}=\sum_{i=1}^{M_{n}(j)% }\frac{\alpha_{i,M_{n}(j)}^{opt}}{i}\sum_{r=i}^{R_{j}^{n}}\left(\tilde{X}_{t_{% r}}-\tilde{X}_{t_{r-i}}\right)^{2},\;\widehat{\Delta\left[Y\right]}_{G_{j+1}^{% N}}=\sum_{i=1}^{M_{m}(j)}\frac{\alpha_{i,M_{m}(j)}^{opt}}{i}\sum_{r=i}^{R_{j}^% {m}}\left(\tilde{Y}_{\tau_{r}}-\tilde{Y}_{\tau_{r-i}}\right)^{2}~{},

and \widehat{\Delta\left[X\right]}_{(I_{Y})_{j+1}^{N}},\widehat{\Delta\left[Y% \right]}_{(I_{X})_{j+1}^{N}} analogously, where all binwise multiscale frequencies are of order \sqrt{NK_{N}} with possibly differing constants. Essential when considering the multiscale estimators on bins is that on Assumption 4 the distances between sampling times are of order N^{-1}\sim n^{-1}\sim m^{-1}, whereas the numbers of observations R_{j}^{\cdot},S_{j}^{\cdot},1\leq j\leq K_{N} in the specific bin are at most of order NK_{N}^{-1}. Following the analysis for the four uncorrelated parts of the estimation error in sections A, orders of the discretization variances

 \sum_{i,k\in\{1,\ldots,M_{N}(j)\}}\frac{\alpha_{i,M_{N}(j)}^{opt}\alpha_{k,M_{% N}(j)}^{opt}}{ik}\cdot i\cdot R_{j}^{N}\frac{i^{2}}{N^{2}}\sim M_{N}(j)\frac{R% _{j}^{N}}{N^{2}}\sim\frac{M_{N}(j)}{K_{N}N}~{},

M_{n}(j)/(nK_{N}), M_{m}(j)/(mK_{N}),M_{n}(j)\frac{S_{j}^{N,X}}{N^{2}} and M_{n}(j)\frac{S_{j}^{N,Y}}{N^{2}}, respectively. Cross terms are of order R_{j}^{N}/(NM_{N}(j))\sim(M_{N}(j)K_{N})^{-1} and analogous orders for the univariate estimators. The errors due to noise instead depend only on the number of observations in the considered interval. Therefore, the addends are of orders R_{j}^{N}/M_{N}^{3}(j)\sim\frac{N}{K_{N}M_{N}^{3}(j)} and M_{N}^{-1}(j) and analogous for the univariate estimators.
Choosing all multiscale frequencies M_{\cdot}(j)\sim N^{\nicefrac{{1}}{{2}}}K_{N}^{\nicefrac{{1}}{{2}}} for every j, so that M_{N}(\cdot)N^{\nicefrac{{1}}{{2}}}\rightarrow\infty, the error due to end-effects in the noise part and the discretization error dominate asymptotically the two other addends and are of order N^{-\nicefrac{{1}}{{4}}}K_{N}^{-\nicefrac{{1}}{{4}}}. This holds as long as K_{N}N^{-\nicefrac{{1}}{{3}}}\rightarrow 0, such that M_{N}(j)(N/K_{N})^{-1}\rightarrow 0 as N\rightarrow\infty.
The estimators (22a)-(22d) are consistent as K_{N}\rightarrow\infty with K_{N}N^{-\nicefrac{{1}}{{3}}}\rightarrow 0 as N\rightarrow\infty, since

 \displaystyle\hat{I}_{2} \displaystyle=\sum_{j=1}^{K_{N}}\left(\frac{\widehat{\Delta\left[X\right]}_{G_% {j}^{N}}\widehat{\Delta\left[Y\right]}_{G_{j}^{N}}}{\left(\Delta G_{j}^{N}% \right)^{2}}\right)\hskip-1.422638pt\frac{G^{N}(T)}{K_{N}} \displaystyle=\sum_{j=1}^{K_{N}}\left(\frac{\int_{G_{j-1}^{N}}^{G_{j}^{N}}% \left(\sigma_{t}^{X}\right)^{2}\,dt\int_{G_{j-1}^{N}}^{G_{j}^{N}}\left(\sigma_% {t}^{Y}\right)^{2}\,dt+\mathcal{O}_{p}\left(N^{-\nicefrac{{1}}{{4}}}K_{N}^{-% \nicefrac{{1}}{{4}}}\right)}{\Delta G_{j}^{N}}\right)^{2}\,\frac{G^{N}(T)}{K_{% N}} \displaystyle=\sum_{j=1}^{K_{N}}\hskip-1.422638pt\left(\sigma^{X}\right)_{% \overline{G_{j}^{N}}}^{2}\left(\sigma^{Y}\right)_{\widetilde{G_{j}^{N}}}^{2}% \frac{G^{N}(T)}{K_{N}}+\hskip-1.422638pt\mathcal{O}_{p}\left(K_{N}^{\nicefrac{% {1}}{{4}}}N^{-\nicefrac{{1}}{{4}}}\right)=\sum_{j=1}^{K_{N}}\hskip-1.422638pt% \left(\sigma^{X}\sigma^{Y}\right)_{G_{j-1}^{N}}^{2}\frac{G^{N}(T)}{K_{N}}+% \hskip-1.422638pt\mathcal{O}_{p}\left(K_{N}^{\nicefrac{{1}}{{4}}}N^{-\nicefrac% {{1}}{{4}}}\right)

and similar conclusions for the other three estimators. We have used that \Delta G_{j}^{N}\sim N^{-1} and apply the mean value theorem. \overline{G_{j}^{N}} is some value G_{j-1}^{N}\leq\overline{G_{j}^{N}}\leq G_{j}^{N}. Finally, elementary inequalities as \left|\left(\sigma^{X}\right)^{2}_{\overline{G_{j}^{N}}}\left(\sigma^{Y}\right% )^{2}_{\widetilde{G_{j}^{N}}}-\left(\sigma^{X}\sigma^{Y}\right)^{2}_{G_{j-1}^{% N}}\right|~{}\leq\left(\sigma^{X}\right)^{2}_{\overline{G_{j}^{N}}}\left|\left% (\sigma^{Y}\right)^{2}_{\widetilde{G_{j}^{N}}}-\left(\sigma^{Y}\right)^{2}_{G_% {j-1}^{N}}\right|+\left(\sigma^{Y}\right)^{2}_{G_{j-1}^{N}}\left|\left(\sigma^% {X}\right)^{2}_{\overline{G_{j}^{N}}}-\left(\sigma^{X}\right)^{2}_{G_{j-1}^{N}% }\right|and \sum_{j=1}^{K_{N}}\left|\left(\rho\sigma^{X}\sigma^{Y}\right)_{\overline{G_{j}% ^{N}}}^{2}-\left(\rho\sigma^{X}\sigma^{Y}\right)_{G_{j-1}^{N}}^{2}\right|\,% \frac{G^{N}(T)}{K_{N}}\leq\sup_{|t-s|\leq\Delta\sup_{j}{G_{j}^{n}}}{\left|\rho% _{t}\sigma_{t}^{X}\sigma_{t}^{Y}-\rho_{s}\sigma_{s}^{X}\sigma_{s}^{Y}\right|}G% ^{N}(T)={\scriptstyle{\mathcal{O}}}_{a.\,s.\,}(1)are involved in the approximations of the type above. Considering (22c) and (22d), note that bin-widths chosen accordingly to I_{Y}^{N} are asymptotically of order K_{N}^{-1} in any interval of [0,T] on that the corresponding part of the integral \int I_{Y}^{\prime}(t)(\sigma_{t}^{X})^{2}\,dt is strictly positive.
Denote R_{N}^{k},\,k=1,\ldots,4, the orders of the approximation errors of the four above given integrals and their Riemann sums evaluated on the partition given K_{N} bins. The variance of the estimators \widehat{\eta_{X}^{2}} and \widehat{\eta_{Y}^{2}} for the noise variances are known to be \mathbb{E}\left[\left(\epsilon_{t_{1}}^{X}\right)^{4}\right]N^{-1} and \mathbb{E}\left[\left(\epsilon_{\tau_{1}}^{Y}\right)^{4}\right]N^{-1} and hence \mathcal{O}\left(N^{-1}\right) on Assumption 3 from Zhang et al. (2005). From

 \displaystyle\hat{I}_{k}=I_{k}+\mathcal{O}_{p}\left(R_{N}^{k}+K_{N}^{\nicefrac% {{1}}{{2}}}N^{-\nicefrac{{1}}{{2}}}\right)~{},~{}k=1,\ldots,4~{},

we derive that

 \widehat{\operatorname{\mathbf{AVAR}}}_{multi}=\operatorname{\mathbf{AVAR}}_{% multi}+\mathcal{O}_{p}\left(\max_{k}{R_{N}^{k}}+K_{N}^{\nicefrac{{1}}{{2}}}N^{% -\nicefrac{{1}}{{2}}}\right)~{},

and the same result for the one-scale estimator. \hfill\Box

## References

• Aït-Sahalia et al. (2010) Y. Aït-Sahalia, J. Fan, D. Xiu, High-frequency estimates with noisy and asynchronous financial data, Journal of the American Statistical Association 105 (2010) 1504–1516.
• Aït-Sahalia et al. (2011) Y. Aït-Sahalia, L. Zhang, P.A. Mykland, Ultra high frequency volatility estimation with dependent microstructure noise, Journal of Econometrics, 160 (2011) 160–165.
• Awartani et al. (2009) B. Awartani, V. Corradi, W. Distasor, Assessing market microstructure effects via realized volatility measures with an application to the dow jones industrial average stocks, Journal of Business and Economic Statistics 27 (2009) 251–265.
• Barndorff-Nielsen et al. (2008a) O.E. Barndorff-Nielsen, P.R. Hansen, A. Lunde, N. Shephard, Designing realised kernels to measure the ex-post variation of equity prices in the presence of noise, Econometrica 76 (2008a) 1481–1536.
• Barndorff-Nielsen et al. (2008b) O.E. Barndorff-Nielsen, P.R. Hansen, A. Lunde, N. Shephard, Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading, SSRN working paper 1154144, University of Aarhus (2008b).
• Barndorff-Nielsen and Shephard (2002) O.E. Barndorff-Nielsen, N. Shephard, Econometric analysis of realized volatility and its use in estimating stochastic volatility models, Journal of the Royal Statistical Society 64 (2002) 253–280.
• Bibinger (2011a) M. Bibinger, Asymptotics of asynchronicity, Technical Report, Humboldt-Universität zu Berlin (2011a) URL=http://sfb649.wiwi.hu–berlin.de/papers/pdf/SFB649DP2011–033.pdf.
• Bibinger (2011b) M. Bibinger, Efficient covariance estimation for asynchronous noisy high-frequency data, Scandinavian Journal of Statistics 38 (2011b) 23–45.
• Christensen et al. (2010) K. Christensen, S. Kinnebrock, M. Podolskij, Pre-averaging estimators of the ex-post covariance matrix in noisy diffusion models with non-synchronous data, Journal of Econometrics, 159 (2010) 116–133.
• Epps (1979) T.W. Epps, Comovements in stock prices in the very short run, Journal of the American Statistical Association 74 (1979) 291–298.
• Fan and Wang (2007) J. Fan, Y. Wang, Multi-scale jump and volatility analysis for high-frequency data, Journal of the American Statistical Association 102 (2007) 1349–1362.
• Fukasawa (2010) M. Fukasawa, Realized volatility with stochastic sampling, Stochastic Processes and their Applications 120 (2010) 209–233.
• Gloter and Jacod (2001) A. Gloter, J. Jacod, Diffusions with measurement errors 1 and 2, ESAIM Probability and Statistics 5 (2001) 225–242.
• Hall and Heyde (1980) P. Hall, C. Heyde, Martingale Limit Theory and its Application, Academic Press, Boston, 1980.
• Hayashi and Yoshida (2005) T. Hayashi, N. Yoshida, On covariance estimation of non-synchronously observed diffusion processes, Bernoulli 11 (2005) 359–379.
• Hayashi and Yoshida (2008) T. Hayashi, N. Yoshida, Asymptotic normality of a covariance estimator for nonsynchronously observed diffusion processes, Annals of the Institute of Statistical Mathematics 60 (2008) 367–406.
• Hayashi and Yoshida (2011) T. Hayashi, N. Yoshida, Nonsynchronous covariation process and limit theorems, Stochastic Processes and their Applications In Press, Uncorrected Proof (2011).
• Jacod (1997) J. Jacod, On continuous conditional gaussian martingales and stable convergence in law, Séminaire de Probabilitiés (1997) 232–246.
• Jacod et al. (2009) J. Jacod, Y. Li, P.A. Mykland, M. Podolskij, M. Vetter, Microstructure noise in the continous case: the pre-averaging approach, Stochastic Processes and their Applications 119 (2009) 2249–2276.
• Kalnina and Linton (2008) I. Kalnina, O. Linton, Estimating quadratic variation consistently in the presence of endogenous and diurnal measurement error, Journal of Econometrics 147 (2008) 47–59.
• Mykland and Zhang (2009) P. Mykland, L. Zhang, Inference for continuous semimartingales observed at high frequency, Econometrica 77 (2009) 1403–1445.
• Palandri (2006) A. Palandri, Consistent Realized Covariance for Asynchronous Observations Contaminated by Market Microstructure Noise, Technical Report, University of Copenhagen, 2006.
• Podolskij and Vetter (2009) M. Podolskij, M. Vetter, Estimation of volatility functionals in the simultaneous presence of microstructure noise and jumps, Bernoulli 15 (2009) 634–658.
• Podolskij and Vetter (2010) M. Podolskij, M. Vetter, Understanding limit theorems for semimartingales: a short survey, Statistica Nederlandica 64 (2010) 329–351.
• Rao (2001) C.R. Rao, Linear Statistical Inference and its Applications, Wiley, New York, 2 edition, 2001.
• Reiß (2011) M. Reiß, Asymptotic equivalence for inference on the volatility from noisy observations, Annals of Statistics, Forthcoming (2011).
• Rényi (1963) A. Rényi, On stable sequences of events, Sankhya: The Indian Journal of Statistics, Series A 25 (1963) 293–302.
• Utev (1990) S. Utev, Central limit theorem for dependent random variables, In Grigelionis et al., Probability theory and mathematical statistics 2 (1990) 519–528.
• Xiu (2010) D. Xiu, Quasi-maximum likelihood estimation of volatility with high frequency data, Journal of Econometrics 159 (2010) 235–250.
• Zhang (2006) L. Zhang, Efficient estimation of stochastic volatility using noisy observations: A multi-scale approach, Bernoulli 12 (2006) 1019–1043.
• Zhang et al. (2005) L. Zhang, P.A. Mykland, Y. Aït-Sahalia, A tale of two time scales: Determining integrated volatility with noisy high-frequency data, Journal of the American Statistical Association 100 (2005) 1394–1411.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters