NonParametric Estimation of Spot Covariance Matrix with HighFrequency Data
Abstract
Estimating spot covariance is an important issue to study, especially with the increasing availability of highfrequency financial data. We study the estimation of spot covariance using a kernel method for highfrequency data. In particular, we consider first the kernel weighted version of realized covariance estimator for the price process governed by a continuous multivariate semimartingale. Next, we extend it to the threshold kernel estimator of the spot covariances when the underlying price process is a discontinuous multivariate semimartingale with finite activity jumps. We derive the asymptotic distribution of the estimators for both fixed and shrinking bandwidth. The estimator in a setting with jumps has the same rate of convergence as the estimator for diffusion processes without jumps. A simulation study examines the finite sample properties of the estimators. In addition, we study an application of the estimator in the context of covariance forecasting. We discover that the forecasting model with our estimator outperforms a benchmark model in the literature.
Keywords: highfrequency data; kernel estimation; jump; forecasting covariance matrix MOS subject classification: 62F12, 62G05, 60J75.
1 Introduction
Spot covariance has important applications in studying the intraday patterns of the covariance process, cojump tests (Bibinger and Winkelmann (2015)) and estimating parametric multivariate stochastic volatility models (Kanaya and Kristensen (2016)). Moreover, understanding covariance dynamics is crucial for effective portfolio choice, derivative pricing, and risk management. The availability of highfrequency intraday data of asset returns has given rise to several approaches for estimating integrated (co)variances and spot variances. While the literature proposes few measures of integrated covariance, see e.g. BarndorffNielsen and Shephard (2004a), Hayashi and Yoshida (2011), there is sparse literature on empirical approaches and statistical theory to estimate spot covariances with highfrequency data.
In this paper, we consider the nonparametric filtering of spot covariance with highfrequency financial data. Our study is at the intersection of two fields of literature. The first strand of literature is on estimating integrated covariance matrices over a fixed period. This topic has been studied extensively in highfrequency econometrics. For example, the highly celebrated paper by BarndorffNielsen and Shephard (2004a) makes important contributions to the use of realized covariance to estimate integrated covariance matrix in a setup without market microstructure noise. The quasimaximum likelihood estimator by AïtSahalia et al. (2010), the multivariate preaveraging estimator by Christensen et al. (2013), the twoscale estimator by Zhang (2011) are robust to microstructutre noise. However, all above mentioned realized covariance estimators do not account for jumps in the underlying price process.
The second strand focuses on spot volatility estimation. Several approaches of estimating spot volatility were proposed. Foster et al. (1996) were the first to introduce the spot volatility estimator: rolling and sampling filters. Later, kerneltype estimators were introduced in Fan and Wang (2008) and Kristensen (2010). These estimators of spot variance neglect the microstructure noise and jumps. The examples of spot variance estimators accounting for microstructure noise include Zu and Boswijk (2014), Bos et al. (2012), Mykland and Zhang (2008). Yu et al. (2014) extend kernel spot volatility estimator of Kristensen (2010) to the case when the underlying price process has jumps.
The estimation of spot covariance matrix is, however, an area that has been studied the least. For a multidimensional continuous semimartingale logasset price process Bibinger et al. (2017) propose an estimator for spot covariance which is constructed based on a local average of blockwise parametric spectral covariance estimates. Aiming to fill this gap in the literature we study the spot covariance estimation of both continuous and discontinuous semimartingales.
Our contribution is following. First, for a setup without jumps, we study asymptotic properties of the kernel covariance estimator, which was mentioned in Kristensen (2010) as an extension to the multivariate case and was left for the future research. Second, we propose the threshold kernel covariance estimator when the underlying price process is a discontinuous semimartingale with finite activity jumps. We derive the asymptotic distribution of this estimator for a fixed bandwidth. The estimator is an extension to the multivariate case of the threshold kernel volatility estimator proposed by Yu et al. (2014). Third, we conduct numerical studies to examine finite sample properties of both estimators. Next, we study an application of the kernel estimator in the context of covariance forecasting.
In a setup without jumps the estimator is a kernelweighted version of the standard integrated covariance estimator, which depends on a kernel function and choice of bandwidth. It can be regarded as a kernel regression in the time domain. The bandwidth choice allows us to focus on the covariance behavior at specific points in time, and give different weights to the covariance matrix over the window used. As the bandwidth shrinks to zero, the spot covariance can be extracted. We establish asymptotic normality of the estimator for both fixed and shrinking bandwidth. The proofs are componentwise. We construct our proofs referring to the techniques of BarndorffNielsen and Shephard (2004a) and Kristensen (2010). We first derive the mean and covariance of the estimator. We then derive the asymptotic distribution by employing central limit theorem for triangular arrays and CramérWold device. We also prove the asymptotic normality for the threshold kernel estimator with fixed bandwidth. In the proof of this theorem we combine our results from the first theorem, techniques from Yu et al. (2014) and employ CramérWold device. In simulation study we examine the finite sample properties of both estimators using the integrated mean square error and the integrated bias performance measurements.
The rate of convergence of both estimators is \sqrt{n}. The local method of moments estimator of the spot covariances of Bibinger et al. (2017) attains slower optimal rate of convergence (\sqrt{n^{4}}). However, it should be noted this is due to the fact that Bibinger et al. (2017) consider the setting with market microstructure noise, whereas we target for a complementary jump case. The kernel and threshold kernel covariance estimators are fairly easy to implement.
In terms of applications of this kernel covariance estimator, considerable efforts has been put into covariance forecasting, see e.g. Alexander (2018), Andersen et al. (2013). Multivariate GARCH models are a standard tool used in modelling and forecasting covariances. However, more recent studies propose models based on highfrequency data and options implied data. In a comprehensive empirical study by Symitsi et al. (2018) several approaches to the covariance forecasting are compared based on statistical and economic criteria. In this study the authors conclude that models based on highfrequency data offer a clear advantage in terms of statistical accuracy. In particular, a Vector Heterogeneous Autoregressive (VHAR) model achieves the best performance amongst the competing models. The VHAR model is a linear combination of past daily, weekly and monthly realized covariance estimators of BarndorffNielsen and Shephard (2004a).
Motivated by this we use the VHAR model to forecast covariance, however instead of the realized covariance estimator we use newly proposed kernel covariance estimator. We further show that with the VHAR model the kernel covariance estimator outperforms the benchmark realized covariance estimator in all three measures of accuracy: the Euclidean loss function, the Frobenius distance and the multivariate quasilikelihood loss function.
The paper is structured as follows. In Section 2.1 we review theoretical setup of the problem and the kernel covariance estimator which was proposed in Kristensen (2010) and left for the future research. In Section 2.2 we study the asymptotic properties of the estimator for a fixed and small (tending to zero) bandwidth. In Section 3 we introduce the setup with jumps, propose the estimator for jump case and derive its asymptotic distribution. In Section 4 we conduct Monte Carlo simulations and investigate the finite sample properties of both estimators. In Section 5 we present an application of the estimator in the context of covariance forecasting. Finally, in Section 6 we summarise our findings.
2 Kernel Covariance Estimation
2.1 Theoretical Setup and the Kernel Covariance Estimator
In this section we start by considering a multidimensional continuous semimatingale, describe the theoretical setup and review the kernel covariance estimator in Kristensen (2010). Our aim is to accurately estimate the spot covariance matrix of a fixed ddimensional logprice process \left(X(t)\right)_{t\geq 0}=\left(X_{1}(t),X_{2}(t),...,X_{d}(t)\right)_{t\geq 0}. We assume that X(t) follows a continuous semimartingale
X(t)=X(0)+\int_{0}^{t}\mu(s)ds+\int_{0}^{t}\theta(s)dW(s),\ \ \ t\in\left[0,T% \right],  (1) 
defined on a filtered probability space (\Omega,\mathcal{F},(\mathcal{F})_{t\geq 0},P), with an initial condition X(0)\in\mathbb{R}^{d}, the drift vector \mu(t), the ddimensional standard Brownian motion W(t) and the instantaneous volatility matrix \theta(t) which has elements that are all càdlàg. The latter yields the (d\times d)dimensional spot covariance matrix \Sigma(t)=\theta(t)\theta(t)^{\top}, which is our object of interest. We also denote the integrated covariance matrix by \Sigma^{*}(t)=\int_{0}^{t}\Sigma(s)ds. We consider the finite and fixed time horizon [0,T] with n+1 highfrequency discrete observations X_{k}(t_{0}),X_{k}(t_{1}),...,X_{k}(t_{n1}),X_{k}(t_{n}) of the realization of kth asset, with k=1,2,...,d. For an arbitrary partition 0=t_{0}<t_{1}<...<t_{n}=T of the interval [0,T] we require that \max_{i=1,...,n}t_{i}t_{i1} approaches zero under the asymptotic limit. For simplicity, we consider the case of equally spaced and synchronous observation times. We denote \delta=T/n, so that t_{i}=i\delta for i=1,2,\cdots,n.
A kernel is a nonnegative integrable function K satisfying the following condition: \int_{\mathbb{R}}K(u)du=1. The kernel weighted measure of the integrated covariance, which is an extension of the measure of the integrated variance introduced in Kristensen (2010), is of the following form
KCV(\tau)=\int_{0}^{T}K_{h}(s\tau)\Sigma(s)ds,  (2) 
where the function K_{h}(z) is given by K\left(\frac{z}{h}\right)/h, satisfies \int_{\mathbb{R}}K(z)dz=1, and h>0 is the fixed bandwidth. KCV(\tau) delivers a kernel weighted average of the quadratic covariation.
An estimator of the integrated covariance in equation (2) is the kernel smoothed sample average of the increments, which was mentioned in Kristensen (2010) as an extension of the univariate case and was left for the future research:
\widehat{KCV}(\tau)=\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X(t_{i1})\Delta X% ^{\top}(t_{i1}),  (3) 
where \Delta X(t_{i1})=X(t_{i})X(t_{i1}) is the ddimensional vector (d is fixed) of the increments of the process X over time interval [t_{i1},t_{i}]. As demonstrated above, for a fixed h>0, KCV(\tau) gives a weighted measure of the integrated covariance. However, as h\to 0, the instantaneous covariance can be recovered at any point of continuity \tau of t\mapsto\Sigma(t):
\Sigma(\tau)=\lim_{h\to\infty}KCV(\tau).  (4) 
To emphasize that we are working with an estimator of the instantaneous covariance at time \tau, we shall denote:
\widehat{\Sigma}(\tau)=\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X(t_{i1})% \Delta X(t_{i1})^{\top}  (5) 
Note that, \sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X(t_{i1})\Delta X(t_{i1})^{\top} can be regarded as the NadaryaWatson estimator. An overview of this types of kernel can be found in Silverman (1986). In the univariate case, i.e. when d=1, we recover the spot variance estimator from Kristensen (2010).
2.2 Asymptotic Properties of the Kernel Covariance Estimator
In this section we state the necessary assumptions and present the two out of the three main results of the paper. Our first theorem derives the asymptotic distribution of the kernel covariance estimator for the fixed bandwidth. Theorem 2 proves asymptotic normality of the kernel covariance estimator for a tending to zero bandwidth. Throughout our work we shall consider the following set of assumptions:
Assumption 1.
The processes \mu and \Sigma are jointly independent of W.
This assumption holds for a widely used stochastic volatility models, such as Heston (1993), Hull and White (1987). Assumption 1 greatly facilitates the proof by allowing us to make all arguments conditional on \mu and \Sigma. Under Assumption 1, the volatility process being independent of W, the model falls into the case without leverage effects. However, this assumption does not appear to be strictly necessary as demonstrated in Kanaya and Kristensen (2016).
Assumption 2.
For any sequences (i1)\delta\leq s_{i}\leq t_{i}\leq i\delta, with i=1,\cdots,n and every k=1,\cdots,d, as \delta\to 0
\delta\sum_{i=1}^{n}\mu_{k}^{2}(s_{i})\mu_{k}^{2}(t_{i})=o(1),\ \ \ \ \ \ % \delta\sum_{i=1}^{n}\Omega(s_{i})\Omega(t_{i})=o(1),  (6) 
where \Omega(t)=:\left\{\Sigma_{kk^{\prime}}(t)\Sigma_{ll^{\prime}}(t)+\Sigma_{kl^{% \prime}}(t)\Sigma_{lk^{\prime}}(t)\right\}_{k,k^{\prime},l,l^{\prime}=1,\cdots% ,d}.
Assumptions 2 imposes a restriction on the local behavior of the mean and covariance processes. It allows for the deterministic patterns, jumps, and nonstationarity, and is automatically satisfied when the mean and volatility processes have continuous trajectories. In particular, standard diffusion models such as Heston (1993), Hull and White (1987) satisfy this assumption.
Assumption 3.
For every k=1,\cdots,d and i=1,\cdots,n the quantities
\delta^{1}\int_{t_{i1}}^{t_{i}}\Sigma_{kk}(s)ds  (7) 
are bounded away from 0 and infinity uniformly in \delta.
Equation (7) in Assumption 3 essentially means that, on any bounded interval, \Sigma_{kk}(t) itself is bounded away from infinity. This is the case, for example for CoxIngersollRoss (CIR) and OrnsteinUhlenbeck (OU) processes in Cox et al. (1985), Uhlenbeck and Ornstein (1930) respectively. The above mentioned assumptions are sufficient to derive asymptotic distribution of \widehat{KCV}(\tau), however in order to get the asymptotics of \widehat{\Sigma}(\tau), when h\to 0, the general smoothness condition needs to be imposed on the covariance process.
Assumption 4.
The space C^{m,\gamma}[0,T] for some m\geq and 0<\gamma<1 consists of functions f:[0,T]\mapsto\mathbb{R} that are m times differentiable with the mth derivative f^{(m)}(t), satisfying
f^{(m)}(t+\delta)f^{(m)}(t)\leq\mathcal{L}(t,\delta)\delta^{\gamma}+o(% \delta^{\gamma}),\ \ \ \delta\to 0,\ \ (a.s.),  (8) 
where \mathcal{L}(t,\delta) is Lipschitz coefficient, a slowly varying function at zero and t\mapsto\mathcal{L}(t,0) is continuous. The mapping t\mapsto\Sigma_{k,l}(t) for k,l=(1,...,d) lies in C^{m,\gamma}[0,T] for some m\geq 0 and \gamma\geq 0.
As stated in Yu et al. (2014) this condition is satisfied by commonly used diffusion processes. When Assumption 5 holds with m=0 and \gamma<0.5 the model is driven by a Brownian motion (see e.g. Revuz and Yor (1998, ch.5)).
We also impose requirements on the kernel function:
Assumption 5.
The kernel K:\mathbb{R}\mapsto\mathbb{R}

satisfies \int_{\mathbb{R}}K(x)dx=1 and continuously differentiable, i.e. K\in C^{1,0}, such that
\bar{K}_{z}\coloneqq\sup_{0\leq u\leq T}K^{(z)}(u)<\infty,\ \ \ \ z=0,1. 
satisfies the condition that there exists some constants \Lambda,L and \Gamma_{i}<\infty such that K^{(i)}(u)\leq\Lambda, and for some v>1, K^{(i)}(u)\leq\Gamma_{i}u^{v} for u\geq L, i=0,1.

satisfies \int_{\mathbb{R}}x^{i}K(x)dx=0, i=1,...,r1 and \int_{\mathbb{R}}x^{r}K(x)dx,\infty, for some r\geq 0.
The assumptions above are satisfied by most standard kernels for r\leq 2. When r>2, K is called a higherorder kernel. If m>2 as well, the higherorder kernels can be used to reduce the bias in the estimation of more than twice differentiable functions. Although, as mentioned in Kristensen (2010), since m=0 is a usual case, Cline and Hart (1991) demonstrated that higherorder kernels can potentially reduce bias even when the object of interest is nonsmooth and has jumps.
Now we are ready to derive the asymptotics of the kernel covaraince estimator for a fixed bandwidth.
Theorem 1.
If Assumptions 15 hold, we have that for fixed h and any \tau\in[0,T]
\sqrt{\delta^{1}}\left\{\widehat{KCV}(\tau)\int_{0}^{T}K_{h}(s\tau)\Sigma(s% )ds\right\}\overset{\mathcal{L}}{\to}N\left(0,\int_{0}^{T}K_{h}^{2}(s\tau)% \Omega(s)ds\right),  (9) 
where \Omega(t) is a d^{2}\times d^{2} array with elements
\Omega(t)=:\left\{\Sigma_{kk^{\prime}}(t)\Sigma_{ll^{\prime}}(t)+\Sigma_{kl^{% \prime}}(t)\Sigma_{lk^{\prime}}(t)\right\}_{k,k^{\prime},l,l^{\prime}=1,\cdots% ,d}.  (10) 
Proof.
We give the proof in several steps. First we derive the means, variances and covariances of the variates
\displaystyle\widehat{KCV}_{kl}(\tau)  \displaystyle=  \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k}(t_{i1})\Delta X_{l% }(t_{i1})  
\displaystyle=  \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\left(X_{k}(t_{i})X_{k}(t_{i1}% )\right)\left(X_{l}(t_{i})X_{l}(t_{i1})\right). 
with k,l=1,2,\cdots,d. Second, the Theorem 1 is proved for the case, where the mean processes \mu_{k} are identically 0, by employing CramerWold device. Finally, the latter restriction is lifted and using lemma 5 in Appendix D the negligibility of nonzero drift term is shown. The proof is componentwise and based on the results and techniques employed by BarndorffNielsen and Shephard (2004a) and Kristensen (2010). See Appendix A for the details of the proof. ∎
This theorem is an intermediate step in the derivation of the asymptotic distribution of the estimator for a shrinking bandwidth. The Theorem 1 is necessary for the proof of the asymptotic normality of the spot kernel covariance estimator in (5).
Theorem 2.
If Assumptions 15 hold with r\geq m+\gamma, then as nh\to\infty and nh^{2(m+\gamma)+1}\to 0 for any t\in(0,T) we have
\sqrt{\delta^{1}h}\left\{\widehat{\Sigma}(t)\Sigma(t)\right\}\overset{% \mathcal{L}}{\to}N\left(0,\Omega(t)\int_{\mathbb{R}}K^{2}(z)dz\right)  (11) 
where \Omega(t) is a d^{2}\times d^{2} array with elements
\Omega(t)=:\left\{\Sigma_{kk^{\prime}}(t)\Sigma_{ll^{\prime}}(t)+\Sigma_{kl^{% \prime}}(t)\Sigma_{lk^{\prime}}(t)\right\}_{k,k^{\prime},l,l^{\prime}=1,\cdots% ,d}.  (12) 
Proof.
See Appendix B. ∎
Bibinger et al. (2017) propose spot covariance estimator which is constructed based on local averages of blockwise parametric spectral covariance estimates. This is an extension of the local method of moments (LMM) in Bibinger and Reiss (2014). Since Bibinger et al. (2017) consider a setting with market microstructure noise, their estimator attains the optimal rate of convergence (\sqrt{n^{4}}) which is slower compared to the convergence rate of the kernel covariance estimator (\sqrt{n}). The kernel estimator in equation (5) is fairly easy to implement.

It is helpful to focus on the bivariate case in order to gain further understanding. We will look at the results for the assets k and l, whose logprices will be written as X_{missing}kandX_missinglrespectively.Thenthehighfrequencyreturnsattimet_iis\begin{equation*}\Delta X_{k}(t_{i})=X_{k}(t_{i})X_{k}(t_{i}1)\ \ \ \text{% and}\ \ \ \Delta X_{l}(t_{i})=X_{l}(t_{i})X_{l}(t_{i}1)\ \ \ \text{for}\ i=1% ,\cdots,n.\end{equation*}% Inordertoavoidthesymmetricreplicationinthecovariationmatrixweemployahalf% vectorization,oralternatively,avechtransformation.Thehalf% vectorizationofasymmetricmatrixisobtainedbyvectorizingonlythelowertriangularpartofthematrix% (seeKolloandRosen\@@cite[citeyearpar]{(\@@bibref{Year}{KolloRosen}{}{})},L{\"{% u}}tkeohl\@@cite[citeyearpar]{(\@@bibref{Year}{Lutkeohl}{}{})}).% InthiscaseTheorem\ref{theorem1}tellsusthatjointasymptoticdistributionforidentifyingelementsofrealizedcovariationoftwoassetsX_kandX_lbecomes\@@eqnarray$\displaystyle\sqrt{\delta^{1}}\left(\begin{array}[]{c}\sum% _{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k}^{2}(t_{i})\int_{0}^{T}K_{h}(s\tau)% \Sigma_{kk}(s)ds\\ \sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k}(t_{i})\Delta X_{l}(t_{i})\int_{% 0}^{T}K_{h}(s\tau)\Sigma_{kl}(s)ds\\ \sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{l}^{2}(t_{i})\int_{0}^{T}K_{h}(s% \tau)\Sigma_{ll}(s)ds\end{array}\right)\overset{\mathcal{L}}{\to}$&$$\\ $\displaystyle N\left[0,\displaystyle\int_{0}^{T}K_{h}^{2}(s\tau)\left(% \scalebox{0.8}{\mbox{$\displaystyle\begin{array}[]{ccc}2\Sigma_{kk}^{2}(s)&2% \Sigma_{kk}(s)\Sigma_{kl}(s)&2\Sigma_{kl}^{2}(s))\\ 2\Sigma_{kk}(s)\Sigma_{kl}(s)&\Sigma_{kk}(s)\Sigma_{ll}(s)+\Sigma_{kl}^{2}(s)2% \Sigma_{ll}(s)\Sigma_{kl}(s)&2\Sigma_{ll}(s)\Sigma_{kl}(s)\\ 2\Sigma_{kl}^{2}(s)&2\Sigma_{ll}(s)\Sigma_{kl}(s)&2\Sigma_{ll}^{2}(s))\end{% array}$}}\right)du\right].$\par
3 Jump Case: Threshold Kernel Covariance Estimation
In this section we assume that the price process is governed by a discontinuous semimartingale with finite activity jumps. We propose a threshold kernel spot covariance estimator, which is an extension of the threshold kernel spot volatility estimator in Yu et al. (2014) to the multivariate case. Theorem 3 derives the asymptotic distribution of the threshold kernel covariance estimator for a fixed bandwidth.
Consider a filtered probability space (\Omega,(\mathcal{F})_{t\in[0,T]},\mathcal{F},P). Let the ddimensional (with fixed d) logprice X(t)=(X_{1}(t),X_{2}(t),...,X_{d}(t)) be defined on the this space and satisfy the following stochastic differential equation:
dX(t)=\mu(t)dt+\theta(t)dW(t)+dJ(t),\ \ \ t\in[0,T]. (13) where \mu(t) is the drift vector, \theta(t) is the instantaneous volatility matrix, W(t) is the ddimensional standard Brownian motion and J(t)=(J_{1}(t),...,J_{d}(t)) is a compound Poisson process with finite activity of jumps, which can be written as J(t)=\sum_{i=1}^{N(t)}(Z_{1}(\tau_{i}),...,Z_{d}(\tau_{i})) =\left(\sum_{i=1}^{N(t)}Z_{1}(\tau_{i}),...,\sum_{i=1}^{N(t)}Z_{d}(\tau_{i})\right). Here (N(t))_{t\geq 0} is a homogeneous Poisson process with constant intensity \lambda>0 and (Z_{k})_{k\in\mathbb{N}} is a sequence of i.i.d. random variables with values in \mathbb{R}^{d}, which denotes the jump size at the jump location \tau_{i}. We assume Z_{k}(\tau_{i}) for k=1,2,...d are i.i.d. and independent of N_{t}. Denote the (d\times d)dimensional spot covariance matrix by \Sigma(t)=\Theta(t)\Theta(t)^{\top}.
Suppose that on a finite and fixed time horizon [0,T], we have n+1 highfrequency discrete observations X_{k}(t_{0}),X_{k}(t_{1}),...,X_{k}(t_{n1}),X_{k}(t_{n}) of the realization of kth asset, with k=1,2,...,d. Here, 0=t_{0}<t_{1}<...<t_{n}=T is an arbitrary partition of the interval [0,T]. Although the observations are not necessarily equidistant, we require that \max_{i=1,...,n}t_{i}t_{i1} approaches zero under the asymptotic limit. We consider the case of equally spaced and synchronous observation times, though this assumption can easily be lifted. Denote \delta=T/n, so that t_{i}=i\delta for i=1,2,\cdots,n.
The quantity of interest is the spot covariance matrix \Sigma(t). The threshold kernel covariance estimator, denoted by \widehat{TCV}, is defined as
\widehat{TCV}(\tau)=\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X(t_{i1})\Delta X% ^{\top}(t_{i1})\mathbbm{1}_{\{\lVert\Delta X_{t_{i1}}\rVert\leq dr(\delta)\}}, (14) where \mathbbm{1}(\cdot) is the indicator function and \Delta X(t_{i1})=X(t_{i})X(t_{i1}) is the ddimensional vector of increments of process X over time interval [t_{i1},t_{i}]. The function K_{h}(x) is given by K(x/h)/h, where h is bandwidth and the kernel function K(x) satisfies \int_{\mathbb{R}}K(x)dx=1. The threshold function r(\delta) is a deterministic function of the step length \delta. As the bandwidth h\to 0 we recover the spot covariance. The threshold function r(\delta) has to vanish more slowly than the modulus of the continuity of the Brownian motion in order to have the convergence in probability. Thus we have the following additional assumption.
Assumption 6.
r(\delta) is a deterministic function of the step length \delta such that \underset{\delta\to 0}{\lim}r(\delta)=0 and \underset{\delta\to 0}{\lim}\frac{\delta\log\frac{1}{\delta}}{r(\delta)}=0.
We now can derive the asymptotics of the threshold kernel covariance estimator.
Theorem 3.
If Assumptions 16 hold, we have that for fixed h and any t\in[0,T]
\sqrt{\delta^{1}}\left\{\widehat{TCV}(\tau)\int_{0}^{T}K_{h}(s\tau)\Sigma(s% )ds\right\}\overset{\mathcal{L}}{\to}N\left(0,\int_{0}^{T}K_{h}^{2}(s\tau)% \Omega(s)ds\right), (15) where \Omega(t) is a d^{2}\times d^{2} array with elements
\Omega(t)=:\left\{\Sigma_{kk^{\prime}}(t)\Sigma_{ll^{\prime}}(t)+\Sigma_{kl^{% \prime}}(t)\Sigma_{lk^{\prime}}(t)\right\}_{k,k^{\prime},l,l^{\prime}=1,\cdots% ,d}. (16) Proof.
See Appendix C ∎
The threshold kernel covariance estimator in equation (3) is an extension of the threshold kernel estimator of the timedependent spot volatility in Yu et al. (2014) to the multivariate case. In Theorem 3 we derive asymptotic distribution for the estimator for a fixed bandwidth h of the kernel. The similar results as in Theorem 3 was achieved for univariate case in Yu et al. (2014).
4 Simulation Study
In this section we examine the performance of the kernel and threshold kernel covariance estimators. In particular, we investigate the finitesample performances of the estimators relative to the time distance between observations. Throughout we work with bivariate stochastic volatility model. First, we examine the kernel covariance estimator in a setup without jumps and assume that asset prices, Y(t)=(Y_{1}(t),Y_{2}(t)), follows Heston model:
dY(t)=\mu Y(t)dt+\theta(t)Y(t)dW(t),\ \ \ \ \ \ \ \Sigma(t)=\theta(t)\theta^{% \prime}(t), (17) where
\Sigma(t)=\begin{pmatrix}\Sigma_{11}(t)&\Sigma_{12}(t)\\ \Sigma_{12}(t)&\Sigma_{22}(t)\end{pmatrix}=\begin{pmatrix}\sigma_{1}^{2}(t)&% \sigma_{1,2}(t)\\ \sigma_{1,2}(t)&\sigma_{2}^{2}(t)\end{pmatrix} (18) with the covariance \sigma_{1,2}(t)=\sigma_{1}(t)\sigma_{2}(t)\rho(t), the drift vector \mu(t)=(\mu_{1}(t),\mu_{2}(t)) and a standard two dimensional Brownian motion W(t)=(W_{1}(t),W_{2}(t)) such that d\left\langle W_{1},W_{2}\right\rangle_{t}=\rho dt. The variance processes, \sigma_{i}(t) for i=1,2, follow the CIR model Cox et al. (1985):
d\sigma^{2}_{i}(t)=\kappa_{i}(\theta_{i}\sigma^{2}_{i}(t))dt+\eta_{i}\sigma_{% i}(t)dZ_{i}(t). (19) We set the correlation between asset and its volatility process to zero in order for Assumption 1 to hold. The remaining data generating parameters are chosen to match the estimated parameter values in BarndorffNielsen and Shephard (2002). In our simulation we set T=2 (48 hours). We consider frequencies \Delta^{1}=12\times 60\times 24,2\times 60\times 24,60\times 24,12\times 24,6% \times 24 corresponding to sampling every 5 seconds, 20 seconds, 1 minute, 5 minutes and 10 minutes. In order to simulate the data using model (4) we employ the Euler discretization scheme from Kloeden and Platen (1999, ch.14). We simulate one trajectory of each \{\sigma_{i}^{2}(t)\} for i=1,2 and keep them fixed. Then we run 500 Monte Carlo repetitions for prices of two assets \{Y_{1}(t),Y_{2}(t)\}. In each repetition we compute \hat{\Sigma}_{kl}(t) for i=1,2 based on sampling frequencies.
Three different estimators of instantaneous covariance: Gaussian kernel estimator, onesided kernel estimator and beta kernel estimator are implemented. For all three estimators crossvalidation was used to select the bandwidth (see Kristensen (2010)). We used the following integrated squared error (ISE) as the goodnessoffit criterion:
\text{ISE}(h)=\int_{t_{l}}^{t_{u}}\left(\Sigma_{kl}(s)\widehat{\Sigma}_{kl}(s% )\right)^{2}ds,\ \ \ \ \ \ \ \text{for}\ \ 0<t_{l}<t_{u}<T, (20) where \Sigma_{kl}(s) and \widehat{\Sigma}_{kl}(s) for k,l=1,2 are the true and the estimated spot covariances. Two performance measurements are used to evaluate the finitesample properties of the estimators: the integrated mean squared error and the integrated bias
[b]
Table 1: Interior performance of the KCV estimator Gaussian kernel \text{Onesided kernel}^{*} Beta kernel Data Frequency IMSE ISB IMSE ISB IMSE ISB 5 seconds 0.14 0.37 0.11 0.21 0.13 0.28 20 seconds 0.73 0.63 0.43 0.49 0.66 0.46 1 minute 0.80 0.74 0.59 0.71 0.76 0.69 5 minutes 1.85 1.97 1.17 1.24 2.03 1.43 10 minutes 3.88 4.21 2.16 2.14 2.85 3.16 
Note: Integrated mean squared error (\times 10^{5}) and integrated squared bias (\times 10^{5}).
\text{IMSE}=\int_{t_{l}}^{t_{u}}\left[(\Sigma_{kl}(s)\widehat{\Sigma}_{kl}(s)% )^{2}\right]ds,\ \ \text{ISB}=\int_{t_{l}}^{t_{u}}\left[E(\Sigma_{kl}(s)% \widehat{\Sigma}_{kl}(s))^{2}ds\right], (21) where 0\leq t_{l}\leq t_{u}\leq T. The results for the performance of the estimator of the covariance, \widehat{\Sigma}_{12}(t), are reported in Table 1. Figure 1 displays QQ plot for observed standardized error terms of Kernel Covariance Estimator using minutebyminute data.
Next, we examine the finite sample performance of the threshold covariance estimator. Though several models combining jumps and stochastic volatility appeared in the literature, we use the model from Bates (1996), one of the most popular examples of the class, an independent jump component is added to the Heston stochastic volatility model:
dX(t)=\mu dt+\theta(t)dW(t)+dJ(t),\ \ \ \ \ \ \ \Sigma(t)=\theta(t)\theta^{% \prime}(t), (22) with
\Sigma(t)=\begin{pmatrix}\Sigma_{11}(t)&\Sigma_{12}(t)\\ \Sigma_{12}(t)&\Sigma_{22}(t)\end{pmatrix}=\begin{pmatrix}\sigma_{1}^{2}(t)&% \sigma_{1,2}(t)\\ \sigma_{1,2}(t)&\sigma_{2}^{2}(t),\end{pmatrix} (23) where X(t)=(X_{1}(t),X_{2}(t)) is log of asset prices, \sigma_{1,2}(t)=\sigma_{1}(t)\sigma_{2}(t)\rho(t), \mu=(\mu_{1},\mu_{2}) is the drift vector, J(t)=\sum_{i=1}^{N(t)}(Z_{1}(\tau_{i}),Z_{2}(\tau_{i})) is a two dimensional compound Poisson jump process and W(t)=(W_{1}(t),W_{2}(t)) is a standard two dimensional Brownian motion such that d\left\langle W_{1},W_{2}\right\rangle_{t}=\rho dt. The variance processes, \sigma_{i}(t) for i=1,2, follow the CIR model:
[b]
Table 2: Interior performance of the TKCV estimator Gaussian kernel \text{Onesided kernel}^{*} Beta kernel Data Frequency IMSE ISB IMSE ISB IMSE ISB 5 seconds 1.76 1.38 1.25 1.22 2.34 1.75 20 seconds 2.24 1.13 1.87 1.34 2.13 2.03 1 minute 3.76 1.45 2.31 1.67 3.54 2.43 5 minutes 9.35 1.67 7.31 1.35 3.52 6.67 10 minutes 5.53 1.25 3.65 7.38 1.83 4.39 
Note: Integrated mean squared error (\times 10^{5}) and integrated squared bias (\times 10^{5}).
d\sigma^{2}_{i}(t)=\kappa_{i}(\theta_{i}\sigma^{2}_{i}(t))dt+\eta_{i}\sigma^{% 2}_{i}(t)dZ_{i}(t). (24) As in simulations for Heston model without jumps we set T=2 (48 hours) and consider sampling frequencies 5 seconds, 30 seconds, 1 minute. We employ Euler discretization scheme from Kloeden and Platen (1999, ch.14) for the simulation. We simulate one trajectory of each \{\sigma_{i}^{2}(t)\} and \{J_{i}(t)\} for i=1,2 and keep them fixed. Then we run 500 repetitions of (X_{1}(t),X_{2}(t)). For each simulated path of the bivariate log asset price we compute \widehat{TCV} based on sampling frequencies.
We use two IMSE and ISB performance measurements in equation (21) for three different estimators: Gaussian, beta and onesided kernel estimator. The results for the performance of the \widehat{TCV} estimator are reported in Table 2. Figure 2 displays QQ plot for observed standardized error terms of Threshold Kernel Covariance Estimator using minutebyminute data.
5 Applications: Covariance Forecasting
Forecasting covariance has an important economic value in the context of asset pricing and portfolio allocation. Multivariate GARCH model is a standard tool of modelling and forecasting covariances. However, the more recent approaches advocate the use of highfrequency data.
Symitsi et al. (2018) undertake a comprehensive empirical comparison of two generic families of covariance forecasting models: multivariate GARCH models that employ daily data and models that use highfrequency and options data. The authors conclude that models based on highfrequency data offer both a clear advantage in terms of statistical accuracy and yield more theoretically consistent predictions leading to superior outofsample portfolio performance. In particular, a Vector Heterogeneous Autoregressive Model (VHAR) achieves the best performance out of the models under consideration. Motivated by this, we use the VHAR model to forecast the integrated covariance, however, when implementing for a finite sample, we use the kernel covariance estimator (3) in Section 2 instead of the realized covariance estimator of BarndorffNielsen and Shephar (2004a).
Heterogeneous Autoregressive model (HAR), see Corsi (2009), was proposed as a simple way to approximate the longmemory behaviour of volatility. Vector HAR, implemented in Chiriac (2011), is a multivariate extension of HAR. In the VHAR the realized covariance is expressed as a linear combination of past daily, weekly and monthly realized covariances:
RC_{t+1}=\alpha+\beta_{d}RC_{t}+\beta_{w}RC_{t5:t}+\beta_{m}RC_{t22:t}+% \epsilon_{t+1}, (25) where RC_{t} is obtained from Cholesky decomposition of realized covariance matrix. If H_{t} is a matrix of realized covariances, its Cholesky decomposition gives H_{t}=C_{t}C_{t}^{\prime} and then RC_{t}=vech(C_{t}). In order to allow direct comparison among quantities defined over various time horizons, these multiperiod factors are normalized sums of the daily realized factors, i.e.
RC_{tk:t}=\frac{1}{k}\sum_{i=0}^{k1}RC_{ti} (26) is the past k day values of RC, \alpha is a constant term and \beta_{d},\beta_{w},\beta_{m} are, respectively, the parameters of daily, weekly and monthly components of the model. The covariance forecasts, H_{t}, are obtained by the reverse transformations of the RC_{t}’s. Modelling the Cholesky factors rather than covariances directly is done in order to avoid unnecessary restrictions that ensure positive definiteness.
We simulate the logprices of two assets and their volatilises using model (4) in Section 4. Since we use simulated data, we have the true integrated covariance matrix and we propose to forecast the true covariance matrix using two measures of integrated covariance: standard in the literature realized covariance estimator of BarndorffNielsen and Shephard (2002) and newly proposed kernel filtering of the covariance in equation (3). Thus we have two models for forecasting integrated covariance. First model is VHAR model where we use the realized covariance as a measure of integrated covariance:
\displaystyle IC_{t+1}=\alpha+\beta_{d}RC_{t}+\beta_{w}RC_{t5:t}+\beta_{m}RC_% {t22:t}+\epsilon_{t+1}, (27) where IC is the halfvectorized Cholesky decomposition of the integrated covariance matrix.
[b]
Table 3: The table reports the out of sample forecast loses for the 1, 5, 22day horizons, respectively. The model with the lowes outofsample loss is market with asterisk (*). 1day horizon 1week horizon 2week horizon VHAR VHARKCV^{*} VHAR VHARKCV^{*} VHAR VHARKCV^{*} \alpha 0.3243 0.3213 0.4987 0.4896 0.4124 0.4126 \beta_{d} 0.6904 0.6064 0.2443 0.2032 0.2295 0.2175 \beta_{w} 0.6909 0.6028 0.1765 0.1483 0.2257 0.1591 \beta_{m} 0.8922 0.8374 0.9007 0.7289 0.5219 0.4328 \mathcal{L}_{E} 0.1267 0.0529 0.1831 0.0772 0.2412 0.1841 \mathcal{L}_{F} 0.1387 0.0546 0.1796 0.0797 0.2981 0.1902 \mathcal{L}_{Q} 10.143 14.0537 9.893 13.2624 7.8503 11.5561 In light of this it is natural to define the VHARKCV model, in which we borrow the VHAR model above to predict the integrated covariance matrix, however we use kernel covariance estimator:
\displaystyle IC_{t+1}=\alpha+\beta_{d}\widehat{KCV}_{t}+\beta_{w}\widehat{KCV% }_{t5:t}+\beta_{m}\widehat{KCV}_{t22:t}+\epsilon_{t+1}, (28) where \widehat{KCV} is the halfvectorized Cholesky decomposition of the kernel covariance estimator in (3). We benchmark the VHARKCV against the VHAR.
In line with Symitsi et al. (2018) we evaluate forecasting ability of the the VHARKCV model (28) based on three multivariate loss functions and compare its performance to the performance of the benchmark VHAR model (27). We use the Euclidean loss function, \mathcal{L}_{E}, which is equallyweighted elements of the forecast error matrix; the Frobenius distance, \mathcal{L}_{F}, which is the extension of the mean squared error to the multivariate space and the multivariate quasilikelihood loss function, \mathcal{L}_{Q}, which is scale invariant:
\displaystyle\mathcal{L}_{E}=vech(\Sigma_{t}H_{t})^{\prime}vech(\Sigma_{t}H_% {t}), (29) \displaystyle\mathcal{L}_{F}=Tr[(\Sigma_{t}H_{t})^{\prime}(\Sigma_{t}H_{t})], (30) \displaystyle\mathcal{L}_{Q}=\logH_{t}+Tr(H_{t}^{1}\Sigma_{t}). (31) Here Tr denotes the trace of square matrix, \Sigma_{t} denotes the integrated covariance matrix at time t and H_{t} is time t matrix of conditional covariance forcasts.
Results are reported in Tables 3. It is clear that for all forecasting horizons, the VHARKCV model outperforms the VHAR model which was shown to be the best model for forecasting covariance matrix in large study by Symitsi et al. (2018).
6 Concluding Remarks
Inspired by the kernel filtering of spot volatility, in this paper we develop estimators of spot covariances for two types of the underlying price process: continuous and discontinuous semimartingales. We show the asymptotic normality of the estimators. An important result is that we are able to attain the convergence rate for both estimators, which is \sqrt{n}. The convergence rate of spot covariance matrix estimator for continuous martingales in a setup with microstructure noise proposed by Bibinger et al. (2017) is, in turn, \sqrt{n^{4}}. In financially realistic scenarios, we conduct Monte Carlo experiments to study the finite sample properties of our estimators. In addition, we investigate one of the possible applications of the estimator, the forecasting of covariance matrix. We conclude that our estimator performs better in the context of forecasting than the benchmark realized covariance estimator of BarndorffNielsen and Shephard (2004a). One of the possible extensions of the estimators is to consider a marketmicrostructure noise.
References
 AïtSahalia et al. (2010) AïtSahalia Y, Fan J, Xiu D. 2010. Highfrequency estimates with noisy and asynchronous financial data. Journal of the American Statistical Association 105: 15041516.
 Alexander et al. (2018) Alexander C. 2008. Market risk analysis (vol. 2): practical financial econometrics. Chichester: John Wiley & Sons.
 Andersen et al. (2013) Andersen TG, Bollerslev T, Christoffersen PF, and Diebold FX. 2013. Financial risk measurement for financial risk management. Handbook of the Economics of Finance 53: 11271220.
 BarndorffNielsen et al. (2004a) BarndorffNielsen OE, Shephard N. 2004a. Econometric analysis of realised covariation: High frequency based covariance, regression and correlation in financial economics. Econometrica 72: 885â925.
 BarndorffNielsen et al. (2002) BarndorffNielsen, OE, Shephard N. 2002. Econometric analysis of realized volatility and its use in estimating stochastic volatility models. Journal of the Royal Statistical Society 64: 253280.
 Bates et al. (1996) Bates D. 1996. Jumps and stochastic volatility: the exchange rate processes implicit in Deutschemark options. The Review of Financial Studies 9: 69107.
 Bibinger et al. (2017) Bibinger M, Hautsch N, Malec P, Reiss M. 2017. Estimating the spot covariation of asset prices â statistical theory and empirical evidence. Journal of Business and Economic Statistics. 15041516.
 Bibinger et al. (2014) Bibinger M, Reiss M. 2014. Spectral estimation of covolatility from noisy observations using local Weights. Scandinavian Journal of Statistics 6: 2350.
 Bibinger et al. (2015) Bibinger M, and Winkelmann L. 2015. Econometrics of cojumps in high frequency data with noise. Journal of Econometrics 184: 361378.
 Bos et al. (2012) Bos CS, Janus P, Koopman SJ. 2012. Spot variance path estimation and its application to highfrequency jump testing. Journal of Financial Econometrics 10: 354389.
 Chiriac et al. (2011) Chiriac R, Voev V. 2011. Modelling and forecasting multivariate realized volatility. Journal of Applied Econometrics 26: 922947.
 Christensen et al. (2013) Christensen K, Podolskij M, Vetter M. 2013. On covariation estimation for multivariate continuous Itô semimartingales with noise in nonsynchronous observation schemes. Journal of Multivariate Analysis 120: 5984.
 Cline et al. (1991) Cline DBH, Hart JD. 1991. Kernel estimation of densities with discontinuities or discontinuous derivatives. Statistics 22: 6984.
 Corsi et al. (2009) Corsi F. 2009. A simple approximate longmemory model of realized volatility. Journal of Financial Econometrics 7: 174196.
 Cox et al. (1985) Cox J, Ingersoll J, Ross S. 1985. A theory of the term structure of interest rates. Econometrica 53: 385407.
 Fan et al. (2008) Fan J, Wang Y. 2008. Spot volatility estimation for highfrequency data. Statistics and Its Interface 1: 279288.
 Foster et al. (1996) Foster DP, Nelson DB. 1996. Continuous record asymptotics for rolling sample variance estimators. Econometrica 64: 139174.
 Hayashi et al. (2011) Hayashi T, Yoshida N. 2011. Nonsynchronous covariation process and limit theorems. Stochastic processes and their applications 121: 24162454.
 Heston et al. (1993) Heston SL. 1993. A closedform solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6: 327343.
 Hull et al. (1987) Hull J, White A. 1987. The pricing of options on assets with stochastic volatility. Journal of Finance 42: 281300.
 Kanaya et al. (2016) Kanaya S, Kristensen D. 2016. Estimation of stochastic volatility models by nonparametric filtering. Econometric Theory 32: 861916.
 Karatzas et al. (1999) Karatzas I, Shreve SE. 1999. Brownian motion and stochastic calculus. New York: Springer.
 Kloeden et al. (1999, ch.14) Kloeden P, Platen E. 1999. Numerical solutions of stochastic differential equations. Berlin: SpringerVerlag.
 Kollo et al. (2005) Kollo T, Rosen D. 2005. Advanced multivariate statistics with matrices. Dordrecht: Springer.
 Kristensen et al. (2010) Kristensen D. 2010. Nonparametric filtering of the realized spot volatility: A kernelbased approach. Econometric Theory 26: 60–93.
 Lutkeohl et al. (1996) Lütkeohl, H. 1996. Handbook of matrices. Chichester: John Wiley & Sons Ltd.
 Mykland et al. (2008) Mykland PA, Zhang L. 2008. Inference for Volatilitytype objects and implications for hedging. Statistics and Its Interface 1: 255278
 Revuz et al. (1998, ch.5) Revuz D, Yor M. 1998. Continuous martingales and Brownian motion. Berlin: SpringerVerlag.
 Silverman et al. (1986) Silverman BW. 1986. Density estimation for statistics and data analysis. New York: Chapman and Halls.
 Symitsi et al. (2018) Symitsi E, Symeonidis L, Kourtis A, Markellos R. 2018. Covariance forecasting in equity markets. Journal of Banking and Finance 96: 153168.
 Uhlenbeck et al. (1930) Uhlenbeck GE, Ornstein LS. 1930. On the theory of Brownian Motion. Phys.Rev 36: 82341.
 Yu et al. (2014) Yu C, Fang Y, Li Z, Zhao X. 2014. Nonparametric estimation of highfrequency spot volatility for Brownian semimartingale with jumps. Journal of Time Series Analysis 35: 572591.
 Zhang et al. (2011) Zhang L. 2011. Estimating covariation: Epps effect and microstructure noise. Journal of Econometrics 160: 3347.
 Zu et al. (2014) Zu Y, Boswijk HP. 2014. Estimating spot volatility with highfrequency financial data. Journal of Econometrics 181: 117135.
Appendix A Proof of Theorem 1
A.1 Notation
In a similar way to BarndorffNielsen and Shephard (2004a) for the purpose of simplifying the proof we will use the index (or equivalently, tensor) notation instead of vector or matrix notation. We rewrite the d stochastic processes X_{k} (k=1,\cdots,d) in equation (1) in index notation as
X_{k}(t)=\int_{0}^{t}\mu_{k}(s)ds+\int_{0}^{t}\theta_{k}^{a}(u)dW_{a}(s), (32) with initial condition X_{k}(0)=0. Here
\Theta(t)=\{\theta_{(k)}^{a}(t)\}_{k,a=1,2,\cdots,d}. In the index notation the Einstein summation convention is used, which means if an index variable appears twice in a single expression then it implies summation over that index. Thus (32) is understood to mean
X_{k}(t)=\int_{0}^{t}\mu_{(k)}(s)ds+\sum_{a=1}^{d}\int_{0}^{t}\theta_{(k)}^{a}% (s)dW_{a}(s). (33) We apply summation convention to indices a,b,c,d, but not to indices k,l,k^{\prime},l^{\prime}, unless otherwise specified. Furthermore, we write
\theta_{kl}^{ab}=\theta_{k}^{a}\theta_{l}^{b}, (34) with similar notation for other index combination. In (34) no superscripts or subscripts are repeated and so no summation operator is generated. Combining the Einstein summation convention and the notional rule for \theta_{kl}^{ab}, the (k,l)th element of the spot covalatility matrix of model (1) is
\Sigma_{kl}(t)=\theta_{kl}^{aa}=\sum_{a=1}^{d}\theta_{k}^{a}(t)\theta_{l}^{a}(% t). (35) A.2 Mean and variances
The proof of Theorem 1 consists of several steps. First step is to derive the means and covariances of the variates
\displaystyle\widehat{KCV}_{kl}(\tau) \displaystyle= \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k}(t_{i1})\Delta X_{l% }(t_{i1}) (36) \displaystyle= \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\left(X_{k}(t_{i})X_{k}(t_{i1}% )\right)\left(X_{l}(t_{i})X_{l}(t_{i1})\right), (37) with k,l=1,2,\cdots,d. Next, the Theorem 1 is proved for the case, where the mean processes \mu_{k} (k=1,\cdots,d) are identically 0. Finally, the latter restriction is lifted. The proof is componentwise and based on the results and techniques employed by BarndorffNielsen and Shephard (2004a) and Kristensen (2010).
We start by computing the expectation of \widehat{KCV}_{kl}(\tau) in equation (37).\displaystyle\mathrm{E}\left[\widehat{KCV}_{kl}(\tau)\right] \displaystyle= \displaystyle\mathrm{E}\left[\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\left(X_{k}(t_{i% })X_{k}(t_{i1})\right)\left(X_{l}(t_{i})X_{l}(t_{i1})\right)\right] (38) \displaystyle= \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\mathrm{E}\left[\left(X_{k}(t_{i% })X_{k}(t_{i1})\right)\left(X_{l}(t_{i})X_{l}(t_{i1})\right)\right] \displaystyle= \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\int_{t_{i1}}^{t_{i}}\theta_{kl% }^{aa}(s)ds, where the final equation is due to the results of BarndorffNielsen and Shephard (2004a):
\mathrm{E}\left[\Delta X_{k}(t_{i1})\Delta X_{l}(t_{i1})\right]=\int_{t_{i1% }}^{t_{i}}\theta_{kl}^{aa}(s)ds. (39) Next, we apply Lemma 5 and have
\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\int_{t_{i1}}^{t_{i}}\theta_{kl}^{aa}(s)ds=% \int_{0}^{T}K_{h}(s\tau)\theta_{kl}^{aa}(s)ds+o(\sqrt{\delta}). (40) Thus
\mathrm{E}\left[\widehat{KCV}_{kl}(\tau)\right]=\int_{0}^{T}K_{h}(s\tau)% \theta_{kl}^{aa}(s)ds. (41) In order to compute the covariance of \widehat{KCV}_{kl}(\tau) in equation (37) we use the following results from BarndorffNielsen and Shephard (2004a):
\displaystyle\text{Cov}\left\{\left[\Delta X_{k}(t_{i1})\Delta X_{l}(t_{i1}% \right],\left[\Delta X_{k^{\prime}}(t_{i1})\Delta X_{l^{\prime}}(t_{i1}% \right]\right\}= (42) \displaystyle\int_{t_{i1}}^{t_{i}}\theta_{kk^{\prime}}^{aa}(s)ds\int_{t_{i1}% }^{t_{i}}\theta_{kl^{\prime}}^{cc}(s)ds+\int_{t_{i1}}^{t_{i}}\theta_{kl^{% \prime}}^{aa}(s)ds\int_{t_{i1}}^{t_{i}}\theta_{lk^{\prime}}^{cc}(s)ds. (43) Now, using the definition of covariance and equations (41), (40) and (42) we have
\displaystyle\text{Cov}\left\{\widehat{KCV}_{kl}(\tau),\widehat{KCV}_{k^{% \prime}l^{\prime}}(\tau)\right\} \displaystyle= \displaystyle\text{Cov}\left\{[\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k}(t% _{i1})\Delta X_{l}(t_{i1})],[\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k^{% \prime}}(t_{i1})\Delta X_{l^{\prime}}(t_{i1})]\right\} \displaystyle= \displaystyle\mathrm{E}\{[\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k}(t_{i1% })\Delta X_{l}(t_{i1})\int_{0}^{T}K_{h}(s\tau)\theta^{aa}_{kl}(s)ds] \displaystyle\times \displaystyle[\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k^{\prime}}(t_{i1})% \Delta X_{l^{\prime}}(t_{i1})\int_{0}^{T}K_{h}(s\tau)\theta^{cc}_{k^{\prime% }l^{\prime}}(s)ds]\} \displaystyle= \displaystyle\sum_{i=1}^{n}K_{h}^{2}(t_{i1}\tau)\{\int_{t_{i1}}^{t_{i}}% \theta^{aa}_{kk^{\prime}}(s)ds\int_{t_{i1}}^{t_{i}}\theta^{cc}_{ll^{\prime}}(% s)ds+\int_{t_{i1}}^{t_{i}}\theta^{aa}_{kl^{\prime}}(s)ds\int_{t_{i1}}^{t_{i}% }\theta^{cc}_{lk^{\prime}}(s)ds\}. Apply Lemma 5 and invoke Riemann integration
\displaystyle\delta^{1}\sum_{i=1}^{n}K_{h}^{2}(t_{i1}\tau)\left\{\int_{t_{i% 1}}^{t_{i}}\theta^{aa}_{kk^{\prime}}(s)ds\int_{t_{i1}}^{t_{i}}\theta^{cc}_{% ll^{\prime}}(s)ds+\int_{t_{i1}}^{t_{i}}\theta^{aa}_{kl^{\prime}}(s)ds\int_{t_% {i1}}^{t_{i}}\theta^{cc}_{lk^{\prime}}(s)ds\right\} \displaystyle\to\int_{0}^{T}K_{h}^{2}(s\tau)\Omega_{kl,k^{\prime}l^{\prime}}(% s)ds, where \Omega(s) is given in equation (10).
A.3 Asymptotic normality
To prove the results of Theorem 1 in the case where the mean processes \mu_{k} are identically 0, we apply CramerWold device, i.e. it suffices to show that for any real constants a^{kl} we have, as \delta\to 0
\frac{1}{\sqrt{\delta}}(a^{kl}(\widehat{KCV}_{kl}(\tau)\int_{0}^{T}K_{h}(s% \tau)\theta_{kl}^{aa}(s)ds)\overset{\mathcal{L}}{\to}\mathcal{N}(0,a^{kl}a^{k^% {\prime}l^{\prime}}(\int_{0}^{T}K_{h}^{2}(s\tau)\Omega_{kl,k^{\prime}l^{% \prime}}(s)ds)). (44) We rewrite (44) as
\displaystyle\frac{1}{\sqrt{\delta}}\sum_{i=1}^{n}a^{kl}(K_{h}(t_{i1}\tau)% \Delta X_{k}(t_{i1})\Delta X_{l}(t_{i1})\int_{t_{i1}}^{t_{i}}K_{h}(s\tau)% \theta^{aa}_{kl}(s)ds) \displaystyle\overset{\mathcal{L}}{\to}\mathcal{N}(0,a^{kl}a^{k^{\prime}l^{% \prime}}\sum_{i=1}^{n}\int_{t_{i1}}^{t_{i}}K_{h}^{2}(s\tau)\Omega_{kl,k^{% \prime}l^{\prime}}(s)ds). (45) Here we apply the Einstein summation convention also to the indices k,l. By the above calculations,
\displaystyle\text{Var}\left\{\sum_{i=1}^{n}a^{kl}(K_{h}(t_{i1}\tau)\Delta X% _{k}(t_{i1})\Delta X_{l}(t_{i1})\int_{t_{i1}}^{t_{i}}K_{h}(s\tau)\theta^{% aa}_{kl}(s)ds)\right\} \displaystyle\to a^{kl}a^{k^{\prime}l^{\prime}}\sum_{i=1}^{n}\int_{t_{i1}}^{t% _{i}}K_{h}^{2}(s\tau)\Omega_{kl,k^{\prime}l^{\prime}}(s)ds. (46) Now we apply the results of LinderbergFeller Central Limit Theorem for triangular arrays of independent random variables x_{n1},\cdots,x_{nk_{n}}(n=1,2,\cdots,i=1,2,\cdots,k_{n} with \ k_{n}\to\infty) and let x_{n}=x_{n1}+\cdots+x_{nk_{n}}. We state the Corollary 3 from BarndorffNielsen and Shephard (2004a) below.
Corollary 1.
Suppose that \mathrm{E}[y_{ni}]=0 for all n and i and there exists a nonnegative number v that \text{Var}[x_{n}]\to v for n\to\infty. Then
y_{n}\overset{\mathcal{L}}{\to}\mathcal{N}(0,v) (47) if and only if
\sum_{i=1}^{k_{n}}\mathrm{E}[y_{ni}^{2}\mathbf{1}_{(\psi,\infty)}(y_{ni})]% \to 0\ \ \text{as}\ \ n\to\infty (48) for an arbitrary \psi>0.
A Lyapunov condition is sufficient for (48), that is
\sum_{i=1}^{k_{n}}\mathrm{E}[y_{ni}^{2+\epsilon}]\to 0 (49) for some \epsilon>0. Now, let
y_{ni}=\frac{1}{\sqrt{\delta}}a^{kl}\{K_{h}(t_{i1}\tau)\Delta X_{k}(t_{i1})% \Delta X_{l}(t_{i1})\int_{t_{i1}}^{t_{i}}K_{h}(s\tau)\theta^{aa}_{kl}(s)ds\}, (50) we find
\displaystyle y_{ni} \displaystyle\overset{\mathcal{L}}{=} \displaystyle\frac{1}{\sqrt{\delta}}a^{kl}\{K_{h}(t_{i1}\tau)\sqrt{\int_{t_{% i1}}^{t_{i}}\theta^{aa}_{kk}(s)ds}\sqrt{\int_{t_{i1}}^{t_{i}}\theta_{ll}^{cc% }(s)ds}U_{kj}U_{lj} (51) \displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ % \ \ \ \ \ \ \ \int_{t_{i1}}^{t_{i}}K_{h}(s\tau)\theta^{aa}_{kl}(s)ds\} \displaystyle\overset{\mathcal{L}}{=} \displaystyle\frac{1}{\sqrt{\delta}}a^{kl}\{K_{h}(t_{i1}\tau)\sqrt{\int_{t_{% i1}}^{t_{i}}\theta^{aa}_{kk}(s)ds}\sqrt{\int_{t_{i1}}^{t_{i}}\theta^{cc}_{ll% }(s)ds}U_{kj}U_{lj} \displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ % \ \ \ \ K_{h}(t_{i1}\tau)\int_{t_{i1}}^{t_{i}}\theta^{aa}_{kl}(s)ds\} \displaystyle\overset{\mathcal{L}}{=} \displaystyle\frac{1}{\sqrt{\delta}}\{a^{kl}K_{h}(t_{i1}\tau)(\sqrt{\int_{t_% {i1}}^{t_{i}}\theta^{aa}_{kk}(s)ds}\sqrt{\int_{t_{i1}}^{t_{i}}\theta^{cc}_{% ll}(s)ds}(U_{kj}U_{lj}\rho_{kl})\} \displaystyle\overset{\mathcal{L}}{=} \displaystyle\sqrt{\delta}a^{kl}\{K_{h}(t_{i1}\tau)\sqrt{\widehat{\Gamma}_{% ki}\widehat{\Gamma}_{li}}(U_{kj}U_{lj}\rho_{kl})\}, where
\tilde{\Gamma}_{ki}=\frac{1}{\delta}\int_{t_{i1}}^{t_{i}}\theta^{aa}_{kk}(s)ds (52) (53) and
\rho_{kl}=\frac{\int_{t_{i1}}^{t_{i}}\theta^{aa}_{kl}(s)ds)}{\sqrt{\int_{t_{i% 1}}^{t_{i}}\theta^{cc}_{kk}(s)ds}\sqrt{\int_{t_{i1}}^{t_{i}}\theta^{dd}_{ll}% (s)ds}} (54) is the correlation coefficient between U_{k} and U_{l}. By our Assumption on the process \Sigma, as \delta varies the quantities \Gamma are bounded away from 0 and infinity, uniformly in k and j. This implies that
\mathrm{E}[a^{kl}K_{h}(t_{i1}\tau)\sqrt{\widehat{\Gamma}_{ki}\widehat{% \Gamma}_{li}}(U_{kj}U_{lj}\rho_{kl}))^{2+\epsilon}] (55) is uniformly bounded above, and hence, by (51), we have
\sum_{i=1}^{n}\mathrm{E}[y_{ni}^{2+\epsilon}]\to 0 (56) as to be shown. Next, we show that the effect of a nonzero drift term is negligible:
\displaystyle\widehat{KCV}_{kl}(\tau)\widehat{KCV}^{*}_{kl}(\tau)=\sum_{i=1}^% {n}K_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{k}(s)ds)(\int_{t_{i1}}^{t_{% i}}\mu_{l}(s)ds) \displaystyle+\sum_{i=1}^{n}K_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{k}(% s)ds)(\int_{t_{i1}}^{t_{i}}\theta_{l}(s)dW(s)) \displaystyle+\sum_{i=1}^{n}K_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{l}(% s)ds)(\int_{t_{i1}}^{t_{i}}\theta_{k}(s)dW(s)). (57) By Lemma 5 the first term in equation (57) is
\displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{k}(s% )ds)(\int_{t_{i1}}^{t_{i}}\mu_{l}(s)ds) \displaystyle=\delta\int_{0}^{T}K_{h}(s\tau)\mu_{k}(s)\mu_{l}(s)ds+o(\delta). (58) The second term is
\displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{k}(s% )ds)(\int_{t_{i1}}^{t_{i}}\theta_{l}(s)dW(s)) \displaystyle\sim\mathcal{N}\left(0,\sum_{i=1}^{n}K^{2}_{h}(t_{i1}\tau)(\int% _{t_{i1}}^{t_{i}}\mu_{k}(s)ds)^{2}\int_{t_{i1}}^{t_{i}}\theta_{ll}^{cc}(s)ds\right) and, similarly, the third term
\displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{l}(s% )ds)(\int_{t_{i1}}^{t_{i}}\theta_{k}(s)dW(s)) \displaystyle\sim\mathcal{N}\left(0,\sum_{i=1}^{n}K^{2}_{h}(t_{i1}\tau)\int_% {t_{i1}}^{t_{i}}\mu_{l}(s)ds)^{2}\int_{t_{i1}}^{t_{i}}\theta_{kk}^{aa}(s)ds% \right), where
\displaystyle\sum_{i=1}^{n}K^{2}_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{% k}(s)ds)^{2}\int_{t_{i1}}^{t_{i}}\theta_{ll}^{cc}(s)ds \displaystyle\leq\delta\sup_{s}\theta_{ll}^{cc}(s)\times\sum_{i=1}^{n}K^{2}_{h% }(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{k}(s)ds)^{2} \displaystyle=\delta^{2}\sup_{s}\theta_{ll}^{cc}(s)\times\left(\int_{0}^{T}K^{% 2}_{h}(s\tau)\mu_{k}^{2}(s)ds+o(1)\right) and
\displaystyle\sum_{i=1}^{n}K^{2}_{h}(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{% l}(s)ds)^{2}\int_{t_{i1}}^{t_{i}}\theta_{kk}^{aa}(s)ds \displaystyle\leq\delta\sup_{s}\theta_{kk}^{aa}(s)\times\sum_{i=1}^{n}K^{2}_{h% }(t_{i1}\tau)(\int_{t_{i1}}^{t_{i}}\mu_{l}(s)ds)^{2} \displaystyle=\delta^{2}\sup_{s}\theta_{kk}^{aa}(s)\times\left(\int_{0}^{T}K^{% 2}_{h}(s\tau)\mu_{l}^{2}(s)ds+o(1)\right). Appendix B Proof of Theorem 2
The convergence results in the proof of Theorem 1 still holds when h\to 0. Now, we consider the shrinking bandwidth, h\to 0, and we derive the means and covariances of the varieties
\displaystyle\hat{\Sigma}_{kl}(\tau) \displaystyle= \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\Delta X_{k}(t_{i1})\Delta X_{l% }(t_{i1}) (59) \displaystyle= \displaystyle\sum_{i=1}^{n}K_{h}(t_{i1}\tau)\left(X_{k}(t_{i})X_{k}(t_{i1}% )\right)\left(X_{l}(t_{i})X_{l}(t_{i1})\right). (60) Following the proof of Theorem 1 and applying Lemma 6 we obtain:
\sum_{i=1}^{n}K_{h}(t_{i1}t)\int_{t_{i}1}^{t_{i}}\theta_{kl}^{aa}(s)dds=% \theta_{kl}^{aa}+h^{m+\gamma}\mathcal{K}(\tau,0)\int_{\mathbb{R}}K(z)z^{m+% \gamma}dz+O\left(\frac{\delta}{h}\right)+O(h^{m+\gamma}) where \mathcal{K}(\tau,0) denotes ”Lipschitz coefficient” of \theta_{kl}(s). Thus we have:
\mathrm{E}\left[\hat{\Sigma}_{kl}(\tau)\right]=\Sigma_{kl}(\tau). (61) For deriving the covariance of the varieties in (60) we use the following result from proof of theorem 1:
\displaystyle\text{Cov}\left\{\hat{\Sigma}_{kl}(\tau),\hat{\Sigma}_{k^{\prime}% l^{\prime}}(\tau)\right\}= \displaystyle\sum_{i=1}^{n}K_{h}^{2}(t_{i1}\tau)\{\int_{t_{i1}}^{t_{i}}% \theta^{aa}_{kk^{\prime}}(s)ds\int_{t_{i1}}^{t_{i}}\theta^{cc}_{ll^{\prime}}(% s)ds+\int_{t_{i1}}^{t_{i}}\theta^{aa}_{kl^{\prime}}(s)ds\int_{t_{i1}}^{t_{i}% }\theta^{cc}_{lk^{\prime}}(s)ds\}. Now we using Lemma 6 and invoking Riemann integration we obtain:
\displaystyle\delta^{1}h\sum_{i=1}^{n}K_{h}^{2}(t_{i1}\tau)\left\{\int_{t_{% i1}}^{t_{i}}\theta^{aa}_{kk^{\prime}}(s)ds\int_{t_{i1}}^{t_{i}}\theta^{cc}_{% ll^{\prime}}(s)ds+\int_{t_{i1}}^{t_{i}}\theta^{aa}_{kl^{\prime}}(s)ds\int_{t_% {i1}}^{t_{i}}\theta^{cc}_{lk^{\prime}}(s)ds\right\} \displaystyle\to\Omega_{kl,k^{\prime}l^{\prime}}(\tau)\int_{\mathbb{R}}K^{2}(z% )dz, where \Omega_{k,l,k^{\prime},l^{\prime}}(\tau) is defined in equation (10). One can easily show the asymptotic normality by following Section A.3 in the proof of Theorem 1 by applying CramerWold device, i.e. to show that for any real constants a^{kl} we have, as \delta\to 0 and h\to 0:
{\sqrt{\delta^{1}h}}(a^{kl}(\widehat{\Sigma}_{kl}(\tau)\Sigma_{kl}(\tau))% \overset{\mathcal{L}}{\to}\mathcal{N}(0,a^{kl}a^{k^{\prime}l^{\prime}}\Omega_{% kl,k^{\prime}l^{\prime}}(\tau)\int_{\mathbb{R}}K^{2}(z)dz). (62) Appendix C Proof of Theorem 3
Here we follow the notation in section A.1 with \Sigma_{kl}(t) denoting the (k,l)th element of spot covariance matrix at time t (see equation (35)). We first derive the asymptotic distribution of elements (\Sigma_{kl})_{k,l=1,...d} of the covariance matrix by following Yu et al. (2014) and then using CramérWold theorem prove multivariate convergence in distribution using univariate results.
Let \widehat{TCV}_{kl} denote the (k,l)th component of the estimator and X^{*} denote the diffusion part of X. So, we have
\displaystyle\sqrt{n}\frac{\widehat{TCV}_{kl}\int_{0}^{T}K_{h}(st)\Sigma_{kl% }(s)ds}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{\prime}}(s)ds}} (63) \displaystyle= \displaystyle\sqrt{n}\frac{\sum_{i=1}^{n}K_{h}(t_{i1}t)\Delta_{i1}X^{*}_{kl% }\Delta_{i1}X^{*}_{k^{\prime}l^{\prime}}\mathbbm{1}_{\{\Delta_{i1}N=0\}}% \int_{0}^{T}K_{h}(st)\Sigma_{kl}(s)ds}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega% _{kl,k^{\prime}l^{\prime}}(s)ds}} \displaystyle= \displaystyle\sqrt{n}\frac{\sum_{i=1}^{n}K_{h}(t_{i1}t)\Delta_{i1}X^{*}_{kl% }\Delta_{i1}X^{*}_{k^{\prime}l^{\prime}}\int_{0}^{T}K_{h}(st)\Sigma_{kl}(s)% ds}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{\prime}}(s)ds}} \displaystyle \displaystyle\sqrt{n}\frac{\sum_{i=1}^{n}K_{h}(t_{i1}t)\Delta_{i1}X^{*}_{kl% }\Delta_{i1}X^{*}_{k^{\prime}l^{\prime}}\mathbbm{1}_{\{\Delta_{i1}N\neq 0\}}% }{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{\prime}}(s)ds}}. The first term in equation (63) for the fixed h, as \delta\to 0 is
\sqrt{n}\frac{\sum_{i=1}^{n}K_{h}(t_{i1}t)\Delta_{i1}X^{*}_{kl}\Delta_{i1}% X^{*}_{k^{\prime}l^{\prime}}\int_{0}^{T}K_{h}(st)\Sigma_{kl}(s)ds}{\sqrt{% \int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{\prime}}(s)ds}}\overset{% \mathcal{L}}{\to}N(0,1). (64) Now, the assumption 4 states that the kernel K is bounded K_{h}(t_{i1}t_{i})\leq\Lambda/h for some constant \Lambda. the number of jumps occurring over the interval [0,T] is finite. Then the second term in equation (63)
\displaystyle\sqrt{n}\frac{\left\sum_{i=1}^{n}K_{h}(t_{i1}t)\Delta_{i1}X^{% *}_{kl}\Delta_{i1}X^{*}_{k^{\prime}l^{\prime}}\mathbbm{1}_{\{\Delta_{i1}N% \neq 0\}}\right}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{% \prime}}(s)ds}} \displaystyle\leq (65) \displaystyle\sqrt{n}\frac{\frac{\Lambda}{h}\sum_{i=1}^{n}\Delta_{i1}X^{*}_{% kl}\Delta_{i1}X^{*}_{k^{\prime}l^{\prime}}\mathbbm{1}_{\{\Delta_{i1}N\neq 0% \}}}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{\prime}}(s)ds}} \displaystyle\leq \displaystyle\sqrt{n}\frac{N_{T}\times\frac{\Lambda}{h}\times\sup\int_{t_{i1}% }^{t_{i}}\theta_{kl}(s)dW_{s}\int_{t_{i1}}^{t_{i}}\theta_{k^{\prime}l^{\prime% }}(s)dB_{s}}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{\prime}}(% s)ds}}. Here the integral \int_{0}^{t}\theta_{kl}(s)dW_{s}, \int_{0}^{t}\theta_{k^{\prime}l^{\prime}}(s)dB_{s} are time changed Brownian motion and by the Levy law of the modulus of continuity of Brownian motion’s path Karatzas and Shreve (1999), for small s we have
\sup_{i\in{1,...,n}}\frac{\left\int_{t_{i1}}^{t_{i}}\theta_{kl}(s)dW_{s}% \right}{\sqrt{2\delta\log\frac{1}{\delta}}}\leq\sqrt{M},\ \ \ \ \sup_{i\in{1,% ...,n}}\frac{\left\int_{t_{i1}}^{t_{i}}\theta_{k^{\prime}l^{\prime}}(s)dB_{s% }\right}{\sqrt{2\delta\log\delta^{1}}}\leq\sqrt{L}, (66) where M,L are a nonnegative constants. Therefor the last term in equation (63) is
\displaystyle\sqrt{n}\frac{\left\sum_{i=1}^{n}K_{h}(t_{i1}t)\Delta_{i1}X^{% *}_{kl}\Delta_{i1}X^{*}_{k^{\prime}l^{\prime}}\mathbbm{1}_{\{\Delta_{i1}N% \neq 0\}}\right}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_{kl,k^{\prime}l^{% \prime}}(s)ds}}\leq (67) \displaystyle\sqrt{n}\frac{N_{T}\times\frac{\Lambda}{h}\times\sqrt{M}\times% \sqrt{L}\times 2\delta\log\delta^{1}}{\sqrt{\int_{0}^{T}K^{2}_{h}(st)\Omega_% {kl,k^{\prime}l^{\prime}}(s)ds}}\overset{P}{\to}0. To prove multivariate convergence, given that we have asymptotic distribution of the elements of the covariance matrix, we employ CramérWold device:
Lemma 4.
For any real a\in\mathbb{R}^{d\times d}, as \delta\to 0
a^{T}\left(\frac{1}{\sqrt{\delta}}(\widehat{TCV}(t)\int_{0}^{T}K_{h}(st)% \Sigma(s)ds)\right)\overset{\mathcal{L}}{\to}a^{T}N\left(0,\int_{0}^{T}K_{h}^{% 2}(st)\Omega(s)ds\right)a. (68) Proof.
This follows from univariate case, since a^{kl}\left(\frac{1}{\sqrt{\delta}}(\widehat{TCV}_{kl}(t)\int_{0}^{T}K_{h}(s% t)\Sigma_{kl}(s)ds)\right) for k,l=1,...,d are independent and identically distributed with mean 0 and variance a^{kl}a^{k^{\prime}l^{\prime}}\int_{0}^{T}K_{h}^{2}(st)\Omega_{kl,k^{\prime}l% ^{\prime}}(s)ds. ∎
Appendix D Lemmas
We rewrite the Lemma 6 and Lemma 7 from Kristensen (2010) in terms of the components of covariance matrix.
Lemma 5.
Under Assumption 2 and Assumption 3, we have for every k,l=1,\cdots,d
\displaystyle(i)\ \ \sum_{i=1}^{n}K_{h}(t_{i1}\tau)\int_{t_{i1}}^{t_{i}}% \Sigma_{kl}(s)ds=\int_{0}^{T}K_{h}(s\tau)\Sigma_{kl}(s)ds+o(\delta)\bar{K}_{1}, \displaystyle(ii)\ \ \delta^{1}\sum K_{h}^{2}(t_{i1}\tau)\left(\int_{t_{i1% }}^{t_{i}}\Sigma_{kl}(s)ds\right)^{2}=\int_{0}^{T}K_{h}^{2}(s\tau)\Sigma_{kl}% ^{2}(s)ds+o(1)\times\bar{K}_{0} \displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ % \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +O(\delta)\times\bar{K}_{1} uniformly over \tau\in[0,T], as \delta\to 0.
Proof.
See Kristensen (2010) Lemma 6. ∎
Lemma 6.
Under Assumption 4 a, b and Assumption 5, uniformly over \tau\in[a,Ta], as \delta,h,a/h\to 0 we have:
\displaystyle(i)\ \ \sum_{i=1}^{n}K_{h}(t_{i1}\tau)\int_{t_{i1}}^{t_{i}}% \Sigma_{kl}(s)ds=\Sigma_{kl}(\tau)+h^{m+\gamma}\mathcal{K}(\tau,0)\int_{% \mathbb{R}}K(z)z^{m+\gamma}dz+ \displaystyle\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ % \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ O\left(\frac{\delta}{h}\right)+o(h^{m+\gamma}), \displaystyle(ii)\ \ \sum K_{h}^{2}(t_{i1}\tau)\left(\int_{t_{i1}}^{t_{i}}% \Sigma_{kl}(s)ds\right)^{2}=\frac{\delta}{h}\Sigma_{kl}^{2}(\tau)\int_{\mathbb% {R}}K^{2}(z)dz+O\left(\frac{\delta^{1+\gamma}}{h}\right)+O\left(\frac{\delta^{% 2}}{h^{2}}\right). Proof.
See Kristensen (2010) Lemma 7. ∎
