Lowrank diffusion matrix estimation for highdimensional timechanged Lévy processes
Abstract
The estimation of the diffusion matrix of a highdimensional, possibly timechanged Lévy process is studied, based on discrete observations of the process with a fixed distance. A lowrank condition is imposed on . Applying a spectral approach, we construct a weighted leastsquares estimator with nuclearnormpenalisation. We prove oracle inequalities and derive convergence rates for the diffusion matrix estimator. The convergence rates show a surprising dependency on the rank of and are optimal in the minimax sense for fixed dimensions. Theoretical results are illustrated by a simulation study.
Abstract
[language=french] In French.
Lowrank diffusion matrix estimation for Lévy processes
and
Volatility estimation, Lassotype estimator, minimax convergence rates, nonlinear inverse problem, oracle inequalities, timechanged Lévy process
1 Introduction
Nonparametric statistical inference for Lévytype processes has been attracting the attention of researchers for many years initiated by the works of Rubin and Tucker [35] as well as Basawa and Brockwell [5]. The popularity of Lévy processes is related to their simplicity on the one hand and the ability to reproduce many stylised patterns presented in economic and financial data or to model stochastic phenomena in biology or physics, on the other hand. While nonparametric inference for one dimensional Lévy processes is nowadays well understood (see e.g. the lecture notes [7]), there are surprisingly few results for multidimensional Lévy or related jump processes. Possible applications demand however for multi or even highdimensional methods. As a first contribution to statistics for jump processes in high dimensions, we investigate the optimal estimation of the diffusion matrix of a dimensional Lévy process where may grow with the number of observations.
More general, we consider a larger class of timechanged Lévy process. Let be a dimensional Lévy process and let be a nonnegative, nondecreasing stochastic process with . Then the timechanged Lévy process is defined via for . For instance, in the finance context, the change of time is motivated by the fact that some economical effects (as nervousness of the market which is indicated by volatility) can be better understood in terms of a “business time” which may run faster than the physical one in some periods (see, e.g. Veraart and Winkel [41]). In view of large portfolios and a huge number of assets traded at the stock markets we have highdimensional observations.
Monroe [28] has shown that even in the case of the Brownian motion , the resulting class of timechanged Lévy processes is rather large and basically coincides, at least in the onedimensional case, with the class of all semimartingales. Nevertheless, a practical application of this fact for statistical modelling meets two major problems: first, the change of time can be highly intricate  for instance, if has discontinuous trajectories, cf. BarndorffNielsen and Shiryaev [4]; second, the dependence structure between and can be also quite involved. In order to avoid these difficulties and to achieve identifiability, we allow to be a general Lévy processes, but assume independence of and .
A first natural question is which parameters of the underlying Lévy process can be identified from discrete observations for some as This question has been recently addressed in the literature (see Belomestny [6] and references therein) and the answer turns out to crucially depend on the asymptotic behaviour of and on the degree of our knowledge about We study the socalled lowfrequency regime, meaning that the observation distance is fixed. In this case, we have severe identifiability problems which can be easily seen from the fact that subordinated Lévy processes are again Lévy processes. Assuming the timechange is known, all parameters of the underlying Lévy process can be identified as Since any Lévy process can be uniquely parametrised by the socalled Lévy triplet with a positive semidefinite diffusion (or volatility) matrix , a drift vector and a jump measure we face a semiparametric estimation problem. Describing the covariance of the diffusion part of the Lévy process , the matrix is of particular interest. Aiming for a high dimensional setting, we study the estimation of under the assumption that it has low rank. This low rank condition reflects the idea of a few principal components driving the whole process. We also discuss the case where the diffusion matrix can be decomposed into a low rank matrix and a sparse matrix as has been suggested by Fan et al. [17, 18].
For discretely observed Lévy processes it has been recently shown that estimation methods coming from lowfrequency observations attain also in the highfrequency case, where , optimal convergence results, cf. Jacod and Reiß [21] for volatility estimation and Nickl et al. [31] for the estimation of the jump measure. In contrast to estimators which are tailormade for highfrequency observations, these lowfrequency methods thus turn out to be robust with respect to the sampling frequency. This observation is a key motivation for considering a fixed in the present article. It is very common in the literature on statistics for stochastic processes, that the observations are supposed to be equidistant. In view of the empirical results by [2], this model assumption might however be often violated in financial applications. Our observation scheme can be equivalently interpreted as random observation times of the Lévy process . For onedimensional Lévy processes general observation distances as been studied by Kappus [23].
The research area on statistical inference for discretely observed stochastic processes is rapidly growing. Let us only mention some closely related contributions while we refer to [7] for a recent overview of estimation for Lévy processes. Nonparametric estimation for timechanged Lévy models has been studied in [6, 8, 13]. In a twodimensional setting the jump measure of a Lévy process has been estimated in [12]. For inference on the volatility matrix (in small dimensions) for continuous semimartingales in a highfrequency regime we refer to [20, 10] and references therein. Large sparse volatility matrix estimation for continuous Itô processes has been recently studied in [42, 37, 38].
Our statistical problem turns out to be closely related to a deconvolution problem for a multidimensional normal distribution with zero mean and covariance matrix convolved with a nuisance distribution which is unknown except for some of its structural properties. Due to the timechange or random sampling, respectively, the normal vector is additionally multiplied with a nonnegative random variable. Hence, we face a covariance estimation problem in a generalised mixture model. Since we have no direct access to a sample of the underlying normal distribution, our situation differs considerably from the inference problems which have been studied in the literature on highdimensional covariance matrix estimation so far.
The analysis of many deconvolution and mixture models becomes more transparent in spectral domain. Our accordingly reformulated problem bears some similarity to the socalled trace regression problem and the related matrix estimation, which recently got a lot of attention in statistical literature (see, e.g. [34, 29, 1, 24]). We face a nonlinear analogue to the estimation of lowrank matrices based on noisy observations. Adapting some ideas, especially from Koltchinskii et al. [24], we construct a weighted least squares estimator with nuclear norm penalisation in the spectral domain.
We prove oracle inequalities for the estimator of implying that the estimator adapts to the lowrank condition on and that it also applies in the missspecified case of only approximated sparsity. The resulting convergence rate fundamentally depends on the timechange while the dimension may grow polynomially in the sample size. Lower bounds verify that the rates are optimal in the minimax sense for fixed dimensions. The influence of the rank of on the convergence rate reveals a new phenomenon which was not observed in multidimensional nonparametric statistics before. Namely, in a certain regime the convergence rate in is faster for a larger rank of . To understand this surprising phase transition from a more abstract perspective, we briefly discuss a related regression model.
The paper is organised as follows. In Section 2, we introduce notations and formulate the statistical model. In Section 3 the estimator for the diffusion matrix is constructed and the oracle inequalities for this estimator are derived. Based on these oracle inequalities, we derive in Section 4 upper and lower bounds for the convergence rate. Section 5 is devoted to several extensions, including mixing timechanges, incorporating a sparse component of the diffusion matrix and the related traceregression model. Some numerical examples can be found in Section 6. In Sections 7 and 8 the proofs of the oracle inequalities and of the minimax convergence rates, respectively, are collected. The appendix contains some (uniform) concentration results for multivariate characteristic functions which are of independent interest.
Acknowledgement. The authors thank Markus Reiß and two anonymous referees for helpful comments. D.B. acknowledges the financial support from the Russian Academic Excellence Project “5100” and from the Deutsche Forschungsgemeinschaft (DFG) through the SFB 823 “Statistical modelling of nonlinear dynamic processes”. M.T. gratefully acknowledges the financial support by the DFG research fellowship TR 1349/11. A part of this paper has been written while M.T. was affiliated to the Université ParisDauphine.
2 Main setup
Recall that a random time change is an increasing rightcontinuous process with left limits such that . For each the random variable is a stopping time with respect to the underlying filtration. For a Lévy process the timechanged Lévy process is given by . We throughout assume that is independent of . By reparametrising the time change we can assume without loss of generality that and that the observations are thus given by the increments
for . Note that in general the observations are not independent, in contrast to the special case of lowfrequently observed Lévy processes. The estimation procedure relies on the following crucial insight: If the sequence
is stationary and admits an invariant measure , then the independence of and together with the LévyKhintchine formula yield that the observations have a common characteristic function given by
where is the Laplace transform of and where the characteristic exponent is given by
(2.1) 
with the diffusion matrix , for a drift parameter and a jump measure . If and , we end up with the problem of estimating a covariance matrix. The function in (2.1) appears due to the presence of jumps and can be viewed as a nuisance parameter. Let us give some examples of typical timechanges and their Laplace transforms.
Examples 2.1.

Lowfrequency observations of with observation distance :

Poisson process timechange or exponential waiting times with intensity parameter :

Gamma process timechange with parameters :

Integrated CIRprocess for some such that (which has mixing increments, cf. [26]):
If the time change is unknown, one faces severe identifiability problems. Indeed, even in the case of a multivariate timechanged Brownian motion we can identify the matrix and only up to a multiplicative factor. More generally, if is not restricted to be Brownian motion there is much more ambiguity as the following example illustrates.
Example 2.2.
Consider two time changes and where is a Gamma subordinator with the unit parameters and is deterministic. We have and Note that Let be a onedimensional Lévy process with the characteristic exponent and let be another Lévy process with the characteristic exponent (the existence of such Lévy process follows from the wellknown fact that is the characteristic function of some infinitely divisible distribution). Then we obviously have for all Hence, the corresponding timechanged Lévy processes and satisfy
for any Moreover, the increments are independent in this case.
The above example shows that even under the additional assumption , we cannot, in general, consistently estimate the parameters of the onedimensional timechanged Lévy process from the lowfrequency observations. As shown by Belomestny [6] all parameters of a dimensional (with ) Lévy process can be identified under additional assumption that all components of are independent and However this assumption would imply that the matrix is diagonal and is thus much too restrictive for our setting. Therefore, we throughout assume that the timechange and consequently its Laplace transform are known, see Section 5.1 for a discussion for unknown . Let us note that similar identifications problems appear even in highfrequency setting, see [19].
Before we construct the volatility matrix estimator in the next section, we should introduce some notation. Recall the Frobenius or trace inner product for matrices . For the Schatten norm of is given by
with being the singular values of . In particular, , and denote the nuclear, the Frobenius and the spectral norm of , respectively. We will frequently apply the trace duality property
Moreover, we may consider the entrywise norms for with the usual modification denoting the number of nonzero entries of .
For any matrices we write
For we write if there is a constant independent of and the involved parameters such that .
3 The estimator and oracle inequalities
In view of the LévyKhintchine formula, we apply a spectral approach. The natural estimator of is given by the empirical characteristic function
which is consistent whenever is ergodic. Even more, it concentrates around the true characteristic function with parametric rate uniformly on compact sets, cf. Theorems A.2 and A.4 in the appendix. A plugin approach yields the estimator for the characteristic exponent given by
(3.1) 
where denotes a continuously chosen inverse of the map . Since only appears in the real part of the characteristic exponent, we may use .
Remark 3.1.
For highfrequency observations, that is , the characteristic function of the observations can be written as where is the characteristic exponent as before and where is the Laplace transform of the rescaled timechange increments , which may be assumed to be stationary. Under some regularity assumption on , we obtain the asymptotic expansion
Instead of the inversion in (3.1) the estimator can be used in this case, depending only on Laplace transform via . Note that the latter is given by the first moment of rescaled timechange increments. However, this asymptotic identification can only be applied in the highfrequency regime , while the estimator (3.1) is robust with respect to the sampling frequency.
Applying , estimating the lowrank matrix can thus be reformulated as the regression problem
(3.2) 
where we have normalised the whole formula by the factor . The design matrix for an arbitrary unit vector is deterministic and degenerated. The second term in (3.2) is a deterministic error which will be small only for large . The last term reflects the stochastic error. Due to the identification via the Laplace transform , the estimation problem is nonlinear and turns out to be illposed: the stochastic error grows for large frequencies.
Inspired by the regression formula (3.2), we define the diffusion matrix estimator as a penalised least squares type estimator with regularisation parameter and a spectral cutoff :
(3.3) 
where is a subset of positive semidefinite matrices and is a weight function. We impose the following standing assumption on the weight function which is chosen by the practitioner.
Assumption A.
Let be a radial nonnegative function which is supported on the annulus . For any let .
In the special case of a Lévy process with a finite jump activity , we can write remainder from (2.1) as
for . Since the Fourier transform converges to zero as under a mild regularity assumption on the finite measure , we can reduce the bias of our estimator by the following modification
(3.4) 
with
and some interval . The most interesting cases are , where and coincide, and . The factor in front of is natural, since the illposedness of the estimation problem for the jump activity is two degrees larger than for estimating the volatility, cf. Belomestny and Reiß [9]. As a side product, is an estimator for the jump intensity. By penalising large , the estimator is pushed back to zero if the least squares part cannot profit from a finite . It thus coincides with the convention to set in the infinite intensity case.
Our estimators and are related to the weighted scalar product
with the usual notation . For convenience we abbreviate . The scalar product is the counterpart to the empirical scalar product in the matrix estimation literature. As the following lemma reveals, the weighted norm and the Frobenius norm are equivalent. As the isometry is not restricted to any sparse subclass of matrices, it is especially implies the often imposed restricted isometry property. The proof is postponed to Section 7.1.
Lemma 3.2.
For any positive semidefinite matrix and any it holds
(3.5) 
where and .
Using wellknown calculations from the Lasso literature, we obtain the following elementary oracle inequality, which is proven in Section 7.2. The condition is trivially satisfied for .
Proposition 3.3.
Let be an arbitrary subset of matrices and define
(3.6) 
where we set if . Suppose . On the event for some we have
(3.7) 
If the set is convex, then subdifferential calculus can be used to refine the inequality (3.7). The proof is inspired by Koltchinskii et al. [24, Thm. 1] and postponed to Section 7.3. Note that this second oracle inequality improves (3.7) with respect to two aspects. Instead of we have in the second term on the righthand side and the nuclearnorm of is replaced by its rank.
Theorem 3.4.
This oracle inequality is purely nonasymptotic and sharp in the sense that we have the constant one in front of on the righthand side. Combining Lemma 3.2 and Theorem 3.4 immediately yields an oracle inequality for the error in the Frobenius norm.
Corollary 3.5.
Let be a convex subset of the positive semidefinite matrices and let . On the event for some and for from (3.6) we have
for a constant depending only on and .
The remaining question is how the parameters and should be chosen in order to control the event . The answer will be given in the next section.
4 Convergence rates
The above oracle inequalities hold true for any Lévyprocess and any independent timechange with stationary increments. However, to find some and such that probability of the event with the error term from (3.6) is indeed large, we naturally need to impose some assumptions. For simplicity we will concentrate on the wellspecified case noting that, thanks to Corollary 3.5, the results carry over to the missspecified situation.
Assumption B.
Let the jump measure of fulfil:

for some and for some constant let
(4.1) 
for some .
Assumption C.
Let the timechange satisfy:

for some .

The sequence , is mutually independent and identically distributed with some law on .

The derivatives of the Laplace transform satisfy for some for all with .
If the Laplace transform decays polynomially, we may impose the stronger assumption

for some for all with .
Remarks 4.1.

For Assumption B(i) allows for Lévy processes with infinite jump activity. In that case in (4.1) is an upper bound for the Blumenthal–Getoor index of the process. A more pronounced singularity of the jump measure at zero corresponds to larger values of . The lower bound is natural, since any jump measure satisfies . On the contrary, in (4.1) implies that is a finite measure. In that case we could further profit from its regularity which we measure by the decay of its Fourier transform. Altogether, the parameter will determine the approximation error that is due to the term in (3.2).

Assumption B(ii) implies that for all and together with the moment condition Assumption C(i), we conclude that , cf. Lemma 8.1. Assumption C(ii) implies that the increments are independent and identically distributed. Note that being stationary with invariant measure is necessary to construct the estimator of . The independence can, however, be relaxed to an mixing condition as discussed in Section 5.2. Finally, Assumptions C(iii) and (iv) are used to linearise the stochastic error.
In the sequel we denote by the dimensional ball of radius . We can now state the first main result of this section.
Theorem 4.2.
In order to prove this theorem, we decompose into a stochastic error term and a deterministic approximation error. More precisely, the regression formula (3.2) and for any yield
(4.2)  
The order of the deterministic error is , cf. Lemma 8.3, which decays as . The rate deteriorates for higher jump activities. This is reasonable since even for high frequency observations it is very difficult to distinguish between small jumps and fluctuations due to the diffusion component, cf. Jacod and Reiß [21]. The stochastic error is dominated by its linearisation
which is increasing in due to the denominator as . To obtain a sharp oracle inequality for the spectral norm of the linearised stochastic error, we use the noncommutative Bernstein inequality by Recht [32]. To bound the remainder, we apply a concentration result (Theorem A.4) for the empirical characteristic function around , uniformly on .
The lower bound on reflects the typical dependence on the dimension that also appear is in theory on matrix completion, cf. Corollary 2 by Koltchinskii et al. [24]. Our upper bound on ensures that the remainder term in the stochastic error is negligible.
The choice of is determined by the tradeoff between approximation error and stochastic error. Since neither nor depend on the rank of , the theorem verifies that the estimator is adaptive with respect to . Supposing lower bounds for that depend only on the radius , the only term that depends on the dimension is . Owing to , it is uniformly bounded by the volume of the unit ball in which in turn is uniformly bounded in (in fact it is decreasing as ). If , we even have . We discuss this quite surprising factor further after Corollary 4.4 and, from a more abstract perspective, in Section 5.5.
For specific decay behaviours of we can now conclude convergence rates for the diffusion matrix estimator. In contrast to the nonparametric estimation of the coefficients of a diffusion process, as studied by Chorowski and Trabs [15], the convergence rates depend enormously on the sampling distribution (resp. timechange). We start with an exponential decay of the Laplace transform of . This is especially the case if the Lévy process is observed at equidistant time points for some and thus .
Corollary 4.3.
Grant Assumptions B and C(i)(iii) and let . Suppose that , where is a convex subset of positive semidefinite matrices and let . If for with and for some , we set
for some and a constant which can be chosen independently of and . Then we have for sufficiently large
with probability larger than for some depending only on and .
The assertion follows from Corollary 3.5 and Theorem 4.2 (setting ), taking into account that due to (2.1) (and (8.1)) we have
This corollary shows that the exponential decay of leads to a severely illposed problem and, as a consequence, the rate is only logarithmic. Therein, the interplay between sample size and dimension , i.e. the term has been also observed for matrix completion [24, Cor. 2]. Whenever there is some such that the logarithmic term in the upper bound simplifies to . The slow rate implies that is consistent in absolute error only if . Considering the relative error , this restriction vanishes. Note also that in a highdimensional principal component analysis, the spectral norm may grow in , too, cf. the discussion for the factor model by Fan et al. [18].
If decays only polynomially, as for instance for the gamma subordinator, cf. Examples 2.1, the estimation problem is only mildly illposed and the convergence rates are polynomial in .
Corollary 4.4.
Grant Assumptions B and C(i)(iv) and assume . For a convex subset of positive semidefinite matrices, let , and . Suppose and for with and for some such that . Denoting the smallest strictly positive eigenvalue of by , we set
for a constant independent of and . Then we have for sufficiently large
with probability larger than for some depending only on and .
Remarks 4.5.

The convergence rate reflects the regularity and the degree of illposedness of the statistical inverse problem, where is the decay rate of the characteristic function and we gain two degrees since grows like . The term appearing in the denominator is very surprising since the rate becomes faster as the rank of increases up to some critical threshold value . To see this, it remains to note that the assumption yields
In order to shed some light on this phenomenon, we consider a related regression problem in Section 5.5.

It is interesting to note that for very small we almost attain the parametric rate. Having the example of gammadistributed increments in mind, a small corresponds to measures which are highly concentrated at the origin. In that case we expect many quite small increments where the jump component has only a small effect. On the other hand, for the remaining few rather large increments, the estimator is not worse than in a purely lowfrequency setting. Hence, heuristically corresponds to an interpolation between high and lowfrequency observations, cf. also Kappus [23].

If the condition is not satisfied, the linearised stochastic error appears to be smaller than the remainder of the linearsation. It that case we only obtain the presumably suboptimal rate (for fixed ). It is still an open problem whether this condition is only an artefact of our proofs or if there is some intrinsic reason.
Let us now investigate whether the rates are optimal in the minimax sense. The dependence on the the rank of is the same as usual in matrix estimation problems and seems natural. The optimality of the influence of the dimension is less clear. In lower bounds for matrix completion, cf. Koltchinskii et al. [24, Thm. 6], a similar dependence appears except for the fact that we face nonparametric rates. In the following we prove lower bounds for the rate of convergence in the number of observations for a fixed . More general lower bounds in a highdimensional setting where may grow with are highly intricate since usually lower bounds for highdimensional statistical problems are based on KullbackLeibler divergences in Gaussian models  an approach that cannot be applied in our model. This problem is left open for future research.
Let us introduce the class of all Lévy measures satisfying Assumption B with and a constant . In order to prove sharp lower bounds, we need the following stronger assumptions on the time change:
Assumption D.
Grant (i) and (ii) from Assumption C. For and let the Laplace transform satisfy

for some and all
or

for some and all with