Approximated maximum likelihood estimation in multifractal random walks

# Approximated maximum likelihood estimation in multifractal random walks

## Abstract

We present an approximated maximum likelihood method for the multifractal random walk processes of [E. Bacry et al., Phys. Rev. E 64, 026103 (2001)]. The likelihood is computed using a Laplace approximation and a truncation in the dependency structure for the latent volatility. The procedure is implemented as a package in the R computer language. Its performance is tested on synthetic data and compared to an inference approach based on the generalized method of moments. The method is applied to estimate parameters for various financial stock indices.

###### pacs:
05.40.-a, 02.50.-r, 47.53.+n, 95.75.Wx

## I Introduction

Multifractal models were first introduced in the 1960s by the so-called “Russian school” in turbulence theory (Obukhov, 1962; Kolmogorov, 1962). In turbulence, multifractality can be conceived as a weakening of the spatial selfsimilarity in the velocity field implicitly assumed in Kolmogorov’s 1941-theory (Kolmogorov, 1941). This generalization is called the Kolmogorov-Obukhov model and entails modeling the spatial variability of the energy dissipation rate as a random measure with certain multi-scaling properties. The Kolmogorov-Obukhov model is treated rigorously by Kahane Kahane (1985) and this construction is known as Gaussian multiplicative chaos.

In recent years multifractal random processes and multifractal random measures have received increased attention and are widely used in physics, geophysics and complex systems theory. Examples include phenomena as diverse as internet traffic Riedi and Ribeiro (1999), geomagnetic activity Rypdal and Rypdal (2010, 2011) and rainfall patterns Pathirana and Herath (2002). In addition, multifractal processes provide natural models for the long-range volatility persistence observed in financial time series. This was first discovered by Ghashghaie Ghashghaie et al. (1996) and Mandelbrot Mandelbrot et al. (1997), and since the late 1990s much work has been done on multifractal modeling of financial markets (Di Matteo, 2007; Calvet and Fisher, 2001). Logarithmic returns of assets are modeled as , where are continuous-time processes with stationary increments and multifractal scaling. The latter means that the moments of are power-laws as functions of time;

 E|X(t)|q∼tζ(q), (1)

either in some interval or asymptotically as . The scaling function is linear for self-similar processes, but may in general be concave. Processes satisfying equation (1) with strictly concave scaling functions are generally referred to as multifractal.

Two well-known “stylized facts” of financial time series are that log-returns are uncorrelated and non-Gaussian. Based on this, Mandelbrot Mandelbrot (1963) deduced that if prices are described as selfsimilar processes, then these processes must be so-called Lévy flights, i.e. -stable Lévy processes with . However, if one allows non-linear scaling functions, then one can maintain uncorrelated log-returns by simply imposing the condition . The concave shape of implies that the variables are increasingly leptokurtic with decreasing , and consequently non-Gaussian. Moreover, as opposed to Lévy flights, multifractal processes have strongly dependent increments and can therefore describe a third “stylized fact” of financial time series, namely volatility clustering.

Notwithstanding that multifractal processes provide accurate and parsimonious descriptions of temporal financial fluctuations, the models are rarely implemented for forecasting and risk-analysis in financial institutions. This is partially due to a lack of accurate, stable and efficient inference methods for multifractal processes. Parameter estimation has so far mostly been made using various moment-based estimators, such as the generalized method of moments (GMM). Alternatively, one can fit the estimated scaling functions to theoretical expressions of . However, as pointed out in e.g. Lux (2003) and Chapman et al. (2005), the standard estimators of scaling exponents have large mean square errors for time series of length comparable to those typically available in econometrics.

An exception to the statements above is the Markov-Switching Multifractal (MSM) model (Calvet and Fisher, 2001) where maximum likelihood estimation is feasible. In discrete time MSM implies that the increments are described by a stochastic volatility model on the form

 xt=σ√Mtεt. (2)

Here are independent variables and the volatility is a product on the form

 Mt=Mt,1Mt,2⋯Mt,K,

where (for each time step ) are updated from a distribution with a probability . In this model however, maximum likelihood estimation is only possible in the case where is defined on a discrete state space, and there is a limitation on the magnitude of which should not exceed (Lux, 2008). These restrictions not only limit flexibility with respect to the distribution of returns, but also the possible range in the volatility dependency.

This paper concerns parametric inference for the multifractal random walk (MRW) introduced by Bacry et al. Bacry et al. (2001). The increment process is still a discrete-time stochastic process described by equation (2), but now the volatility is modeled as , where is a stationary and centered Gaussian process with the co-variance structure

 Cov(ht,hs)=λ2log+T(|t−s|+1)Δt, (3)

where . Here is called the correlation range 1 and is called the intermittency parameter. The constant ensures normalization and is chosen so that . We denote .

Let denote the parameter vector and a fixed time series. The main result of this paper is the development of a method for efficiently computing approximations to the likelihood function

 L(θ|y)=px(y|θ),

where is the probability density function of a random vector produced by the MRW model with parameters . Using the likelihood function, parameters can be determined by means of the maximum likelihood (ML) estimator:

 ^θ=argmaxθL(θ|y).

Our method exploits that the discrete MRW model has a construction similar to simple volatility (SV) models. The distinguishing feature is that the processes are autoregressive in SV models. By truncating the dependency structure in the logarithmic volatility , the computation of the likelihood function is mapped on to a similar problem for SV models, and hence existing techniques for further approximations are available.

To our knowledge the present paper is the first to present results on ML estimation for multifractal models with continuous state spaces for the volatility. Such estimates may be of great practical importance, since accurate parameter estimation is essential for volatility forecasts and risk estimates. In the MRW model this degree of accuracy is particularly important for the intermittency parameter which determines the peakedness of the return distributions on all time scales. In applications other than finance, accurate estimates of can be used as supplements to the empirical scaling functions, and thereby the ML estimator can provide a tool for quantifying multifractality in data.

The paper is organized as follows: In section II we briefly explain the construction of the continuous-time process for which the model given by equations (2) and (3) is a discretization. There exists a large class of multifractal processes which are related to a construction known as infinitely divisible cascades (IDC). In general the random walk models associated with IDC processes have logartithmic volatility with infinitely divisible distributions, and the MRW model considered in this paper is a discrete approximation to the random walk model obtained in the special case when the logarithmic volatility is Gaussian.

Section III contains the procedure for approximated ML estimation in the MRW model. In section IV we test the estimator by first applying it to various stock market indices, and then by running a small Monte Carlo study. The results are compared with the GMM method used in (Bacry et al., 2008).

We finally remark that the methods presented in this paper have been implemented in a package for the R statistical software (R Development Core Team, 2011). This package is available online Løvsletten and Rypdal (2011).

## Ii Motivation of the model

There exist several popular models for multifractal stochastic processes with uncorrelated increments. All of these models can be written either on the form , where is a Brownian motion and is the distribution function of a multifractal random measure on the time axis, or as

 X(t)=limr→0∫t0√Ar(t′)dB(t′),

where as . The meaning of is discussed below. These two types of constructions are equivalent as long as is a Brownian motion. (This is not the case for fractional Brownian motions with .)

The differences between the various multifractal models are then related to the construction of the random measure . The log-normal MRW model is on one hand based on a particular construction of known as multiplicative chaos, and on the other hand it can be seen as a special case of the more general IDC constructions.

In multiplicative chaos, which was first developed rigorously in Kahane (1985), one considers a sequence of measures defined via random densities on the form

 dmn(t)=cnehn(t)dt,

where , and are centered Gaussian processes with co-variance structures that converge to some expression in the limit . Kahane Kahane (1985) showed that if is -positive, meaning that there are positive and positive definite functions such that

 gn(t,s)=n∑m=1Km(t,s),

then the sequence converges weakly to a Borel measure which depends only on the function . One can therefore informally think of as being on the form where and is a “Gaussian” process with co-variance . Then, if one makes the choice

 γ(t,s)=λ2log+R|t−s|, (4)

one easily obtains the relation , where are independent of and distributed according to . It follows that we for and have the scaling relation

 m([0,at])\lx@stackreld=M(a)m([0,t]), (5)

with . See proposition 3.3 in (Robert and Vargas, 2010) for a rigorous proof of (5), and see example 2.3 of the same paper for a verification that the function in equation (4) is -positive. By using the well-known formula for the -th moments of log-normal variables together with equation (5), we easily verify the multifractality of the process : Denote and observe that

 E|m([0,t])|q=CqEM(t)q=CqtζA(q),

where . Since a Brownian motion is self-similar with the scaling function of is given by .

Alternatively the model defined by equations (2) and (3) can be motivated by considering the more general class of IDC models. Here we briefly mention the main ideas and results in this theory, and we refer to Bacry and Muzy (2003); Muzy and Bacry (2002) for details. At the base of this construction is an object called an independently scattered infinitely divisible random measure defined on the halfplane . The defining properties of the random measure are: (1) for any measurable set , the random variable is infinitely divisible with characteristic function

 φP(A)(q)=eψ(q)μ(A),

where . (2) for any finite sequence of disjoint and measurable sets, the corresponding random variables are independent. If we assume that , the random measure induces a family of centered and stationary stochastic processes through the equation

 hr(t)=P(A(r,t)),

where are cone-like domains defined by

 A(r,t)={(t′,r′)∈S+|r′≥r,|t′−t|≤f(r′)/2},

with for and for . The time correlations in the processes are characterized by the functions

 ρr(t)=μ(A(0,r)∩A(t,r))={logRr+1−tr,t

In fact, the co-variance of is given by

 Cov(hr(t),hr(s))=λ2ρr(|t−s|),

where .

Random measures are defined by , where . The corresponding distribution functions are and corresponding random walks are

 Xr(t)=∫t0√Ar(t′)dB(t′).

By using the relation one can show that

 har(at)\lx@stackreld=hr(t)+Ω(a),

for and , where are independent of and have characteristic functions . Consequently the limit process has scaling function

 ζ(q)=(1+ψ(−i))q/2−ψ(−iq/2).

In the case that are Gaussian, i.e. , the co-variance is on the form

 Cov(hr(t),hr(s))=λ2log+R|t−s|

for , and hence it can be approximated by the process defined by equation (3). In this case the scaling function is

 ζ(q)=(1+λ2/2)q/2−λ2q2/8.

We note that for the process is reduced to a Brownian motion and .

We point out that this paper presents a ML estimator for the discrete-time process defined by equations (2) and (3). This is sufficient for the purpose of modeling and forecasting volatility in financial time series, since the discrete-time MRW model is directly comparable to GARCH-type models. In other applications, such as modeling the velocity field in turbulence, one is interested in the continuous-time process . Since is an approximation to the continuous-time process , our method can also be interpreted as an estimator for this process. In this case one must be aware that the increment process is not proportional (in law) to , and that this is only an approximation in the limit . See appendix A.1. in (Bacry et al., 2008). In the case of strong intermittency, the estimator for the continuous-time process may therefore depend significantly on the time-scale for which the data is sampled. An analysis of how our method performs as an estimator for will require extensive Monte Carlo simulations (with varying and ), and this is beyond the scope of this paper.

We also remark that it in some applications is relevant to estimate the parameters of the measure , for instance when modeling the energy dissipation fields in turbulence. In the discrete-time approximation this corresponds to the process , where is described by equation (3). Since is Gaussian, this problem is much easier than the one considered in this paper. The ML estimator for can be constructed using standard methods McLeod et al. (2007) and no approximations are required.

## Iii Approximated maximum likelihood

In this section we explain our method of approximated maximum likelihood estimation. Let and be the processes defined by (2) and (3). Denote and . The first step is to write

 px(x)=∫Rnpx,h(x,h)dh=∫Rnpx|h(x|h)ph(h)dh. (6)

The first factor in the integrand is computed by noting that, when conditioned on , the variables are independent and Gaussian. In fact,

 logpx|h(x|h)=n∑t=1logpxt|ht(xt|ht)=−nlog√2πcσ+n∑t=1(−ht2−x2t2σ2ceht). (7)

For the second factor we use that is a centered Gaussian process with a specified co-variance structure . First we decompose the density into one-dimensional marginals;

 ph(h)=ph1(h1)n∏t=2pht|h1:t−1(ht|h1:t−1), (8)

where we have used the notation

 hn:m={(hn,hn+1,…,hm) for m≥n(hn,hn−1,…,hm) for n>m.

Denote by the co-variance matrix of the vector , and let . The co-variance matrix can be written on the block form:

 Γt=(γ(0)γ1:t−1γT1:t−1Γt−1).

By performing standard computations of conditional marginals in multivariable Gaussian distributions we deduce that is a Gaussian with mean

 mt=γ1:t−1Γ−1t−1hT(t−1):1

and variance

 S2t=γ(0)−γ1:t−1Γ−1t−1γT1:t−1.

As usual it is convenient to introduce vectors defined by . This allows us to write the mean as and the variance as . Then from equation (8) we have

 logph(h)=−nlog√2π−n∑t=1logSt−n∑t=1(ht−ϕ(t−1)hT(t−1):1)22S2t. (9)

Combining equation (9) with equation (7) we get an expression for the full likelihood:

 logpx,h(x,h)=−nlog(2π√cσ)+n∑t=1(−ht2−x2t2σ2ceht)−n∑t=1logSt−n∑t=1(ht−ϕ(t−1)hT(t−1):1)22S2t. (10)

We keep in mind that depends on and through the relation .

Approximation 1: By comparing co-variances the process can be written as

 ht=ϕ(t−1)1ht−1+⋯+ϕ(t−1)t−1h1+wt, (11)

where are independent Gaussian variables with zero mean and variances equal to . As approximations to we can consider processes obtained by truncating the sum in equation (11). We fix a parameter , and for we replace equation (11) with

 ht=ϕ(τ)1ht−1+⋯+ϕ(τ)τht−τ+w(τ)t, (12)

where are independent Gaussian variables with zero mean and variances equal to . Note that in equation (11) has the same distribution as obtained from (12), namely a Gaussian with mean and variance . In effect we have approximated the distribution of , by truncating the dependency after a lag . As a result of this approximation equation (9) becomes

 logph(h)=−nlog√2π−τ∑t=1logSt−(n−τ)logSτ+1−τ∑t=1(ht−ϕ(t−1)hT(t−1):1)22S2t+n∑t=τ+1(ht−ϕ(τ)hTt−1:t−τ)22S2τ+1. (13)

In order to compute the expression in equation (13) we need to solve the equations

 ϕ(t)Γt=γ1:t,t=1,…,τ.

This is done efficiently using the Durbin-Levinson algorithm (Levinson, 1946; Trench, 1964). We remark that for the expression in equation (13) is exact.

Approximation 2: The second approximation is the so-called Laplace’s method, which is frequently used for approximation of likelihoods in SV models, see e.g. Skaug and Yu (2009); Martino et al. (2011). We write equation (6) on the form

 px(x)=∫Rnenfx(h)dh, (14)

where

 fx(h)=1nlogpx,h(x,h)=1nn∑t=1logpxt|ht(xt)+1nn∑t=1logpht|h1:t−1(ht). (15)

Laplace’s method is to assume that has a global maximum in , which we denote by . When is large the contribution to the integral in equation (14) is concentrated around , and therefore we make a second order Taylor approximation to around this point. Since is also a local maximum we have

 fx(h)≈1nlogpx,h(x,h∗)+12n(h−h∗)Ωx(h−h∗)T,

where

 Ωx=∂2logpx,h(x,h∗)∂h∂hT

is the Hessian matrix of at the point . The approximation now reads

The maximum is found by computing the partial derivatives of with respect to , setting them equal to zero and solving the corresponding system of equations numerically using the algorithm DF-SANE (Cruz et al., 2006), which is implemented in R  package “BB” Varadhan and Gilbert (2009). The matrix is obtained using analytical expressions for the second derivatives. This matrix is band-diagonal with bandwidth equal to the truncation parameter , and in the R  software such matrices are efficiently stored and manipulated using the package “Matrix” (Bates and Maechler, 2011).

## Iv Estimator comparisons

In this section the ML estimator is compared with an GMM approach which is similar to the one used in Bacry et al. (2008). This GMM version is essentially a least-square fitting of the auto-correlation function for the logarithmic volatility, and we briefly explain this method in the following: Denote and observe that

 mt=ht+yt

where are independent and identically distributed. We can use the sample standard deviation to normalize so that it has unit variance. Then, if we let denote the mean of , the auto-correlation function of has the form

 ACFm(t)=E[(m1−μm)(mt+1−μm)]=E[h1ht+1]=λ2log+Rt+1.

For we have

 ACFm(t)=λ2logR−λ2log(t+1),

and and can be found by linear regression of the auto-correlation function versus .

We begin testing the approximated ML estimator by applying it to various stock market indices. We use daily log-returns and in all of the estimates the truncation parameter is set to days. The results are presented in table 1. We observe that the intermittency parameter varies from to for the different indices and time periods. We also observe that the correlation range parameter varies by roughly one order of magnitude, in the range 1.4-12.2 years. If we compare with the GMM we see that, for all the indices, the estimates of are lower using the ML method. For the parameter the estimates using ML and GMM are more or less consistent, but with quite large variations between the two estimators.

To further test the performance of the proposed ML estimator we run a small-sample Monte Carlo study. We have used three different sample lengths , and for each sample length we simulated 500 sample realizations. The parameter vector considered is , and . For the truncation parameter we have considered the cases , and in the GMM method we use a maximum time lag days in the auto-correlation function of .

The results are presented in table 2. For both the GMM and the ML methods the estimates of are highly unstable. This is also pointed out in Bacry et al. (2008). However, the processes only depend on the through expressions on the form . Therefore, in order to have an estimator which is comparable to , we should consider the variable . The estimators of behave reasonably well, even though there are significant mean square errors and some bias. We see that both the ML and GMM method underestimate and that the errors are roughly the same for the two estimators.

On the other hand we observe that the ML estimates of have a standard deviations which are much smaller than the corresponding standard deviation for the GMM estimate, especially for . This can also be seen from figures 2 and 3 where the probability density functions for the GMM estimates and the ML estimates are presented. Based on this we conclude that the ML estimator performs better than the GMM. Moreover, if one allows longer computing times, the truncation parameter can be increased to obtain even more accurate estimates. For a time series of data points, an ML estimate with takes a few minutes on a personal computer.

In figure 4 we have plotted the mean square errors (MSE) for the ML estimator with and the GMM estimator. We see that for both the estimators the MSE is roughly invserly proportional to the sample length. However, from table 2 we see that is a slight negative bias in for the ML estimator. This bias decreases with increasing , and we suspect the estimator to be asymptotically unbiased in the limit .

## V Concluding remarks

In this paper we have presented an approximate ML estimator for MRW processes. The method is implemented and tested in a Monte Carlo study, and the results show significant improvements over existing methods for the intermittency parameter .

The methods of this paper represent a suitable starting point for two important generalizations. The first generalization is to allow for correlated innovations, for instance by letting be a fractional Gaussian noise. This has several important applications, for instance in modeling of geomagnetic activity Rypdal and Rypdal (2010, 2011) and electricity spot prices Malo (2006). Another interesting generalization is to consider the non-Gaussian IDC models referred to in section II.

We also point out that the techniques presented in section III can be used to calculate conditional densities on the form . At time , such an expression provides a complete forecast over the next time steps. Forecasting and risk analysis based on the MRW model and the methods in this paper is a promising topic that will be pursued in future work.

Acknowledgment. This project was supported by Sparebank 1 Nord-Norge and the Norwegian Research Council (project number 208125). We thank K. Rypdal and the anonymous referee for useful comments and suggestions.

### Footnotes

1. In turbulence corresponds to the integral scale.

### References

1. A. M. Obukhov, J. Geophys. Res. 67, 3011 (1962).
2. A. N. Kolmogorov, J. Fluid. Mech. 13, 83 (1962).
3. A. N. Kolmogorov, Dokl. Akad. Nauk. SSSR 31, 301 (1941).
4. J. P. Kahane, Ann. Sci. Math. Quebec 9, 105 (1985).
5. R. H. Riedi and V. J. Ribeiro, IEEE T. Inform. Theory 45, 992 (1999).
6. M. Rypdal and K. Rypdal, J. Geophys. Res. 115, A11216 (2010).
7. M. Rypdal and K. Rypdal, J. Geophys. Res. 116, A02202 (2011).
8. A. Pathirana and S. Herath, Hydrol. Earth Syst. Sc. 6, 695 (2002).
9. S. Ghashghaie, S. Brewmann, W. Peinke, J. Talkner, and Y. Dodge, Nature 381, 767 (1996).
10. B. Mandelbrot, A. Fisher, and L. Calvet, Cowles Foundation for Research in Economics Working Papers (1997).
11. T. Di Matteo, Quant. Finance 7, 21 (2007).
12. L. Calvet and A. Fisher, J. Econometrics 105, 27 (2001).
13. B. B. Mandelbrot, J. Bus. 36, 394 (1963).
14. T. Lux, Economics Working Papers, Christian-Albrechts-University of Kiel (2003).
15. S. C. Chapman, B. Hnat, G. Rowlands, and N. W. Watkins, Nonlinear Proc. Geoph. 12, 767 (2005).
16. T. Lux, J. Bus. Econ. Stat. 26, 194 (2008).
17. E. Bacry, J. Delour, and J. F. Muzy, Phys. Rev. E 64, 026103 (2001).
18. Note1, in turbulence corresponds to the integral scale.
19. E. Bacry, A. Kozhemyak, and J.-F. Muzy, J. Econ. Dyn. Control 32, 156 (2008).
20. R Development Core Team (2011), URL http://www.R-project.org/.
21. O. Løvsletten and M. Rypdal, R package (2011), URL http://complexityandplasmas.net/nordforsk/Papers_files/MLE%20MRW%20R%20package.zip.
22. R. Robert and V. Vargas, Ann. Probab. 38, 605 (2010).
23. E. Bacry and J. F. Muzy, Commun. Math. Phys. 236, 449 (2003).
24. J. F. Muzy and E. Bacry, Phys. Rev. E 66, 056121 (2002).
25. I. A. McLeod, H. Yu, and Z. L. Krougly, J. Stat. Softw. 23, 1 (2007).
26. N. Levinson, J. Math. Phys. 25, 261 (1946).
27. W. F. Trench, J. Soc. Indust. Appl. Math. 12, 515 (1964).
28. H. Skaug and J. Yu, Research Collection School of Economics, Singapore Management University (2009).
29. S. Martino, K. Aas, O. Lindqvist, L. R. Neef, and H. Rue, Europ. J. Finance 17, 487 (2011).
30. W. L. Cruz, J. Martínez, and M. Raydan, Math. Comput. 75, 1429 (2006).
31. R. Varadhan and P. Gilbert, J. Stat. Softw. 32, 1 (2009).
32. D. Bates and M. Maechler, R package (2011), URL http://CRAN.R-project.org/package=Matrix.
33. P. Malo, Helsinki School of Economics Working Papers (2006).
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters