Mean-Reverting Portfolios: Tradeoffs Between Sparsity and Volatility

# Mean-Reverting Portfolios: Tradeoffs Between Sparsity and Volatility

## Abstract

Mean-reverting assets are one of the holy grails of financial markets: if such assets existed, they would provide trivially profitable investment strategies for any investor able to trade them, thanks to the knowledge that such assets oscillate predictably around their long term mean. The modus operandi of cointegration-based trading strategies (Tsay, 2005, §8) is to create first a portfolio of assets whose aggregate value mean-reverts, to exploit that knowledge by selling short or buying that portfolio when its value deviates from its long-term mean. Such portfolios are typically selected using tools from cointegration theory (Engle and Granger, 1987; Johansen, 1991), whose aim is to detect combinations of assets that are stationary, and therefore mean-reverting. We argue in this work that focusing on stationarity only may not suffice to ensure profitability of cointegration-based strategies. While it might be possible to create synthetically, using a large array of financial assets, a portfolio whose aggregate value is stationary and therefore mean-reverting, trading such a large portfolio incurs in practice important trade or borrow costs. Looking for stationary portfolios formed by many assets may also result in portfolios that have a very small volatility and which require significant leverage to be profitable. We study in this work algorithmic approaches that can take mitigate these effects by searching for maximally mean-reverting portfolios which are sufficiently sparse and/or volatile.

## 1 Introduction

Mean-reverting assets, namely assets whose price oscillates predictably around a long term mean, provide investors with an ideal investment opportunity. Because of their tendency to pull back to a given price level, a naive contrarian strategy of buying the asset when its price lies below that mean, or selling short the asset when it lies above that mean can be profitable. Unsurprisingly, assets that exhibit significant mean-reversion are very hard to find in efficient markets. Whenever mean-reversion is observed in a single asset, it is almost always impossible to profit from it: the asset may typically have very low volatility, be illiquid, hard to short-sell, or its mean-reversion may occur at a time-scale (months, years) for which the borrow-cost of holding or shorting the asset may well exceed any profit expected from such a contrarian strategy.

Since mean-reverting assets rarely appear in liquid markets, investors have focused instead on creating synthetic assets that can mimic the properties of a single mean-reverting asset, and trading such synthetic assets as if they were a single asset. Such a synthetic asset is typically designed by combining long and short positions in various liquid assets to form a mean-reverting portfolio, whose aggregate value exhibits significant mean-reversion.

Constructing such synthetic portfolios is, however, challenging. Whereas simple descriptive statistics and unit-root test procedures can be used to test whether a single asset is mean-reverting, building mean-reverting portfolios requires finding a proper vector of algebraic weights (long and short positions) that describes a portfolio which has a mean-reverting aggregate value. In that sense, mean-reverting portfolios are made by the investor, and cannot be simply chosen among tradable assets. A mean-reverting portfolio is characterized both by the pool of assets the investor has selected (starting with the dimension of the vector), and by the fixed nominal quantities (or weights) of each of these assets in the portfolio, which the investor also needs to set. When only two assets are considered, such baskets are usually known as long-short trading pairs. We consider in this paper baskets that are constituted by more than two assets.

#### Mean-Reverting Baskets with Sufficient Volatility and Sparsity

A mean-reverting portfolio must exhibit sufficient mean-reversion to ensure that a contrarian strategy is profitable. To meet this requirement, investors have relied on cointegration theory (Engle and Granger, 1987; Maddala and Kim, 1998; Johansen, 2005) to estimate linear combinations of assets which exhibit stationarity (and therefore mean-reversion) using historical data. We argue in this work, as we did in earlier references (d’Aspremont, 2011; Cuturi and d’Aspremont, 2013), that mean-reverting strategies cannot, however, only rely on this approach to be profitable. Arbitrage opportunities can only exist if they are large enough to be traded without using too much leverage or incurring too many transaction costs. For mean-reverting baskets, this condition translates naturally into a first requirement that the gap between the basket valuation and its long term mean is large enough on average, namely that the basket price has sufficient variance or volatility. A second desirable property is that mean-reverting portfolios require trading as few assets as possible to minimize costs, namely that the weights vector of that portfolio is sparse. We propose in this work methods that maximize a proxy for mean reversion, and which can take into account at the same time constraints on variance and sparsity.

We propose first in Section 2 three proxies for mean reversion. Section 3 defines the basket optimization problems corresponding to these quantities. We show in Section 4 that each of these problems translate naturally into semidefinite relaxations which produce either exact or approximate solutions using sparse PCA techniques. Finally, we present numerical evidence in Section 5 that taking into account sparsity and volatility can significantly boost the performance of mean-reverting trading strategies in trading environments where trading costs are not negligible.

## 2 Proxies for Mean-Reversion

Isolating stable linear combinations of variables of multivariate time series is a fundamental problem in econometrics. A classical formulation of the problem reads as follows: given a vector valued process taking values in and indexed by time , and making no assumptions on the stationarity of each individual component of , can we estimate one or many directions such that the univariate process is stationary? When such a vector exists, the process is said to be cointegrated. The goal of cointegration techniques is to detect and estimate such directions . Taken for granted that such techniques can efficiently isolate sparse mean reverting baskets, their financial application can be either straightforward using simple event triggers to buy, sell or simply hold the basket (Tsay, 2005, §8.6), or more elaborate optimal trading strategies if one assumes that the mean-reverting basket value is a Ohrstein-Ullenbeck process, as discussed in (Jurek and Yang, 2007; Liu and Timmermann, 2010; Elie and Espinosa, 2011).

### 2.1 Related Work and Problem Setting

Engle and Granger (1987) provided in their seminal work a first approach to compare two non-stationary univariate time series , and test for the existence of a term such that becomes stationary. Following this seminal work, several techniques have been proposed to generalize that idea to multivariate time series. As detailed in the survey by Maddala and Kim (1998, §5), cointegration techniques differ in the modeling assumptions they require on the time series themselves. Some are designed to identify only one cointegrated relationship, whereas others are designed to detect many or all of them. Among these references, Johansen (1991) proposed a popular approach that builds upon a VAR model, as surveyed in (Johansen, 2005, 2004). These approaches all discuss issues that are relevant to econometrics, such as de-trending and seasonal adjustments. Some of them focus more specifically on testing procedures designed to check whether such cointegrated relationships exist or not, rather than on the robustness of the estimation of that relationship itself. We follow in this work a simpler approach proposed by d’Aspremont (2011), which is to trade-off interpretability, testing and modeling assumptions for a simpler optimization framework which can be tailored to include other aspects than only stationarity. d’Aspremont (2011) did so by adding regularizers to the predictability criterion proposed by Box and Tiao (1977). We follow in this paper the approach we proposed in (Cuturi and d’Aspremont, 2013) to design mean-reversion proxies that do not rely on any modeling assumption.

Throughout this paper, we write for the cone of positive definite matrices. We consider in the following a multivariate stochastic process taking values in . We write for the lag- autocovariance matrix of if it is finite. Using a sample path of , where and each , we write for the empirical counterpart of computed from ,

 Akdef=1T−k−1T−k∑t=1~xt~xTt+k,~xtdef=xt−1TT∑t=1xt. (1)

Given , we now define three measures which can all be interpreted as proxies for the mean reversion of . Predictability – defined for stationary processes by Box and Tiao (1977) and generalized for non-stationary processes by Bewley et al. (1994) – measures how close to noise the series is. The portmanteau statistic Ljung and Box (1978) is used to test whether a process is white noise. Finally, the crossing statistic (Ylvisaker, 1965) measures the probability that a process crosses its mean per unit of time. In all three cases, low values for these criteria imply a fast mean-reversion.

### 2.2 Predictability

We briefly recall the canonical decomposition derived in Box and Tiao (1977). Suppose that follows the recursion:

 xt=^xt−1+εt, (2)

where is a predictor of built upon past values of the process recorded up to , and is a vector of i.i.d. Gaussian noise with zero mean and covariance independent of all variables . The canonical analysis in Box and Tiao (1977) starts as follows.

#### Univariate case

Suppose and thus , Equation (2) leads thus to

 E[x2t]=E[^x2t−1]+E[ε2t], thus 1=^σ2σ2+Σσ2,

by introducing the variances and of and respectively. Box and Tiao measure the predictability of by the ratio

 λdef=^σ2σ2.

The intuition behind this variance ratio is simple: when it is small the variance of the noise dominates that of and is dominated by the noise term; when it is large, dominates the noise and can be accurately predicted on average.

#### Multivariate case

Suppose and consider now the univariate process with weights . Using (2) we know that , and we can measure its predicability as

 λ(y)def=yT^A0yyTA0y, (3)

where and are the covariance matrices of and respectively. Minimizing predictability is then equivalent to finding the minimum generalized eigenvalue solving

 det(λA0−^A0)=0. (4)

Assuming that is positive definite, the basket with minimum predictability will be given by , where is the eigenvector corresponding to the smallest eigenvalue of the matrix .

#### Estimation of λ(y)

All of the quantities used to define above need to be estimated from sample paths. can be estimated by following Equation (1). All other quantities depend on the predictor . Box and Tiao assume that follows a vector autoregressive model of order – VAR(p) in short – and therefore takes the form,

 ^xt−1=p∑k=1Hkxt−k,

where the matrices contain each autoregressive coefficients. Estimating from the sample path Box and Tiao solve for the optimal basket by inserting these estimates in the generalized eigenvalue problem displayed in Equation (4). If one assumes that (the case can be trivially reformulated as a VAR(1) model with adequate reparameterization), then

 ^A0=H1A0HT1 % and A1=A0H1,

and thus the Yule-Walker estimator (Lütkepohl, 2005, §3.3) of would be . Minimizing predictability boils down to solving in that case

 miny^λ(y),^λ(y)def=yT(H1A0HT1)yyTA0y=yT(A1A−10AT1)yyTA0y,

which is equivalent to computing the smallest eigenvector of the matrix if the covariance matrix is invertible.

The machinery of Box and Tiao to quantify mean-reversion requires defining a model to form , the conditional expectation of given previous observations. We consider in the following two criteria that do without such modeling assumptions.

### 2.3 Portmanteau Criterion

Recall that the portmanteau statistic of order  Ljung and Box (1978) of a centered univariate stationary process (with ) is given by

 porp(x)=1pp∑i=1(E[xtxt+i]E[x2t])2

where is the th order autocorrelation of . The portmanteau statistic of a white noise process is by definition for any . Given a multivariate process we write

 ϕp(y)=porp(yTx)=1pp∑i=1(yTAiyyTA0y)2,

for a coefficient vector . By construction, for any and in what follows, we will impose . The quantities are computed using the following estimates (Hamilton, 1994, p.110):

 ^ϕp(y)=1pp∑i=1(yTAiyyTA0y)2. (5)

### 2.4 Crossing Statistics

Kedem and Yakowitz (1994, §4.1) define the zero crossing rate of a univariate process (its expected number of crosses around per unit of time) as

 γ(x)=E⎡⎣∑Tt=2\mathds1{xtxt−1≤0}T−1⎤⎦, (6)

A result known as the cosine formula states that if is an autoregressive process of order one AR(1), namely if , is i.i.d. standard Gaussian noise and , then (Kedem and Yakowitz, 1994, §4.2.2):

 γ(x)=arccos(a)π.

Hence, for AR(1) processes, minimizing the first order autocorrelation also directly maximizes the crossing rate of the process . For , since the first order autocorrelation of is equal to , we propose to minimize and ensure that all other absolute autocorrelations , are small.

Given a centered multivariate process , we form its covariance matrix and its autocovariances . Because , we symmetrize all autocovariance matrices . We investigate in this section the problem of estimating baskets that have maximal mean reversion (as measured by the proxies proposed in Section2), while being at the same time sufficiently volatile and supported by as few assets as possible. The latter will be achieved by selecting portfolios that have a small “0-norm”, namely that the number of non-zero components in ,

 ∥y∥0def=#{1≤i≤d|yi≠0},

is small. The former will be achieved by selecting portfolios whose aggregated value exhibits a variance over time that exceeds a given threshold . Note that for the variance of to exceed a level , the largest eigenvalue of must necessarily be larger than , which we always assume in what follows. Combining these two constraints, we propose three different mathematical programs that reflect these trade-offs.

### 3.1 Minimizing Predictability

Minimizing Box-Tiao’s predictability defined in §2.2 while ensuring that both the variance of the resulting process exceeds and that the vector of loadings is sparse with a 0-norm equal to , means solving the following program:

 minimizeyTMysubject toyTA0y≥ν,∥y∥2=1,∥y∥0=k, (P1)

in the variable with , where . Without the normalization constraint and the sparsity constraint , problem ((P1)) is equivalent to a generalized eigenvalue problem in the pair . That problem quickly becomes unstable when is ill-conditioned or is singular. Adding the normalization constraint solves these numerical problems.

### 3.2 Minimizing the Portmanteau Statistic

Using a similar formulation, we can also minimize the order portmanteau statistic defined in §2.3 while ensuring a minimal variance level by solving:

 minimize∑pi=1(yTAiy)2subject toyTA0y≥ν,∥y∥2=1,∥y∥0=k, (P2)

in the variable , for some parameter . Problem ((P2)) has a natural interpretation: the objective function directly minimizes the portmanteau statistic, while the constraints normalize the norm of the basket weights to one, impose a variance larger than  and impose a sparsity constraint on .

### 3.3 Minimizing the Crossing Statistic

Following the results in §2.4, maximizing the crossing rate while keeping the rest of the autocorrelogram low,

 minimizeyTA1y+μ∑pk=2(yTAky)2subject toyTA0y≥ν,∥y∥2=1,∥y∥0=k, (P3)

in the variable , for some parameters , will produce processes that are close to being AR(1), while having a high crossing rate.

## 4 Semidefinite Relaxations and Sparse Components

Problems ((P1)), ((P2)) and  ((P3)) are not convex, and can be in practice extremely difficult to solve, since they involve a sparse selection of variables. We detail in this section convex relaxations to these problems which can be used to derive relevant sub-optimal solutions.

### 4.1 A Semidefinite Programming Approach to Basket Estimation

We propose to relax problems ((P1)), ((P2)) and  ((P3)) into Semidefinite Programs (SDP) (Vandenberghe and Boyd, 1996). We show that these semidefinite programs can handle naturally sparsity and volatility constraints while still aiming at mean-reversion. In some restricted cases, one can show that these relaxations are tight, in the sense that they solve exactly the programs described above. In such cases, the true solution of some of the programs above can be recovered using their corresponding SDP solution .

However, in most of the cases we will be interested in, such a correspondence is not guaranteed and these SDP relaxations can only serve as a guide to propose solutions to these hard non-convex problems when considered with respect to vector . To do so, the optimal solution needs to be deflated from a large rank matrix to a rank one matrix , where can be considered a good candidate for basket weights. A typical approach to deflate a positive definite matrix into a vector is to consider its eigenvector with the leading eigenvalue. Having sparsity constraints in mind, we propose to apply a heuristic grounded on sparse-PCA (Zou et al., 2006; d’Aspremont et al., 2007). Instead of considering the lead eigenvector, we recover the leading sparse eigenvector of (with a -norm constrained to be equal to ). Several efficient algorithmic approaches have been proposed to solve approximately that problem; we use the SPASM toolbox (Sjöstrand et al., 2012) in our experiments.

### 4.2 Predictability

We can form a convex relaxation of the predictability optimization problem ((P1)) over the variable ,

 minimizeyTMysubject toyTA0y≥ν∥y∥2=1,∥y∥0=k,

by using the lifting argument of Lovász and Schrijver (1991), i.e.  writing , to solve now the problem using a semidefinite variable , and by introducing a sparsity-inducing regularizer on which considers the norm of ,

 ∥Y∥1def=∑ij|Yij|,

so that Problem ((P1)) becomes (here ),

 minimizeTr(MY)+ρ∥Y∥1subject toTr(A0Y)≥νTr(Y)=1, Rank(Y)=1, Y⪰0.

We relax this last problem further by dropping the rank constraint, to get

 minimizeTr(MY)+ρ∥Y∥1subject toTr(A0Y)≥νTr(Y)=1, Y⪰0 (SDP1)

which is a convex semidefinite program in .

### 4.3 Portmanteau

Using the same lifting argument and writing , we can relax problem ((P2)) by solving

 minimize∑pi=1Tr(AiY)2+ρ∥Y∥1subject toTr(BY)≥νTr(Y)=1, Y⪰0, (SDP2)

a semidefinite program in .

### 4.4 Crossing Stats

As above, we can write a semidefinite relaxation for problem ((P3)):

 minimizeTr(A1Y)+μ∑pi=2Tr(AiY)2+ρ∥Y∥1subject toTr(BY)≥νTr(Y)=1, Y⪰0 (SDP3)

#### Tightness of the SDP Relaxation in the Absence of Sparsity Constraints

Note that for the crossing stats criterion (with and no quadratic term in ) criteria, the original problem (P3) and its relaxation (SDP3) are equivalent, taken for granted that no sparsity constraint is considered in the original problems and set to in the relaxations. This relaxations boil down to an SDP’s that only has a linear objective, a linear constraint and a constraint on the trace of . In that case, Brickman (1961) showed that the range of two quadratic forms over the unit sphere is a convex set when the ambient dimension , which means in particular that for any two square matrices of dimension

 {(yTAy,yTBy):y∈Rn, ∥y∥2=1}= {(Tr(AY),Tr(BY)):Y∈\bf S% n, TrY=1, Y⪰0}

We refer the reader to (Barvinok, 2002, §II.13) for a more complete discussion of this result. As remarked in (Cuturi and d’Aspremont, 2013), the same equivalence holds for (P1) and (SDP1). This means that, in the case where and the 0-norm of is not constrained, for any solution of the relaxation ((SDP1)) there exists a vector which satisfies , and which means that is an optimal solution of the original problem ((P1)). Boyd and Vandenberghe (2004, App. B) show how to explicitly extract such a solution  from a matrix solving ((SDP1)). This result is however mostly anecdotical in the context of this paper, in which we look for sparse and volatile baskets: using these two regularizers breaks the tightness result between the original problems in and their SDP counterparts.

## 5 Numerical Experiments

In this section, we evaluate the ability of our techniques to extract mean-reverting baskets with sufficient variance and small 0-norm from a universe of tradable assets. We measure performance by applying to these baskets a trading strategy designed specifically for mean-reverting processes. We show that, under realistic trading costs assumptions, selecting sparse and volatile mean-reverting baskets translates into lower incurred costs and thus improves the performance of trading strategies.

### 5.1 Historical Data

We consider daily time series of option implied volatilities for 210 stocks from January 4 2004 to December 30 2010. A key advantage of using option implied volatility data is that these numbers vary in a somewhat limited range. Volatility also tends to exhibit regime switching, hence can be considered piecewise stationary, which helps in extracting structural relationships. We plot a sample time series from this dataset in Figure 1 that corresponds to the implicit volatility of Apple’s stock. In what follows, we mean by asset the implied volatility of any of these stocks, whose value can be efficiently replicated using option portfolios.

We compare the three basket selection techniques detailed here, predictability, portmanteau and crossing statistic, implemented with varying targets for both sparsity and volatility, with two cointegration estimators that build upon principal component analysis (Maddala and Kim, 1998, §5.5.4). By the label ‘PCA’ we mean in what follows the eigenvector with smallest eigenvalue of the covariance matrix of the process (Stock and Watson, 1988). By ‘sPCA’ we mean the sparse eigenvector of with 0-norm that has the smallest eigenvalue, which can be simply estimated by computing the leading sparse eigenvector of where is bigger than the leading eigenvalue of . This sparse principal component of the covariance matrix should not be confused with our utilization of sparse PCA in Section 4.1 as a way to recover a vector solution from the solution of a positive semidefinite problem. Note also that techniques based on principal components do not take explicitly variance levels into account when estimating the weights of a co-integrated relationship.

### 5.3 Jurek and Yang Trading Strategy

While option implied volatility is not directly tradable, it can be synthesized using baskets of call options, and we assimilate it to a tradable asset with (significant) transaction costs in what follows. For baskets of volatilities isolated by the techniques listed above, we apply the (Jurek and Yang, 2007) strategy for log utilities to the basket process recording out of sample performance. Jurek and Yang propose to trade a stationary autoregressive process of order and mean governed by the equation , where , by taking a position in the asset which is proportional to

 Nt=ρ(μ−xt)σ2Wt (7)

In effect, the strategy advocates taking a long (resp. short) position in the asset whenever it is below (resp. above) its long-term mean, and adjust the position size to account for the volatility of and its mean reversion speed . Given basket weights , we apply standard AR estimation procedures on the in-sample portion of to recover estimates for and and plug them directly in Equation (7). This approach is illustrated for two baskets in Figure 2.

### 5.4 Transaction Costs

We assume that fixed transaction costs are negligible, but that transaction costs per contract unit are incurred at each trading date. We vary the size of these costs across experiments to show the robustness of the approaches tested here to trading costs fluctuations. We let the transaction cost per contract unit vary between 0.03 and 0.17 cents by increments of 0.02 cents. Since the average value of a contract over our dataset is about 40 cents, this is akin to considering trading costs ranging from about 7 to about 40 Base Points (BP), that is 0.07 to 0.4%.

### 5.5 Experimental Setup

We consider 20 sliding windows of one year (255 trading days) taken in the history, and consider each of these windows independently. Each window is split between 85% of days to estimate and 15% of days to test-trade our models, resulting in 38 test-trading days. We do not recompute the weights of the baskets during the test phase. The 210 stock volatilities (assets) we consider are grouped into 13 subgroups, depending on the economic sector of their stock. This results in 13 sector pools whose size varies between 3 assets and 43 assets. We look for mean-reverting baskets in each of these 13 sector pools.

Because all combinations of stocks in each of the 13 sector pools may not necessarily mean-reverting, we select smaller candidate pools of assets through a greedy backward-forward minimization scheme, where . To do so, we start with an exhaustive search of all pools of size 3 within the sector pool, and proceed by adding or removing an asset using the PCA estimator (the smallest eigenvalue of the covariance matrix of a set of assets). We use the PCA estimator in that backward-forward search because it is the fastest to compute. We score each pool using that PCA statistic, the smaller meaning the better. We generate up to 200 candidate pools per each of the 13 sector pools. Out of all these candidate pools, we keep the best 50 in each window, and use then our cointegration estimation approaches separately on these candidates. One such pool was, for instance, composed of the stocks {BBY,COST,DIS,GCI,MCD,VOD,VZ,WAG,T} observed during the year 2006. Figure 2 provides a closeup on that universe of stocks, and shows the results of three trading experiments using either PCA, sparse PCA or the Crossing Stats estimator to build trading strategies.

### 5.6 Results

#### Robustness of Sharpe Ratios to Costs

In Figure 3, we plot the average of the Sharpe ratio over the baskets estimated in our experimental set versus transaction costs. We consider different PCA settings as well as our three estimators using, in all 3 cases, the variance bound to be times the median of all variances of assets available in a given asset pool, and the 0-norm to be equal to 0.3 times the size of the universe (itself between 8 and 12). We observe that Sharpe ratios decrease the fastest for the naive PCA based method, this decrease being somewhat mitigated when adding a constraint on the 0-norm of the basket weights obtained with sparse PCA. Our methods require, in addition to sparsity, enough volatily to secure sufficient gains. These empirical observations agree with the intuition of this paper: simple cointegration techniques can produce synthetic baskets with high mean-reversion, large support, low variance. Trading a portfolio with low variance which is supported by multiple assets translates in practice into high trading costs which can damage the overall performance of the strategy. Both sparse PCA and our techniques manage instead to achieve a trade-off between desirable mean-reversion properties and, at the same time, control for sufficient variance and small basket size to allow for lower overall transaction costs.

#### Tradeoffs Between Mean Reversion, Sparsity, and Volatility

In the plots of Figure 4 and 5, this analysis is further detailed by considering various settings for (volatility threshold) and . To improve the legibility of these results we summarize, following the observation in Figure 3 that the relationship between Sharpes and transactions costs seems almost linear, each of these curves by two numbers: an intercept level (Sharpe ratio when costs are low) and a slope (degradtion of Sharpe as costs increase). Using these two numbers, we locate all considered strategies in the intercept/slope plane. We first show the spectral techniques, PCA and sPCA with different levels of sparsity, meaning that is set to where and is the size of the original basket. Each of the three estimators we propose is studied in a separate plot. For each we present various results characterized by two numbers: a volatility threshold and a sparsity level . To avoid cumbersome labels, we attach an arrow to each point: the arrow’s length in the vertical direction is equal to and characterizes the size of the basket, the horizontal length is equal to and characterizes the volatility level. As can be seen in these 3 plots, an interesting interplay between these two factors allows for a continuum of strategies that trade mean-reversion (and thus Sharpe levels) for robustness to cost level.

## 6 Conclusion

We have described three different criteria to quantify the amount of mean reversion in a time series. For each of these criteria, we have detailed a tractable algorithm to isolate a vector of weights that has optimal mean reversion, while constraining both the variance (or signal strength) of the resulting univariate series to be above a certain level and its 0-norm to be at a certain level. We show that these bounds on variance and support size, together with our new criteria for mean reversion, can significantly improve the performance of mean reversion statistical arbitrage strategies and provide useful controls to adjust mean-reverting strategies to varying trading conditions, notably liquidity risk and cost environment.

### References

1. A. Barvinok. A course in convexity. American Mathematical Society, 2002.
2. R. Bewley, D. Orden, M. Yang, and L.A. Fisher. Comparison of Box-Tiao and Johansen Canonical Estimators of Cointegrating Vectors in VEC (1) Models. Journal of Econometrics, 64:3–27, 1994.
3. Gep Box and GC Tiao. A canonical analysis of multiple time series. Biometrika, 64(2):355–365, 1977.
4. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
5. L. Brickman. On the field of values of a matrix. Proceedings of the American Mathematical Society, pages 61–66, 1961.
6. Marco Cuturi and Alexandre d’Aspremont. Mean reversion with a variance threshold. In Proceedings of the International Conference in Machine Learning 2013, 2013.
7. Alexandre d’Aspremont. Identifying small mean reverting portfolios. Quantitative Finance, 11(3):351–364, 2011.
8. Alexandre d’Aspremont, Laurent El Ghaoui, Michael I Jordan, and Gert RG Lanckriet. A direct formulation for sparse PCA using semidefinite programming. SIAM review, 49(3):434–448, 2007.
9. R. Elie and G.-E. Espinosa. Optimal stopping of a mean reverting diffusion: minimizing the relative distance to the maximum. hal-00573429, 2011.
10. Robert F. Engle and C. W. J. Granger. Co-integration and error correction: Representation, estimation, and testing. Econometrica, 55(2):251–276, 1987.
11. J.D. Hamilton. Time series analysis, volume 2. Cambridge Univ Press, 1994.
12. S. Johansen. Cointegration: a survey. Palgrave Handbook of Econometrics, 1, 2005.
13. Soren Johansen. Estimation and hypothesis testing of cointegration vectors in gaussian vector autoregressive models. Econometrica, 59(6):1551–80, November 1991.
14. Søren Johansen. Cointegration: Overview and development. In Torben Gustav Andersen, Richard A Davis, Jens-Peter Kreiß, and Thomas V Mikosch, editors, Handbook of financial time series. Springer, 2004.
15. Jakub W. Jurek and Halla Yang. Dynamic Portfolio Selection in Arbitrage. SSRN eLibrary, 2007. doi: 10.2139/ssrn.882536.
16. B. Kedem and S. Yakowitz. Time series analysis by higher order crossings. IEEE press Piscataway, NJ, 1994.
17. J. Liu and A. Timmermann. Optimal arbitrage strategies. Technical report, UC San Diego Working Paper, 2010.
18. G.M. Ljung and G.E.P. Box. On a measure of lack of fit in time series models. Biometrika, 65(2):297–303, 1978.
19. L. Lovász and A. Schrijver. Cones of matrices and set-functions and - optimization. SIAM Journal on Optimization, 1(2):166–190, 1991.
20. H. Lütkepohl. New Introduction to Multiple Time Series Analysis. Springer, 2005.
21. GS Maddala and I.M. Kim. Unit roots, cointegration, and structural change. Cambridge Univ Pr, 1998.
22. Karl Sjöstrand, Line Harder Clemmensen, Rasmus Larsen, and Bjarne Ersbøll. Spasm: A matlab toolbox for sparse statistical modeling. Journal of Statistical Software, 2012.
23. J.H. Stock and M.W. Watson. Testing for common trends. Journal of the American Statistical Association, pages 1097–1107, 1988.
24. Ruey S Tsay. Analysis of financial time series, volume 543. Wiley-Interscience, 2005.
25. L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM review, 38(1):49–95, 1996.
26. N Donald Ylvisaker. The expected number of zeros of a stationary gaussian process. The Annals of Mathematical Statistics, pages 1043–1046, 1965.
27. Hui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265–286, 2006.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters