1 Introduction

Heterotic Risk Models

Heterotic Risk Models

Zura Kakushadze1


Quantigic Solutions LLC

1127 High Ridge Road #135, Stamford, CT 06905  2

Free University of Tbilisi, Business School & School of Physics

240, David Agmashenebeli Alley, Tbilisi, 0159, Georgia

(April 30, 2015)


We give a complete algorithm and source code for constructing what we refer to as heterotic risk models (for equities), which combine: i) granularity of an industry classification; ii) diagonality of the principal component factor covariance matrix for any sub-cluster of stocks; and iii) dramatic reduction of the factor covariance matrix size in the Russian-doll risk model construction. This appears to prove a powerful approach for constructing out-of-sample stable short-lookback risk models. Thus, for intraday mean-reversion alphas based on overnight returns, Sharpe ratio optimization using our heterotic risk models sizably improves the performance characteristics compared to weighted regressions based on principal components or industry classification. We also give source code for: a) building statistical risk models; and ii) Sharpe ratio optimization with homogeneous linear constraints and position bounds.

1 Introduction

When the number of stocks in a portfolio is large and the number of available (relevant) observations in the historical time series of returns is limited – which is essentially a given for short-horizon quantitative trading strategies – the sample covariance matrix (SCM) is badly singular. This makes portfolio optimization – e.g., Sharpe ratio maximization – challenging as it requires the covariance matrix to be invertible. A standard method for circumventing this difficulty is to employ factor models,3 which, instead of computing SCM for a large number of stocks, allow to compute a factor covariance matrix (FCM) for many fewer risk factors. However, the number of relevant risk factors itself can be rather large. E.g., in a (desirably) granular industry classification (IC), the number of industries4 can be in 3 digits for a typical (liquid) trading universe. Then, even the sample FCM can be singular.

In (Kakushadze, 2015c) a simple idea is set forth: model FCM itself via a factor model, and repeat this process until the remaining FCM is small enough and can be computed. In fact, at the end of this process we may even end up with a single factor, for which “FCM” is simply its variance.5 This construction – termed as “Russian-doll” risk models (Kakushadze, 2015c) – dramatically reduces the number of or altogether eliminates the factors for which (off-diagonal) FCM must be computed. The “catch” is that at each successive step we must: i) identify the risk factors; and ii) compute the specific (idiosyncratic) risk (ISR) and FCM consistently.

Identifying the risk factors in the Russian-doll construction is facilitated by using a binary industry classification:6 using BICS as an illustrative example, industries serve as the risk factors for sub-industries; sectors – there are only 10 of them – serve as the risk factors for industries; and – if need be – the “market” serves as the sole risk factor for sectors. Correctly computing ISR and FCM is more nontrivial: the algorithms for this are generally deemed proprietary. One method in the “lore” is to use a cross-sectional linear regression, where the returns are regressed over a factor loadings matrix (FLM), and FCM is identified with the (serial) covariance matrix of the regression coefficients, whereas ISR squared is identified with the (serial) variance of the regression residuals. However, as discussed in (Kakushadze, 2015c), generally this does not satisfy a nontrivial requirement (which is often overlooked in practice) that the factor model reproduce the historical in-sample total variance.

In this paper we share a complete algorithm and source code for building what we refer to as “heterotic” risk models. It is based on a simple observation that, if we use principal components (PCs) as FLM, the aforementioned total variance condition is automatically satisfied. Unfortunately, the number of useful PCs is few as it is limited by the number of observations, and they also tend to be unstable out-of-sample (as they are based on off-diagonal covariances), with the first PC being most stable. We circumvent this by building FLM from the first PCs of the blocks (sub-matrices) of the sample correlation matrix7 corresponding to – in the BICS language – the sub-industries. I.e., if there are stocks and sub-industries, FLM is , and in each column the elements corresponding to the tickers in that sub-industry are proportional to the first PC of the corresponding block, with all other elements vanishing.8 The total variance condition is automatically satisfied. Then, applying the Russian-doll construction yields a nonsingular factor model covariance matrix, which, considering it sizably adds value in Sharpe ratio optimization for certain intraday mean-reversion alphas we backtest, appears to be stable out-of-sample.

Heterotic risk models are based on our proprietary know-how. We hope sharing it with the investment community encourages organic custom risk models building.

This paper is organized as follows. In Section 2 we briefly review some generalities of factor models and discuss in detail the total variance condition. In Section 3 we discuss the PC approach and an algorithm for fixing the number of PC factors, with the R source code in Appendix A. We discuss heterotic risk models in detail in Section 4, with the complete Russian-doll embedding in Section 5 and the R source code in Appendix B. In Section 6 we run a horse race of intraday mean-reversion alphas via i) weighted regressions and ii) optimization using heterotic risk models. For optimization with homogeneous linear constrains and (liquidity/position) bounds we use the R source code in Appendix C.9 We briefly conclude in Section 7.

2 Multi-factor Risk Models

2.1 Generalities

In a multi-factor risk model, a sample covariance matrix for stocks, , which is computed based on a time series of stock returns (e.g., daily close-to-close returns), is modeled via a constructed covariance matrix given by

(1)
(2)

where is the Kronecker delta; is an matrix; is the specific risk (a.k.a. idiosyncratic risk) for each stock; is an factor loadings matrix; and is a factor covariance matrix, , where . I.e., the random processes corresponding to stock returns are modeled via random processes (specific risk) together with random processes (factor risk):

(3)
(4)
(5)
(6)
(7)

When , where is the number of observations in each time series, the sample covariance matrix is singular with nonzero eigenvalues. In contrast, assuming all and is positive-definite, then is automatically positive-definite (and invertible). Furthermore, the off-diagonal elements of typically are not expected to be too stable out-of-sample. On the contrary, the factor model covariance matrix is expected to be much more stable as the number of risk factors, for which the factor covariance matrix needs to be computed, is .

2.2 Conditions on Total Variances

The prime aim of a risk model is to predict the covariance matrix out-of-sample as precisely as possible, including the out-of-sample total variances. However, albeit this requirement is often overlooked in practical applications, a well-built factor model had better reproduce the in-sample total variances. That is, we require that the factor model total variance coincide with the in-sample total variance :

(8)

A priori this gives conditions10 for unknowns and , so we need additional assumptions11 to compute and .

2.3 Linear Regression

One such assumption – intuitively – is that the total risk should be attributed to the factor risk to the greatest extent possible, i.e., the part of the total risk attributed to the specific risk should be minimized. One way to formulate this requirement mathematically is via least squares. First, mimicking (3), we decompose the stock returns via a linear model

(9)

Here the residuals are not the same as in (3); in particular, generally the covariance matrix is not diagonal (see below). We can require that

(10)

where , and the minimization is w.r.t. . This produces a weighted linear regression12 with the regression weights . So, what should these weights be?

2.4 Correlations, Not Covariances

While choosing unit weights might appear as the simplest thing to do, this suffers from a shortcoming. Intuitively it is clear that – on average – the residuals are larger for more volatile stocks, so the regression with unit weights would produce skewed results.13 This can be readily rectified using nontrivial regression weights. A “natural” choice is . In fact, we have a regression with unit weights:

(11)
(12)

where , , and on average are expected to be much more evenly distributed compared with – we have scaled away the volatility skewness via rescaling the returns, factor loadings and residuals by .

So, we are now modeling the sample correlation matrix (note that , while for )14 via another factor model matrix

(13)

where , and . The solution to (12) is given by (in matrix notation)

(14)
(15)
(16)

where is a projection operator: . Consequently, we have:

(17)
(18)

Note that the matrix is not diagonal. However, the idea here is to identify with the diagonal part of :

(19)

and we have

(20)

Note that defined via (19) are automatically positive (nonnegative, to be precise – see below). However, we must satisfy the conditions (8), which reduce to

(21)

and imply

(22)
(23)

The conditions (22) are not all independent. Thus, we have .

3 Principal Components

The conditions (22) are nontrivial. They are not satisfied for an arbitrary factor loadings matrix . However, there is a simple way of satisfying these conditions, to wit, by building from the principal components of the correlation matrix .

Let , be the principal components of forming an orthonormal basis

(24)
(25)

such that the eigenvalues are ordered decreasingly: . More precisely, some eigenvalues may be degenerate. For simplicity – and this is not critical here – we will assume that all positive eigenvalues are non-degenerate. However, we can have multiple null eigenvalues. Typically, the number of nonvanishing eigenvalues15 is , where, as above, is the number of observations in the stock return time series. We can readily construct a factor model with :

(26)

Then the factor covariance matrix and we have

(27)
(28)

so . See Appendix B for the R code including the following algorithm.

3.1 Fixing

When we have , which is singular.16 Therefore, we must have . So, how do we determine ? And is there other than the evident answer ? Here we can do a lot of complicated, even convoluted things. Or we can take a pragmatic approach and come up with a simple heuristic. Here is one simple algorithm that does a very decent job at fixing .

The idea here is simple. It is based on the observation that, as approaches , goes to 0 (i.e., less and less of the total risk is attributed to the specific risk, and more and more of it is attributed to the factor risk), while as approaches 0, goes to 1 (i.e., less and less of the total risk is attributed to the factor risk, and more and more of it is attributed to the specific risk). So, as a rough cut, we can think of and as the maximum/minimum values of such that and , where and are some desired bounds on the fraction of the contribution of the specific risk into the total risk. E.g., we can set and . In practice, we actually need to fix the value of , not and , especially that for some preset values of and we may end up with . However, the above discussion aids us in coming up with a simple heuristic definition for what should be. Here is one:

(29)
(30)

i.e., we take for which (which monotonically decreases with increasing ) is closest to 1. This simple algorithm works pretty well in practical applications.17

3.2 Limitations

An evident limitation of the principal component approach is that the number of risk factors is limited by . If long lookbacks are unavailabe/undesirable, as, e.g., in short-holding quantitative trading strategies, then typically . Yet, the number of the actually relevant underlying risk factors can be substantially greater than , and most of these risk factors are missed by the principal component approach. In this regard, we can ask: can we use other than the first principal components to build a factor model? The answer, prosaically, is that, without some additional information, it is unclear what to do with the principal components with null eigenvalues. They simply do not contribute to any sample factor covariance matrix. However, not all is lost. There is a way around this difficulty.

4 Heterotic Construction

4.1 Industry Risk Factors

Without long lookbacks, the number of risk factors based on principal components is limited.18 However, risk factors based on a granular enough industry classification can be plentiful. Furthermore, they are independent of the pricing data and, in this regard, are insensitive to the lookback. In fact, typically they tend to be rather stable out-of-sample as companies seldom jump industries, let alone sectors.

For terminological definiteness, here we will use the BICS nomenclature for the levels in the industry classification, albeit this is not critical here. Also, BICS has three levels “sector industry sub-industry” (where “sub-industry” is the most granular level). The number of levels in the industry hierarchy is not critical here either. So, we have: stocks labeled by ; sub-industries labeled by ; industries labeled by ; and sectors labeled by . More generally, we can think of such groupings as “clusters”. Sometimes, loosely, we will refer to such cluster based factors as “industry” factors.19

4.2 “Binary” Property

The binary property implies that each stock belongs to one and only one sub-industry, industry and sector (or, more generally, cluster). Let be the map between stocks and sub-industries, be the map between sub-industries and industries, and be the map between industries and sectors:

(31)
(32)
(33)

The nice thing about the binary property is that the clusters (sub-industries, industries and sectors) can be used to identify blocks (sub-matrices) in the correlation matrix . E.g., at the most granular level, for sub-industries, the binary matrix defines such blocks. Thus, the sum , where is an arbitrary -vector, is the same as , where is the set of tickers in the sub-industry . These blocks are the backbone of the following construction.

4.3 Heterotic Models

Consider the following factor loadings matrix:

(34)
(35)

where is the set of tickers (whose number ) in the sub-industry labeled by . Then the -vector is the first principal component of the matrix defined via , . (Note that ; also, let the corresponding (largest) eigenvalue of be .)20 With this factor loadings matrix we can compute the factor covariance matrix and specific risk via a linear regression as above, and we get:

(36)
(37)

so we have21

(38)

and automatically . This simplicity is due to the use of the (first) principal components corresponding to the blocks of the sample correlation matrix.

Multiple Principal Components

For the sake of completeness, let us discuss an evident generalization. Above in (34) we took the binary map between the tickers and sub-industries and augmented it with the first principal components of the corresponding blocks in the sample correlation matrix. Instead of taking only the first principal component, we can take the first principal components for each block labeled by the sub-industry (). Then we have risk factors labeled by pairs , where for a given value of we have (with ). The factor loadings matrix reads:

(39)

where is the matrix whose columns are the first principal components (with eigenvalues ) of the matrix (as above, , , and , .) In order to have nonvanishing specific risks, it is necessary that we take ( is the number of observations in the time series). We then have

(40)

and (as in the case above with all ) automatically .

Caveats

The above construction might look like a free lunch, but it is not. Let us start with the case (first principal components only). For short lookbacks, the number of risk factors typically is too large: can easily be greater than , so22

(41)

is singular. In general, the sample factor covariance matrix is singular if . We will deal with this issue below via the nested Russian-doll risk model construction.

This issue is further exacerbated in the multiple principal component construction (with at least some ) as the number of risk factors is even larger. This too can be dealt with via the Russian-doll construction. However, there is yet another caveat pertinent to using multiple principal components, irrespective of whether the factor covariance matrix is singular or not. The principal components are based on off-diagonal elements of and tend to be unstable out-of-sample, the first principal component typically being the most stable. So, for the sake of simplicity, below we will focus on the case with only first principal components.

5 Russian-Doll Construction

5.1 General Idea

As discussed above, the sample factor covariance matrix is singular if the number of factors is greater than . The simple idea behind the Russian-doll construction is to model such itself via yet another factor model matrix (as opposed to computing it as a sample covariance matrix of the risk factors ):23

(42)
(43)

where is the specific risk for the “normalized” factor return ; , , is the corresponding factor loadings matrix; and is the factor covariance matrix for the underlying risk factors , , where we assume that . If the smaller factor covariance matrix is still singular, we model it via yet another factor model with fewer risk factors, and so on – until the resulting factor covariance matrix is nonsingular. If, at the final stage, we are left with a single factor, then the resulting factor covariance matrix is automatically nonsingular – it is simply the sample variance of the remaining factor.

5.2 Complete Heterotic Russian-Doll Embedding

For concreteness we will use the BICS terminology for the levels in the industry classification, albeit this is not critical here. Also, BICS has three levels “sector industry sub-industry” (where “sub-industry” is the most granular level). For definiteness, we will assume three levels here, and the generalization to more levels is straightforward. So, we have: stocks labeled by ; sub-industries labeled by ; industries labeled by ; and sectors labeled by . A nested Russian-doll risk model then is constructed as follows:

(44)
(45)
(46)
(47)
(48)
(49)
(50)
(51)

where

(52)
(53)
(54)
(55)

and

(56)
(57)
(58)
(59)

so we have , , , and (see below). Also, ( tickers in sub-industry ), ( sub-industries in industry ), ( industries in sector ), and the maps (tickers to sub-industries), (sub-industries to industries) and (industries to sectors) are defined in (31), (32) and (33). Furthermore,

(60)
(61)
(62)
(63)

and

(64)
(65)
(66)

The -vector is the first principal component of with the eigenvalue (, ), the -vector is the first principal component of with the eigenvalue (, ), the -vector is the first principal component of with the eigenvalue (, ), while is the first principal component of with the eigenvalue . The vectors , and are normalized, so , , , and also .

For the sake of completeness, above we included the step where the sample factor covariance matrix for the sectors is further approximated via a 1-factor model . If computed via (58) is nonsingular, then this last step can be omitted,24 so at the last stage we have factors (as opposed to a single factor).25 Similarly, if we have enough observations to compute the sample covariance matrix for the industries, we can stop at that stage. Finally, note that in the above construction we are guaranteed to have , , and (with the last equality occurring only for single-ticker sub-industries and not posing a problem – see below).26 In Appendix B we give the R code for building heterotic risk models.

5.3 Model Covariance Matrix and Its Inverse

The model covariance matrix is given by defined in (44). For completeness, let us present it in the “canonical” form:

(67)

where

(68)
(69)
(70)

where is defined in (52), is defined in (64), is defined in (46), and we use the star superscript in the our factor covariance matrix (which is nonsingular) to distinguish it from the sample factor covariance matrix (which is singular).

In many applications, such as portfolio optimization, one needs the inverse of the matrix . When we have no single-ticker sub-industries, the inverse is given by (in matrix notation)

(71)
(72)
(73)

However, when there are some single-ticker sub-industries, the corresponding , (), so (71) “breaks”. Happily, there is an easy “fix”. This is because for such tickers the specific risk and factor risk are indistinguishable. Recall that , , and , (). We can rewrite via

(74)

where: for ; for with arbitrary , ; if or or ; and for . (Here we have taken into account that , .) Now we can invert via

(75)
(76)
(77)

Note that, due to the factor model structure, to invert the matrix , we only need to invert two matrices and . If there are no single-ticker sub-industries, then itself has a factor model structure and involves inverting two matrices, one of which has a factor model structure, and so on.

6 Horse Race

So, suppose we have built a complete heterotic risk model. How do we know it adds value? I.e., how do we know that the off-diagonal elements of the factor model covariance matrix are stable out-of-sample to the extent that they add value. We can run a horse race. There are many ways of doing this. Here is one. For a given trading universe we compute some expected returns, e.g., based on overnight mean-reversion. We can construct a trading portfolio by using our heterotic risk model covariance matrix in the optimization whereby we maximize the Sharpe ratio (subject to the dollar neutrality constraint). On the other hand, we can run the same optimization with a diagonal sample covariance matrix subject to neutrality (via linear homogeneous constraints) w.r.t. the underlying heterotic risk factors (plus dollar neutrality).27 In fact, optimization with such diagonal covariance matrix and subject to such linear homogeneous constraints is equivalent to a weighted cross-sectional regression with the loadings matrix (over which the expected returns are regressed) identified with the factor loadings matrix (augmented by the intercept, i.e., the unit vector, for dollar neutrality) and the regression weights identified with the inverse sample variances (see (Kakushadze, 2015a) for details). So, we will refer to the horse race as between optimization (using the heterotic risk model) and weighted regression (with the aforementioned linear homogeneous constraints).28

6.1 Notations

Let be the time series of stock prices, where labels the stocks, and labels the trading dates, with corresponding to the most recent date in the time series. The superscripts and (unadjusted open and close prices) and and (open and close prices fully adjusted for splits and dividends) will distinguish the corresponding prices, so, e.g., is the unadjusted close price. is the unadjusted daily volume (in shares). Also, for each date we define the overnight return as the previous-close-to-open return:

(78)

This return will be used in the definition of the expected return in our mean-reversion alpha. We will also need the close-to-close return

(79)

An out-of-sample (see below) time series of these returns will be used in constructing the heterotic risk model and computing, among other things, the sample variances . Also note that all prices in the definitions of and are fully adjusted.

We assume that: i) the portfolio is established at the open29 with fills at the open prices ; ii) it is liquidated at the close on the same day – so this is a purely intraday alpha – with fills at the close prices ; and iii) there are no transaction costs or slippage – our aim here is not to build a realistic trading strategy, but to test that our heterotic risk model adds value to the alpha. The P&L for each stock

(80)

where are the dollar holdings. The shares bought plus sold (establishing plus liquidating trades) for each stock on each day are computed via .

6.2 Universe Selection

For the sake of simplicity,30 we select our universe based on the average daily dollar volume (ADDV) defined via (note that is out-of-sample for each date ):

(81)

We take (i.e., one month), and then take our universe to be the top 2000 tickers by ADDV. To ensure that we do not inadvertently introduce a universe selection bias, we rebalance monthly (every 21 trading days, to be precise). I.e., we break our 5-year backtest period (see below) into 21-day intervals, we compute the universe using ADDV (which, in turn, is computed based on the 21-day period immediately preceding such interval), and use this universe during the entire such interval. We do have the survivorship bias as we take the data for the universe of tickers as of 9/6/2014 that have historical pricing data on http://finance.yahoo.com (accessed on 9/6/2014) for the period 8/1/2008 through 9/5/2014. We restrict this universe to include only U.S. listed common stocks and class shares (no OTCs, preferred shares, etc.) with BICS sector, industry and sub-industry assignments as of 9/6/2014.31 However, as discussed in detail in Section 7 of (Kakushadze, 2015a), the survivorship bias is not a leading effect in such backtests.32

6.3 Backtesting

We run our simulations over a period of 5 years (more precisely, 1260 trading days going back from 9/5/2014, inclusive). The annualized return-on-capital (ROC) is computed as the average daily P&L divided by the intraday investment level (with no leverage) and multiplied by 252. The annualized Sharpe Ratio (SR) is computed as the daily Sharpe ratio multiplied by . Cents-per-share (CPS) is computed as the total P&L divided by the total shares traded.33

6.4 Weighted Regression Alphas

We will always require that our portfolio be dollar neutral:

(82)

We will further require neutrality

(83)

with the three different incarnations for the loadings matrix (for each trading day , so we omit the index )34 defined via:

(84)
(85)
(86)

Here