Bayesian Emulation for Optimization
in Multi-Step Portfolio Decisions
We discuss the Bayesian emulation approach to computational solution of
multi-step portfolio studies in financial time series. Bayesian emulation for
decisions involves mapping the technical structure of a decision analysis problem to that of
Bayesian inference in a purely synthetic “emulating” statistical model. This provides access to standard posterior analytic, simulation and optimization methods that yield indirect solutions of the decision problem.
We develop this in time series portfolio analysis using classes of economically and psychologically relevant multi-step ahead portfolio utility functions.
Studies with multivariate currency, commodity and stock index time series illustrate the approach and show some of the practical utility and benefits of the Bayesian emulation methodology.
Some key words and phrases: Bayesian forecasting; Dynamic dependency network models; Marginal and joint modes; Multi-step forecasting; Portfolio decisions; Synthetic model
The research reported here was developed while Kaoru Irie was a graduate student in the Department of Statistical Science at Duke University. Kaoru was partly supported by a fellowship from the Nakajima Foundation of Japan, and he received the 2014-15 BEST Award for Student Research at Duke University. Support from the Nakajima and BEST Foundations are gratefully acknowledged. Any opinions, findings and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of the Nakajima or BEST Foundations.
The authors thank Mengran Deng for sharing his unpublished undergraduate research findings (with M.W.) related to the contents in Section 2.
This work stems from an interest in Bayesian portfolio decision problems with long-term, multi-step investment objectives that lead to the need for computational methods for portfolio optimization. Methodological advances reflect the fact that some such optimization problems can be recast– purely technically– as problems of computing modes of marginal posterior distributions in “synthetic” statistical models. We then have access to analytic and computational machinery for exploring posterior distributions whose marginal modes represent target optima in originating optimization/decision problems. We refer to this as Bayesian emulation for decisions, with the synthetic statistical model regarded as an emulating framework for computational reasons.
The use of decision analysis for portfolios coupled with dynamic models for forecasting financial time series continues to be a very active area of Bayesian analysis– in research and in broad application in personal and corporate gambling on markets of all kinds. Forecasting with multivariate dynamic linear/volatility models coupled with extensions of traditional Markowitz mean-variance optimization (Markowitz 1991) define benchmark approaches (e.g. Quintana 1992; Aguilar and West 2000; Quintana et al. 2003, 2010; Polson and Tew 2000, chapter 10 of Prado and West 2010, Jacquier and Polson 2011, among others). Much recent work has emphasised advances in forecasting ability based on increasingly structured multivariate models (e.g. Zhou et al. 2014; Nakajima and West 2013b, a, 2015, 2016; Zhao et al. 2016; Gruber and West 2016a, b) with benefits in portfolio outcomes based, in part, on improved characterizations of dynamics in multivariate stochastic volatility. However, relatively little Bayesian work addresses interests in more relevant utility/loss functions, especially in longer-term, multi-step portfolio contexts; much of the cited work here employs standard myopic/one-step ahead decision rules. Our emphasis is to complement these time series forecasting advances with Bayesian decision analysis that explicitly reflects personal or commercial utilities for stable portfolios in a multi-step context.
In stylized forecasting and decision problems, analysis involves computing portfolio weight vectors to minimize expected portfolio loss functions, and to repeatedly apply this sequentially over time. The solutions can be approximated numerically in a number of ways, depending on the form of the loss function, but typically need customization of the numerical techniques. The approach here– emerging naturally in the specific context of multi-step portfolios– is a general approach applicable to a variety of loss functions. At any one time point with decision variable and expected loss function , the Bayesian emulation strategy is useful if/when there exists a purely synthetic statistical model involving hypothetical random vectors (parameters, latent variables or states) and generating a posterior density under which the marginal model of is theoretically equal to the optimal in the portfolio decision. Minimizing can then be approached by exploring with standard analytic and numerical methods for posterior analysis. While novel in terms of our context and development, the basic idea here goes back (at least) to Müller (1999). There, with discrete decision variables in non-sequential design contexts, optimization is solved using a similar synthetic posterior idea and combining optimization with estimation using MCMC. This approach has, surprisingly, seen limited development, although recent work by Ekin et al. (2014) represents extension and new application. Our work takes a broader emulating perspective with complete separation of models/forecasting and decisions/optimization. We develop emulation of portfolio decisions using forecast information from state-of-the-art multivariate dependency network models (Zhao et al. 2016), treated as given. We then define the new multi-step decision strategy for computing and revising Bayesian portfolios over time based on these forecasts.
Section 2 summarizes the multi-step portfolio set-up in sequential forecasting. To define and exemplify the emulation approach, we give summary details of its use in multi-step portfolios with extensions of standard (myopic, constrained) quadratic loss functions. Here the emulating synthetic statistical models are conditionally linear and normal state-space models, i.e., dynamic linear models (DLMs), amenable to evaluation using analytic forward filtering and backward smoothing (FFBS) methods. This is extended in Section 3 to a class of portfolios with sparsity-inducing penalties on portfolio weights and turnover. The emulating models here also have state-space forms, but now with non-normal structure. With augmented state-spaces, we can convert these to conditional DLMs in which posterior evaluation and mode search are efficiently performed by combining FFBS with a customized EM method. Section 4 discusses a fundamental question of definition of portfolio loss functions and objectives in multi-step contexts, and a strategy for marginal mode evaluation. A range of portfolio loss functions are then evaluated in sequential forecasting and portfolio construction with a dimensional series of daily FX, commodity and market index prices. Section 5 discusses this, highlighting choices of portfolio loss functions and objectives, and practical benefits arising with sparsity-inducing, multi-step portfolio strategies. The latter shows the potential to improve portfolio outcomes, particularly in the presence of realistic transaction costs. Comments in Section 6 conclude the main paper. Appendices provide technical details on optimization and on dynamic dependency network models used for forecasting.
Notational Remarks: We use for a generic density of given Normal, exponential and gamma distributions are written as with mean and with shape and mean ; the values of their density functions at a particular are denoted by and respectively. Indices for are shortened as . The -dimensional all-ones and all-zeros vectors are and , respectively, and represents a zero vector or matrix when dimensions are obvious.
2 Multi-Step Emulation: Constrained Quadratic Losses
2.1 Setting and Notation
Over times we observe a vector asset price time series ; the returns vector has elements At time with current information set , a model defines a forecast distribution for returns at the next time points. With no loss of generality and to simplify notation, take current time with initial information set Predicting ahead, the predictive mean vectors and precision (inverse variance) matrices are denoted by and over the steps ahead The time portfolio weight vector has elements some of which may be negative reflecting short-selling. Standing at with a current, known portfolio stylized myopic (one-step) Markowitz analyses are Bayesian decision problems focused on choosing subject to constraints. Standard mean-variance portfolios minimize subject to a chosen expected return target and usually a sum-to-one constraint , i.e., allowing only portfolios closed to draw-down or additional investment.
For multi-step portfolios, extend to consider the sequence of potential portfolio vectors over the next periods. The decision is to choose but we are interested in target returns and portfolio turnover control over multiple steps, and so must consider how the decision analysis might play-out up to time Consider multi-step (expected) loss functions of the form
where , and are specified positive weights defining relative contributions of the terms in this sum, while is the (least-norm) generalized inverse of a specified positive-semi-definite matrix , and will be the usual inverse in cases of positive-definiteness. Also, now includes the current portfolio vector
The first set of terms in the sum involve specified multi-step target returns . Individual investors typically prefer realized portfolios to progress relatively smoothly towards an end-point target , rather than bouncing from high to low interim returns. The weights can be used to increasingly emphasize the importance of later-stage returns as approaches Note that allowing theoretically implies the hard constraint on expected return, as in the standard myopic case. Hence we refer to eqn. (1) as including “soft target constraints,” while having the ability to enforce the hard constraint at the terminal point via sending to zero.
The second set of terms in eqn. (1) penalize portfolio uncertainty using the standard risk measures , again allowing differential weighting as a function of steps-ahead . The final set of terms relates to portfolio turnover. If these terms penalize changes in allocations across all assets. If trades are at a fixed rate, this is a direct transaction cost penalty; otherwise, it still relates directly to transactions costs and so that terminology will be used. With a heavy emphasis on these terms– as defined by the weights– optimal portfolios will be more stable over time, providing less stress on investors (including emotional as well as workload stress for individual investors). The can play several constraint-related roles, as we discuss below.
2.2 Portfolio Optimization and Emulating Models
There are, of course, no new computational challenges to simple quadratic optimization implied by eqn. (1). Key points are that it is easy to: (i) compute the joint optimizing values , and (ii) deduce the one-step optimizing for the Bayesian decision. Optimization with respect to alone can be immediately performed using a forward-backward dynamic programming algorithm. Importantly, the optimizing value for (or for any subset of the ) is– as a result of the quadratic nature of eqn. (1)– precisely that sub-vector (or subset of vectors) arising at the global/joint maximizer
The emulation idea translates the above concepts into a synthetic Bayesian model immediately interpretable by statisticians. Rewrite eqn. (1) as
where each term is a specific normal p.d.f., the are interpreted as random quantities in a multivariate normal distribution underlying this density form, and where each is set at Specifically, consider a dynamic linear model (DLM) generating pairs of observations – with scalar and a vector– based on latent vector states via
with a known initial state (the current portfolio) and where the are independent and mutually independent innovations sequences. In this model, observing the sequence of synthetic observations with immediately implies the resulting posterior as given in eqn. (2).
Observe that computing the minimizer of is equivalent to calculating the posterior mode for in the synthetic DLM. It is immediate that the required (marginal) optimizing value for is the marginal mode in this joint posterior. Since the joint posterior is normal, marginal modes coincide with values at the joint mode, so we can regard the Bayesian optimization as solved either way. We easily compute the mode of using the forward filtering, backward smoothing (FFBS) algorithm– akin to a Viterbi-style optimization algorithm (e.g. Viterbi 1967; Godsill et al. 2001, 2004)– widely used in applications of DLMs (e.g. West and Harrison 1997; Prado and West 2010).
2.3 Imposing Linear Constraints
As noted above, some applications may desire a hard target at the terminal point, and this is formally achieved by setting the synthetic variance in eqn. (3). The multivariate normal posterior is singular due to the resulting constraint , but this raises no new issues as the FFBS computations apply directly.
The general framework also applies with singular matrices , now playing the roles of the variance matrices of state innovations in eqn. (5). These arise to enforce linear portfolio constraints where is a given vector and is a full-rank matrix with Choose to satisfy these constraints and ensure that each is such that . Then the priors and posteriors for the synthetic states are singular and constrained such that (almost surely). Again the FFBS analysis applies directly to generate the optimal portfolio vector – and the sequence of interim optimizing values even though only is used at . This now involves propagating singular normal posteriors for states, as is standard in, for example, constrained seasonal DLMs (e.g. West and Harrison 1997, sect. 8.4). A key portfolio case is the sum-to-one constraint for all Here we redefine beginning with the identity – representing equal and independent penalization of turnover across assets– and then condition on the constraints to give rank matrices
3 Multi-Step Emulation: Constrained Laplace Losses
3.1 Basic Setting
Now consider modifications to (i) more aggressively limit switching in/out of specific assets between time points– for both transaction and psychological cost considerations, and to (ii) limit the numbers of assets invested at any time point. Several authors have considered absolute loss/penalties to encourage shrinkage-to-zero of optimizing portfolio vectors (e.g. Jagannathan and Ma 2003; Brodie et al. 2009) and we build on this prior work. Key points, however, are that such approaches have not been consistent with a Bayesian decision analysis framework, while goals with respect to marginal versus joint optimization in the multi-step context have been poorly understood and explored, and require clarification. Our fully Bayesian emulation strategy adds to this literature while also clarifying this critical latter point and defining relevant methodology.
The “Laplace loss” terminology relates to novel synthetic statistical models that emulate portfolio optimization with absolute norm terms to penalize portfolio weight changes. Modify eqn. (1) to the form
where the final term now replaces the quadratic score with the sum of absolute changes of asset weights Relative to eqn. (1), this aims to more aggressively limit transaction costs, both monetary and psychological. Optimizing globally over may/will encounter boundary values in which some portfolio weights are unchanged between times and This theoretical lasso-style fact is one reason for the interest in such loss functions, due to the implied expectation of reduced portfolio turnover– or “churn”– and hence reduced costs.
3.2 Emulating Dynamic Laplace Models
In parallel to Section 2.2, we identify a synthetic statistical model– again a state-space model but now with non-normal evolution/transition components for the synthetic latent states – of the form
where denotes the Laplace (double exponential) distribution– the p.d.f. for each element is . Also, the are independent and mutually independent across the ranges of all suffices.
One of the immediate benefits of the Bayesian emulating model approach is that we can exploit latent variable constructs. In particular here, the Laplace distributions are known to be scale mixtures of normals (Andrews and Mallows 1974; West 1984, 1987). Thus, there exist latent random quantities , such that independently over , and based on which each synthetic state evolution in eqn. (9) has the form
Augmenting by the vectors of latent scales , the evolutions in eqn. (9) become
This defines a conditionally normal DLM and the above/standard FFBS algorithm can be used to evaluate the posterior mode of for any including that at zero. To maximize over portfolios in the implied marginal with respect to , Bayesian EM (e.g. Dempster et al. 1977) is the obvious and easily implemented approach. Here the E-step applies to the latent , while FFBS gives the exact M-step for at each iterate. In summary:
Initialization: Set each arbitrarily. Candidates for initial values are the current , or the trivially computed values that optimize the multi-step portfolios under the quadratic loss of Section 2.
For EM iterates under a chosen stopping rule, repeat the following:.
On stopping at iterate , use as the approximate optimizing portfolio vector.
3.3 Extended Laplace Loss for Sparser Portfolios
In the one-step, myopic context, penalizing portfolio variance with a term proportional to is an obvious strategy towards the goal of inducing shrinkage to zero of optimized portfolio weights. As noted earlier, a number of recent works have introduced such a lasso-style penalty directly on portfolio weights, rather than on changes in weights, and with standard convex optimization algorithms for solution (e.g. Brodie et al. 2009) and demonstrating improved portfolio performance in some cases (e.g. Jagannathan and Ma 2003). We now integrate such penalties as components of a more general class loss function embedded in the multi-step framework, and develop the Bayesian emulation methodology for this novel context.
The shrinkage-inducing penalty aims to drive some subset of weights to zero– exactly in the one-step, myopic context when balanced only by portfolio risk. A key point to note is that, when the portfolio vector is also subject to the sum-to-one constraint, then the combined loss function also more aggressively penalizes negative weights, i.e., short positions, and so is particularly of interest to personal investors and institutional funds that generally adopt long positions. That is, the absolute weight penalty operates as a soft constraint towards non-negative weights. In our broader context below, this does not theoretically imply non-negative optimal weights, but does often yield such solutions. Modify eqn. (6) to the form
with weights on the new absolute loss terms at each horizon Extending the latent variable construction of double exponential distributions to these terms in addition to the turnover terms, we now see that optimizing eqn. (12) is equivalent to computing the mode over states in a correspondingly extended synthetic DLM. This emulating model is:
with synthetic observations (scalar) and (vectors), and where latent scales are augmented with additional terms for each Conditioning on converts the Laplace term to a conditional normal. To incorporate exact linear constraints on each the above is modified only through the implied changes to the ; this is precisely as detailed at the end of Section 3.2 above.
Extension of the FFBS/EM algorithm of Section 3.2 provides for computation of the optimizing . Each E-step now applies to the latent as well as , while the M-step applies as before to at each iterate. Following initialization at , the earlier details of iterates are modified as follows:
Update the via to give a new matrix
Update the via to give a new matrix
M-step: FFBS applied to the extended emulating model eqn. (13) yields the exact mode of the synthetic posterior conditional on current
The resulting defines the optimizing portfolio vector.
4 One-Step Decisions with Multi-Step Goals
4.1 Profiled Loss and Marginal Loss
In multi-step portfolio analysis, the decision faced at time is to choose only. The future weights are involved in the initial specification of the joint loss function in order to weigh expected fluctuations in risk and costs up to the target horizon From the viewpoint of Bayesian decision theory, this is perfectly correct in the context of the actual decision faced if the approach is understood to be minimizing
Joint optimization over to deliver the actionable vector is Bayesian decision analysis with this implied loss as a function of alone.
The emulation framework provides an approach to computation, but also now suggests an alternative loss specification. With emulating synthetic joint density , minimizing the loss above is equivalent to profiling out the future hypothetical vectors by conditioning on their (joint) modal values. It is then natural to consider the alternative of marginalization over ; that is, define the implied marginal loss function as
Call the profiled loss function and the marginal loss function.
In general, the resulting optimal vectors (profiled) and (marginal) will differ. A key exception is the case of the quadratic loss function and normal synthetic models of Section 2 where the joint posterior is multivariate normal. In that case, joint modes are joint means, whose elements are marginal means, i.e., The situation is different in cases of non-normal emulating models, such based on the Laplace forms. These are now considered further for comparisons of marginal and profile approaches.
4.2 Computing Marginal Portfolios under Laplace Loss
Return to the Laplace loss framework of Sections 3.1 and 3.2 (i.e., the extended Laplace context with ) with sum-to-one constraints. Here the key issues of profiled versus marginal losses are nicely illustrated. Similar features arise in the extended Laplace loss context of Section 3.3, but with no new conceptual or practical issues so details of that extension are left to the reader. The FFBS/EM algorithm easily computes the optimal profile portfolio but it does not immediately extend to evaluating the optimal marginal portfolio . Of several approaches explored, the most useful is based on Markov Chain Monte Carlo (MCMC) analysis of the synthetic DLM, coupled with iterative, gradient-based numerical evaluation of the mode of the resulting Monte Carlo approximation to the required marginal density function. Summary details are given here and further explored in application in Section 5.
The density is the margin under the full joint posterior of where is the vector of step ahead latent scales. The FFBS/EM approach is enabled by the nice analytic forms of implied conditional posteriors; these also enable MCMC analysis in this conditionally normal DLM with uncertain scale factors. This approach is nowadays standard and easily implemented (e.g. West and Harrison 1997, chapt. 15; Prado and West 2010, sect. 4.5). Now the FFBS is exploited to generate backward sampling, rather than the backward smoothing that evaluates posterior modes. At each MCMC iterate, FFBS applies to simulate one draw of the full trajectory of states from the retrospective posterior conditional on current values of the latent scales. Then, conditional on this state trajectory, the conditional posterior is simulated to draw a new sample of the latent scales. In the emulating model of eqn. (11) this second step involves a set of conditionally independent univariate draws, each from a specific GIG (generalized inverse Gaussian) distribution. Applying the sum-to-one constraint on each vector changes this structure for the however, and direct sampling of the is then not facile. To address this, we define a Metropolis-Hastings extension for these elements to allow use of the constraint. Summary details of this, and of MCMC convergence diagnostics related to the real-data application in Section 5, are given in Appendix A.
The MCMC generate samples indexed by superscript , for some chosen sample size The Rao-Blackwellized Monte Carlo approximation to the required margin for is then
Importantly, this is the density of a mixture of normals: each conditional in the sum is the implied normal margin in the DLM defined by conditioning values of latent scales, with moments trivially computable via FFBS (using backward smoothing), and the density values are easily evaluated at any Thus the portfolio optimization problem reduces to mode-finding in a mixture of multivariate normals, and there are a number of numerical approaches to exploit for this. The most effective is really one of the simplest– a Newton-type updating rule based on the first order derivative of the density, with repeat searches based on multiple initial values for numerical iterates. Relevant candidate initial values can be generated by evaluating the mixture at each of the normal component means, and selecting some of those with highest mixture density. Further details are noted in Appendix A.
5 Studies in FX and Commodity Price Portfolios
Evaluation of multi-step portfolios uses data on daily returns of financial series: exchange rates of 10 international currencies (FX) relative to the US dollar, two commodities and two asset market indices; see Table 1. The time series runs from August 8, 2000 to December 30, 2011. An initial period of this data is used for exploratory analysis, followed by formal sequential filtering using a multivariate dynamic model, as noted below. The main interest in portfolio evaluation is then explored over the period of days from January 1, 2009 to December 30, 2011.
|Australian Dollar||AUD||Swiss Franc||CHF|
|Japanese Yen||JPY||New Zealand Dollar||NZD|
|Canadian Dollar||CAD||Norwegian Kroner||NOK|
|South African Rand||ZAR||Oil price||OIL|
5.2 Forecasting Model
Forecasts are generated from a time-varying, vector auto-regressive model of order 2 (TV-VAR(2), e.g. Primiceri 2005; Nakajima and West 2013b), with dynamic dependence network structure (DDN, Zhao et al. 2016). Exploratory analysis of the first 500 observations is used to define the sparsity structure of the dynamic precision matrix for the TV-VAR innovations, i.e., a sparse representation of multivariate volatility, following examples in the above references. From day 501, the analysis is run sequentially in time, updating and forecasting each day. The DDN structure enables analytic filtering and one-step forecasting; forecasting multiple steps ahead in a TV-VAR with DDN structure is performed by direct simulation into the future. For each day during the investment period, the model generates multiple-step ahead forecast mean vectors and variance matrices, , given as Monte Carlo averages of forecast trajectories of the return vectors simulated at time We take days as the portfolio horizon, and reset the time index so that represents the start of the investment period, January 1, 2009. Appendix B provides detailed discussion of the DDN model, use of exploratory training data, filtering and simulation-based forecasting.
5.3 Parameters and Metrics
Comparisons use various values of portfolio parameters in the quadratic/normal and Laplace loss frameworks. In all cases, we take the target return schedule to be constant, with representing daily return targets of , annualized (261 trading days) to about Then, we have for rather than strictly enforcing the constraint by so that the interim targets are “soft” rather than strictly enforced. The initial portfolio is the myopic Markowitz portfolio for comparison. Parameters define relative weights of the four components of loss. In a long-term context (e.g., when indexes months or more) some use of discounting into the future becomes relevant. For example, we may take may be chosen to increase with , but to decrease with to more aggressively target the soft targets as approaches given the accurate and reliable long-term predictions. In short-term contexts, such as with our day context, this is not relevant, so we take constant weights Setting loses no loss of generality, as the remaining three weights are relative to Examples use various values of to highlight their impact on portfolio outcomes. Larger values of reduce the penalty for risk in terms of overall portfolio variance; larger values of leads to more volatile portfolio dynamics due to reduced penalties on transaction costs; and larger values of reduce shrinkage of portfolio weights, also relaxing the penalties on shorting.
Portfolios are compared in several ways, including realized returns. With a fixed transaction cost of time optimal portfolio vector and realized return vector cumulative return from the period is In our examples, we compare cases with and We also compare our multi-step portfolios with the standard one-step/myopic Markowitz approach– naturally expected to yield higher cumulative returns with no transaction costs as it then generates much more volatile changes in portfolio weights day-to-day. Our portfolios constraining turnover are naturally expected to improve this in terms of both stability of portfolios and cumulative return in the presence of practically relevant, non-zero . Additional metrics of interest are portfolio “risk” as traditionally measured by the realized portfolio standard deviations , and patterns of volatility in trajectories of optimized portfolio weights over time.
5.4 Normal Loss Portfolios
First examples use the normal loss framework of Section 2 with Figure 1 shows trajectories over time of optimized portfolio weight vectors using and as well as those from the standard, myopic Markowitz analysis that corresponds to We see the increased smoothness of changes as decreases; at the trajectories (not shown) are almost constant.
Figure 2 plots trajectories of cumulative returns for three normal loss portfolios and 10,000) and for the Markowitz analysis. Markowitz and larger normal loss portfolios performs best– in this metric– with no transaction costs; but the Markowitz approach is very substantially out-performed by the smoother, multi-step portfolios under even very modest transaction costs Smaller induces portfolios more robust to transaction costs. Of note here is that, during 2009 following the financial crisis, portfolios with larger benefit as they are less constrained in adapting; but, later into 2010 and 2011, portfolios with lower are more profitable as they define ideal allocations with less switching and therefore save on transaction costs.
Figure 3 shows trajectories of realized SDs of optimized portfolios, i.e. over time, for each of the portfolios in Figure 2; also plotted is the theoretical lower bound trajectory from the myopic, one-step, minimum variance portfolio. Less constrained portfolios with larger have lower standard deviations, approaching those of the Markowitz portfolio while also generating smoother changes in portfolio weights and higher cumulative returns. Thus, these portfolios are improved in these latter metrics at the cost of only modest increases in traditional portfolio “risk.” Interestingly, the relationship between and realized standard deviations is not monotone; we see larger standard deviations in the case of than with , the latter, very low value yielding an almost constant portfolio over time that, in this study, turns out to control risk at a level not matched by modestly more adaptive portfolios.
5.5 Extended Laplace Loss Portfolios
We explore similar graphical summaries from analyses using the extended Laplace loss framework of Section 3.3, and with sum-to-one constraints. Figure 4 shows optimal weight trajectories with and (this change of is only in this example), and with the same level of penalization of turnover and absolute weights, i.e., We see expected effects of the two types of shrinkage– of changes in weights and in weights themselves. First, the hard shrinkage of changes induces much less switching in portfolio allocation over time, with longish periods of constants weights on subsets of equities. This occurs even with larger where the portfolio becomes more volatile and similar to the Markowitz case. Second, the penalty on absolute weights themselves, and implicitly on short positions as a result in this context of sum-to-one weights, yields trajectories that are basically non-negative on all equities over time. The joint optimization drives some of the weights exactly to zero at some periods of time, indicating a less than full portfolio over these periods. Furthermore, it is evident that there are periods where some of the weights– while not zero– are shrunk to very small values, so that a practicable strategy of imposing a small threshold would yield sparser portfolios– i.e., a “soft” sparsity feature. Values favor more stability/persistence in the portfolio allocations, and we see more “stepwise” allocation switches rather than more volatile turnover. Conversely, more aggressively favors no-shorting and encourages “soft” sparsity of allocations, resulting in dynamically switching portfolio weights over, generally, fewer assets.
Figure 5 plots trajectories of cumulative returns for three extended Laplace loss portfolios to show variation with the value of together with one highly adaptive normal loss portfolio and the Markowitz analysis, for and fixed. Again we compare cases with transaction cost and 0.001. As with normal loss comparisons, all multi-step cases dominate the traditional Markowitz analysis under even modest transaction costs. In addition, we now see the ability of the increasingly constrained multi-step Laplace portfolios to outperform unconstrained– and hence more volatile– Markowitz as well as multi-step normal loss portfolios even when transactions costs are zero or ignored. Then, cumulative returns with are essentially uniformly dominated by those with and , regardless of the existence of transaction costs in this example. This suggests values of smaller than or comparable to to appropriately balance the two degrees of shrinkage while maintaining relevant returns. One reason for this is the encouragement towards less volatile swings in weights to larger negative/positive values and towards no-shorting as part of that, features that can lead to increased risk and transaction costs.
5.6 Comparison of Profiled and Marginal Loss Approaches
We now discuss some analysis summaries related to the discussion of profiled and marginal losses of Section 4. As discussed in Section 4.2 we do this in the Laplace loss framework of Sections 3.1 and 3.2 (i.e., with in the extended context). First, Figure 6 shows optimal weight trajectories with and comparing the profiled Laplace loss weights of Section 3.2 with the marginal Laplace loss weights of Section 4. Both strategies generate positive weights on the JPY, GBP and CAD FX rates, with a number of the other assets having quite small weights for longer periods of time, while the weights under profiled loss vary more widely to higher absolute values. We see constant weights for long periods on adaptively updated subsets of assets using the profiled weights, as expected; these trajectories are effectively smoothed-out and shrunk towards zero under the marginal weights. The latter do not exhibit the exact zero values that the former can, as we now understand is theoretically implied by our representation via the emulating statistical model: marginal modes will not be exactly at zero even when joint modes have subsets at zero.
Figure 7 plots trajectories of cumulative returns for both profiled and marginal portfolios in each of the cases with and Without transaction cost, the profiled and marginal portfolios are similarly lucrative whatever the value of , whereas the profiled portfolios show greater differences. In contrast, both approaches are more substantially impacted by transaction costs and in a similar way; the cumulative return performance of portfolios decreases drastically in the presence of transaction costs. Not shown here, portfolios with smaller values define far more stable weights while resulting in very similar cumulative returns under both profiled and marginal strategies, as the resulting portfolio weights are very stable over time; this extends this observation as already noted in the normal loss context in Section 5.4.
The marginal strategy tends to be less sensitive to than the profiled strategy, suggesting relevance in a “conservative” investment context with respect to loss function misspecification. Even with quite widely varying , resulting marginal loss portfolios will be more stable, and far less susceptible to substantial changes and potential deterioration in terms of cumulative returns, than profiled loss portfolios.
6 Additional Comments
The selected illustrations in our application to financial time series highlight key features and benefits of our Bayesian emulation approach to computing optimal decisions, as well as our development of new multi-step portfolio decision analysis. Versions of the Laplace loss functions generate multi-step portfolios that consistently outperform traditional myopic approaches, both with and without transaction costs; they define psychologically and practically attractive framework for investors concerned about portfolio stability over multiple periods with defined targets. Examples show the opportunity– through appropriate selection of loss function parameters– for resulting portfolios to cushion the impacts of economically challenging times for the market, and enhance recovery afterwards, as highlighted in the examples using FX, commodity and market index data over 2009-2012. In addition to showcasing the application of the concept of “Bayesian emulation for decisions”, our interest in multi-step portfolios also highlights the central question of optimization for one-step decisions in the multi-step view; while specific numerical methods using dynamic programming might be tuned and customized to a specific loss function in this context, the Bayesian emulation approach opens up new approaches and suggests new avenues for development. As we begin in our discussion of marginal versus profiled loss functions, there is now opportunity to drive some part of the research agenda from synthetic statistical models as a starting point, exploring and evaluating the implied loss functions. One specific, current direction linked to this is to define classes of non-normal, state-space models with skewed innovation/error distributions that induce asymmetric loss functions; a key idea here is to use discrete/mean-scale mixtures of normals for the innovation/error distributions, so maintaining the ability to use MCMC coupled with FFBS/EM methods for mode-finding while generating a very rich class of resulting loss functions. One key and desirable feature of the latter, in particular, is to represent high penalties on portfolio short-fall relative to moderate or expected gains. This direction, and others opened-up by the “Bayesian emulation for decisions” approach, offers potential for impact on research frontiers in statistics and decision theory as well as application in financial portfolio development and other areas.
- Aguilar and West (2000) O. Aguilar and M. West. Bayesian dynamic factor models and portfolio allocation. Journal of Business and Economic Statistics, 18:338–357, 2000.
- Andrews and Mallows (1974) D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society (Series B: Methodological), 36:99–102, 1974.
- Brodie et al. (2009) J. Brodie, I. Daubechies, C. De Mol, D. Giannone, and I. Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences, 106:12267–12272, 2009.
- Dempster et al. (1977) A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society (Series B: Methodological), 39:1–38, 1977.
- Ekin et al. (2014) T. Ekin, N. G. Polson, and R. Soyer. Augmented Markov chain Monte Carlo simulation for two-stage stochastic programs with recourse. Decision Analysis, 11:250–264, 2014. doi: 10.1287/deca.2014.0303.
- Godsill et al. (2001) S. J. Godsill, A. Doucet, and M. West. Maximum a posteriori sequence estimation using Monte Carlo particle filters. Annals of the Institute of Statistical Mathematics, 53:82–96, 2001.
- Godsill et al. (2004) S. J. Godsill, A. Doucet, and M. West. Monte Carlo smoothing for non-linear time series. Journal of the American Statistical Association, 99:156–168, 2004.
- Gruber and West (2016a) L. F. Gruber and M. West. GPU-accelerated Bayesian learning in simultaneous graphical dynamic linear models. Bayesian Analysis, 11:125–149, 2016. doi: 10.1214/15-BA946.
- Gruber and West (2016b) L. F. Gruber and M. West. Bayesian forecasting and scalable multivariate volatility analysis using simultaneous graphical dynamic models. Technical Report, Duke University, 2016. arXiv:1606.08291.
- Jacquier and Polson (2011) E. Jacquier and N. G. Polson. Bayesian methods in finance. In John F. Geweke, Gary Koop, and Herman Van Dijk, editors, The Oxford Handbook of Bayesian Econometrics, chapter 9, pages 439–512. Oxford University Press, 2011. doi: 10.1093/oxfordhb/9780199559084.001.0001.
- Jagannathan and Ma (2003) R. Jagannathan and T. Ma. Risk reduction in large portfolios: Why imposing the wrong constraints helps. The Journal of Finance, 58:1651–1684, 2003.
- Markowitz (1991) H. M. Markowitz. Portfolio Selection: Efficient Diversification of Investments. Wiley, New York, 2nd edition, 1991. Reprinted several times from 1959 original.
- Müller (1999) P. Müller. Simulation based optimal design. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6, pages 459–474. Oxford University Press, 1999.
- Nakajima and West (2013a) J. Nakajima and M. West. Bayesian dynamic factor models: Latent threshold approach. Journal of Financial Econometrics, 11:116–153, 2013. doi: 10.1093/jjfinec/nbs013.
- Nakajima and West (2013b) J. Nakajima and M. West. Bayesian analysis of latent threshold dynamic models. Journal of Business & Economic Statistics, 31:151–164, 2013. doi: 10.1080/07350015.2012.747847.
- Nakajima and West (2015) J. Nakajima and M. West. Dynamic network signal processing using latent threshold models. Digital Signal Processing, 47:6–15, 2015. doi: 10.1016/j.dsp.2015.04.008.
- Nakajima and West (2016) J. Nakajima and M. West. Dynamics and sparsity in latent threshold factor models: A study in multivariate EEG signal processing. Brazilian Journal of Probability and Statistics, to appear, 2016. arXiv:1606.08292.
- Park and Casella (2008) T. Park and G. Casella. The Bayesian lasso. Journal of the American Statistical Association, 103:681–686, 2008.
- Polson and Tew (2000) N. G. Polson and B. V. Tew. Bayesian portfolio selection: An empirical analysis of the S&P 500 Index 1970-1996. Journal of Business and Economic Statistics, 18:164–73, 2000.
- Prado and West (2010) R. Prado and M. West. Time Series: Modeling, Computation and Inference. Chapman and Hall/CRC Press, 2010.
- Primiceri (2005) G. E. Primiceri. Time varying structural vector autoregressions and monetary policy. The Review of Economic Studies, 72:821–852, 2005.
- Quintana (1992) J. M. Quintana. Optimal portfolios of forward currency contracts. In J. O. Berger, J. M. Bernardo, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics IV, pages 753–762. Oxford University Press, 1992.
- Quintana et al. (2003) J. M. Quintana, V. Lourdes, O. Aguilar, and J. Liu. Global gambling. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, and M. West, editors, Bayesian Statistics VII, pages 349–368. Oxford University Press, 2003.
- Quintana et al. (2010) J. M. Quintana, C. M. Carvalho, J. Scott, and T. Costigliola. Futures markets, Bayesian forecasting and risk modelling. In A. O’Hagan and M. West, editors, The Oxford Handbook of Applied Bayesian Analysis, chapter 14, pages 343–365. Oxford University Press, 2010.
- Viterbi (1967) A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13:260–269, 1967. doi: 10.1109/TIT.1967.1054010.
- West (1984) M. West. Outlier models and prior distributions in Bayesian linear regression. Journal of the Royal Statistical Society (Series B: Methodological), 46:431–439, 1984.
- West (1987) M. West. On scale mixtures of normal distributions. Biometrika, 74:646–648, 1987.
- West and Harrison (1997) M. West and P. J. Harrison. Bayesian Forecasting and Dynamic Models. Springer Verlag, 2nd edition, 1997.
- Zhao et al. (2016) Z. Y. Zhao, M. Xie, and M. West. Dynamic dependence networks: Financial time series forecasting & portfolio decisions (with discussion). Applied Stochastic Models in Business and Industry, 32:311–339, 2016. doi: 10.1002/asmb.2161. arXiv:1606.08339.
- Zhou et al. (2014) X. Zhou, J. Nakajima, and M. West. Bayesian forecasting and portfolio decisions using dynamic dependent sparse factor models. International Journal of Forecasting, 30:963–980, 2014. doi: http://dx.doi.org/10.1016/j.ijforecast.2014.03.017.
Appendix A Appendix: Mode Searching for Marginal Laplace
a.1 Gibbs Sampler and Maximization of Mixture of Normal Densities
To construct the approximate density in eqn. (16), we need to sample from the (joint) posterior of the model in eqn. (11). The Gibbs sampler for this model has components related to those of Bayesian lasso regression (Park and Casella 2008), but now in the extended context of dynamic models using FFBS methods. The MCMC proceeds over iterations as follows:
Sample each from its generalized inverse Gaussian (GIG)111The density of is , complete conditional posterior distribution, independently across and
Sample using FFBS.
For later use, record the means and variances of the marginal normal posterior generated by the above FFBS analysis.
With the samples and the by-products , the Monte Carlo approximation to is
The next step is mode-finding in this mixture of normals. Modes satisfy
with and We iterate this fixed-point equation to compute approximate modes, with the strategy for multiple “global” starting values as noted in the main paper. Normal mixtures can exhibit multiple modes, and our starting values– using means prioritized by the resulting values of – explicitly address this by defining a set of “spanning” mode searches.
a.2 Sum-to-One Constraint
Note that, when the sum-to-one constraint is imposed on the original model by setting , then the full conditional of the is no longer a product of univariate GIG distributions. To sample each , we therefore use a novel, independence chain Metropolis-Hastings algorithm. Here the product of the initial GIG distributions in the unconstrained model is used as the obvious proposal distribution. Acceptance probabilities involve singular normal densities based on the generalized inverses where . The acceptance probability of each proposed sample conditional on its previous value and all other parameters is where and . We observe in our empirical studies and the application of the paper, in particular, that the acceptance probability is generally very high– typically around 98%.
We note also that, due to sum-to-one constraint, the conditional, variance matrices are rank-deficient, being of rank . The generalized inverse in eqn. (17) are based on singular value decompositions in this iterative numerical solver for the modes of the normal mixture.
Appendix B Appendix: Dynamic Dependence Network Models
For time series analysis and forecasting, we adapt the framework of dynamic dependence network models (DDNMs) introduced in Zhao et al. (2016). This model framework builds on prior work in multivariate dynamic modelling and innovates in bringing formal and adaptive Bayesian model uncertainty analysis to parsimonious, dynamic graphical structures of real practical relevance to financial (and other) forecasting contexts. Specific classes of DDNMs represent both lagged and cross-sectional dependencies in multivariate, time-varying autoregressive structures, with an ability to adapt over time to dynamics in cross-series relationships that advances the ability to characterize changing patterns of feed-forward relationships and of multivariate volatility, and to potentially improve forecasts as a result.
Denote the vector of assets by ; in our application, is the vector of log prices of the financial assets. The DDNM extension of a TV-VAR(2) model represents via
where , is a matrix of time-varying intercept auto-regressive coefficients, is a time-varying, lower triangular matrix with diagonal zeros, and with time-varying univariate volatilities on the diagonal. The model can be written element-wise as
where is the -th row of , is the parental set of series defined as the indices of -th row of with non-zero elements, and and are the corresponding subvectors with elements of and -th row of . The state parameters are assumed to follow normal random walks with a discount factor method applied to define the state evolution variance matrices as is standard in univariate DLMs (West and Harrison 1997, chap. 6). The observational variance is modeled as a gamma-beta stochastic volatility process over time, again based on standard DLM methodology (West and Harrison 1997, sect. 10.8). Sparsity of the parental sets defines patterns of zeros below the diagonal in . This in turn defines the sparsity structure of the implied residual precision matrix; by inversion, the conditional precision matrix of given the past values and all dynamic parameters is which has the form of sparse Cholesky decomposition when is sparse. If the level of sparsity in parental sets is high, then this precision matrix will also have zeros in some of the off-diagonal elements, representing conditional independencies in the innovations; since elements of and are time-varying, these conditional independencies represent an underlying dynamic graphical model for the innovations.
Given the parental sets , the sequential, forward filtering analysis of the multivariate DDNM partitions into a parallel set of univariate models with standard, analytic computation of on-line prior-to-posterior updating and one-step ahead forecasting. For forecast distributions more than one-step ahead, simulation methods are used as the unknown future observations () are required as conditional predictors. Direct simulation from the exact predictive distributions is easily implemented recursively, as detailed in Zhao et al. (2016)
As in much of our past work in practical financial time series forecasting, we apply the DDNM to log prices in the vector , and returns are then inferred. For univariate series with price at time the return is with . Similar relationships define the -step ahead returns at any time. Our portfolio analyses require predictive mean vectors and variance matrices of returns, which can be directly computed by transformation of the predictive samples of log prices.
The DDNM requires specification of the parental sets . We choose these based on exploratory analysis of preliminary data over first 500 days. Filtering and forecasting with the defined DDNM then run from day 501, redefined as in the formal sequential analysis. This exploratory analysis runs full models over the first 500 days, i.e., DDNMs using for each Then, we simply compute the Cholesky decomposition of the posterior mean of and threshold its off-diagonal elements using a threshold of those elements exceeding the threshold in row define the parental set (with, of course, that we adopt for the forward filtering and forecasting analysis from then on. The choice of the threshold is naturally important here; a higher threshold yields sparser parental sets and hence sparser matrices. Exploratory analysis on the first 500 days is used to explore and evaluate this, and guide the choice informally. With a very low threshold, forecasts of returns in this training period tends to have very narrow credible intervals but show substantial biases, especially in multi-step ahead prediction. Higher thresholds– consistent with increased sparsity– lead to wider credible intervals but less adaptive models. On a purely exploratory basis, we chose as a “sweet-spot” balancing forecast mean accuracy and uncertainty. This– ad-hoc but practically rationale– exploratory analysis defined a relevant, specific DDNM for use here. The resulting parental sets are displayed in Table 2.
Our results are based on the resulting DDNM with additional parameters as follows: for the normal DLM state evolutions, we use discount factor of 0.98 for each series, and 0.97 for residual stochastic volatilities. Direct simulation of multi-step ahead predictive distributions used a Monte Carlo sample size of 50,000.
|JPY||GBP EUR CAD AUD|
|GLD||GBP ZAR CAD CHF|
|S&P||GBP EUR NOK CAD AUD NZD|
|NSD||AUD JPY CHF S&P|