QCNN: Quantile Convolutional Neural Network

QCNN: Quantile Convolutional Neural Network

Abstract

A dilated causal one-dimensional convolutional neural network architecture is proposed for quantile regression. The model can forecast any arbitrary quantile, and it can be trained jointly on multiple similar time series. An application to Value at Risk forecasting shows that QCNN outperforms linear quantile regression and constant quantile estimates.

1 Introduction

Convolutional neural networks have shown great results in time series forecasting. However, the applications so far, as time series forecasting in general, focused mainly on predicting the mean. This article presents a convolutional neural network for forecasting quantiles.

2 Qcnn

This section describes the proposed QCNN model.

2.1 CNNs for Time Series Forecasting

Convolutional neural networks use slided local receptive fields to find local features in the input data. It enables them to model certain data types particularly well, for example, images (with strong 2D structure) or time series (with strong 1D structure). Variables spatially or temporarily nearby are often correlated, so we should take advantage of the topology of the inputs [LeCun et al., 1995]. This is what convolutional neural networks can do very neatly.
CNNs are most often associated with images, yet they might be applied to one-dimensional data as well. Even the earliest convolutional architectures were applied to the time domain [Waibel et al., 1995]. Yet, several notable time series forecasting convolutional architectures were just proposed recently.
Recurrent neural networks (LSTMs and GRUs) are often considered the best or even the optimal neural network architectures for sequential data (including time series). However, Bai et al. [2018] compared generic recurrent and convolutional architectures on various sequence modeling tasks, and found that the latter might be a better choice.
Learning long temporal dependencies is a challenging task for CNNs, but dilations [Yu and Koltun, 2015] can help. A dilated convolution means a convolution with holes, that is, the convolutional filter is enlarged by skipping a few points. The layer’s dilation rate can be set to in order to allow an exponential growth in the effective receptive field: by increasing the number of layers, we can exponentially increase the time horizon that the network can see. The convolution should also be causal, meaning that outputs can never depend on future inputs.
The WaveNet model uses dilated causal convolutions for generating audio waveforms [Oord et al., 2016]. Borovykh et al. [2017] used an adaptation of the WaveNet for time series forecasting, and found it an easy to implement and time-efficient alternative to recurrent networks.
QCNN is a one-dimensional dilated causal convolutional network with an appropriately chosen quantile loss function.

2.2 Quantile Regression

While simple least squares regressions estimate the conditional mean of a given variable, quantile regressions estimate conditional quantiles [Koenker and Hallock, 2001]. This requires a new objective: . Mean squared error targets the mean, this targets arbitrary -quantiles. Thus, we should just change loss function.
Denote our convolutional neural network by , taking as arguments the input values and trainable parameters . Now we can use the quantile loss function to turn our one-dimensional causal dilated CNN model into QCNN (1).

(1)

2.3 Training

The loss function can be minimized using any proven optimizer algorithm (e.g., stochastic gradient descent or its variants).
Neural networks provide greater flexibility than simple linear models. However, a single time series might not be enough to exploit this flexibility. Thus, we may expect QCNN to work better when trained jointly on a set of similar time series.

3 Empirical Study

This section presents an application of QCNN.

3.1 Value at Risk Forecasting

Value at Risk () is an important risk measure in finance. It aims to find a realistically worst-case outcome in the sense that anything worse happens with a given (small) probability [Shin, 2010]. is the worst loss over a horizon that will not be exceeded with a given level of confidence [Jorion, 2000]. It is reported as a positive number, by convention.
Mathematically, the of with a confidence level can be defined as the quantile function of at (2).

(2)

can be estimated in many ways. A usual procedure estimates the return variance, and assumes a normal (or t) distribution to compute the quantile estimates. However, these assumptions are often invalid. Efforts have been made to relax them, see, for example, Hull and White [1998] or Glasserman et al. [2002]. One of the great advantages of the quantile regression approach is that it does not require any such distributional assumptions.
Quantile regression methods are often applied to Value at Risk forecasting. Engle and Manganelli [1999] proposed the CAViaR (Conditional Value at Risk By Quantile Regression) model. Taylor [2000] applied a quantile regression neural network approach to estimation. Chen and Chen [2002] found that the quantile regression approach outperforms the variance-covariance approach. Taylor [2007] developed an exponentially weighted quantile regression. White et al. [2015] proposed a vector autoregressive extension to quantile models. Xu et al. [2016] applied a quantile autoregression neural network (QARNN) as a flexible generalization of quantile autoregression [Koenker and Xiao, 2006]. Yan et al. [2018] used a long short-term memory neural network in a quantile regression framework to learn the tail behavior of financial asset returns. Just to mention a few notable studies.
deals with rare events, and rare events produce small data. The higher confidence level we choose, the fewer loss events we have to learn from. Data volume is thus a crucial issue in Value at Risk forecasting. It is, therefore, beneficial to use several stocks’ data to learn to forecast . More stocks have experienced more extreme events, and we may expect them to have similar sources. If they do so, then a joint -forecasting model might outperform individual models.

3.2 Data

Our stock market dataset was obtained from Kaggle1. We have randomly chosen 100 stocks listed on NASDAQ, NYSE, and AMEX in the 10-year period under study (2009-01-01 to 2018-12-31). Daily logarithmic returns were computed and fed to the algorithms. The last 30% of each time series was used as a test set, that is, about the last 3 years.

3.3 Models and Experiments

VaRs are forecasted for 3 confidence levels (95%, 99%, and 99.9%), using 4 different methods:

  • a constant quantile estimate,

  • a linear quantile regression,

  • a QCNN,

  • a joint QCNN trained on all available training data.

The QCNN network contains 6 causal convolutional layers (each with 8 filters of kernel size 2, with activation, and exponentially increasing dilation rates), and a convolutional layer with a single filter of kernel size 1 and a linear activation.
The inputs are overlapping 128-step sequences extracted from the time series of stock returns. The output sequences are the inputs shifted by 1 step, so that at any time we predict one day ahead.
The dataset was scaled by subtracting the mean and dividing by the standard deviation, and it was fed to the algorithms in 128-batches. The QCNN models were trained for 128 epochs, using the adadelta [Zeiler, 2012] optimizer.
The linear model is an autoregression using 4 lags of the target variable.
The constant quantile estimate was computed using linear interpolation (3).

(3)

3.4 Evaluation

We apply the Dynamic Quantile test of Engle and Manganelli [1999]. A variable (4) is constructed, which takes the value at exceedances, and takes else, and so has an expected value of zero. This variable is regressed on (5), which may contain ’s past lags, the Value at Risk (and its lags), and possibly further variables. This regression should have no explanatory power, so we test the hypothesis . Applying the central limit theorem, we can construct an asymptotically chi-square distributed test statistic (6). denotes the number of input variables, denotes the time steps.

(4)
(5)
(6)

This test is simple and easy to implement. Here we use , and 3 lags of as input variables.
The average estimated values and exceedance rates are also reported for a more detailed assessment of model performance.

3.5 Results

The means, medians, and standard deviations of exceedances, the average rejection rates of the DQ test at two different significance levels, and the average values are reported in Tables 1, 2, and 3. Zero-exceedance forecasts are assigned a 0 p-value for the DQ test. The DQ test results are not reported for the 99.9% level.

Exceedances DQ Test Rejections VaR
Mean Median SD 0.01 0.05 Mean
Constant 0.0399 0.0358 0.0291 0.5300 0.6600 0.0727
QR 0.0409 0.0344 0.0293 0.6200 0.7200 0.0689
QCNN 0.1269 0.1245 0.0362 0.7400 0.8000 0.0423
Joint QCNN 0.0433 0.0444 0.0101 0.0500 0.1600 0.0583
Table 1: 95% VaR forecasts
Exceedances DQ Test Rejections VaR
Mean Median SD 0.01 0.05 Mean
Constant 0.0084 0.0061 0.0117 0.3800 0.4100 0.1935
QR 0.0097 0.0066 0.0125 0.5400 0.5900 0.1852
QCNN 0.0599 0.0576 0.0280 0.9500 0.9600 0.0676
Joint QCNN 0.0115 0.0119 0.0056 0.0900 0.1600 0.1023
Table 2: 99% VaR forecasts
Exceedances VaR
Mean Median SD Mean
Constant 0.0023 0.0007 0.0052 0.3764
QR 0.0043 0.0013 0.0075 0.3630
QCNN 0.0249 0.0212 0.0186 0.1079
Joint QCNN 0.0023 0.0013 0.0026 0.2062
Table 3: 99.9% VaR forecasts

The single-stock QCNN overshot the targeted -exceedance rates, in some cases by orders of magnitude. All other methods produced more conservative, and seemingly more reliable forecasts. Even the constant estimate produced exceedance rates close to the desired levels. However, the joint QCNN achieved similar exceedance accuracy with lower average estimates. It means that while a simple historical quantile estimate might be enough to set a budget against a realistically worst-case outcome, the QCNN gives a cheaper solution. Also, the joint QCNN produced exceedance rates with consistently lower standard deviation than the benchmark methods. The Dynamic Quantile test is rejected for much fewer stocks in case of the joint QCNN than any other method, which also justifies that this one produces the highest quality estimates.
The experiments were repeated for the previous 10 years’ data (1999-2008) with a different set of randomly chosen stocks, and the results were quite similar.

4 Summary

This article proposed a one-dimensional convolutional neural network architecture for quantile regression. The model takes as input a series of observations, and directly makes one-step ahead forecasts for arbitrary quantiles. It is essentially a convolutional extension of previously proposed quantile regression models. The QCNN model was applied to Value at Risk forecasting, and produced better estimates than the benchmark models. The model seems most applicable when there are multiple similar time series to train on.

Footnotes

  1. https://www.kaggle.com/qks1lver/amex-nyse-nasdaq-stock-histories

References

  1. Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
  2. Anastasia Borovykh, Sander Bohte, and Cornelis W Oosterlee. Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017.
  3. Mei-Yuan Chen and Jau-Er Chen. Application of quantile regression to estimation of value at risk. Review of Financial Risk Management, 1(2):15, 2002.
  4. Robert F Engle and Simone Manganelli. Caviar: conditional value at risk by quantile regression. Technical report, National Bureau of Economic Research, 1999.
  5. Paul Glasserman, Philip Heidelberger, and Perwez Shahabuddin. Portfolio value-at-risk with heavy-tailed risk factors. Mathematical Finance, 12(3):239–269, 2002.
  6. John Hull and Alan White. Value at risk when daily changes in market variables are not normally distributed. Journal of derivatives, 5:9–19, 1998.
  7. Philippe Jorion. Value at risk. 2000.
  8. Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15(4):143–156, 2001.
  9. Roger Koenker and Zhijie Xiao. Quantile autoregression. Journal of the American Statistical Association, 101(475):980–990, 2006.
  10. Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
  11. Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
  12. Hyun Song Shin. Risk and liquidity. Oxford University Press, 2010.
  13. James W Taylor. A quantile regression neural network approach to estimating the conditional density of multiperiod returns. Journal of Forecasting, 19(4):299–311, 2000.
  14. James W Taylor. Using exponentially weighted quantile regression to estimate value at risk and expected shortfall. Journal of financial Econometrics, 6(3):382–406, 2007.
  15. Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and Kevin J Lang. Phoneme recognition using time-delay neural networks. Backpropagation: Theory, Architectures and Applications, pages 35–61, 1995.
  16. Halbert White, Tae-Hwan Kim, and Simone Manganelli. Var for var: Measuring tail dependence using multivariate regression quantiles. Journal of Econometrics, 187(1):169–188, 2015.
  17. Qifa Xu, Xi Liu, Cuixia Jiang, and Keming Yu. Quantile autoregression neural network model with applications to evaluating value at risk. Applied Soft Computing, 49:1–12, 2016.
  18. Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, and Qi Wu. Parsimonious quantile regression of financial asset tail dynamics via sequential learning. In Advances in Neural Information Processing Systems, pages 1575–1585, 2018.
  19. Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
  20. Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
386936
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description