QCNN: Quantile Convolutional Neural Network
Abstract
A dilated causal onedimensional convolutional neural network architecture is proposed for quantile regression. The model can forecast any arbitrary quantile, and it can be trained jointly on multiple similar time series. An application to Value at Risk forecasting shows that QCNN outperforms linear quantile regression and constant quantile estimates.
1 Introduction
Convolutional neural networks have shown great results in time series forecasting. However, the applications so far, as time series forecasting in general, focused mainly on predicting the mean. This article presents a convolutional neural network for forecasting quantiles.
2 Qcnn
This section describes the proposed QCNN model.
2.1 CNNs for Time Series Forecasting
Convolutional neural networks use slided local receptive fields to find local features in the input data. It enables them to model certain data types particularly well, for example, images (with strong 2D structure) or time series (with strong 1D structure). Variables spatially or temporarily nearby are often correlated, so we should take advantage of the topology of the inputs [LeCun et al., 1995]. This is what convolutional neural networks can do very neatly.
CNNs are most often associated with images, yet they might be applied to onedimensional data as well. Even the earliest convolutional architectures were applied to the time domain [Waibel et al., 1995]. Yet, several notable time series forecasting convolutional architectures were just proposed recently.
Recurrent neural networks (LSTMs and GRUs) are often considered the best or even the optimal neural network architectures for sequential data (including time series). However, Bai et al. [2018] compared generic recurrent and convolutional architectures on various sequence modeling tasks, and found that the latter might be a better choice.
Learning long temporal dependencies is a challenging task for CNNs, but dilations [Yu and Koltun, 2015] can help. A dilated convolution means a convolution with holes, that is, the convolutional filter is enlarged by skipping a few points. The layerâs dilation rate can be set to in order to allow an exponential growth in the effective receptive field: by increasing the number of layers, we can exponentially increase the time horizon that the network can see. The convolution should also be causal, meaning that outputs can never depend on future inputs.
The WaveNet model uses dilated causal convolutions for generating audio waveforms [Oord et al., 2016]. Borovykh et al. [2017] used an adaptation of the WaveNet for time series forecasting, and found it an easy to implement and timeefficient alternative to recurrent networks.
QCNN is a onedimensional dilated causal convolutional network with an appropriately chosen quantile loss function.
2.2 Quantile Regression
While simple least squares regressions estimate the conditional mean of a given variable, quantile regressions estimate conditional quantiles [Koenker and Hallock, 2001]. This requires a new objective: . Mean squared error targets the mean, this targets arbitrary quantiles. Thus, we should just change loss function.
Denote our convolutional neural network by , taking as arguments the input values and trainable parameters . Now we can use the quantile loss function to turn our onedimensional causal dilated CNN model into QCNN (1).
(1) 
2.3 Training
The loss function can be minimized using any proven optimizer algorithm (e.g., stochastic gradient descent or its variants).
Neural networks provide greater flexibility than simple linear models. However, a single time series might not be enough to exploit this flexibility. Thus, we may expect QCNN to work better when trained jointly on a set of similar time series.
3 Empirical Study
This section presents an application of QCNN.
3.1 Value at Risk Forecasting
Value at Risk () is an important risk measure in finance. It aims to find a realistically worstcase outcome in the sense that anything worse happens with a given (small) probability [Shin, 2010]. is the worst loss over a horizon that will not be exceeded with a given level of confidence [Jorion, 2000]. It is reported as a positive number, by convention.
Mathematically, the of with a confidence level can be defined as the quantile function of at (2).
(2) 
can be estimated in many ways. A usual procedure estimates the return variance, and assumes a normal (or t) distribution to compute the quantile estimates. However, these assumptions are often invalid. Efforts have been made to relax them, see, for example, Hull and White [1998] or Glasserman et al. [2002]. One of the great advantages of the quantile regression approach is that it does not require any such distributional assumptions.
Quantile regression methods are often applied to Value at Risk forecasting. Engle and Manganelli [1999] proposed the CAViaR (Conditional Value at Risk By Quantile Regression) model. Taylor [2000] applied a quantile regression neural network approach to estimation. Chen and Chen [2002] found that the quantile regression approach outperforms the variancecovariance approach. Taylor [2007] developed an exponentially weighted quantile regression. White et al. [2015] proposed a vector autoregressive extension to quantile models. Xu et al. [2016] applied a quantile autoregression neural network (QARNN) as a flexible generalization of quantile autoregression [Koenker and Xiao, 2006]. Yan et al. [2018] used a long shortterm memory neural network in a quantile regression framework to learn the tail behavior of financial asset returns. Just to mention a few notable studies.
deals with rare events, and rare events produce small data. The higher confidence level we choose, the fewer loss events we have to learn from. Data volume is thus a crucial issue in Value at Risk forecasting. It is, therefore, beneficial to use several stocksâ data to learn to forecast . More stocks have experienced more extreme events, and we may expect them to have similar sources. If they do so, then a joint forecasting model might outperform individual models.
3.2 Data
Our stock market dataset was obtained from Kaggle
3.3 Models and Experiments
VaRs are forecasted for 3 confidence levels (95%, 99%, and 99.9%), using 4 different methods:

a constant quantile estimate,

a linear quantile regression,

a QCNN,

a joint QCNN trained on all available training data.
The QCNN network contains 6 causal convolutional layers (each with 8 filters of kernel size 2, with activation, and exponentially increasing dilation rates), and a convolutional layer with a single filter of kernel size 1 and a linear activation.
The inputs are overlapping 128step sequences extracted from the time series of stock returns. The output sequences are the inputs shifted by 1 step, so that at any time we predict one day ahead.
The dataset was scaled by subtracting the mean and dividing by the standard deviation, and it was fed to the algorithms in 128batches. The QCNN models were trained for 128 epochs, using the adadelta [Zeiler, 2012] optimizer.
The linear model is an autoregression using 4 lags of the target variable.
The constant quantile estimate was computed using linear interpolation (3).
(3) 
3.4 Evaluation
We apply the Dynamic Quantile test of Engle and Manganelli [1999]. A variable (4) is constructed, which takes the value at exceedances, and takes else, and so has an expected value of zero. This variable is regressed on (5), which may contain ’s past lags, the Value at Risk (and its lags), and possibly further variables. This regression should have no explanatory power, so we test the hypothesis . Applying the central limit theorem, we can construct an asymptotically chisquare distributed test statistic (6). denotes the number of input variables, denotes the time steps.
(4) 
(5) 
(6) 
This test is simple and easy to implement. Here we use , and 3 lags of as input variables.
The average estimated values and exceedance rates are also reported for a more detailed assessment of model performance.
3.5 Results
The means, medians, and standard deviations of exceedances, the average rejection rates of the DQ test at two different significance levels, and the average values are reported in Tables 1, 2, and 3. Zeroexceedance forecasts are assigned a 0 pvalue for the DQ test. The DQ test results are not reported for the 99.9% level.
Exceedances  DQ Test Rejections  VaR  
Mean  Median  SD  0.01  0.05  Mean  
Constant  0.0399  0.0358  0.0291  0.5300  0.6600  0.0727 
QR  0.0409  0.0344  0.0293  0.6200  0.7200  0.0689 
QCNN  0.1269  0.1245  0.0362  0.7400  0.8000  0.0423 
Joint QCNN  0.0433  0.0444  0.0101  0.0500  0.1600  0.0583 
Exceedances  DQ Test Rejections  VaR  
Mean  Median  SD  0.01  0.05  Mean  
Constant  0.0084  0.0061  0.0117  0.3800  0.4100  0.1935 
QR  0.0097  0.0066  0.0125  0.5400  0.5900  0.1852 
QCNN  0.0599  0.0576  0.0280  0.9500  0.9600  0.0676 
Joint QCNN  0.0115  0.0119  0.0056  0.0900  0.1600  0.1023 
Exceedances  VaR  
Mean  Median  SD  Mean  
Constant  0.0023  0.0007  0.0052  0.3764 
QR  0.0043  0.0013  0.0075  0.3630 
QCNN  0.0249  0.0212  0.0186  0.1079 
Joint QCNN  0.0023  0.0013  0.0026  0.2062 
The singlestock QCNN overshot the targeted exceedance rates, in some cases by orders of magnitude. All other methods produced more conservative, and seemingly more reliable forecasts. Even the constant estimate produced exceedance rates close to the desired levels. However, the joint QCNN achieved similar exceedance accuracy with lower average estimates. It means that while a simple historical quantile estimate might be enough to set a budget against a realistically worstcase outcome, the QCNN gives a cheaper solution. Also, the joint QCNN produced exceedance rates with consistently lower standard deviation than the benchmark methods. The Dynamic Quantile test is rejected for much fewer stocks in case of the joint QCNN than any other method, which also justifies that this one produces the highest quality estimates.
The experiments were repeated for the previous 10 yearsâ data (19992008) with a different set of randomly chosen stocks, and the results were quite similar.
4 Summary
This article proposed a onedimensional convolutional neural network architecture for quantile regression. The model takes as input a series of observations, and directly makes onestep ahead forecasts for arbitrary quantiles. It is essentially a convolutional extension of previously proposed quantile regression models. The QCNN model was applied to Value at Risk forecasting, and produced better estimates than the benchmark models. The model seems most applicable when there are multiple similar time series to train on.
Footnotes
 https://www.kaggle.com/qks1lver/amexnysenasdaqstockhistories
References
 Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
 Anastasia Borovykh, Sander Bohte, and Cornelis W Oosterlee. Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017.
 MeiYuan Chen and JauEr Chen. Application of quantile regression to estimation of value at risk. Review of Financial Risk Management, 1(2):15, 2002.
 Robert F Engle and Simone Manganelli. Caviar: conditional value at risk by quantile regression. Technical report, National Bureau of Economic Research, 1999.
 Paul Glasserman, Philip Heidelberger, and Perwez Shahabuddin. Portfolio valueatrisk with heavytailed risk factors. Mathematical Finance, 12(3):239–269, 2002.
 John Hull and Alan White. Value at risk when daily changes in market variables are not normally distributed. Journal of derivatives, 5:9–19, 1998.
 Philippe Jorion. Value at risk. 2000.
 Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15(4):143–156, 2001.
 Roger Koenker and Zhijie Xiao. Quantile autoregression. Journal of the American Statistical Association, 101(475):980–990, 2006.
 Yann LeCun, Yoshua Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.
 Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
 Hyun Song Shin. Risk and liquidity. Oxford University Press, 2010.
 James W Taylor. A quantile regression neural network approach to estimating the conditional density of multiperiod returns. Journal of Forecasting, 19(4):299–311, 2000.
 James W Taylor. Using exponentially weighted quantile regression to estimate value at risk and expected shortfall. Journal of financial Econometrics, 6(3):382–406, 2007.
 Alexander Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and Kevin J Lang. Phoneme recognition using timedelay neural networks. Backpropagation: Theory, Architectures and Applications, pages 35–61, 1995.
 Halbert White, TaeHwan Kim, and Simone Manganelli. Var for var: Measuring tail dependence using multivariate regression quantiles. Journal of Econometrics, 187(1):169–188, 2015.
 Qifa Xu, Xi Liu, Cuixia Jiang, and Keming Yu. Quantile autoregression neural network model with applications to evaluating value at risk. Applied Soft Computing, 49:1–12, 2016.
 Xing Yan, Weizhong Zhang, Lin Ma, Wei Liu, and Qi Wu. Parsimonious quantile regression of financial asset tail dynamics via sequential learning. In Advances in Neural Information Processing Systems, pages 1575–1585, 2018.
 Fisher Yu and Vladlen Koltun. Multiscale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122, 2015.
 Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.