Evaluating Data Augmentation for Financial Time Series Classification
Abstract
Data augmentation methods in combination with deep neural networks have been used extensively in computer vision on classification tasks, achieving great success; however, their use in time series classification is still at an early stage. This is even more so in the field of financial prediction, where data tends to be small, noisy and nonstationary. In this paper we evaluate several augmentation methods applied to stocks datasets using two stateoftheart deep learning models. The results show that several augmentation methods significantly improve financial performance when used in combination with a trading strategy. For a relatively small dataset ( samples), augmentation methods achieve up to improvement in risk adjusted return performance; for a larger stock dataset ( samples), results show up to improvement.
Elizabeth Fons
AllianceBernstein, London, UK.
Department of Electrical and Computer Engineering, Aarhus University, Denmark
\ninept
{keywords}
Data augmentation, financial signal processing, stock classification, deep learning
1 Introduction
Time series classification is an important and challenging problem, that has garnered much attention as time series data is found across a wide range of fields, such as weather prediction, financial markets, medical records, etc. Recently, given the success of deep learning methods in areas such as computer vision and natural language processing, deep neural networks have been increasingly used for time series classification tasks. However, unlike in the case of image or text datasets, (annotated) time series datasets tend to be smaller in comparison, which often leads to poor performance on the classification task [10]. This is especially true of financial data, where a yearlong of stock price data may consist of only daily prices. [17]. Therefore, in order to be able to leverage the full potential of deep learning methods for time series classification, more labeled data is needed.
A common strategy to address this problem is use of data augmentation techniques to generate new sequences that cover unexplored regions of input space while maintaining correct labels, thus preventing overfitting and improving model generalization [15]. This practice has been shown to be very effective in other areas, but it is not an established procedure for time series classification [20] [9]. Moreover, most of the methods used are just adaptations of imagebased augmentation methods that rely on simple transformations, such as scaling, rotation, adding noise, etc. While a few data augmentation methods have been specifically developed for time series [10, 8], their effectiveness in the classification of financial time series has not been systematically studied.
Stock classification is a challenging task due to the high volatility and noise from the influence of external factors, such as global economy and investor’s behaviour [6]. An additional challenge is that financial datasets tend to be small; ten years of daily stock prices would include around samples, which would be insufficient to train even a small neural network (e.g. a singlelayer LSTM network with neurons has approximately parameters). In this work we perform a systematic analysis of multiple individual data augmentation methods on stock classification. To compare the different data augmentation methods, we evaluate them using two stateoftheart neural network models that have been used for financial tasks. As the usual purpose of stock classification tasks is to build portfolios, we compare the results of each method and each architecture by building simple rulebased portfolios and calculating the main financial metrics to assess performance of each portfolio. Finally, we analyse the combination of multiple data augmentation methods, by focusing on the best performing ones.
The contributions of the paper are as follows:

We provide the first, to the best of our knowledge, thorough evaluation of popular data augmentation methods for time series on the stock classification problem; we perform an indepth analysis of a number of methods on two stateoftheart neural network architectures using daily stock returns datasets.

We evaluate performance using traditional classification metrics. In addition, we build portfolios using a simple rulebased strategy and evaluate performance based on financial metrics.
The remainder of the paper is organized as follows: Section 2 overviews previous work on data augmentation; Section 3 describes the data augmentation methods used in our evaluations; Section 4 describes the experimental setup; Section 5 provides the experimental results; conclusions and future work are presented in Section 6.
2 Related work
Data augmentation has proven to be an effective approach to reduce overfitting and improve generalization in neural networks [5]. While there are several methods to reduce overfitting in neural networks, such as regularization, dropout and transfer learning, data augmentation tackles the issue from the root, i.e., by enriching the information related to the class distributions in the training dataset. Therefore, by assuming that more information can be extracted from the dataset through augmentations, it further has the advantage that it is a modelindependent solution [15].
In tasks such as image recognition, data augmentation is a common practice, and may be seen as a way of preprocessing the training set only [7]. For instance Krizhevsky et al. [13] used random cropping, flipping and changing image intensity in AlexNet, Simonyan et al. used scale jittering and flipping [16] on the VGG network. However, such augmentation strategies are not easily extended to timeseries data in general, due to the non i.i.d. property of the measurements forming each timeseries. Data augmentation has been applied to domainspecific time series data encoding information of natural phenomena with great success. Cui et al. [5] use stochastic feature mapping as a label preserving transformation for automatic speech recognition. Um et al. [19] test a series of transformationbased methods (many inspired directly by computer vision) on sensor data for Parkinson’s disease and show that rotations, permutations and time warping of the data, as well as combinations of those methods, improve test accuracy.
To date, little work has been done on studying the effect of data augmentation methods for financial data or developing methods specialized on financial timeseries. For regression tasks, Teng et al. [17] use a timesensitive data augmentation method for stock trend prediction, where data is augmented by corrupting highfrequency patterns of original stock price data as well as preserving lowfrequency ones in the frame of wavelet transformation. For stock market index forecasting, Yujin et al. [3] propose ModAugNet, a framework consisting of two modules: an overfitting prevention LSTM module and a prediction LSTM module.
3 Time Series Augmentation
Most cases of time series data augmentation correspond to random transformations in the magnitude and time domain, such as jittering (adding noise), slicing, permutation (rearranging slices) and magnitude warping (smooth elementwise magnitude change). In our analysis, the following methods were used for evaluation, and examples of these transformations are shown in Figure 1:
Magnify: a variation of window slicing proposed by Le Guennec et al [8]. In window slicing, a window of of the original time series is selected at random. Instead, we randomly slice windows between and of the original time series, but always from the fixed end of the time series (i.e. we slice the beginning of the time series by a random factor). Randomly selecting the starting point of the slicing would make sense in an anomaly detection framework, but not on a trend prediction as is our case. We interpolate the resulting time series to the original size in order to make it comparable to the other augmentation methods.
Reverse: the time series is reversed; hence a timeseries of the form is transformed to . This method is inspired by the flipping data augmentation process followed in computer vision.
Jittering: Gaussian noise with a mean and standard deviation is added to the time series [19].
Pool: Reduces the temporal resolution without changing the length of the time series by averaging a pooling window. We use a window of size . This method is inspired by the resizing data augmentation process followed in computer vision.
Quantise: the time series is quantised to a level set , therefore the difference between the maximum and minimum values of the time series is divided into levels, and the values in the time series are rounded to the nearest level [18]. We used .
Convolve: the time series is convolved with a kernel window. The size of the kernel is and the type of window is Hann.
Time Warping: the time intervals between samples are distorted based on a random smooth warping curve by cubic spline with four knots at random magnitudes [19].
Suboptimal warped time series generator (SPAWNER): SPAWNER [11] creates a time series by averaging two random suboptimally aligned patterns that belong to the same class. Following Iwana et al. [10], noise is added to the average with in order to avoid cases where there is little change.
For the methods Pool, Quantise, Convolve and Time warping we used the code from Arundo [1]
4 Methodology
4.1 Datasets
Full SP500 dataset: The data used in this study consists of the daily returns of all constituent stocks of the SP500 index, from to . It comprises trading days, and approximately stocks per day. We use the data preprocessing scheme from Krauss et al. [12], where the data is divided into splits of days, with a sliding window of days. Each split overlaps with the previous one by points, and a model is trained in each one, resulting in splits in total. Inside each of the 25 splits, the data is segmented into sequences consisting on time steps for each stock , with a sliding window of one day, as shown in Figure 2. The first days make up the training set, with the test set consisting of the last days. The training set has approximately 255K samples ((750240)*500) and the test set has approximately 125K samples.
The data is standardised by subtracting the mean of the training set () and dividing by the standard deviation (), i.e., , with the return of stock at time . We define the problem as a binary classification task, where the target variable for stock and date can take to values, 1 if the returns are above the daily median (trend up) and 0 if returns are below the daily median. This leads to a balanced dataset.
50 stocks dataset: In order to have a smaller dataset, we use the same preprocessing scheme but only for the largest stocks on the SP500 measured by market capitalization, on each data split. This leads to samples for training and for testing.
4.2 Augmentation
The training data (750 days) is divided into training and validation with a proportion . Before splitting the data, all samples are shuffled in order to make sure that all stocks and time steps are randomly assigned to train or validation. Each train set is augmented with 1X the original size.
4.3 Network architectures and training
We used two neural network architectures proposed in previous financial studies, optimizing the cross entropy loss:
LSTM: Following Krauss et al. [12], we train a single layer LSTM network with neurons, and a fully connected twoneuron output. We use a learning rate of , batch size and early stopping with patience with RMSProp as optimizer.
Temporal Logistic Neural BagofFeatures (TLoNBoF): we adapt the network architecture proposed by Passalis et al. [14] to forecast limit order book data. The original network was used on data samples of 15 time steps and 144 features so we adapt it for our univariate data of 240 time steps. It comprises an 1Dconvolution ( filters, kernel size ), a TLoNBoF layer (, ), a fullyconnected layer ( neurons) and a fullyconnected output layer of neurons. The initial learning rate is set to , the learning rate is decreased on plateau of the validation loss, batch size is and the optimizer is Adam.
4.4 Rulebased portfolio strategy and evaluation
In order to evaluate if data augmentation provides an improvement in asset allocation, we propose a simple trading strategy, following the conclusions of Krauss et al [12]. The trading rule on the full SP500 dataset is as follows: stocks in both classes are ranked daily by their predicted probability of belonging to that class, we then take the top and bottom stocks and build a longshort portfolio by equally weighting the stocks. Portfolios are analysed after transaction costs of 5 bps per trade.
On the 50stocks dataset, building a longshort portfolio would not be profitable as it consists of the largest US market cap stocks. So we only build a portfolio by going long on the top stocks [4]. In order to compare our methods with the performance of their stocks universe, we build a benchmark that consists of all 50 stocks weighted by their market cap. All portfolios are built including transaction costs.
We evaluate portfolio performance by calculating the Information ratio (IR), the ratio between excess return (portfolio returns minus benchmark returns) and tracking error (standard deviation of excess returns) [2]. We also calculate the downside information ratio, the ratio between excess return and the downside risk (variability of underperformance below the benchmark), that differentiates harmful volatility from total overall volatility.
5 Results
Tables 5 and 5 present the results obtained for each individual augmentation method and the combination of the most successful individual methods for the small 50 stock dataset using the LSTM and the TLoNBoF networks. For comparison, we also show the results without augmentation.
We also show classification metrics (accuracy and F1) over the 25 data splits expressed by the mean and standard deviation. In both models, the classification accuracy improvement is very small with respect to no augmentation, and for F1 as well. But we see that both the IR and DIR improve using several augmentation methods. Magnify and time warp methods are consistently good performers, as well as spawner. For the TLoNBoF, IR increases four times with respect of no method, and time warp on the LSTM model doubles the IR. We anticipated that the Reverse method would not be effective  and in both cases it decreases overall performance. Further, we note that he combination of two augmentation methods does not always improve performance.
Figures 3 and 4 show the cumulative profit over time (out of sample) of the models trained with different augmentation methods and the baseline (no augmentation). We focus on the most competitive techniques and for comparison, we add the benchmark calculated by the market weighted returns of the 50 constituent stocks. The top plots show the full history while the bottom plots show the last 10 years. Both models perform well over time, and cumulative profits of the models trained with augmentation are higher when compared to not using augmentation; however, only TLoNBoF is competitive on the most recent testing period (20072017), along with several of the augmentation methods. The LSTM model fluctuates around zero and does not improve with regards to the benchmark. Krauss et al. [12] observes that the edge of the LSTM method seems to have been arbitraged away in the latter years.
Table 5 presents the results obtained for each individual augmentation method and the combination of the most successful methods for the large SP500 dataset trained on the LSTM network. As the portfolios are longshort, they are marketneutral (therefore, the performance of the portfolio in independent of the performance of the market and no benchmark has to be subtracted). As with the small dataset, Magnify and Time warp show a strong performance in IR and DIR, as well as their combination. Jitter performs well in this dataset, but in the models trained on the small dataset, performance decreased so maybe in a larger dataset, the added noise helps with generalization, while in smaller data, diminishes the signal. The changes to the classification metrics are not significant.
6 Conclusions
Data augmentation is a ubiquitous technique to improve generalization in supervised learning. In this work, we have studied the impact of various data augmentation methods for time series on the stock classification problem. We have shown that even with very noisy datasets such as stocks returns, it is beneficial to use data augmentation to improve generalization. Magnify, Time warp and Spawner consistently improve both the Information ratio and downside information ratio on all models and datasets. On the small datasets, augmentation achieves up to fourtimes (TLoNBoF) and twotimes (LSTM) performance improvement on IR compared to no augmentation. On a larger dataset, as espected, improvement is not that sharp, but still it achieves an increment in IR of up to .
We tested the TLoNBoF network that has not previously been used on lowfreq stock data, and this network shows consistent positive returns over the last ten years of data, therefore, unlike the LSTM architecture, the profit has not been leveraged away.
Footnotes
 thanks: This work was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie SklodowskaCurie Grant Agreement no. 675044 (http://bigdatafinance.eu/), Training for Big Data in Financial Research and Risk Management. A. Iosifidis acknowledges funding from the Independent Research Fund Denmark project DISPA (Project Number: 904100004).
 https://arundotsaug.readthedocshosted.com/en/stable/
References
 (2020) TSAUG. GitHub. Note: \urlhttps://tsaug.readthedocs.io/en/stable/index.html Cited by: §3.
 (2012) Practical riskadjusted performance measurement. The Wiley Finance Series, Wiley. External Links: ISBN 9781118391525, LCCN 2012025787 Cited by: §4.4.
 (2018) ModAugNet: a new forecasting framework for stock market index value with an overfitting prevention lstm module and a prediction lstm module. Expert Systems with Applications 113, pp. 457 – 480. External Links: ISSN 09574174 Cited by: §2.
 (2015) Dissecting investment strategies in the cross section and time series. Econometric Modeling: Derivatives eJournal. Cited by: §4.4.
 (2015) Data augmentation for deep neural network acoustic modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (9), pp. 1469–1477. Cited by: §2, §2.
 (2018) Deep learning with long shortterm memory networks for financial market predictions. European Journal of Operational Research 270 (2), pp. 654–669. Cited by: §1.
 (2016) Deep learning. MIT Press. Cited by: §2.
 (2016) Data augmentation for time series classification using convolutional neural networks. proceedings In ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data, Cited by: §1, §3.
 (2020) An empirical survey of data augmentation for time series classification with neural networks. arXiv preprint arXiv:2007.15951. Cited by: §1.
 (2020) Time series data augmentation for neural networks by time warping with a discriminative teacher. In 2020 25th International Conference on Pattern Recognition (ICPR), Vol. . Cited by: §1, §1, §3.
 (201912) Data augmentation with suboptimal warping for timeseries classification. Sensors (Basel, Switzerland) 20 (1), pp. 98. Cited by: §3.
 (2017) Deep neural networks, gradientboosted trees, random forests: statistical arbitrage on the 500. European Journal of Operational Research 259 (2), pp. 689 – 702. External Links: ISSN 03772217 Cited by: §4.1, §4.3, §4.4, §5.
 (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger (Eds.), pp. 1097–1105. Cited by: §2.
 (2020) Temporal logistic neural bagoffeatures for financial time series forecasting leveraging limit order book data. Pattern Recognition Letters. Cited by: §4.3.
 (2019) A survey on image data augmentation for deep learning. Journal of Big Data 6 (1), pp. 60. External Links: ISBN 21961115 Cited by: §1, §2.
 (2015) Very deep convolutional networks for largescale image recognition. In International Conference on Learning Representations, Cited by: §2.
 (2020) Enhancing stock price trend prediction via a timesensitive data augmentation method. Complexity. Cited by: §1, §2.
 (2000) Temporal pattern recognition in noisy nonstationary time series based on quantization into symbolic streams. lessons learned from financial volatility trading.. Report Series SFB ”Adaptive Information Systems and Modelling in Economics and Management Science” Technical Report 46, SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business, Vienna. Cited by: §3.
 (2017) Data augmentation of wearable sensor data for parkinsonâs disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, ICMI ’17, pp. 216â220. External Links: ISBN 9781450355438 Cited by: §2, §3.
 (2020) Time series data augmentation for deep learning: a survey. ArXiv. Cited by: §1.