Stock Prices Prediction using Deep Learning Models
Financial markets have a vital role in the development of modern society. They allow the deployment of economic resources. Changes in stock prices reflect changes in the market. In this study, we focus on predicting stock prices by deep learning model. This is a challenge task, because there is much noise and uncertainty in information that is related to stock prices. So this work uses sparse autoencoders with one-dimension (1-D) residual convolutional networks which is a deep learning model, to de-noise the data. Long-short term memory (LSTM) is then used to predict the stock price. The prices, indices and macroeconomic variables in past are the features used to predict the next day’s price. Experiment results show that 1-D residual convolutional networks can de-noise data and extract deep features better than a model that combines wavelet transforms (WT) and stacked autoencoders (SAEs). In addition, we compare the performances of model with two different forecast targets of stock price: absolute stock price and price rate of change. The results show that predicting stock price through price rate of change is better than predicting absolute prices directly.
©20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Stock time series forecast is one of the main challenges for machine learning technology because the time series analysis is required . Two methods are usually used to predict financial time series: machine learning models and statistical methods .
Statistical methods can be used to predict a financial time series. The common methods are autoregressive conditional heteroscedastic (ARCH) methods , and autoregressive moving average (ARMA)  or an autoregressive integrated moving average (ARIMA) methods. However, traditional statistical methods generally assume that the stock time series pertains to a linear process, and model the generation process for a latent time series to forecast future stock prices . However, a stock time series is generally a dynamic nonlinear process .
Many machine learning models can capture nonlinear characters in data without prior knowledge . These models are always used to model a financial time series. The most commonly used models for stock forecasts are artificial neural networks (ANN), support vector machines (SVM), and hybrid and ensemble methods. Artificial neural networks have found many applications in business because they can deal with data that is non-linear, non-parametric, discontinuous or chaotic for a stock time series . Support vector machine is a statistical machine learning model that is widely applied for pattern recognition. A SVM model, which learns by minimizing the risk function and the empirical error and regularization terms has been derived to minimize the structural risk . Box et el. presented a revised least squares (LS)-SVM model and predicted movements in the Nasdaq Index after training with satisfactory results .
Deep learning models, which are an extension of ANN s, have seen recent rapid development. Many studies use deep learning to predict financial time series. For example, Ting et al. used a deep convolutional neural network to forecast the effect of events on stock price movements . Bengio et al. used long-short term memory (LSTM) to predict stock prices .
This study addresses the problem of noise in a stock time series. Noise and volatile features in a stock price forecast are major challenges because they hinder the extraction of useful information . A stock time series can be considered as waveform data, so the technology from communication electronics such as wavelet transform is pertinent. Bao et al. used a model that combines wavelet transform and stacked autoencoder (SAE) to de-noise a financial time series . This study de-noises data using an autoencoder [16, 5] with a convolutional resident neural network (Resnet) . This is an adaptive method to reduce noise and dimension for time sequences. It is different from wavelet transforms in that the kernel of the convolutional neural network adapts to dataset automatically, so it can more effectively eliminate noise and retain useful information. The experiments use the CSI 300 index, the Nifty 50 index, the Hang Seng index, the Nikkei 225 index, the S&P 500 index and the DJIA index are performed and the results are compared with those for . The proposed model gives more accurate predictions, as measured by mean absolute percent error (MAPE), Theil U and the linear correlation between the predicted prices and the real prices. We do both the experiments on predicting stock price directly and on predicting price rate of change and calculating the price indirectly. We found that the latter can achieve better accuracy. Predicting future price indirectly can be seen as adding prior knowledge to improve model performance.
The remainder of this paper has five sections. The next section draws the background knowledge of market analysis. Section III details a little experiment about the property of de-noising CNN. Section IV details the structure for the proposed model with sparse autoencoders and LSTM. Section V describes the features and data resources for the experiment and details the experiment, and analyzes the results of the experiment. The last section draws conclusions.
Understanding the behaviors of the market in order to improve the decisions of investors is the main purpose of market analysis. Several market attributes and features that are related to stock prices time series have been studied. Depending on the market factors that are used, market analysis can be divided into two categories: fundamental and technical analysis .
Technical analysis often only uses historical prices as market characters to identity the pattern of price movement. Studies assume that the relative factors are incorporated in the movement of the market price and that history will repeat itself. Some investors used technical approaches to predict stock prices with great success . However, the Efficient Market Hypothesis  assumes that all available factors are already incorporated in the prices so only new information affects the movement of market prices, but new information is unpredictable.
Fundamental analysis assumes that the related factors are the internal and external attributes of a company. These attributes include the interest rate, product innovation, the number of employees, the management policy and etc . In order to improve the prediction, other information such as the exchange rate, public policy, the Web and financial news are used as features. Nassirtoussi et al. used news headlines as features to predict the market . Twitter sentiment was used in  to improve predictions.
In 1995, one study showed that 85% of responders depend on fundamental analysis and technical analysis . Technical analysis is more useful for short-term forecasting so it is pertinent to high frequency trading. Lui et al. showed that technical analysis better forecasts turning points than trends, but fundamental analysis gives a better prediction of trends .
Depending on the prediction target, tasks can be classified as regression task or classification tasks. For a regression task the prediction target for the model is the future price, and a classification task model predicts the rise or fall of the stock prices. If the predicted price is higher than the current price, the recommended strategy is to buy, and vice versa. This is the buy-and-sell trading strategy, which is widely used in studies . If the task is to identify the rise or fall in the price, then the resultant strategy is obvious. Market analysis is also used for recommendation systems. Huang et al. used SVR to predict the return of each stock and to select stocks with the highest profit margins (top 10, 20 and 30) to calculate the profit margin .
This study uses technical analysis to predict the stock price for the next day. Sparse autoencoders with 1-D convolution networks and prior knowledge are used to give a more accurate prediction than other techniques.
Iii De-noising CNN
To create a 1-D convolutional neural network for sequence analysis, a single neural network can be combined with a convolutional neural network with LSTM. When the features for the input are extracted at a high-level by the convolution layer, the price is directly predicted by the LSTM layer. During training, the gradient propagates back to convolution through the LSTM layer. However, if there is too much noise in the data, this model tends to over-fit.
A notional problem is used to compare the model with a single neural network. The model uses the features after de-noising. The task is a bias prediction task, in which each data point corresponds to a function, . The target is to predict the value of in this function, which is sampled from a uniform distribution, . Here is the feature vector for the data, where , is the size of sequence. Two types of noises are then added to the features. The first type is Gaussian noise, . The form of another noise is written as , where is sampled from the uniform distribution, , is sampled from the 0-1 distribution with possibility and is the weight of this noise. This noise has multiple peaks that interfere with prediction. Figure (a)a shows the training curves for both models. The red and green lines are the respective training curves for the model that combines CNN with LSTM and uses the features after de-noising. The solid and dashed lines respectively represent the training loss and the test loss. In Figure (b)b the dotted curves indicate the loss gap. When the training loss decreases, the loss gap for the model grows slower than that for the single neural network. The minimum test loss for the proposed model is less than that for the single neural network. It is obvious that de-noising features prevents over-fitting for the model.
The noise for stock forecasting is much more complex than the noise for this notional task, so in this study the noise in the stock forecast data is reduced first using 1-D convolution autoencoders. The details of the features of the 1-D CNN autoencoders processes are given. In Figure 2, the yellow dots denote the rebuilt curve for the sine function. The red curve is the global true, which is the sine function curve without noise, and the green dots are feature points with noise. The ordinate axis represents the specific feature value. Each point represents an element input for the model. It is obvious that curve for the yellow dots is smoother than that for the green dots and it is close to the real curve.
The values for the weights in the convolutional kernel are shown in Figure 3, which is for the model with minimal test loss. The values for the weights in the convolutional kernel are also smoother than those for a single neural network (see Figure 3). However, the sine function is smoother than the noise, so the kernel in the single network is more likely to match the noise than the 1-D convolution autoencoders. This model tries to establish a relationship between the noise and the label. In fact, the noise and the label are irrelevant, so it is more prone to over-fitting.
In order to extract high-level abstract features and predict future prices from the stock time series, we apply two models in our system, one deep model is used for de-noising and another is used for prediction. The prediction process involves three steps:(1) data preprocessing that involves calculating technical indicators, clipping and normalizing features, (2) encoding and decoding features using a 1-D ResNet block to minimize the rebuilt loss and (3) using the LSTM to deal with high-level abstract features and give a one-step-ahead output.
Figure 4 shows the overall framework. The input feature of data sequence is a matrix, where is the number of channels, and is the length of sequence. Daily trading data, technical indicators and macroeconomic variables are the matrices of data sequence with size , and . After preprocessing, we merge them into one matrix with size , so the inputted data sequence has 17 channels. The prices are then predicted by LSTM after the noise and dimension have been reduced by the encoder model.
Iv-a Sparse autoencoders
Sparse autoencoders are models that can reduce the dimension. An autoencoder neural network is used to rebuild the input (see Figure 5). The loss function, which is used to train autoencoder neural network, is given by [16, 5]
where is the number of data points, denotes the feature vector for the sample and denotes the reconstructed feature vector for the sample. The last term is the sparse penalty term and is the weight. The sparse penalty, which is a kind of regularization, is used to make most units of network tend to non-activity state in order to reduce over-fitting. This is the difference between sparse autoencoders and traditional autoencoders. The sparse penalty is given by ,
where is the sparse parameter, is the number of units in the hidden layer and , is the unit in the hidden layer, . Weight decay is also used to reduce out-fitting of the model. After training, only the features from the middle layer of the network are used (see Figure 5).
The model for the sparse autoencoders [16, 5] is a 1-D CNN. This is used to compare the performance of WT and CNN in terms of de-noising stock time series data. A convolution network is used as the encoding network, and a deconvolution network is the decoded network , so the model used in SAEs is a fully convolutional network. The autoencoder’s function is not only to reduce noise, but also to reduce the dimensions of the features, in order to allow the latter network structure to use a smaller number of weights. The CNN applied here is the ResNet , which is a type of convolutional neural network used to speed up the training by using a “shortcut connections”  to back-propagate gradient.
Iv-B Long-short term memory
LSTM is a type of recurrent neural network (RNN)  that can be used to transfer information from the past to the present. However, the structure of a RNN has a defect that can cause the gradient to vanish or explode when the input series are too long. The problem of the gradient exploding is generally solved by gradient clipping
where represents the gradient of a parameter. The problem of the gradient vanishing is solved by using the structure of the LSTM. A LSTM differs from a conventional RNN in that the LSTM has another memory that transfers its state to the next state without matrix multiplication and operation of activation function, so the gradient is back-propagated smoothly . The details of the LSTM are shown in Figure 6. The left part of figure is the structure of the LSTM unit. The dotted arrows in the figure indicate the indirect effects. At each step, all the and gates receive the last state and the new feature, and then the cell state and the hidden state are updated at time t, and the input for the unit is the last state vector for the cell (), the hidden last state vector () and the input feature (). The four vectors are
where , and is the new information that is used to update the cell state, and and are respectively used to select information that is to be added to cell state or be forgotten,
where denotes element-wise multiplication. The term is used to select the output and the hidden state,
The experiments compare the accuracy of the proposed method with that of a deep learning framework  for the CSI 300 index, the DJIA index, the Hang Seng index, the Nifty 50 index, the Nikkei 225 index and the S&P500 index. Similar to a previous study , more than one market is used. The predictive accuracy is evaluated by MAPE, Theil U and the linear correlation between the prediction and the real price [14, 19, 1, 11]. The data is divided into different groups for training and testing, in order to reduce the time span.
Two experiments test the performance of the two methods: (1) a 1-D resnet autoencoder is used to predict prices (called C1D-LSTM) and (2) a 1-D resnet autoencoder is used to predict the rate of change of prices (called C1D-ROC). The accuracy of the models is compared and the prediction curve for one year is plotted.
V-a Data descriptions
Data resource. The data resource is following a previous study  from the Figshare website. The data was sampled from the WIND(http://www.wind.com.cn) and CSMAR(http://www.gtarsc.com) databases of the Shanghai Wind Information Co., Ltd and the Shenzhen GTA Education Tech. Ltd, respectively. The stock time series is from Jul. 2008 to Sep. 2016 (see Table I).
Data features. Following a previous study , three sets of features are selected as the inputs. The first set is the trading data for the past, including Opening, Closing, High, and Low prices and trading volume. In Table II, , and respectively denote the closing price, the low price and the high price at time t. The second set includes the technical indicators that are widely used for stock analysis. Their calculation method is shown in Table II, where , and respectively denote the double exponential moving average for and , where and respectively denote the highest high price and the lowest low price in the range. The last set of features is the macroeconomic information. Stock prices are affected by many factors, so using the macroeconomic information as features can reduce uncertainty in the stock prediction. The US dollar index and the Interbank offered rate for each market are the third set of features.
Data divide. The data is divided to train multiple models. Each model is trained using past data, and the training data and test data cannot be randomly sampled from the dataset because it is irrational. To predict future stock prices, only data from the past can be used. The greater the time interval between the two stock time series data, the smaller is the correlation between them; so using outdated data does not improve performance. In order to take into account the above reason and to simplify the result, the forecast is divided into 6 years; and each year is from Oct. to Sep. (see Table I).
|MACD||Moving Average Convergence|
|CCI||Commodity channel index|
|ATR||Average true range|
|BOLL||Bollinger Band MID|
|EMA20||20 day Exponential Moving Average|
|MA5/MA10||5/10 day Moving Average|
|MTM6/MTM12||6/12 month Momentum|
|ROC||Price rate of change|
|SMI||Stochastic Momentum Index|
|WVAD||Williams’s Variable Accumulation/Distribution|
The experiments use MAPE,the linear correlation between the predicted price and the real price and Theil U to evaluate the model. These are defined as
where and respectively denote the predictive price for the proposed model and the actual price on day , and and respectively denote their average values. MAPE is a measure of the relative error in the average values. R is the correlation coefficient for two variables and describes the linear correlation between them. A large value for R means that the forecast is close to the actual value. Theil U is also called the uncertainty coefficient and is a type of association measure. A smaller value for MAPE and Theil U denotes greater accuracy.
V-C Predictive accuracy test
Tables III-VIII show that a 1-D CNN gives slightly better results than WSAEs. This shows that the convolutional network is effective in processing stock data, which is a model that can adaptively de-noise the noisy data and can reduce the dimensionality. Markets with higher predicted errors are almost the same for both two models. Moreover, the CSI 300 index, the HangSeng Index and the Nifty 50 index are more difficult to be predicted than the DJIA index and the S&P500 Index.
In some individual cases, more closer between predicted and actual prices does not mean that there is a higher prediction accuracy. However, the average for different years shows that the prediction accuracy and the linear correlation are positively correlated.
|Panel B.Correlation coefficient|
|Panel C.Theil U|
|Panel B.Correlation coefficient|
|Panel C.Theil U|
|Panel B.Correlation coefficient|
|Panel C.Theil U|
|Panel B.Correlation coefficient|
|Panel C.Theil U|
|Panel B.Correlation coefficient|
|Panel C.Theil U|
|Panel B.Correlation coefficient|
|Panel C.Theil U|
If past prices are used to predict future stock prices, predicting the rate of change of the price is also able to get the current prices. For most stock price series, the price scale is much larger than the rate of change. If the prediction target for the model is the absolute price, it is easy to ignore the information for price changes because changes in the price has a smaller effect on the loss than the absolute price. Tables III-VIII show that the model predicts prices indirectly through predicting the rate of change can get higher accuracy. This demonstrates that predicting the rate of change is a better way than to predict prices directly.
V-D Predictive curve
The predicted results for the first year for each market index are shown in Figure 7. The curve for C1D-ROC is closer to the actual curve than that for C1D-LSTM. The curve for C1D-LSTM occasionally deviates far from the actual price curve but that for the C1D-ROC does so only rarely. This demonstrates that future prices can be derived using the current price and price changes. The current input characteristics include the current price but it is difficult to fully preserve this feature in the input features for an autoencoder. If the change in the price is predicted directly and then inferred from the exact current value, the model can use the full information for the current price.
1-D ResNet sparse autoencoders are used to de-noise and reduce the dimensionality of data. A notional experiment is used to compare the performance of the model that uses features after de-noising and that of a single network with LSTM. The first method reduces over-fitting when there is a lot of noise in the data. The results of experiment show that the proposed method gives a more accurate prediction than WSAEs. This is the first contribution of this paper. Another contribution is that we add prior knowledge about the relationship between prices and the rate of change to the model to try to improve the performance, and the results of experiment show the conclusion that it is more accurate to use the rate of change to indirectly predict the price of stocks than to directly predict the price of stocks.
Future study will use an attention model  to improve the performance. This model assumes that the price for the next day is approximately related to the price for previous days. The attention model will be applied to express the relationship between the price for previous day and next day, which will give improved performance and result that are more easily interpreted.
-  (2005) Stock market forecasting: artificial neural network and linear regression comparison in an emerging market. Journal of Financial Management & Analysis 18 (2), pp. 18. Cited by: §V.
-  (2009) Surveying stock market forecasting techniques–part ii: soft computing methods. Expert Systems with Applications 36 (3), pp. 5932–5941. Cited by: §I.
-  (2018) ModAugNet: a new forecasting framework for stock market index value with an overfitting prevention lstm module and a prediction lstm module. Expert Systems with Applications 113, pp. 457–480. Cited by: §I.
-  (2017) A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLOS ONE 12 (7), pp. e0180944. Cited by: §I, §V-A, §V-A, TABLE II, §V.
-  (2007) Greedy layer-wise training of deep networks. In Advances in neural information processing systems, pp. 153–160. Cited by: §I, §IV-A, §IV-A.
-  (2015) Time series analysis: forecasting and control. John Wiley & Sons. Cited by: §I, §I.
-  (2016) Computational intelligence and financial markets: a survey and future directions. Expert Systems with Applications 55, pp. 194–211. Cited by: §II.
-  (2010) SVM application of financial time series forecasting using empirical technical indicators. In Information Networking and Automation (ICINA), 2010 International Conference on, Vol. 1, pp. V1–77. Cited by: §I.
-  (2017) Enhancing recurrent neural networks with positional attention for question answering. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993–996. Cited by: §VI.
-  (2015) Deep learning for event-driven stock prediction.. In Ijcai, pp. 2327–2333. Cited by: §I.
-  (2010) Forecasting nigerian stock exchange returns: evidence from autoregressive integrated moving average (arima) model. Ssrn Electronic Journal. Cited by: §V.
-  (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica: Journal of the Econometric Society, pp. 987–1007. Cited by: §I.
-  (1965) The behavior of stock-market prices. The journal of Business 38 (1), pp. 34–105. Cited by: §II.
-  (2014) A feature fusion based forecasting model for financial time series. PLOS ONE 9 (6), pp. e101113. Cited by: §V.
-  (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §I, §IV-A.
-  (2006) Reducing the dimensionality of data with neural networks. Science 313 (5786), pp. 504–507. Cited by: §I, §IV-A, §IV-A.
-  (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §IV-B.
-  (2016) Behavior recognition for humanoid robots using long short-term memory. International Journal of Advanced Robotic Systems 13 (6), pp. 1729881416663369. Cited by: §IV-B.
-  (2011) Forecasting stock markets using wavelet transforms and recurrent neural networks: an integrated system based on artificial bee colony algorithm. Applied Soft Computing 11 (2), pp. 2510–2525. Cited by: §V.
-  (2012) A hybrid stock selection model using genetic algorithms and support vector regression. Applied Soft Computing 12 (2), pp. 807–818. Cited by: §II.
-  (2013) Performance analysis of indian stock market index using neural network time series model. In Pattern Recognition, Informatics and Mobile Engineering (PRIME), 2013 International Conference on, pp. 72–78. Cited by: §I.
-  (2004) Neural network techniques for financial performance prediction: integrating fundamental and technical analysis. Decision Support Systems 37 (4), pp. 567–581. Cited by: §II.
-  (2012) Fluctuation prediction of stock market index by legendre neural network with random time strength function. Neurocomputing 83, pp. 12–21. Cited by: §I.
-  (1998) The use of fundamental and technical analyses by foreign exchange dealers: hong kong evidence. Journal of International Money and Finance 17 (3), pp. 535–545. Cited by: §II.
-  (2015) Text mining of news-headlines for forex market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications 42 (1), pp. 306–324. Cited by: §II.
-  Sparse autoencoder. Note: CS294A Lecture noteshttps://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf Cited by: §IV-A.
-  (2013) Machine learning in prediction of stock market indicators based on historical data and data from twitter sentiment analysis. In Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on, pp. 440–444. Cited by: §II.
-  (2011) CAST: using neural networks to improve trading systems based on technical analysis by means of the rsi financial indicator. Expert Systems with Applications 38 (9), pp. 11489–11500. Cited by: §II.
-  (2013) OBST-based segmentation approach to financial time series. Engineering Applications of Artificial Intelligence 26 (10), pp. 2581–2596. Cited by: §I.
-  (2001) Application of support vector machines in financial time series forecasting. Omega 29 (4), pp. 309–317. Cited by: §I.
-  (2012) A novel text mining approach to financial time series forecasting. Neurocomputing 83, pp. 136–145. Cited by: §I.
-  (2011) Forecasting stock indices with back propagation neural network. Expert Systems with Applications 38 (11), pp. 14346–14355. Cited by: §I.
-  (1999) Neural networks for technical analysis: a study on klci. International Journal of Theoretical and Applied Finance 2 (02), pp. 221–241. Cited by: §II.
-  (2010) Deconvolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §IV-A.