A Deep Learning Framework for News-oriented Stock Trend Prediction

Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction

Ziniu Hu Key Lab of High-Confidence Software Technology, MoEPeking University, China bull@pku.edu.cn Weiqing Liu Microsoft ResearchBeijingChina Weiqing.Liu@microsoft.com Jiang Bian Microsoft ResearchBeijingChina Jiang.Bian@microsoft.com Xuanzhe Liu Key Lab of High-Confidence Software Technology, MoEPeking University, China liuxuanzhe@pku.edu.cn  and  Tie-Yan Liu Microsoft ResearchBeijingChina Tie-Yan.Liu@microsoft.com

Stock trend prediction plays a critical role in seeking maximized profit from the stock investment. However, precise trend prediction is very difficult since the highly volatile and non-stationary nature of the stock market. Exploding information on the Internet together with the advancing development of natural language processing and text mining techniques have enabled investors to unveil market trends and volatility from online content. Unfortunately, the quality, trustworthiness, and comprehensiveness of online content related to stock market vary drastically, and a large portion consists of the low-quality news, comments, or even rumors. To address this challenge, we imitate the learning process of human beings facing such chaotic online news, driven by three principles: sequential content dependency, diverse influence, and effective and efficient learning. In this paper, to capture the first two principles, we designed a Hybrid Attention Networks (HAN) to predict the stock trend based on the sequence of recent related news. Moreover, we apply the self-paced learning mechanism to imitate the third principle. Extensive experiments on real-world stock market data demonstrate the effectiveness of our framework. A further simulation illustrates that a straightforward trading strategy based on our proposed framework can significantly increase the annualized return.

stock trend prediction; deep learning; text mining
journalyear: 2018copyright: acmcopyrightconference: WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining ; February 5–9, 2018; Marina Del Rey, CA, USAbooktitle: WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining , February 5–9, 2018, Marina Del Rey, CA, USAprice: 15.00doi: 10.1145/3159652.3159690isbn: 978-1-4503-5581-0/18/02

1. Introduction

In order for seeking maximized profit, stock investors continuously attempt to predict the future trends of market (Nuij2014An, ), which, however, is quite challenging due to highly volatile and non-stationary nature of the market (adam2016stock, ). Traditional efforts on predicting the stock trend have been carried out based on information from various fields. One of most basic ways relies on technical analysis upon recent prices and volumes on the market. Such methods yield the very limitations on unveiling the rules that govern the drastic dynamics of the market. Meanwhile, another basic method focuses on analyzing financial statements of each company, which is though incapable of catching the impact of recent trends.

With the rapid growth of the Internet, content from online media has indeed become a gold mine for investors to understand market trends and volatility. Even more, the advancing development of Natural Language Processing techniques has inspired increasing efforts on stock trend prediction by automatically analyzing stock-related articles. For instance, Tetlock et al. (Tetlock2007Giving, ) extracts and quantifies the optimism and pessimism of Wall Street Journal reports and observes that trading volume tends to increase after pessimism reports and high pessimism scored reports tend to be followed by a downtrend and a reversion of market prices.

Not surprisingly, the effectiveness of such textual analyses depends on the quality of target articles. For instance, comparing with reading a comprehensive report of a company from Wall Street, analyzing a simple declarative news about this company is less likely to produce an accurate prediction. Unfortunately, the quality, trustworthiness, and comprehensiveness of online content related to stock market vary drastically, and a large portion of the online content consists of the low-quality news, comments, or even rumors.

To address this challenge, we imitate the learning process of human beings facing such chaotic online news, which can be summarized into three principles:

  • Sequential Context Dependency: Even if a single piece of news is very likely to be of low-quality or not informative enough, human can comprehensively consider a sequence of related recent news as a unified context of the stock and consequently make a more reliable prediction on the subsequent stock trend. More importantly, within the unified sequential context, humans can pay different attention to various parts according to their respective importance and influence.

  • Diverse Influence: While the influence of different online news can be diverse greatly, human beings can discriminate them based on the intrinsic content. For instance, some breaking news (e.g. air crash, military conflict) will profoundly affect the trend of related stocks, whilst some useless comments or vague rumors may cause little disturb on the stock trends. In real-world investment, people tend to consciously and comprehensively consider the estimated impact of each news at the time of predicting the subsequent stock trend.

  • Effective and Efficient Learning: News cannot always provide obviously informative indications on stock trend, especially when the content of news contains vague information on stock trend or even there is very limited number of news about certain stocks in a period. In order for both effective and efficient learning, human beings tend to first gain knowledge by focusing on informative occasions, and then turn to disturbing evidence to obtain uneasy experience.

To capture the first two principles of human learning process, we design a Hybrid Attention Networks (HAN) to predict the stock trend based on the sequence of recent related news. First, to imitate the human cognition on sequential context with diverse attentions, we construct attention-based recurrent neural networks (RNN) at the higher level. In particular, the RNN structure enables the processing of recent related news for a stock in a unified sequence, and the attention mechanism is capable of identifying more influential time periods of the sequence. Second, to further model diverse influence of news, we propose news-level attention-based neural networks at the lower level, which aims at recognizing more important news from others at the same time point.

To imitate the effective and efficient learning of human, we employ the self-paced learning (SPL) (Kumar2011Self, ) mechanism. Since the news-based stock trend prediction is more challenging in some situations, SPL enables us to automatically skip those training samples from some challenging periods in the early stage of model training, and progressively increase the complexity of training samples. Self-paced learning mechanism can automatically choose the suitable training samples for different training stage, which enhanced the final performance of our framework.

To validate the effectiveness of our approach, in this paper, we performed extensive experiments on real-world data. Comparing with traditional approaches, the experiment results show that our framework can significantly improve the performance of stock trend prediction. Furthermore, we simulated the stock investment using a simple trading strategy based on our framework, and the results illustrate that a straightforward trading strategy can achieve much better annualized return than the baseline methods.

To sum up, the contributions of our work include:

  • A summarization of principles for imitating the learning process of human beings, particularly for stock trend prediction from the chaotic online news.

  • A Hybrid Attention Networks with self-paced learning for stock trend prediction, driven by principles of human learning process.

  • Experimental studies on real-world data with simulated investment performance based on the real stock market.

The rest of the paper is organized as follows. We introduce related work in Section 2. We present empirical analysis to reveal principles for designing news-oriented stock prediction framework in Section 3, based on which we propose a new deep learning framework with details in Section 4. Experimental setup and results are demonstrated in Section 5. We conclude the paper and point out future directions in Section 6.

2. Related Work

Stock trend prediction has attracted many research efforts due to its decisive role in stock investment. In general, traditional approaches can be categorized into two primary approaches: technical and fundamental analysis, according to the various types of information they mainly relied on.

Technical analysis deals with the time-series historic market-data, such as trading price and volume, and make predictions based on that. The main goal of this type of approach is to discover the trading patterns that can be leveraged for future prediction. One of the most widely used model in this direction is the Autoregressive (AR) model for linear and stationary time-series (Li2016AR, ). However, the non-linear and non-stationary nature of stock prices limits the applicality of AR models. Hence, previous studies attempted to applied non-linear learning methods (Nayak2015SVMKNN, ) to catch the complex patterns underlying the market trend. With the developement of deep learning, more research efforts have been paid on exploiting deep neurual networks for financial prediction (Kim2012simultaneous, ; Laboissiere2015maximum, ; Gocken2016IMA, ; Adebiyi2014Comparison, ; Patel2015predicting, ; Ticknor2013bayesian, ). To further model the long-term dependency in time series, recurrent neurual networks (RNN), especially Long Short-Term Memory (LSTM) network, have also been employed in financial predition (Rather2015recurrent, ; Akita2016deep, ; Gao2016stock, ). In most recent time, Zhang et al. (Zhang2017MultiFreq, ) proposed a new RNN, called State Frequency Memory (SFM), to discover multi-frequency trading patterns for stock price prediction.

One major limitation of technical analysis is that it is incapable of unveiling the rules that govern the dynamics of the market beyond price data. Fundamental approaches, on the contrary, seek information from outside market-historic-data, such as geopolitics, financial environment and business principles. Information explosion on the Internet has promoted online content, especially news, as one of the most important sources for fundamental analysis. There have been many attempts to mine news data for better predicting market trends. Nassirtoussi et al. (Nassirtoussi2015Text, ) proposed a multi-layer dimension reduction algorithm with semantics and sentiment to predict intraday directional-movements of a currency-pair in the foreign exchange market. Ding et al. (Ding2015Deep, ) proposed a deep learning method for event-driven stock market prediction. They further augmented their approach (ding2016knowledge, ) by incorporating an outside knowledge graph into the learning process for event embeddings. Wang et al. (Wang2014A, ) performed a text regression task to predict the volatility of stock prices. Xie et al. (Xie2013Semantic, ) introduced a novel tree representation, and use it to train predictive models with tree kernels using support vector machines. Hagenau et al. (Hagenau2013Automated, ) extract a large scale of expressive features to represent the unstructured text data and employs a robust feature selection to enhance the stock prediction.

Another major aspects for market news mining is to analyze sentiments from public news and social media, and then use it to predict market trends. Li et al. (Li2014News, ) implements a generic stock price prediction framework using sentiment analysis. Zhouet al. (zhou2016can, ) studies particularly the Chinese stock market. They conduct a thorough study over 10 million stock-relevant tweets from Weibo, and find five attributes that stock market in China can be competently predicted by various online emotions. Nguyen et al.(si2013exploiting, ) explicitly consider the topics relating to the target stocks, and extracting topics and related sentiments from social media to make prediction.

While there have been many efforts in exploiting news for stock prediction, few of them paid enough attention to the quality, trustworthiness, and comprehensiveness of news, which highly affects the effectiveness of textual analysis. In this paper, we address the challenge of chaotic news by imitating the learning process of human beings, inspired by which we propose a Hybrid Attention Networks (HAN) to predict the stock trend based on the sequence of recent related news and employ self-paced learning for effective and efficient learning.

3. Empirical Analysis

Figure 1. An example of two news sequences about the petrol industry reform, jointly shown with the stock price of SINOPEC that is one of the biggest petrol companies in China. In the figure, red, green and yellow circles represent negative, positive and neutral news respectively.
Figure 2. Proportion of the consecutive days without news.

In this section, through empirical analysis, we will reveal three principles of the human learning process with respect to stock trend prediction via chaotic news. These principles can consequently provide essential guidelines in designing our learning framework.

3.1. Sequential Context Dependency

Due to the diverse quality of online financial news, human investors usually prefer not to rely on a single piece of news to make prediction due to its limited or even vague information. Instead, by broadly analyzing a sequence of news and combining them into a unified context, each news can provide complementary information and thus a more reliable assessment of stock trend can be made.

For example, Figure 1 illustrates two news sequences referring to petrol industry reforms happening in September 2014 and March 2017, respectively. The figure also displays the share price of SINOPEC, which is one of the biggest petrol companies in China.

From the figure, we can see two yellow circles with the fluctuation sign that represent two news declaring the initiation of two reforms, respectively. By merely referring to the two news, we can hardly tell the future trend of SINOPEC, since they revealed quite limited details about the reforms. However, the difference between these two reforms could be inferred by their previous news sequences. In particular, previous news with the down sign on September 2014 indicated that the reform might cause negative effects; while on March 2017, the news with the rise sign demonstrates quite positive signals.

In reality, human investors can naturally synthesize these analytical reports before the reforms actually began, in order to better assess the influence of these reforms on relevant stocks. Therefore, to imitate such analysis process as a human, an ideal framework should integrate and interpret each news in a sequential temporal context, rather than analyze them separately.

3.2. Diverse Influence

Significant news has more intensive and durable influence on the market than those trivial ones. For example, the third news with the down sign in Figure 1(b) summarizes that the share price of SINOPEC has been continuously going down, which should give a negative signal to the stock trend. However, compared with the positive news reporting the reform on petrol industry in the same period, this negative one yields much less importance. As it turns out later, after the SINOPEC’s price drops down for only one day, it starts to rise up and keeps the uptrend for quite a long time, proving that the influence of the negative news is indeed weaker than the positive one.

Based on the diversity of news influence, we can conclude that when analyzing the news, an ideal framework should have the ability to distinguish the news with more intensive and durable influence, and pay more attention to them.

3.3. Effective and Efficient Learning

News cannot always provide an informative indication of the stock trend, especially when there exists only an insufficient number of news about specific stocks in a period. According to the online news we collected, Figure 2 shows the occurrence rates of the situation that there is no news about a particular stock “Jiai Technology” for consecutive days or more. Within those of time periods when there is no news reported for more than days, it is quite tough to make any news-oriented prediction. Additionally, other situations, such as the aggregation of vague news, also introduce difficulties to the prediction.

Such diverse difficulty in news-oriented stock trend prediction task has motivated human investors to find a more effective and efficient learning process. In reality, human investors tend to first gain an overall knowledge by focusing on common occasions, and then turn to exceptional cases.

Inspired by this, an ideal learning framework for stock prediction should follow a similar process, which in particular conducts learning on more informative news at the earlier stage, and further optimized to tackle harder samples.

4. Deep Learning Framework for News-Oriented Trend Prediction

In this section, we first formalize the problem of the stock trend prediction. Then, we present our framework based on the three design principles discussed in Empirical Analysis (Section 3). We first propose a Hybrid Attention Networks (HAN), which consists of two attention layers on news level and temporal level, respectively. Next, we incorporate a self-paced learning mechanism that enables the model to adjust the learning sequence in order to achieve better performance.

4.1. Problem Statement

We regard the problem of stock trend prediction as a classification problem. For a given date and a given stock , we can calculate its rise percent by:

Similar to many previous studies, we can divide rise percent into three classes: DOWN, UP, and PRESERVE, representing the significant dropping, rising, and steady stock trend on the next date, respectively.

The stock trend prediction task can be formulated as follows: given the length of a time sequence , the stock and date , the goal is to use the news corpus sequence from time to , denoted as , to predict the class of , i.e. DOWN, UP, or PRESERVE. Note that each news corpus contains a set of news with the size of , , denoting related news on date .

4.2. Hybrid Attention Networks

Based on the Sequential Context Dependency principle, our framework should interpret and analyze news in the sequential temporal context and pay more attention to critical time periods. In addition, based on the Diverse Influence principle, our framework should distinguish more significant news from others. To capture these two principles, we design a hybrid attention network (HAN), which incorporates attention mechanisms at both the news level and the temporal level.

We summarize the overall framework in Figure 3. Given the input of a news corpus sequence, a news embedding layer encodes each news into a news vector . Next, a news-level attention layer assigns an attention value to each news vector in a date, and calculate the weighted mean of these news vectors as a corpus vector for this date. Afterwards, these corpus vectors are encoded by a bi-directional Gated Recurrent Units (GRU). Then, another temporal attention layer assigns an attention value to each date, and calculate the weighted mean of these encoded corpus vectors to represent the overall sequential context information. Finally, the classification is made by a discriminative network. The details of the architecture are elaborated below.

Figure 3. The overall framework of the Hybrid Attention Networks (HAN).

News Embedding: For each news in news corpus of date , we use a word embedding layer to calculate the embedded vector for each word and then average all the words’ vectors to construct a news vector . To reduce the complexity of the framework, we pre-train an unsupervised Word2Vec as the word embedding layer rather than tuning its parameters in the learning process.

News-level Attention: Since not all news contributes equally to predicting the stock trend, we introduce an attention mechanism to aggregate the news weighted by an assigned attention value, in order to reward the news offering critical information. Specifically,

We first estimate attention values by feeding the news vector through a one-layer network to get the news-level attention value , and then calculate a normalized attention weight through a softmax function. Finally, we calculate the overall corpus vector as a weighted sum of each news vector respectively, and use this vector to represent all news information for date . Thus, we get a temporal sequence of corpus vector . Obviously, the attention layer can be trained end-to-end and thus gradually learn to assign more attention to the reliable and informative news based on its content.

Sequential Modeling: To encode the temporal sequence of corpus vectors, we adopt Gated Recurrent Units (GRU). GRU is a variant of recurrent neural networks that uses a gating mechanism to check the state of sequences without separate memory cells. At date , the GRU computes the news state by linearly interpolating the previous state and the current updated state , as:

The current updated state is computed by non-linearly combining the corpus vector input for this time-stamp and the previous state, as:

where denotes the reset gate, controlling how much past state should be used for updating the new state, and is the update gate, deciding how much past information should be kept and how much new information should be added. These two gates are calculated by:

Therefore, we can get the latent vector for each date through GRU. In order to capture the information from the past and future of a news as its context, we concatenate the latent vectors from both directions to construct a bi-directional encoded vector as:

The resulted incorporates the information of both its surrounding context and itself. In this way, we encode the temporal sequence of corpus vectors.

Temporal Attention: Since the news published at different dates contribute to the stock trend unequally, we adopt the temporal-level attention mechanism, which incorporates both the inherent temporal pattern and the news content, to distinguish the temporal difference as:

where is the parameter for each date in the softmax layer, indicating in general which date is more significant, and is the latent representations of encoded corpus vectors. By combining them through a softmax layer, we can get an attention vector to distinguish the temporal difference. Then we use to calculate the weighted sum , so that it can incorporate the sequential news context information with temporal attention, and will be used for classification.

Trend Prediction: The final discriminative network is a standard Multi-layer Perceptron (MLP), which takes as input and produces the three-class classification of the future stock trend.

4.3. Self-paced Learning Mechanism

As discussed in Section 3.3, there exist some natural challenges of news-oriented stock trend prediction, such as the scarceness of news and the aggregation of vague news, which cause severe learning difficulties. To conduct an effective and efficient learning, the designed model should skip those challenging training samples at the early training stages, and progressively incorporate them into the model training.

Curriculum Learning (Bengio2009Curriculum, ) is a learning mode that can imitate such a learning process. However, the sequence of training samples in curriculum learning is fixed by predetermined heuristics (curriculum), which cannot be adjusted to the feedback from the dynamic learned models. To alleviate this issue, Kuma et al. (Kumar2011Self, ) designed Self-Paced Learning (SPL) to embed curriculum design into the learning objective, so it can jointly optimize the curriculum and the learned model simultaneously.

Therefore, we take advantage of SPL in our framework to learn the news influence in an organized manner. Formally, given a training set , where denotes all the news inputs for the observed sample, and represents the corresponding stock trend label. Let denotes the loss function between label and the output of the whole model , and represents the model parameter to be learned. We assign each learning sample an importance weight . The goal of SPL is to jointly learn the model parameter and the latent weight by:


where denotes a self-paced regularizer which controls the learning scheme for penalizing the latent weight variables, and is a hyper-parameter that controls the pace at which the model learns new samples. In this paper, we choose the linear regularizer proposed in  (Jiang2015Self, ), which linearly discriminates samples with respect to their loss, as:


Similar to (Jiang2015Self, ) (Kumar2011Self, ), we adopt Alternative Convex Search (ACS)(gorski2007biconvex, ) to solve Equation (1). ACS divides the variables into two disjoint blocks. In each iteration, a block of variables is optimized while fixing the other block. With the fixed , the unconstrained close-formed solution for the linear regularizer (2) can be calculated by:

where denotes the element in the iterated optimal solution, and denotes the loss for each element . In this way, the latent weight for samples that are different to what model has already learned will receive a linear penalty.

0:  Input dataset, ; linear regularizer, ; self-paced step size, ;
0:  Model Parameter, ;
1:  Initialize evenly;
2:  while not converged do
3:     Update ;
4:     Update ;
5:     if  is small then
6:        increase by the stepsize ;
7:     end if
8:  end while
9:  return  ;
Algorithm 1 Self-paced Learning.

Based on the optimization equation, the overall learning process can be abstracted as Algorithm 1. It alternates between two optimization steps described in ACS until the model finally converges: Step 3 learns the optimal model parameters with the fixed and most recent by standard back-propagation mechanism; Step 4 learns the optimal weight variables with the fixed by the linear regulizer. Noted that a small initial let only samples with rather small loss affect the learning. As the training progresses and increases in Step 6, samples with larger loss will be gradually incorporated, realizing the effective and efficient learning.

5. Evaluation

Figure 4. Distribution of the rise percent among the whole dataset.
Figure 5. The five attention weight clusters for temporal attention in different context, along with an overall distribution of all the attention weights.

In this section, we first present our experimental setup. Then, we conduct comprehensive experiments to evaluate the performance of our proposed deep learning framework, followed by a systematic trading simulation to examine the effectiveness of our framework on the real-world market.

Figure 6. Three classes of demonstrative news in different levels of attention values.
Figure 7. The learning curves of the model using self-paced learning and that without using SPL.

5.1. Experimental Setup

5.1.1. Data Collection

We obtained the Chinese stock data, including time series in terms of price and trading volume from 2014 to 2017 in daily frequency. There are totally 2527 stocks, covering the vast majority of Chinese stocks.

In the meantime, we collected economic news between 2014 and 2017***The news data are collected from two famous economic websites in China:  http://www.eastmoney.com/ and  http://finance.sina.com.cn/.. For each news, we extracted the publication time-stamp, title, and its content.

Then, we correlate each of the collected news to a specific stock if the news mentioned the name of the stock in the title or content. We then filter out the news without any correlation to stocks. After such a process, the total news amount is . For each stock, we then aggregate all the news in a certain date to construct the daily news corpus.

5.1.2. Learning Settings

To specify the label of the classification problem, we set up two particular thresholds to bin the rise percents, i.e., DOWN (), UP (), and PRESERVE (). We define the thresholds so that the three categories are approximately even, as is shown in Figure 4.

For the following experiments, we split the dataset into a training set () from September 2014 to May 2016, and a test set () from May 2016 to March 2017. Then we further randomly sample a validation set from the training set with size of it, in order to optimize the hyper-parameters and choose the best epoch.

In all the following experiments, we tokenize each news and remove the stop words and the words appearing less than 5 times to build the vocabulary. We then obtain the word-level embedding by training an un-supervised CBOW Word2Vec(mikolov2013efficient, ) model with the dimension of 500 through the whole dataset. All the models use the length of a time sequence as 10.

Figure 8. The accuracy result for different methods.
Figure 9. The annualized return for different methods.

5.1.3. Compared Methods

To evaluate the effectiveness of our proposed deep learning framework, we compare our result with the performance of the following methods:

Random Forest: we use the Random Forest (RF) classifier, with the number of trees in the forest as . we define a corpus vector for one date by averaging all news vectors of a certain date, and concatenate the corpus vector in 10 days to construct the input.

Multi-layer Perceptron: we use the multilayer perceptron (MLP) classifier, with three layers of sizes as 256, 128, and 64, respectively. The input of MLP is the same as that in RF.

News-RNN: We use a bi-directional GRU-based RNN, with the dimensionality of RNN output layer as 64. The RNN model takes corpus vectors organized in a temporal sequence as the input.

One-RNN: To evaluate the effectiveness of bi-directional setting, we use a standard one-direction GRU-based RNN as a comparison with News-RNN.

Temporal-Attention-RNN: To evaluate the temporal attention layer, we use Temporal-Attention-RNN (Temp-ATT) by adding a single temporal-level attention layer on top of the News-RNN.

News-Attention-RNN: To evaluate the news attention layer, we use News-Attention-RNN (News-ATT) by adding a single news-level attention layer to process corpus vector before the News-RNN.

HAN: Our proposed hybrid attention networks with the normal learning process.

HAN-SPL: Our proposed hybrid attention networks with self-paced learning process.

5.2. Effects of Two Attention Mechanisms

5.2.1. Demonstration on News Attention

Since the news attention values are constructed by a single perception with the embedded news vector as input, we can calculate the attention values for every piece of news on the testing dataset. To validate whether our framework can select significant news and filter out those less informative ones, Figure 6 shows some detailed results of the news attention. In this figure, the number in front of the news is the attention value, and it shows demonstrative news within three classes that have the lowest, middle, and highest attention values, respectively. From the figure, we can find that the sentences with the lowest attention value merely display the price and the code of the corresponding stock, which obviously have no implication to the future trend; the news with the middle attention value records the previous market performance of the corresponding stock, which provides some information of the stock; the news with the highest value describes some prediction from official analyst or some significant events, which explicitly contain the predictive information to the market trend. These cases clearly illustrate that our news attention mechanism can indeed distinguish important news from uninformative ones, explaining why it can boost the performance.

5.2.2. Visualization of Temporal Attention

To further illustrate the effect of temporal attention, we calculate all the temporal attention vectors on the testing dataset. As is shown in Figure 5, the last plot with the title Overall Distribution implies that, in general, the news reported recently has more impact on the current stock trend than those reported earlier.

Next, we adopt the spectral cluster method to generate five clusters representing different patterns of temporal attention, as is shown in the first five plots in Figure 5. We can find that despite the overall distribution, in different contexts with different news contents, the attention distribution can be quite diverse as well as dynamic. The informative news even in a few days ago can also express their influences in the temporal attention layer.

5.3. Effects of Self-paced learning

To evaluate the effectiveness of the self-paced learning (SPL) in our framework, we compare the training process using the SPL with that using the standard training process. Both the convergence speed and the final result are considered as the comparison metric. As is shown in Figure 7, at the beginning, the learning speed for SPL is slower than the one without using it, probably because SPL could neglect some difficult samples at early stages. However, as the round of epochs increases, the testing accuracy of the model using SPL gradually outperforms the one without using it, and eventually converge to a better result.

5.4. Overall Performance Experiments

5.4.1. Classification Accuracy Result

In the experimental setting of a tri-label classification problem, as the three label is approximately evenly split, we choose accuracy, which means the proportion of true results among the total number of testing samples, as the evaluation metric. The final results of all the methods are shown in Figure 9, in which each bar indicates the average accuracy of the testing dataset. Our proposed framework is able to gain the best accuracy result among all the baseline methods.

5.4.2. Market Trading Simulation

To further evaluate the effectiveness of our proposed framework, we conduct a back-testing by simulating the stock trading for approximately one year, from May 2016 to March 2017.

Our estimation strategy conducts the trading in the daily frequency. At the beginning of each trading date, the model will give each stock a score based on the probability to have a rising trend minus the probability to have a declining trend. Based on these scores, a straightforward portfolio construction strategy called top-K selects stocks with the highest scores to construct a new portfolio for the next trading day. The selected stocks are evenly invested according to their OpenPrices of the next trading day. To approximate the real-world trading, we also consider a transaction cost of for each trading. We then calculate the average return on the stock market by evenly holding every stock as the baseline, indicating the overall market trend.

Figure 10. The cumulative profit curve of different methods with the portfolio of choosing top 40 stocks.

To evaluate the performance of each prediction method, we use the annualized return as the metric, which is the cumulative profit per year. In reality, investors always choose multiple stocks to avoid risks. Therefore, we evenly invest on top stocks, where is set as , , and , respectively, to compare the results in Figure 9. In general, the highest profits should be obtained by selecting the most top stocks. However, when the is , the improvement is not that obvious. It is because that selecting few stocks leads to high turnover rate, which causes high transaction cost and offsets the profits. The detailed comparison and analysis of each algorithm will be elaborated in the next section.

We also plot a cumulative profit curve for each method with as . As is illustrated in Figure 10, the cumulative profit results are consistent with the Overall Performance Analysis (section 5.4), and our proposed framework is able to gain the best profit result among all the baseline methods.

5.5. Performance Discussion

As is illustrated in Figure 9 and Figure 9, the tendency of the accuracy result and the annualized return result are consistent among different methods. Furthermore, the profit curves in Figure 10 demonstrated that the profit gaps between different methods almost do not change with time. Our proposed framework, HAN-SPL, can achieve the best accuracy of (Figure 9) and can obtain the highest profits in all different settings (Figure 9). In particular, the highest annualized return are obtained while investing on top stocks, which is a remarkable improvement compared to the Market performance of . We further analyze the results of different methods to discuss why our proposed framework can take effects. The discussions are as follows:

Discussion on the RNN setting: The performances of MLP and RF are apparently worse than the others, probably because the input to these two methods are not organized in sequential contexts. This result to some extent indicates the significance to use RNN, which processes the news in the sequential context.

Discussion on the bi-directional GRU setting: In our experiment setting, the News-RNN and One-RNN share the similar model structure, except that News-RNN uses a bi-directional GRU while One-RNN uses a one-directional GRU. From the figures, we can see that the News-RNN outperforms the One-RNN, showing the bi-directional setting, which can utilize the information from both past and future for prediction, effectively increases the performance.

Discussion on the attention mechanisms: As we can see from the figures, the two baseline models with one attention layer, Temp-ATT and News-ATT, can achieve a better result than that of News-RNN, which has no attention layer, indicating the effectiveness to distinguish diverse influences of different news and dates.

Discussion on our proposed HAN framework: When we combine the two attention layers, our HAN can achieve even higher performance than all above-mentioned models, indicating the effectiveness of our overall model structure. In addition, the HAN model trained by self-paced learning (HAN-SPL) outperforms HAN with the standard training process, which shows the importance of effective and efficient learning. In Figure 10, our HAN-SPL significantly outperforms all other methods for all the test time, which shows our proposed framework is indeed effective in the problem of stock trend prediction.

6. Conclusion and Future Work

In this paper, we pointed out three principles for news-oriented stock trend prediction, including sequential context dependency, diverse influence and effective and efficient learning, by imitating the learning process of human. Based on these principles, we proposed a new learning framework, particularly a Hybrid Attention Network (HAN) with self-paced learning mechanism, for stock trend prediction from online news. Extensive experiments on real stock market data demonstrated that our proposed framework can result in significant improvement in terms of the accuracy of stock trend prediction. Meanwhile, through back-testing, our framework can produce appreciable profits, with highly increased annualized excess return, in a one-year round trading simulation.

In the future, beyond modeling the sequence of news related to one stock, we plan to further leverage the relationship between news related to different individual stocks, according to their industrial connections in real world. Moreover, we will investigate how to integrate the news-oriented approach with technical analysis for more accurate stock trend prediction.


  • [1] K. Adam, A. Marcet, and J. P. Nicolini. Stock market volatility and learning. The Journal of Finance, 71(1):33–82, 2016.
  • [2] A. A. Adebiyi, A. O. Adewumi, and C. K. Ayo. Comparison of arima and artificial neural networks models for stock price prediction. Journal of Applied Mathematics, 2014.
  • [3] R. Akita, A. Yoshihara, T. Matsubara, and K. Uehara. Deep learning for stock prediction using numerical and textual information. In Proceedings of the 15th International Conference on Information Systems, pages 1–6, 2016.
  • [4] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. Journal of the American Podiatry Association, 60(60):6, 2009.
  • [5] X. Ding, Y. Zhang, T. Liu, and J. Duan. Deep learning for event-driven stock prediction. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, pages 2327–2333, 2015.
  • [6] X. Ding, Y. Zhang, T. Liu, and J. Duan. Knowledge-driven event embedding for stock prediction. In Proceedings of the 26th International Conference on Computational Linguistics, pages 2133–2142, 2016.
  • [7] Q. Gao. Stock market forecasting using recurrent neural network. PhD thesis, University of Missouri–Columbia, 2016.
  • [8] M. Göçken, M. Özçalıcı, A. Boru, and A. T. Dosdoğru. Integrating metaheuristics and artificial neural networks for improved stock price prediction. Expert Systems with Applications, 44:320–331, 2016.
  • [9] J. Gorski, F. Pfeuffer, and K. Klamroth. Biconvex sets and optimization with biconvex functions: a survey and extensions. Mathematical Methods of Operations Research, 66(3):373–407, 2007.
  • [10] M. Hagenau, M. Liebmann, and D. Neumann. Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55(3):685–697, 2013.
  • [11] L. Jiang, D. Meng, Q. Zhao, S. Shan, and A. Hauptmann. Self-paced curriculum learning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, pages 2694–2700, 2015.
  • [12] K.-J. Kim and H. Ahn. Simultaneous optimization of artificial neural networks for financial forecasting. Applied Intelligence, pages 887–898, 2012.
  • [13] M. P. Kumar, B. Packer, and D. Koller. Self-paced learning for latent variable models. 1:1189–1197, 2010.
  • [14] L. A. Laboissiere, R. A. Fernandes, and G. G. Lage. Maximum and minimum stock price forecasting of brazilian power distribution companies based on artificial neural networks. Applied Soft Computing, 35:66–74, 2015.
  • [15] L. Li, S. Leng, J. Yang, and M. Yu. Stock market autoregressive dynamics: A multinational comparative study with quatile regression. In Mathematical Problems in Engineering, 2016.
  • [16] X. Li, H. Xie, L. Chen, J. Wang, and X. Deng. News impact on stock price return via sentiment analysis. Knowledge-Based Systems, 69(1):14–23, 2014.
  • [17] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
  • [18] A. K. Nassirtoussi, S. Aghabozorgi, T. Y. Wah, and D. C. L. Ngo. Text mining of news-headlines for forex market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications, 42(1):306–324, 2015.
  • [19] R. K. Nayak, D. Mishra, and A. K. Rath. A naive svm-knn based stock market trend reversal analysis for indian benchmark indices. In Applied Soft Computing, 2015.
  • [20] W. Nuij, V. Milea, F. Hogenboom, F. Frasincar, and U. Kaymak. An automated framework for incorporating news into stock trading strategies. IEEE Transactions on Knowledge and Data Engineering, 2(11):823–835, 2014.
  • [21] J. Patel, S. Shah, P. Thakkar, and K. Kotecha. Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications, 42(4):2162–2172, 2015.
  • [22] A. M. Rather, A. Agarwal, and V. Sastry. Recurrent neural network and a hybrid model for prediction of stock returns. Expert Systems with Applications, 42(6):3234–3241, 2015.
  • [23] J. Si, A. Mukherjee, B. Liu, Q. Li, H. Li, and X. Deng. Exploiting topic based twitter sentiment for stock prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 24–29, 2013.
  • [24] P. C. Tetlock. Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3):1139–1168, 2007.
  • [25] J. L. Ticknor. A bayesian regularized artificial neural network for stock market forecasting. Expert Systems with Applications, 40(14):5501–5506, 2013.
  • [26] W. Y. Wang and Z. Hua. A semiparametric gaussian copula regression model for predicting financial risks from earnings calls. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1155–1165, 2014.
  • [27] B. Xie, R. J. Passonneau, L. Wu, and G. G. Creamer. Semantic frames to predict stock price movement. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 873–883, 2013.
  • [28] L. Zhang, C. Aggarwal, and G.-J. Qi. Stock price prediction via discovering multi-frequency trading patterns. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2141–2149, 2017.
  • [29] Z. Zhou, J. Zhao, and K. Xu. Can online emotions predict the stock market in china? In Proceedings of the 17th International Conference on Web Information Systems Engineering, pages 328–342. Springer, 2016.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description