Using Twitter to Model the EUR/USD Exchange Rate
Fast, global, and sensitively reacting to political, economic and social events of any kind, these are attributes that social media like Twitter share with foreign exchange markets. Does the former allow us to predict the latter? The leading assumption of this paper is that time series of Tweet counts have predictive content for exchange rate movements. This assumption prompted a Twitter-based exchange rate model that harnesses regARIMA analyses for short-term out-of-sample ex post forecasts of the daily closing prices of EUR/USD spot exchange rates. The analyses made use of Tweet counts collected from January 1, 2012 – September 27, 2013 via the Otter API of topsy.com. To identify concepts mentioned on Twitter with a predictive potential the analysis followed a 2-step selection. Firstly, a heuristic qualitative analysis assembled a long list of 594 concepts, e.g., Merkel, Greece, Cyprus, crisis, chaos, growth, unemployment expected to covary with the ups and downs of the EUR/USD exchange rate. Secondly, cross-validation using window averaging with a fixed-sized rolling origin was deployed to select concepts and corresponding univariate time series that had error scores below chance level as defined by the random walk model that is based only on the EUR/USD exchange rate. With regard to a short list of 17 concepts (covariates), in particular SP (Standard & Poor’s) and risk, the out-of-sample predictive accuracy of the Twitter-based regARIMA model was found to be repeatedly better than that obtained from both the random walk model and a random noise covariate in 1-step ahead forecasts of the EUR/USD exchange rate. This advantage was evident on the level of forecast error metrics (MSFE, MAE) when a majority vote over different estimation windows was conducted. The results challenge the semi-strong form of the efficient market hypothesis (Fama, 1970, 1991) which when applied to the FX market maintains that all publicly available information is already integrated into exchange rates.
The foreign exchange market is known to be the largest financial market in the world. Over the last decades, “the FX trading volume has exploded reflecting an electronic revolution that has lowered trading costs, attracted new groups of market participants, and enabled aggressive new trading strategies” (King et al., 2011, p. 3 ). In this market, the EUR/USD exchange rate111The EUR/USD spot exchange rate is stated as U.S. Dollars per Euro. is currently considered to be the most important currency pair. When wars, disasters, economic decline, high unemployment rates, elections in major countries are looming, the EUR/USD exchange rate often reacts like a global crisis barometer. Usually, these reactions like exchange rate movements in general can be well explained in hindsight, but over short-time periods they are hard to predict above chance level. In a recent comprehensive review of the vast, but not always satisfying work on exchange rate modeling Barbara Rossi underscores that success in this area depends to a large degree on the choice of predictors (Rossi, 2013). The work described in this paper has been sparked by the question whether such predictors can be distilled from public discussions on the microblogging platform Twitter. Information harvested from Twitter has already become a promising source of data for predictive modeling of many real-world phenomena (Lică and Tută, 2011). Areas modeled are as diverse as forecasting box-office revenues for movies (Asur and Huberman, 2010), disease activity of influenza-type illnesses (Signorini et al., 2011), stock market indicators like Dow Jones Industrial Average (Bollen et al., 2010) or political elections (Tumasjan et al., 2010). More closely related to the present work is a recent study by Papaioannou et al. (2013) who used data sourced from Twitter between 25/10/2010 and 05/01/2011 to model the EUR/USD exchange on a high-frequency intraday trading scale. The work introduced in this paper shares the general research goal pursued by Papaioannou et al. (2013). It differs, however, from this study insofar as it looks at the influence of public debates on the EUR/USD exchange rate. Accordingly, the present study make use of a less fine grained temporal resolution (daily closing prices of EUR/USD spot exchange rates) which are studied over a longer time span (21 months). The more specific goal of this paper is to address the question whether and to what extent information extracted from public debates on Twitter facilitates prediction of the EUR/USD exchange rate.
Being global and sensitive to small changes on the international news agenda Twitter seems to fit the metaphor of a global crisis barometer. Can this metaphor be turned into a predictive model of the EUR/USD exchange rate? On browsing the research on predictive modeling via data from social media three requirements of the domain to be modeled spring to mind: Firstly, all areas modeled should provoke intensive public discussions on social media platforms. Actually acquiring this data may or may not be non-trivial or costly. Clearly, however, there should not be any shortage of data available for modeling. Secondly, for each area to be modeled an unequivocal numerical measure external to the Internet, e.g., revenues, stock exchange rates or results of a general election is essential. This requirement emphasizes that a variable to be predicted is an indispensable part of any forecasting setup. Thirdly, social media usually generate an overabundance of data. Twitter, for instance, is a hub for half a billion text messages a day (Goel, 2013). The popularity of social media often means that finding information that facilitates the modeling task chosen is challenging. The third requirement for successful modeling via data from social media is therefore the selection of information with explanatory or predictive potential. For instance, Papaioannou et al. (2013) addressed the selection problem by searching via the Archivist API for Tweets that used the expression “buy EUR/USD” .
The three requirements discussed above help to narrow down conditions under which Twitter can be deployed to model the EUR/USD exchange rate. Firstly, Twitter provides a global platform where news, opinions, comments are exchanged on an unprecedented scale. Often, this type of information varies with the volatility of markets including the foreign exchange markets. Secondly, the EUR/USD exchange rate is obviously a numerical measure amenable to modeling. Whether or not data gleaned from Twitter can fulfill the third requirement, i.e., selection of information with explanatory or predictive potential, is an open question that merits a more detailed discussion.
The US Dollar and the Euro are mentioned in a large number of Tweets that do not seem to have any relationship with the EUR/USD exchange rate, e.g., advertisements. Vice versa, there is a large number of Tweets that do not mention the US Dollar or the Euro explicitly but which may be connected with the EUR/USD exchange rate, e.g., mentions of unemployment figures in the US or a crisis in a EU country. Even though the information found on Twitter is often nonsensical or simply unrelated to foreign exchange markets it can be assumed that a sizeable number of the messages communicated on Twitter include information that bear some relationship with the foreign exchange markets. Following this assumption, the question at stake is which information exchanged on Twitter has a predictive potential with regard to variables of the foreign exchange market. If this type of information is available on Twitter, it needs to be identified using appropriate selectors. What could be such an appropriate selector? The research described in this paper is guided by the assumption that on Twitter the selection of variables with predictive content for the EUR/USD exchange rate is achievable by referring to the concepts used in discussions of the Euro crisis. This is a specify focus, and it has to be specific to work as a selector for Tweets. At the same time, this focus does in itself not rule out that data elicited in this way may also reflect positive developments in the Euro zone. This is true as long as there are some discussions on the Euro crisis on Twitter. Otherwise, there is no variance in the data and prediction becomes impossible. It is conjectured that intensive discussions of the Euro crisis are associated with a bearish Euro and a bullish US Dollar. Vice versa, if the discussion on any crisis in the Euro zone is losing momentum, e.g., because of positive news from the Euro zone, then the value of the Euro relative to the US Dollar can be expected to be on the up. Furthermore, it is assumed that such discussions on the Euro crisis are not only associated with the EUR/USD exchange rate but bear potential to predict it.
The assumption that there exists information on Twitter (Tweets) that facilitates predicting variables of interest is termed here the hypothesis of sufficient predictive information. This hypothesis has been inspired by the conjecture that Twitter is a fast, efficient and globally operating aggregator of news and opinions. Using information from Twitter to forecast foreign exchange rates with more accuracy than the random walk model (RW) is considered to be supporting evidence in favor of the hypothesis of sufficient predictive information. A competing approach to the prediction of foreign exchange rates that leads to opposing predictions when applied to Twitter is the well-known efficient market hypothesis (EMH). There are three forms of the EMH, and in this paper the focus is on the semi-strong form. It maintains that investors are rational and pick up swiftly all market-relevant information so that the exchange rate considered fully reflects the information that is publicly available. According to this hypothesis, only truly new information, which occurs at random, can affect the markets. A corollary of the efficient market hypothesis is that the random walk model predicts market prices best. It is typically applied to the analysis of stock markets but is also common in the analysis of foreign exchange markets. However, the question of efficiency has sparked a number of controversial debates (e.g., Lee and Sodoikhuu, 2012).
The position maintained in this paper is that the efficient market hypothesis is sufficiently general to put information gleaned from Twitter into perspective as well (see also Papaioannou et al., 2013). The Twitter-based information considered in this study are Tweet counts. This type of information is not as easily accessible as information taken from, e.g., newspapers but it is information that is publicly available. The semi-strong form of the market hypothesis predicts that discussions on Twitter like any other market-relevant publicly available information is rapidly arbitraged away and impounded into exchange rates. A truly efficient foreign exchange market would mean that discussions on Twitter have no predictive value for exchange rates. Still, harnessing Twitter for prediction would be methodologically naïve without considering both the potential and the limits of the fast and dynamically changing character of discussions on this microblogging platform. On the one hand, data gleaned from Twitter may facilitate predictions of variables of interest. In fact, the very existence of Twitter is a challenge to the efficient market hypothesis as it offers real-time data with possible predictive power or predictability for foreign exchange markets and other markets. On the other hand, the predictive power of concepts mentioned on Twitter is expected to vary across time and may easily be subject to structural breaks. The research question addressed in this study is whether and under what conditions the hypothesis of sufficient predictive information that operates on the basis of data gleaned from Twitter or the efficient market hypothesis facilitates better predictions of the EUR/USD exchange rate. The conditions to be examined include the identification of concepts that facilitate prediction of the EUR/USD exchange rate and the expected time-dependence of their predictability. To answer this question a horse race between regARIMA models and the random walk model has been conducted. The models are nested, in that the regARIMA model is an extension of the parsimonious random walk model. The former is taken to model the hypothesis of sufficient information, the latter models the efficient market hypothesis. The mean square forecast error (MSFE) and the mean absolute prediction (MAE) error of the regARIMA model relative to that of the random walk model are used as loss functions and criteria of forecast accuracy. Akaike’s information criterion (AIC, Akaike, 1974) and the Bayesian information criterion (BIC, Schwarz, 1978) are harnessed as measures of model fit. In what follows, the expression outperforming concepts refers to concepts which when used as covariates in a regARIMA model lead to smaller forecast error scores (MSFE, MAE) than the random walk model. Seen from this vantage point, this study is part of a strand of research in exchange rate modeling that attempts to ‘beat’ the random walk model (MacDonald and Taylor, 1994; Lisi and Medio, 1997; Kilian and Taylor, 2003; Hong et al., 2007; Rossi, 2013). At the same time, however, this study differs from this research in that the exchange rate model proposed makes use of data gathered from social media. The analysis will reveal whether or not there are outperforming concepts talked about on Twitter.
The remainder of this article is organized around the following sections. To motivate the Twitter-based approach to exchange rate modeling section 2 discusses the relationship between the EUR/USD exchange rate and public discussions. Section 3 spells out the theoretical background and the econometric models used in this study. In section 4, a 5-step approach to data elicitation from Twitter and feature extraction is described. Section 5 presents the results of this study. The paper concludes with a discussion that relates the findings of this study to the hypothesis of sufficient information and the efficient market hypothesis.
2 Exchange Rates and Public Communication
The existence of a link between communication and exchange rates is well known to observers of the FX market. Politicians or high ranking bank officials trying to talk currencies up or down via the news is a typical example of attempts to instrumentalize this link. In this study, however, the focus is on Twitter-based public discussions around currencies irrespective of whether they have been prompted by politicians, bank officials or any other party. Does the general public discuss the EUR/USD exchange rate? If such a discussion exists, does it influence exchange rates? Indirectly, the vast literature on the EUR/USD exchange rate seems to give a negative answer to this question. The majority of the academic work on exchange rates does not consider discussions of the general public but focuses on macro-fundamentals and to a smaller degree on non-fundamentals, e.g., news or sentiments. What the general public discusses has attracted far less attention among researchers of exchange rates. The small number of studies that do exist on public discussions and exchange rates examine the role of media or institutions of the finance sector on exchange rates (Thompson, 2009) or they examine tactical public communications of politicians. But these studies do not work towards exchange rate modeling (Bracke et al., 2008; Weiss and Kemper, 2011). Hence, it is an open question whether and to what degree the ups and downs of the EUR/USD are explicitly discussed by the general public. But it seems to be safe to assume that the state of the economy in the United States, in the Euro zone and beyond matters to many people (Bruegger and Knorr-Cetina, 2002). A large number of economic topics which the general public is concerned with, e.g., real income, unemployment, debts, prices, rents, wages, housing are known to be closely related to exchange rates. In fact, many of these topics directly relate to economic fundamentals. Until recently, there was hardly any platform available where people outside financial expert communities could articulate their thoughts and sentiments related to theses topics. The Internet, in particular the microblogging service Twitter, has changed this. Used by non-professionals and professionals alike, Twitter offers a global platform for discussing a huge variety of topics including those which directly or indirectly concern the EUR/USD exchange rate. In this study, the intensity of public debates on Twitter will be measured by the number of Tweets that include one or more of the concepts considered to be indicative for the EUR/USD exchange rate. The unit of interest in this study are central concepts in public Twitter-based discussions on the Euro crisis conjecturing that this debate is a viable proxy for discussions on the EUR/USD exchange rate. It is expected that some of these concepts have the potential to predict the EUR/USD exchange rate. In order to test this assumption, the concepts typically used in discussions of the Euro crisis have to be identified. Discussions on the Euro crisis can be described as a system of narratives (Propp, 1968) which in turn provides a heuristic approach for selecting concepts on Twitter that are possibly related to the EUR/USD crisis.
Is the focus on concepts of the Euro crisis overly specific in that other determinants of the EUR/USD exchange rate, e.g., economic problems in the US, are left out? Such reasoning overlooks that public discussions today take place in a competitive attention economy (Davenport and Beck, 2001; Falkinger, 2008). This means that on the assumption that public global discourse is not massively manipulated, the Euro crisis or any other major theme that enters the global news agenda will be discussed relative to other topics. Thus, even with clear focus on concepts of the Euro crisis this approach does not in itself rule out other sources of influence on the EUR/USD exchange rate. For instance, intensive global discussions of news related specifically to the US, e.g., government shutdown, may reduce the global attention devoted to the Euro crisis. Whether and to what degree an event actually takes away the public attention from event can be conceived of as the result of a global voting process which is expected to manifest itself on Twitter and other social media.
3 Theoretical Background
The conceptual framework used for conducting a horse race between the efficient market hypothesis and the hypothesis of sufficient predictive information
was the autoregressive integrated moving average (ARIMA) methodology
(Box and Jenkins, 1976).
ARIMA models that correspond to the efficient market hypothesis on the one hand to the
hypothesis and the hypothesis of sufficient predictive information on the other are
the random walk model and regression with ARIMA errors or regARIMA model
(equations 1 and 6).
The decision between the competing models is made on the basis of in-sample information criteria (AIC, BIC)
and out-of-sample forecast error measures (MAE, MSFE)
using time series cross-validation
(Arlot and Celisse, 2010; Hyndman, 2010).
3.1 Random Walk Model
The first and simplest model to be examined is the univariate random walk model according to which a time series (or in backshift notation hinges only on its predecessor and a random process . A number of authors found little evidence that exchange rate movements (EUR/USD, ECU/USD) do not follow the random walk model (Chen, 2011; Newbold et al., 1998). In this study, the random walk model without drift has been chosen as it is the toughest benchmark to beat (Rossi, 2013; Meese and Rogoff, 1983).
The random walk model is a special case of the autoregressive model abbreviated AR(). This model uses -times lagged versions of the forecast variable for prediction. If =1 and =1, then the autoregressive model is equivalent to the random walk model
3.2 regARIMA Model
The second type of model harnessed to predict and to explain the EUR/USD exchange rate is the multivariate regARIMA model (Hyndman and Athanasopoulos, 2012). This type of time series method incorporates one or several exogenous series (regressors) into an ARIMA framework to assess whether the marginal explanatory power of the regressor(s) used is larger than that of a pure autoregressive model.222In this study, all regARIMA models considered after completion of the feature selection process are bivariate or uni-covariate. Each of them ties-in one exogenous time series, which is the Tweet count of one concept used in the discourse of the Euro crisis. A regARIMA model can be conceived of as a generalization of either a regression or an ARIMA model and constructed accordingly. Suppose, model development starts with the regression part of a regARIMA model. Let be a time series of length modeled via an OLS regression with predictor variables and denoted as
The term refers to the residuals or regression errors, i.e., the difference between a score to be predicted and its model prediction with
The regression errors form themselves a univariate time series which typically includes correlated residuals. This is at odds with OLS regression as it leads to biased standard errors and thus distorted estimates of parameters. At this point, the ARIMA part of a regARIMA model enters the game because correlated scores that cause problems in an OLS regression model can well be accounted for in an ARIMA(,,) model. To specify the parameters of the ARIMA(,,) model such that it fits the time series of residuals best, there are a number of methods available like, e.g., analysis of the patterns of (partial) autocorrelation of or the automated procedure for optimal ARIMA selection (Hyndman and Khandakar, 2008). If, for instance, an appropriate model identification procedure indicates that a non-seasonal ARIMA(1,2,1) model accounts for the time series of residuals best, then this model model can be written in succinct backshift notation as
While the error term of the regression analysis occupies the slot of the time series, the error term of this ARIMA model is expected to be a Gaussian white noise process (Hyndman and Athanasopoulos, 2012). To make visible the forecast variable to be modeled, the regressors and their (fixed) coefficient , the residual term in equation 5 is to be replaced by . Then, the regARIMA model can be re-expressed as
3.3 Diagnostic Checking
By its very nature, regARIMA modeling involves diagnostic tests that are motivated both by the regression part and the time series part of this model. Firstly, to check fulfillment of the requirements of the regression part of the regARIMA model variance was examined. This was achieved by calculating variance inflation factors (VIF) thereby testing for collinearity of the covariates (Fox, 1997). Low VIF scores indicate collinearity due to redundant explanatory variables, and variables with low VIF score were discarded. Secondly, diagnostic checking for time series was addressed. Typically, this stage of the analysis involves a visual examination of the residuals from the tentatively entertained model as the distribution of residuals can reveal problems of the model applied. In this study, however, for each of the component times series considered numerous time series segments (estimation windows) were analyzed. This followed from the fixed-size rolling window approach used for cross-validation (see below). Thus, visually examining correlation or partial autocorrelation plots was not an option. Instead, for each concept and for each estimation window the auto.arima() function of the R forecast package (Hyndman and Khandakar, 2008) was applied as part of each of the forecasting runs conducted. This function uses unit-root tests, seasonal root tests and a collection of standard model selection criteria, e.g., AIC, to identify the regARIMA model parameters p, d and q that explain the data best.
For each of the component time series analyzed the regARIMA model parameters suggested by auto.arima() for each estimation window were used for model fitting. Likewise according to the output of auto.arima() each estimation segment has been differenced, detrended and seasonally adjusted. To double check whether or not the adjustments made actually removed time series anomalies standard residual-base diagnostic tests have been carried out for each estimation window. These include the White test for nonlinearity (Lee et al., 1993), the Ljung-Box portmanteau test to check the independence of the residuals, the Augmented Dickey-Fuller and the Phillips-Perron test to examine the stationarity of the residuals via unit-root testing. Anomalies usually mean that the model performance is seriously affected or the model does not converge. In the majority of cases, the test results confirmed that the adjustments made removed the anomalies tested. Computation of the model prediction either via random walk model or regARIMA model per segment was skipped whenever an anomaly was detected. In this case, the calculation of mean forecast errors (MAE, MSFE) and mean information criteria scores (AIC, BIC) for a component time series was conducted without the anomalous segment. This applied to less than 2% of all estimation segments examined. In all other cases the fitted model was used for one-step ahead forecasting. For the next estimation window, auto.arima() was again harnessed to identify the ARIMA model unitl the end of the component time series was reached.
3.4 Randomized Data
What could have been achieved by just guessing randomly? Computer-generated random “guesses” provide reference scores that help to answer this question. They are essential in work on modeling as they allow modelers and the scientific community to put the results of the model tested (alternative model) into perspective. Reference scores can be derived from some sort of random model (null model), or they are based on randomized data. The random walk model introduced above is an example of the first strategy. In addition, reference scores have been generated by randomizing data. In this work, both strategies for generating reference scores have been deployed as this increases the chances of identifying false positives.
Usage of randomized data as a validation strategy comes in many forms. Examples in point are studies that create randomized data to facilitate direct comparison with original data (e.g., Sato and Takayasu, 2013). Other approaches proceed by shuffling the original data times thereby approximating the null distributions to develop significance tests in the sense of Theiler et al. (1992). The work in this paper follows the first type of approach in that a random sample of the original data has been used to establish a noise covariate Rand as a reference score that is instrumental in rejecting false positives (Flack and Chang, 1987). The covariate Rand has been generated by sampling with replacement the values (Tweet counts) of all remaining 17 “true” covariates considered (e.g., bank, risk). As a consequence, the final calculation of error metrics (MAE, MSFE) and metrics of information criteria (AIC, BIC) have been calculated for = 17 + 1 predictors or covariates.
4 Data Selection and Feature Extraction
One of the major challenges when analyzing data from the Internet, e.g., from Twitter, consists of selecting appropriate data and then extracting features333In this paper, the terms concepts, features and covariates have related, though distinguishable meanings. Concept is a term used to refer to -grams monitored on Twitter, e.g., “loose spending”. More precisely, the number of Tweets mentioning one or more -grams is recorded and analyzed. The term feature is used in situations when concepts are involved in feature extraction. Covariate is the term taken to refer to concepts as seen from the viewpoint of a regARIMA analysis. correlated with or predictive of the forecast variable, e.g., the EUR/USD exchange rate. In the context of the efficient market hypothesis this challenge has been referred to as the “messy problem of deciding what are useful information” (Fama, 1991, p. 1575). In this study, data selection and feature extraction followed a grow-and-shrink approach of 5 steps (see Table 1). Steps 1 – 2 make use of heuristic approaches to generate a list of 594 concepts that are plausible candidates, i.e., a long list, for predictors of the forecast variable (“grow”). Step 3 refers to data collection, i.e., the collection of frequencies of Tweets that mention concepts identified in the preceding step. Step 4 involves data preprocessing, feature selection and extraction which leads to a short list of 17 concepts. Step 5 identifies concepts which when used as covariates in regARIMA models outperform the prediction of the random walk model. Next is a more detailed description of each of these steps.
|Step||Concept Elicitation and Feature Extraction||#|
|1||Generating a Seed List of Concepts||65|
|2||Extension of the Seed List by Human Analysts||594|
|3||Collection of Tweet Counts of concepts related to the Euro Crisis|
|5||Identifying concepts with repeatedly low forecast errors||2|
Step 1: Generating a Seed Set of Concepts
The EUR/USD exchange rate is influenced by a large number of determinants most of which are unstable or can be identified only post hoc. Which of these is associated with the EUR/USD exchange rate or even predicts it? Clearly, there is a vast number of public debates that could possibly influence the EUR/USD exchange rate. A plausible candidate of a public discussion that could have this potential is the global discussion on the Euro crisis. At the time of writing this article, the Euro crisis can be described as a narrative or a system of different and competing narratives444The British, French, German or Greek, etc. view of the Euro crisis. with far-reaching effects on foreign exchange markets including the EUR/USD exchange rate (Stracca, 2013). In this work, collecting Tweets related to the Euro crisis is based on the following assumptions: The intensity of the public debates on the Euro crisis is associated with or even predictive of the EUR/USD exchange rate. Using a keyword-based approach, narratives about the Euro crisis can be described by recurring concepts which are used on Twitter and on other media to spread information about the Euro crisis. The language of the Euro crisis narratives includes concepts like, e.g., debts, protests, Germany, Greece. Clearly, any concept of a given language may become part of a Euro crisis narrative, and the language of the Euro crisis is bound to change to some degree. The Euro crisis can spread or change its symptoms, new financial or political threats may loom, sooner or later different politicians enter the political arena while others drop out of the public limelight, new financial instruments may be introduced or new linguistic ways of describing phenomena may emerge. But there are also concepts of the Euro crisis that seem to have a longer life-time, e.g., names of countries. It is assumed that at a given point in time, a set of concepts of the Euro crisis with a high typicality can be defined. Concepts of the Euro crisis narrative(s) need to be elicited and assembled to a seed list of concepts hypothesized to correlated with or predictive of the scalar forecast variable EUR/USD exchange rate.555In this study, only English concepts have been considered.
Led by the aforementioned assumptions, a heuristic selection process has been used to assemble an initial seed list of candidate concepts. The discussion of these concepts on Twitter is expected to be predictive of the EUR/USD exchange rate. The candidate list of concepts should be large because all subsequent selection steps will lead to a reduction of this list. Errors of commission, i.e., including wrong concepts, can be fixed in subsequent steps of features selection, but errors of omission cannot. To compile a seed list of concepts with a predictive potential for the EUR/USD exchange rate (step 1 of Table 1) theories from narrative science were considered. The work of Vladimir Propp (1895 – 1970) was found to be useful in achieving this purpose. Propp analyzed the structure of more than 100 Russian folktales and identified a set of 31 recurrent components which Propp called functions or narratemes performed by 7 dramatis personae (Propp, 1968). According to Propp, the characters can be either fused or spread across different persons. There can be fewer functions or narratemes but their sequence is always kept. Some of the narratemes always come in pairs, e.g., the interdiction and its violation. In addition to the narratemes, Propp suggests that there are 8 recurrent characters like the villain, the dispatcher, the magical helper and others. These structural elements have repeatedly been used to analyze narratives in many fields as diverse as fairy tales, religious texts, political discourse (Pierce, 2008) and in computer linguistics (Bod et al., 2012).
Table 3 shows in prose how Propp’s narratemes can be mapped to the Euro crisis. This reconstruction indicates that many of the historical events of the Euro crisis seem to fit Propp’s narratemes. The reconstruction spans the narrative of the Euro crisis from absentation (To join the common currency many European states give up their national currencies) over interdiction (The treaty of Maastricht stipulates that only member states with budget deficits of up to 3% of the GDP are entitled to join the Euro) and violation of interdiction (Many member states of the EU have budget deficits higher than the agreed-upon 3%) to guidance (Guidance and recipes how to overcome the crisis are offered by various parties). This reconstruction did not deploy all 31 narratemes of Propp’s methodology. Firstly, at the time of conducting this study the Euro crisis is still ongoing. Secondly, in a narrative analysis according to Propp not all narratemes have to the used. The narrative of the Euro crisis may be told differently in different countries. For instance, the role of the villain in the Euro crisis narrative may be filled differently in different countries. The latter aspect is very important as a significant part of the discussion on the Euro crisis is concerned with a competition of different versions of this narrative. Still, each of these competing narrative instances complies with a Propprian analysis. One way to retell the Euro crisis in Propprian terms is to conceive of the common people in Europe as the hero. Table 4 is simply a stripped down version of Table 3. Now, the Euro crisis in prose is reduced to key-words. The key-word based version is not committed to a particular view of any of the Euro crisis narratives. Whenever different instantiations of one role (e.g., hero, villain) were possible each of them was used. The key-word-only reconstruction of the Euro crisis narrative(s) provided a seed list made up of 65 concepts.
Step 2: Extension of the Seed List by Human Analysts
The seed list of 64 concepts generated in step 1 covered essential concepts related to the Euro crisis. Intuitively, however, it seemed to be obvious that a large number of concepts was missing. In December 2011, a concept elicitation study was conducted to extend the seed list. Three male and two female native English-speaking students of computer science of the National College of Ireland in Dublin aged between 19–23 years took part in this study as part of their course requirements. The participants of this study were presented with the seed list of concepts generated in step 1. They were told that the seed list was intended to become a comprehensive concept inventory on the Euro crisis. But since the list was obviously incomplete the task was to provide additional concepts which could be unigrams or -grams. It was emphasized that a broad spectrum of concepts was required that should tap into economic, political, social, emotional or other aspects of the Euro crisis. The participants were asked to work individually. They were free to make use of any Internet resources of their choice, and they had 90 minutes time to complete the task. The lists of concepts obtained from each student were pooled, typos were corrected and duplicates were removed. Different linguistic forms of the same concept, e.g, bank and banking were kept, however. The abbreviation SP for Standard & Poor’s was used by the subjects and hence this short form and not the long version was deployed. The resulting list covered 529 concepts, which together with the 65 concepts obtained in step 1 summed up to a long list of 594 concepts. These concepts were used to elicit Tweet counts on each of these concepts per day throughout the study (Table 5 ).
Step 3: Collection of Tweet Counts of Concepts related to the Euro Crisis
Concept usage on social media has repeatedly been shown to be indicative of variables of interest both with regard to individuals (De Choudhury et al., 2013) and groups or communities (Bryden et al., 2013). In this study, concept usage on Twitter has been harnessed as a social marker that reflects cognition and emotion related to the Euro crisis. From January 1, 2012 – September 27, 2013 Tweet counts of concepts used to discuss the Euro crisis were elicited on a daily basis via R (R Core Team, 2013) and an API from the Twitter search engine http://www.topsy.com.666http://www.topsy.com has access to the full Twitter firehose. In this way, 594 automated search requests have been submitted every day. Likewise and for the same time interval the EUR/USD exchange has been sourced in an automated way from http://www.quandl.com/.
It is obvious that simply determining the number of Tweets that make use of a particular concept will generate misleading results. For instance, if the EUR/USD exchange rate is to be examined then determining the number of Tweets that mention the concept Euro would return all Tweets that provide some price information in Euro. This type of retrieval would hardly separate Tweets that discuss the Euro currency from Tweets with everyday chitchat about prices. This is the reason why the 594 concepts assembled in steps 1 and 2 were used as selectors to target the search for Tweets on the Euro crisis. Hence, each of the 594 daily requests to the Twitter search engine Topsy followed the pattern
Euro and Crisis and concept.
The data returned reflects the count of Tweets or Retweets that mention one or more of the concepts in connection with the Euro crisis. The data does not include information on the regional origin of Tweets or the network structure among Tweeterers.777Clearly, this procedure cannot completely rule out false positives or false negatives. It is expected, however, that the large number of commercial offerings on Twitter that express prices in US Dollar or Euro should be filtered out. All data elicited and used in this study, i.e., the EUR/USD exchange rate and the frequency of Tweets on the Euro crisis are time series. The times-series are even-spaced reflecting the rhythm of daily data collection.
Using a more formal lingo, the univariate time series on the EUR/USD exchange rate together with the 594 univariate time series on the frequency of Tweets that mention one or more of the concepts on Twitter form a multivariate time series from with univariate component time series. With a forecast horizon of , the EUR/USD exchange rate was the scalar target variable denoted by or simply . Its forecast at time is denoted by (Giacomini and Rossi, 2013). The univariate component time series were the explanatory variables. The predictive power of each of them was assessed via a forecasting scheme with a rolling estimation window of fixed size . Following this forecasting scheme, the total length of the multivariate time series to be analyzed was successively split into in-sample portions of fixed length and out-of-sample portions of length . For each component time series the number of possible predictions can then be calculated as follows
Step 4: Data Preprocessing
It is obvious that not all concepts elicited in steps 1–3 have the potential to predict the EUR/USD exchange rate. To identify those concepts negative selection (rejecting inappropriate concepts) was combined with a positive selection (accepting appropriate concepts). This step was concerned with the negative selection. A number of those concepts which initially appeared to be plausible candidates for predictors of the forecast variable turned out to be rarely used on Twitter so that their Tweet counts were low or even zero.
These concepts can be easily detected as their variance is zero or near-zero.
Since concepts with low variance cannot be good correlates or predictors of the EUR/USD exchange rate they were discarded.888Removal of zero- and near zero-variance predictors was achieved via the nearZeroVar() function of the R caret package (Kuhn, 2013).
As the Breusch-Pagan test revealed the presence of heteroscedasticity in the time series of the EUR/USD exchange rate it was log-transformed prior to the analysis.
Step 5: Identifying Concepts with repeatedly low Forecast Errors
All concepts assembled in steps 1–2 were assessed with respect to their power to forecast the EUR/USD exchange rate. Good predictors ought to have a low forecast error. How low should the forecast error be to call a predictor good? To answer this question, forecast errors of the regARIMA model under study for each of the 594 concepts have been compared with the forecast errors of a baseline model. In time series analysis, this baseline model is usually the random walk model. The forecasts have been deployed to set up a horse race between the hypothesis of sufficient predictive information and the efficient market hypothessi. While the former maintains that regARIMA models predict the EUR/USD exchange rate best, the latter claims the same of the random walk model. The identification of concepts with low forecast errors will be outlined in more detail in the next section.
From January 1, 2012 – September 27, 2013, i.e., for 636 days999At this time, topsy.com changed the terms of trade and introduced some technical changes to the API., Tweet counts on the Euro Crisis were garnered from topsy.com, a reseller of Twitter data. The EUR/USD exchange rate was recorded for the same time span. The Tweet counts reflect the number of Tweets mentioning the 17 selected concepts on the Euro crisis. Accordingly, this step generated 17 time series. After preprocessing these time series the initial long list of 594 concepts was reduced to the following short list of 17 concepts and their associated time series.
bank, banking, banks, debt, ECB, economy, Euro, Germany, Greece, Greek, Hollande, Italian, Italy, Moodys, risk, SP, Spain
A descriptive account of the time series of Tweet counts along with the EUR/USD exchange rate is given by Figure 1. The three panels of this figure use a common time axis to align the EUR/USD exchange rate and the number of Tweets that made use of one or more of the shortlisted 17 concepts. The top panel of this figure presents the EUR/USD exchange rate. The remaining two panels show the Tweet counts collected during the same time span. All panels of this figure use a common time axis to align the EUR/USD exchange rate and the number of Tweets that mentioned at least one of the shortlisted 17 concepts. The middle panel presents Tweet counts of all 17 concepts, the lower panel features as well Tweet counts but leaves out the Tweet counts for Euro. Not surprisingly, the concept Euro was used most often (blue series in the middle panel of Figure 1). As this series eclipses other series it has been left out in the lower panel. Prima facie, both panels on Tweet counts suggest that a downward (“bearish”) trend of the EUR/USD is associated with an increased discussion of the Euro crisis as evidenced by peaks in the usage of the concepts Euro and Greece around the middle of June 2012 and middle of December 2012. This type of eyeball econometrics is limited. For instance, just focusing on some salient feature means that the overall data base is bound to be thin. The following sections will use a different approach to data analysis and prediction.
5.2 In-Sample Model Selection
The research question of this paper asks whether and under what conditions the hypothesis of sufficient information that operates on the basis of data gleaned from Twitter or the efficient market hypothesis facilitates better predictions of the EUR/USD exchange rate. This question can be rephrased as a selection between these two classes of models and also as a selection among uni-covariate regARIMA models that outforecast the random walk model. n work on time series model selection, information theoretic approaches play a dominating role, and it is this class of methods that will be considered here. Methodological debates in time series model selection often revolve around the question whether Akaike’s information criterion (AIC, Akaike, 1974), the Bayesian information criterion (BIC, Schwarz, 1978) or other information criteria are most appropriate for a given model selection problem. It is well documented that AIC has the tendency to over-fit and to under-penalize complex models while BIC makes use of a stricter penalty regime for additional parameters. Hence, in comparison to AIC it is BIC that tends to prefer simpler and more parsimonious models. In this study, the models examined are nested which involves, necessarily, issues of model complexity. Comparing and contrasting both AIC and BIC scores is useful when deciding whether the additional complexity of the regARIMA models leads to different results of the model comparison or whether despite this, AIC and BIC scores agree.
Against the backdrop of the discussion above, both in-sample AIC and BIC scores have been calculated for all models considered. The scores provided the basis for a ranking of all models or covariates. Table 2 presents the scores of the averaged scaled information criteria used for the random walk model and the regARIMA models. All scores have been rescaled which is denoted by AIC and BIC. This means scores of BIC and AIC are expressed relative to their minimum value found in the overall set of random walk model and the regARIMA for all predictors and estimation windows.
|Predictors||random walk||regARIMA||random walk||regARIMA|
Differences between random walk model and regARIMA models. The random walk model was clearly outperformed by all regARIMA models. In other words, when integrated into uni-covariated regARIMA analyses, each concept (predictor, covariate) listed in Table 2 or rather its associated Tweet count time series facilitated better AIC and BIC scores than the random work model. It is unlikely that this result is attributable to the higher model complexity of regARIMA models as it was consistently found for AIC and for BIC.
Comparison of AIC and BIC. In line with the discussion on information criteria, AIC scores were lower than BIC scores indicating a better model fit which, however, may be due to the smaller penalty that AIC uses to castigate model complexity. For the random walk model, AIC and BIC provided uniform values as covariates are by definition ignored in this class of model. With respect to the regARIMA models, separate rankings for AIC and for BIC scores could be established the positions of which indicated in brackets in Table 2. While AIC and BIC agree on the 5 best explaining concepts (debt, Moody’s, ECB, Euro, SP), the ranking positions of AIC and BIC disagreed (or agreed again) with regard to the remaining ranking positions.
Comparison among Covariates. The covariate debt showed a better performance than all other covariates. This performance gain was consistently expressed in terms of AIC and BIC and at the same time this performance gain was consistent across all estimation windows considered. The noise covariate Rand came second which means that all predictors but debt performed below chance level. This is a remarkable finding. It indicates that using error scores of the random walk model as baseline values appears to be a too liberal approach which may contaminate the results with false positives.
5.3 Out-of-Sample Forecasting
Forecasting and evaluating the accuracy of the forecasts are essential parts of predictive modeling. The accuracy is higher the smaller the difference between the realization of the forecast variable at time and its forecast . This intuition is typically expressed by a loss function . In the majority of studies that involve time series forecasting the quadratic loss function (mean square forecasting error, MSFE) or the absolute loss function (mean absolute error, MAE) is harnessed. Accordingly, the accuracy of a forecast can be described as the expected loss . The latter is usually operationalized by averaged pseudo out-of-sample forecasting error scores using MAE, MSFE or other forecast error metrics (e.g., Giacomini and Rossi, 2013). Both MSFE and MAE are then used to express the forecast accuracy of models of interest, e.g., regARIMA or random walk model. In time series analysis it is well known that large estimation windows are often necessary to average out measurement errors. Clearly, this is only true if there are no structural breaks the risk of which increases with the length of estimation windows considered. In this study, medium to large estimation windows and thus small sample split values have been used. This choice was motivated by the assumption that time series of Tweet counts can be expected to be a noisy type of data. Starting with which corresponds to a sample split of different forecasting runs were conducted each time increasing the size of the estimation window by ten. With 636 days of data recording this led to a set-up with resulting in s=32 sample splits and thus 32 forecasting runs with . Clearly, this is an explorative approach. It can be justified, though, by the novelty of time series analysis on Tweet counts. Moreover, results of all 32 estimation window sizes examined are reported here so that the robustness of the findings can be assessed.
Each of the following two sections report on a different rationale to examine the predictability of covariates. Section 5.3.1 reports on a scoring approach to predictability, while section 5.3.2 informs about a classificatory approach to examine the predictive power of the 17+1 covariates under scrutiny.
5.3.1 Predictability of Covariates by Averaging across Sample Splits
What is the average predictive strength or predictability of each the 17+1 covariates? How do the predictability scores of the 17+1 covariates compare to each other, and what is the relationship between the frequency and the predictability of the 17+1 covariates considered? The analysis described in this section makes use of a scoring approach to answer these questions.
The results of this analysis are shown in Figure 2. It presents the frequency of Tweet counts (top), the predictability of each covariate averaged across sample splits (middle) and the frequency of covariates (concepts) relative to the predictability of covariates. The top panel of Figure 2 presents the Tweet counts of the 17 concepts considered as seen from a cross-sectional angle. Thus, it provides an alternative perspective on the longitudinal arrangement of the same data in Figure 1. The noise variable Rand does not enter this panel as this variable is synthetic and has not been recorded. For each of the 17+1 covariates considered the middle panel of Figure 2 expresses the predictive power relative to the predictability of the random walk model averaged across sample splits. Here, Rand has been included. To ease readability is used to refer to the predictability scores averaged across sample splits.101010More formally, let MSFE denote the predictability of a covariate of a regARIMA model relative to the corresponding random walk model for a particular sample split . Then, the MSFE score of a covariate averaged across all sample splits under consideration can be expressed as (8) The middle panel shows that among the 17+1 covariates analyzed the covariates risk, Italy and SP have the highest predictability scores when used in regARIMA analyses. The finding that Rand occupies the 4th predictability rank position means that the predictive power of all remaining covariates is negligible.
The bottom panel of Figure 2 integrates the information of the two panels above by presenting the frequency of covariates (concepts) relative to the predictability of covariates. Comparing frequency and predictability ranks of Tweet counts helps to answer the question whether and to what degree concepts were “overtalked” or “undertalked” on Twitter within the context of the Euro crisis and using the methodological setup described. This has been facilitated by firstly ranking all covariates in terms of their frequency and then in terms of their predictability. Secondly, for each concept the
|frequency rank position - predictability rank position||(9)|
has been calculated. For instance, the concept risk has the predictability rank position 1 and the frequency rank position 16. Therefore, for risk the predictability relative to its frequency is -15. In this sense, risk was found to be “undertalked” on Twitter. The same can be said about all other covariates that beat both the random walk model and the noise covariate Rand (i.e., risk, Italy and SP). Each of them is “undertalked” on Twitter. By contrast, for the concept or covariate Euro the frequency relative to the predictability is 17-1=16. The positive value of this score expresses that Euro is “overtalked” on Twitter. This means that the frequency of using Euro on Twitter is disproportionally high relative to its very low predictive power to actually forecast the EUR/USD exchange rate.
5.3.2 Predictability of Covariates that outforecast the RWM and the Noise Variable
While the study of predictability delineated in the previous section followed a scoring approach, the analysis described in this section pursued a classificatory approach. In a nutshell, the rationale of this analysis is (i) to classify each covariate as to whether or not it has outforecasted both the random walk model and the noise covariate Rand and then (ii) to count the number of times each of the 17+1 covariates and the estimation window sizes facilitated this performance. This calculation provides the basis for a mode vote that indicates which covariate and which estimation window size has secured the best forecasting results. Figure 3 illustrates the results of this analysis. The results are broken down for covariates (Figure 3a) and estimation window sizes (Figure 3b). In turn, each of the two dot plots of this figure expresses the forecast error via MAE and MSFE. The x-axis in Figure 3a informs how often out of 32 runs each covariate outforecasted the random walk model and the noise covariate. Accordingly, the x-axis in Figure 3b indicates for each of the estimation window sizes studied how many out of 17 predictors outforecasted both the random walk model and the noise covariate Rand.
Differences between MAE and MSFE. Even a casual look at the results of Figure 3 reveals differences between the results expressed by MAE on the one hand and MSFE on the other. Often, when using MSFE as a metric for forecast errors a shorter estimation window is required to identify predictors that outperform the random walk model. One reason for the differences is that the MSFE is more influenced by large error scores than the MAE. This difference becomes more expressed when the estimation window is small as this typically leads to higher error scores.
Mode vote by covariates. The winners of the mode vote by covariates were the predictors risk and SP if the focus is on MSFE (Figure 3a). Each of the regARIMA models that uses the Tweet count of risk or SP as a covariate has outforecasted the random walk model in 19 out the 32 runs per estimation window calculated which amounts to a hit rate of 59%. The covariate Euro never outforecasted the random walk model and the noise covariate Rand.
Mode vote by estimation window size. The findings reveal that when analyzing time series of Tweet counts large estimation windows were indeed required to identify predictors that outperform the random walk model. There is, however, not a simple linear relationship between the size of the estimation window and forecast accuracy. The best result in terms of MSFE was achieved with an estimation window size of 530 days. Since the total length of the time series studied was 636 this amounts to a sample split of .
5.4 Residual Diagnostics
Time series diagnostics has been conducted on the fly as an integral part of each the forecasting runs conducted for each estimation segment. According to the results of diagnostic tests conducted, each segment has been differenced, detrended, seasonally adjusted or in case of serious anomalies discarded (cf. section 3.3). This section informs about additional diagnostic sanity checks. The focus is on out-of-sample final model residuals of the fitted regARIMA models that incorporated the covariates SP and risk, respectively, which were examined for all sample splits or time series runs. The analysis of the residuals included the calculation of the first four standardized central moments of the residuals to assess their distributional properties, Q-Q plots to facilitate assessment of normality of residuals, ACF and PACF plots that illustrate the correlative properties of the residuals and residual tests for autocorrelation of the residuals (Box-Ljung portmanteau test), for trend stationarity (KPSS test) and for nonlinearity (White Neural Network Test).
Residual Moments. Tables 6–7 present the first four standardized central moments calculated for =32 estimation window sizes or sample splits. Both for the covariates SP and risk the values obtained for the first moment were small. This is true for the standardized values reported but applies also to the absolute values of the first moment. Positive and negative values of the first moment are reasonably well distributed across the sample splits studied. There is some variation of the first moment across the different sample splits which applies to SP and to risk. Results for the second moment exhibit only small variation, and thus there is no obvious indication for heteroscedasticity of the residuals. With decreasing sample-split ratio the values obtained for skewness turn increasingly more negative. The consistency of this trend with decreasing sample split both for SP and risk seems to suggests that regARIMA model forecasts have a tendency to consistently overpredict whenever the values for are large and values for are small. However, this interpretation is not supported by the results obtained for the first moment. Any constant trend to generate forecasts systematically higher than the actual values of the EUR/USD exchange rate would have led to consistently negative residual average scores. But this is not the case. Still, it seems that occasionally there is some emphasized overprediction. This is reflected by the negative skewness scores which is usually averaged out. Finally, both for SP and for risk the values of the fourth moment are on moderate levels of magnitude. They exhibit a gentle increase of the magnitude of kurtosis with decreasing sample split ratio (larger ) indicating a tendency towards fat tails of the residual distribution.
Q-Q Plots. Figure 4 shows the residual Quantile-Quantile plot of the covariates SP (top row) and risk (bottom row) across three different sample splits (left, middle, right column). The Q-Q plots of the left column of Figure 4 (=310, =322) illustrate the situation when a comparatively small estimation window is used to forecast a large number of data points. Even though the data points at each end of the straight line depart to some degree from the line, most of the data is within the 95% confidence bands. Overall, both Q-Q plots of the left column indicate that the residuals can be reasonably accounted for by a normality distribution. The Q-Q plots of the middle column of Figure 4 (=460, =172) exhibit some increased scatter around the reference line. But on the whole, the two Q-Q plots of the middle column are similar to the ones of the left column which have been generated on the basis of a much larger data set. The Q-Q plots of the right column of Figure 4 (=620, =12) have reference lines that cut the x-axis higher than their counterparts in the four remaining Q-Q plots. This reflects the somewhat higher average score of this small group of residuals. Most of the residuals are on the reference line or fall within the 95% confidence bands. In brief, the distributional pattern of residuals does not change drastically across the 3 sample splits studied. Deviations from the straight line are small so that the Q-Q plots are reasonably in line with the normal distribution. This interpretation is consistent with the small scores of the standardized central third and fourth moments obtained for the corresponding sample splits.
Residual ACF and Residual PACF Plots. At this stage of the analysis, the goal of interpreting the ACF and PACF plots was not not to identify regARIMA model parameters but to examine whether the selected residuals show signs of anomaly. Figure 5 presents the sample autocorrelation plots of the residuals that result from the regARIMA analysis with SP and risk as covariates. Figure 6 shows the corresponding partial autocorrelation plots. Note that the ACF plots do not include the redundant spike at lag 0.
For both SP and risk the ACF and the PAC plots of the left column (=310, =322), i.e., at relatively high sample splits, reveal a non-zero spike at lag 1. Apparently, the residuals analyzed are not white noise which suggests that the model parameters chosen in this bracket of and values do not fit the data well. As the sample split decreases the residuals become well behaved. This is illustrated by the middle and the right columns of Figures 5 and 6. Here, the non-zero spikes disappear or reach just above the confidence bands and the plots gives no indication of stationarity of the residuals like, e.g., slow decay of autocorrelations.
Residual Tests. Tables 8 – 9 report on the results of the univariate residual tests conducted. These tests include the Ljung-Box Test for autocorrelation (LB), the KPSS test for trend stationarity (KP) and the White neural network test for nonlinearity (WH). For each residual test, null hypotheses have been tested. To address the problem of multiple testing the raw -values have been adjusted via the Bonferroni procedure. Both the raw and the Bonferroni adjusted -values have been made accessible in tables 8 – 9.
Ljung-Box Test. The Ljung-Box test examines the null of independence in the time series studied. Applied to the analysis of residuals, a small -value indicates that the residuals are not independent and possibly correlated. A number of observed -values were below 5%. This was in particular true in a situation of relatively high sample splits. After applying the Bonferroni procedure the overwhelming majority of the adjusted -values was higher than 5%. Hence, for most of the sample splits the null of independence could not be rejected. Using alternative controlling procedure, e.g., Holm’s procedure, changed this outcome only marginally.
KPSS Test. The KPSS test has been taken to examine the null that the time series of residuals for each value of R are stationary. A high -value is a lack of evidence that the residuals are not stationary. The observed (uncorrected) -values were never lower than 5% so that the null of level stationarity can not be rejected.
White Test. The White neural network test for neglected nonlinearity (Lee et al., 1993) has been harnessed to test the null of linearity in mean. For SP and for risk and across all sample sets the results gave no indication to reject the null.
In short, the examinations of the residuals diagnostics characterized the conditions of good regARIMA model performance relative to the random walk model. While most of the residual tests conducted indicate that the requirements to ensure model adequacy are fulfilled, the ACF and PACF plots suggest that caution should be taken with short estimation windows.
“Does anything forecast exchange rates” (Rossi, 2013)? According to the efficient market hypothesis (Fama, 1970, 1991) the answer is a resounding “No”. For some researchers this hypothesis has empirical support that is “virtually unparalleled in economics” (Geweke and Feige, 1979, p. 334). Early on, however, and even more so in recent times other scientists have critiqued the efficient market hypothesis from different vantage points (e.g., Grossman and Stiglitz, 1980). For the purpose of this study it may be useful to remember that the efficient market hypothesis has been developed in a more traditional ecosystem of markets and media. Compared to the 2nd half of the last century, markets and media today rely on a much more sophisticated technical infrastructure. The change of the technological basis of market and media means faster information spreading and enormous speed gains in trading. It is an open question which market mechanisms have not only speeded up but changed inherently. The concept of information is a candidate for the latter. Even though the efficient market hypothesis is defined relative to an information set, this aspect has usually been underspecified. Moreover, its possible dependence on the technology at a given historical point in time remains largely undiscussed. Today‘s ecosystem of markets and media adds a question mark to the simple dichotomous distinction between “publicly available information” on the one hand and “insider information” on the other which seems to be essential to the efficient market hypothesis. For instance, how do insights gained from big data analyses fit into this conceptual framework? The data that feed into such analyses may be publicly available and the results may simulate or even go beyond the information of insiders.
It is against this backdrop of considerations around information and markets that this study has been conducted. Information gleaned from Twitter is harnessed here as publicly available information. Even though the representatives of efficient market hypothesis of the sixties and seventies may not have imagined this type of information, it qualifies as publicly available information because today it can be accessed by all market participants. Is Twitter efficient in that it provides marginally predictive information not or not yet reflected by the EUR/USD exchange rate? Or is the market efficient in the sense that all market-relevant information are already priced-in the EUR/USD exchange rate? If the FX exchange market is a semi-strong informationally efficient market, then information secured from Twitter should not have a marginal predictive content relative to the random walk model. The main result of this study challenges the efficient market hypothesis as publicly available information gleaned from Twitter can be shown to predict repeatedly the EUR/USD exchange rate above chance level. The boundary conditions under which this result has been achieved merit particular consideration.
Selection of Tweets. As described above, a 2-step selection of tweets proved successful in identifying covariates that predicted the EUR/USD exchange rate above chance level. The covariates risk and SP have been found to out-forecast repeatedly the random walk model and the noise covariate. The semantic content of these words suggests that talking of risk on Twitter can be harnessed to forecast the EUR/USD exchange rate. Apart from the meaning of risk and SP the result is encouraging in the sense that these concepts are not short-lived like, e.g., names of politicians many of whom play role in the public discussion for only a relatively short length of time.
Data Quality. Data secured from Twitter is not standard in econometric time series analysis. With regard to the covariates risk and SP the data quality of this data was on a level required for meaningful time series analysis as evidenced by most of the diagnostic tests carried out. The residual tests conducted for these covariates gave no indication that the residuals are not independent, that the data is not level stationarity or that the data is not linear. The corresponding Q-Q plots indicate that the residuals are reasonably in line with the normal distribution.
Parameter Settings. The ratio between the number of possible predictions and the size of the estimation window (sample split) turned out to be the most influential parameter in this study. For predictive effects to be detected large estimation window were required which was instrumental in bringing down the sampling error given the noisy data of Tweet counts.
Safeguarding against Chance Results. The ubiquitous modeling practice of using just the random walk model turned out to be insufficient to guard against chance findings. To achieve this purpose it became evident that both a benchmark model, i.e., the random walk model and benchmark data, i.e., the noise covariate Rand had to be used. With regard to the assessment of model fit, all 17 covariates outperformed the random walk model but only the covariate debt outperformed also the noise covariate. When it comes to out-of-sample forecasting, only the regARIMA models that outperformed the corresponding random walk model and the covariate Rand can be considered as evidence of a predictive effect above chance-level. This turned out to be true for SP and risk in the majority of cases (sample splits) considered.
In summary, this study has provided supporting evidence for the hypothesis of sufficient predictive information. Under the conditions discussed above the EUR/USD exchange rate proved forecastable above chance level using Tweet counts. The study confirms and extends the findings of Papaioannou et al. (2013) who provide evidence that information gleaned from microblogging platforms such as Twitter can enhance forecasting efficiency of intraday exchange rates. The fact that the work introduced in this paper and the study of Papaioannou et al. (2013) agree considerably in their findings despite using different methodologies attests to the robustness of the results. Even though a somewhat unlikely couple at first blush, the analysis has revealed that efficient market hypothesis and the analysis of public discussions on Twitter, illuminate each other. Seen from the viewpoint of efficient market hypothesis, the analysis of data secured from Twitter helps to identify information which may predict the FX market. In doing so, the methodology introduced in this article, contributes not “to sidestep the messy problem of deciding what are useful information” (Fama, 1991, p. 1575). Seen from the vantage point of the notoriously theory-poor but data-rich area of Twitter analysis, the study has shown how efficient market hypothesis can be harnessed to gain insights from data on social media. Clearly, more research is required to further examine the relationship between public communication on social media and exchange rates. There are good chances that this will pave the way to apply other econometric concepts, e.g., Granger causality analysis, to data harvested from social media, thereby contributing to a better understanding of social and economic phenomena.
- Akaike (1974) Akaike, H. (1974). A new look at the statistical model identification. Automatic Control, IEEE Transactions on, 19(6):716–723.
- Arlot and Celisse (2010) Arlot, S. and Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40–79.
- Asur and Huberman (2010) Asur, S. and Huberman, B. A. (2010). Predicting the future with social media. In Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM, volume 1, pages 492–499. IEEE.
- Bod et al. (2012) Bod, R., Fisseni, B., Kurji, A., and Löwe, B. (2012). Objectivity and reproducibility of Proppian narrative annotations. May 2012 Workshop on Computational Models of Narrative 2012 at the International Conference on Language Resources and Evaluation (LREC) in Istanbul.
- Bollen et al. (2010) Bollen, J., Mao, H., and Zeng, X. (2010). Twitter mood predicts the stock market. Journal of Computational Science, 2:1–8.
- Box and Jenkins (1976) Box, G. E. P. and Jenkins, G. M. (1976). Time series analysis: Forecasting and control. Holden-Dag, CA, San Francisco.
- Bracke et al. (2008) Bracke, T., Skala, M., and Thimann, C. (2008). Thirty years of exchange rate communication: How, when and why does the G7 speak?
- Bruegger and Knorr-Cetina (2002) Bruegger, U. and Knorr-Cetina, K. (2002). Global microstructures: The virtual societies of financial markets. American Journal of Sociology, 107(4):905–950.
- Bryden et al. (2013) Bryden, J., Funk, S., and Jansen, V. A. A. (2013). Word usage mirrors community structure in an online social network. EPJ Data Science, 2(3):1–9.
- Chen (2011) Chen, J.-H. (2011). Variance ratio tests of random walk hypothesis of the Euro exchange rate. International Business & Economics Research Journal (IBER), 7(12).
- Davenport and Beck (2001) Davenport, T. H. and Beck, J. C. (2001). The attention economy: Understanding the new currency of business. Harvard Business Press, Boston, MA.
- De Choudhury et al. (2013) De Choudhury, M., Counts, S., and Horvitz, E. (2013). Major life changes and behavioral markers in social media: case of childbirth. In Proceedings of the 2013 conference on Computer supported cooperative work, pages 1431–1442. ACM.
- Falkinger (2008) Falkinger, J. (2008). Limited attention as a scarce resource in information-rich economies. The Economic Journal, 118(532):1596–1620.
- Fama (1970) Fama, E. F. (1970). Efficient capital markets: a review of theory and empirical work. Journal of Finance, 25:383–417.
- Fama (1991) Fama, E. F. (1991). Efficient capital markets: II. Journal of Finance, 46(5):1575–1617.
- Flack and Chang (1987) Flack, V. F. and Chang, P. C. (1987). Frequency of selecting noise variables in subset regression analysis: A simulation study. The American Statistician, 7(1):84–86.
- Fox (1997) Fox, J. (1997). Applied Regression Analysis Linear Models and Related Methods. Sage, Thousand Oaks, CA, USA.
- Geweke and Feige (1979) Geweke, J. and Feige, E. (1979). Some joint tests of the efficiency of markets for forward foreign exchange. The Review of Economics and Statistics, 61(3):334–341.
- Giacomini and Rossi (2013) Giacomini, R. and Rossi, B. (2013). Forecasting in macroeconomics. In Hashimzade, N. and Thornton, M., editors, Handbook of Research Methods and Applications on Empirical Macroeconomics, chapter 7, pages 618–658. Edward Elgar Publishing, Cheltenham, UK.
- Goel (2013) Goel, V. (2013). Twitter introduces tool to make collecting and sharing tweets easier. New York Times, November 13.
- Grossman and Stiglitz (1980) Grossman, S. J. and Stiglitz, J. E. (1980). On the impossibility of informationally efficient markets. The American economic review, 70(3):393–408.
- Hong et al. (2007) Hong, Y., Li, H., and Zhao, F. (2007). Can the random walk model be beaten in out-of-sample density forecasts? evidence from intraday foreign exchange rates. Journal of Econometrics, 141(2):736–776.
- Hyndman (2010) Hyndman, R. J. (2010). Why every statistician should know about cross-validation. http://robjhyndman.com/researchtips/crossvalidation/.
- Hyndman and Athanasopoulos (2012) Hyndman, R. J. and Athanasopoulos, G. (2012). Forecasting: Principles and practice. an online textbook. http://otexts.com/fpp/. Accessed on/. Accessed on Sept. 14 2012. Accessed: 2013-10-11.
- Hyndman and Khandakar (2008) Hyndman, R. J. and Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3).
- Kilian and Taylor (2003) Kilian, L. and Taylor, M. P. (2003). Why is it so difficult to beat the random walk forecast of exchange rates? Journal of International Economics, 60(1):85–107.
- King et al. (2011) King, M. R., Osler, L., and Rime, D. (2011). Foreign exchange market structure, players and evolution. Technical report, Norges Bank.
- Kuhn (2013) Kuhn, M. (2013). caret: Classification and Regression Training. R package version 5.15-61.
- Lee and Sodoikhuu (2012) Lee, H.-Y. and Sodoikhuu, K. (2012). Efficiency tests in foreign exchange market. International Journal of Economics and Financial Issues, 2:216–224.
- Lee et al. (1993) Lee, T.-H., White, H., and Granger, C. W. (1993). Testing for neglected nonlinearity in time series models: A comparison of neural network methods and alternative tests. Journal of Econometrics, 56(3):269–290.
- Lică and Tută (2011) Lică, L. and Tută, M. (2011). Using data from social media for making predictions about product success and improvement of existing models. International Journal of Research and Reviews in Applied Sciences, 8:301–306.
- Lisi and Medio (1997) Lisi, F. and Medio, A. (1997). Is a random walk the best exchange rate predictor? International Journal of Forecasting, 13(2):255–267.
- MacDonald and Taylor (1994) MacDonald, R. and Taylor, M. P. (1994). The monetary model of the exchange rate: Long-run relationships, short-run dynamics and how to beat a random walk. Journal of International Money and Finance, 13(3):276–290.
- Meese and Rogoff (1983) Meese, R. A. and Rogoff, K. S. (1983). Empirical exchange rate models of the seventies: Do they fit out of sample? Journal of International Economics, pages 3–24.
- Newbold et al. (1998) Newbold, P., Rayner, T., Kellar, N., and Ennew, C. (1998). Is the Dollar/ECU exchange rate a random walk? Applied Financial Economics, 8:553–558.
- Papaioannou et al. (2013) Papaioannou, P., Russo, L., Papaioannou, G., and Siettos, C. I. (2013). Can social microblogging be used to forecast intraday exchange rates? NETNOMICS: Economic Research and Electronic Networking, 14(1-2):47–68.
- Pierce (2008) Pierce, R. (2008). Research Methods in Politics. Sage, London.
- Propp (1968) Propp, V. (1968). Morphology of the folktale. University of Texas Press, Austin, TX, 2nd edition.
- R Core Team (2013) R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
- Rossi (2013) Rossi, B. (2013). Exchange rate predictability. Journal of Economic Literature, forthcoming.
- Sato and Takayasu (2013) Sato, A.-H. and Takayasu, H. (2013). Segmentation procedure based on fisher’s exact test and its application to foreign exchange rates. arXiv:1309.0602.
- Schwarz (1978) Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 6(2):461–464.
- Signorini et al. (2011) Signorini, A., Segre, A. M., and Polgreen, P. M. (2011). The use of twitter to track levels of disease activity and public concern in the us during the influenza a h1n1 pandemic. PloS one, 6(5):e19467.
- Stracca (2013) Stracca, L. (2013). Our currency, your problem? The global effects of the Euro debt crisis. University of St. Gallen, Switzerland, Finance Research Seminar.
- Theiler et al. (1992) Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., and Doyne Farmer, J. (1992). Testing for nonlinearity in time series: the method of surrogate data. Physica D: Nonlinear Phenomena, 58(1):77–94.
- Thompson (2009) Thompson, P. (2009). Market manipulation? Applying the propaganda model to financial media reporting. Westminster Papers on Communication and Culture, 6(2):73–96.
- Tumasjan et al. (2010) Tumasjan, A., Sprenger, T. O., Sandner, P. G., and Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, pages 178–185.
- Weiss and Kemper (2011) Weiss, J. and Kemper, S. (2011). Analysis of Exchange Rate Communication. GRIN Verlag, Munich.
|1||Absentation||To join the common currency many European states give up their|
|2||Interdiction||The treaty of Maastricht stipulates that only member states with budget|
|deficits of up to 3% of the GDP are entitled to join the Euro.|
|3||Violation of Interdiction||Many member states of the EU have budget deficits higher than|
|the agreed-upon 3%|
|4||Reconnaissance||Financial markets monitor the economic situation of the states of the Euro|
|5||Delivery||Banks receive information about the sorry condition of the economy|
|in some states of the Euro zone by ratings.|
|6||Trickery||Financial markets and their helpers enforce austerity measures|
|which are basically used to finance the banking system.|
|7||Complicity||Large parts of the people of Europe and their governments are deceived|
|by the banks, and some are unwittingly helping them.|
|8||Villainy||The banks and their helpers consolidate their influence. Austerity, lack of|
|growth and unemployment are the consequences.|
|9||Mediation||The bad situation is made public, it is debated in the media and on EU summits.|
|10||Beginning counteraction||People of Europe start to protest, governments begin to introduce first measure|
|against the banks, e.g., transaction tax.|
|11||Departure||Departure of EU member states from the Eurozone is looming which is|
|a threat to some EU member states and a hope of others.|
|12||First function of the donor||Some and some politicians representatives of the EU signal that they understand|
|the deplorable situation of those EU countries that are in deep economic trouble.|
|13||Hero’s reaction||People in Europe don’t react in a uniform way. Some respond with revival|
|of national stereotypes, anger, riots and strikes, others leave their country.|
|14||Receipt of a magical agent||Slowly but steadily rules and regulations are put in place that control the banks.|
|15||Guidance||Guidance and recipes how to overcome the crisis are offered by various parties.|
|1||Absentation||national currency, dignity, loss of independence|
|2||Interdiction||treaty, Maastricht, rules, contract, agreement, BIP, GDP|
|3||Violation of Interdiction||budget, deficit, debt, loose spending, squandering, unaffordable|
|5||Delivery||Moodies, Fitch, ratings, S&P|
|6||Trickery||saving, structural reforms, productivity|
|7||Complicity||bailout, IMF, pro austerity, troika|
|8||Villainy||austerity, interest rates, lack of growth, saving, unemployment, job losses|
|9||Mediation||agreement, summit , talk|
|10||Beginning counteraction||protest, government, transaction tax|
|11||Departure||grexit, leave, threat, hope|
|12||First function of the donor||compassion, growth, sympathy|
|13||Hero’s reaction||anger, chaos, clashes, collapse, emigration, riot, strike, suicide, turmoil|
|14||Receipt of a magical agent||ESF, ESM, euro bonds, fiscal pact, final union, transaction tax,|
|indicates whether the RW model has been outperformed (+) or not (-), LB; Ljung-Box test for autocorrelation, KP: KPSS test for trend stationarity, WH: White neural network test for nonlinearity, raw: raw -values, adj: Bonferroni adjusted -values.|
|indicates whether the RW model has been outperformed (+) or not (-), LB; Ljung-Box Test for autocorrelation, KP: KPSS test for trend stationarity, WH: White neural network test for nonlinearity, raw: raw -values, adj: Bonferroni adjusted -values.|