Forecasting in light of Big Data
Abstract.
Predicting the future state of a system has always been a natural motivation for science and practical applications. Such a topic, beyond its obvious technical and societal relevance, is also interesting from a conceptual point of view. This owes to the fact that forecasting lends itself to two equally radical, yet opposite methodologies. A reductionist one, based on the first principles, and the naïveinductivist one, based only on data. This latter view has recently gained some attention in response to the availability of unprecedented amounts of data and increasingly sophisticated algorithmic analytic techniques. The purpose of this note is to assess critically the role of big data in reshaping the key aspects of forecasting and in particular the claim that bigger data leads to better predictions. Drawing on the representative example of weather forecasts we argue that this is not generally the case. We conclude by suggesting that a clever and contextdependent compromise between modelling and quantitative analysis stands out as the best forecasting strategy, as anticipated nearly a century ago by Richardson and von Neumann.
Key words and phrases:
Forecasting; Big Data; EpistemologyNothing is more practical than a good theory (L. Boltzmann)
1. Introduction and motivation
Uncertainty spans our lives and forecasting is how we cope with it, individually, socially, institutionally, and scientifically. As a consequence, the concept of forecast is an articulate one. Science, as a whole, moves forward by making and testing forecasts. Political institutions make substantial use of economic forecasting to devise their policies. Most of us rely on weather forecasts to plan our daily activities. Thus, in forecasting, the boundaries between the natural and the social sciences are often crossed, as well as the boundaries between the scientific, technological and ethical domains.
This rather complex picture has been enriched significantly, over the past few years, by the rapidly increasing availability of methods for collecting and processing vast amounts of data. This revived a substantial interest in purely inductive methods which are expected to serve the most disparate needs, from commercial service to datadriven science. Data brokers sell to third parties the digital footprints recorded by our internet activities or credit card transactions. Those can be put to a number of different uses, not all of them ethically neutral. For instance, aggressive forms of personalised marketing algorithms identify women who are likely to be pregnant based on their internet activity, and similarly health related web searches have been proved to influence individual credit scorings [26]. However, dataintensive projects lie at the heart of extremely ambitious, cuttingedge scientific enterprises, including the US “Brain Research through Innovative Neurotechnologies” (http://www.braininitiative.nih.gov/) and the AfricanAustralian Square Kilometer Array, a radio telescope array consisting of thousands receivers (http://skatelescope.org/).^{1}^{1}1See, e.g. [2] for an appraisal of how, experiments of this kind, may lead to a paradigm shift in the philosophy of science.
Those examples illustrate clearly that big data spans radically diverse domains. This, together with its sodality with machine learning, has recently been fuelling an allencompassing enthusiasm, which is loosely rooted on a twofold presupposition. First, the idea that big data will lead to much better forecasts. Second, it will do so across the board, from scientific discovery to medical, financial, commercial and political applications. It is this enthusiasm which has recently led to making a case for the predictiveanalytics analogue of universal Turing machines, unblushingly referred to as The Master Algorithm [12].
Based on this twofold presupposition, big data and predictive analytics are expected to have a major impact in society, in technology, and all the way up to the scientific method itself [21]. The extent to which those promises are likely to be fulfilled is currently a matter for debate across a number of disciplines [1, 15, 22, 17, 3, 23], while some early success stories rather quickly turned into macroscopic failures [16]. This note adds to the methodological debate by challenging both aspects of the presupposition for big data enthusiasm. First, more data may lead to worse predictions. Second, a suitably specified context is crucial for forecasts to be scientifically meaningful. Both points will be made with reference to a highly representative forecasting problem: weather predictions.
The remainder of the paper is organised as follows. Section 2 begins by recalling that the very meaning of scientific prediction depends significantly on an underlying theoretical context. Then we move on, in Section 3, to challenging the naïve inductivist view which goes hand in hand with big data enthusiasm. In a rather elementary setting we illustrate the practical impossibility of inferring future behaviour from the past when the dimension of the problem is moderately large. Section 4 develops this further by emphasising that forecasts depend significantly on the modeller’s ability to identify the proper level of description of the target system. To this end we draw on the history of weather forecasting, where the early attempts at arriving at a quantitative solution turned out to be unsuccessful precisely because they took into account too much data. The representativeness of the example suggests that this constitutes a serious challenge to the view according to which big data could make do with the sole analysis of correlations.
The main lesson can be put as follows: as anticipated nearly a century ago by Richardson and von Neumann, a clever and contextdependent tradeoff between modelling and quantitative analysis stands out as the best strategy for meaningful prediction. This flies in the face of the by now infamous claim put forward in 2008 in Wired by its then editor C. Anderson “the data deluge makes the scientific method obsolete”. In our experience academics have a tendency to roll their eyes when confronted with this, and similar claims, and hasten to add that nonacademic publications should not be given so much credit. We believe otherwise. Indeed we think that the importance of the cultural consequences of such claims is reason enough for academics to take scientific and methodological issue against them, independently of their publication venue. Whilst Anderson’s argument fails to stand methodological scrutiny, as the present paper recalls, its key message –big data enthusiasm– has clearly percolated society at large. This may lead to very serious social and ethical shortcomings. For the combination of statistical methods and machine learning techniques for predictive analytics is currently finding cavalier application in a number of very sensitive intelligence and policing activities, as we now briefly recall.
This clearly illustrates that the scope of the epistemological problem tackled by this note extends far beyond the scientific method and the academic silos.
1.1. From SKYNET to PredPol
Early in 2016 a debate took place on alleged drone attacks in Pakistan. The controversial article by C. Grothoff and J.M. Porup^{2}^{2}2http://arstechnica.co.uk/security/2016/02/thensasskynetprogrammaybekillingthousandsofinnocentpeople/ opened as follows:
In 2014, the former director of both the CIA and NSA proclaimed that “we kill people based on metadata.” Now, a new examination of previously published Snowden documents suggests that many of those people may have been innocent.
Recall that SKYNET is the US National Security Agency’s programme aimed at monitoring mobile phone networks in Pakistan. Leaked documents [31] show that the primary goal of this programme is the identification of potential affiliates to the Al Quaeda network. Further information suggests that SKYNET builds on classification techniques, fed primarily on GSM data drawn from the entire Pakistani population. This obviously puts the classification method at high risk of overfitting, given, of course, that the vast majority of the population is not linked to terrorist activities. Not surprisingly then, the Snowden papers revealed a rather telling result of the SKYNET sophisticated machine learning, which led to attach Ahmad Zaidan, a bureau chief for AlJazeera in Islamabad, the highest probability of being an Al Quaeda courier.
Two points are worth observing. First note, as some commentators have reported [30], that the classification of Zaidan as strongly linked to Al Qaida cannot be dismissed as utterly wrong. It of course all depends on what we mean by “being linked”. As a journalist in the field he was certainly “linked” to the organisation, and very much so if one counts the two interviews he did with Osama Bin Laden. But of course, “being linked” with a terror organisation may mean something entirely different, namely being actively involved in the pursuit of their goal. This fundamental bit of contextual information is probably impossible to infer for a classification technique, even the most accurate one. But SKYNET algorithms are far from it, which brings us to the second noteworthy point. The leaked documents assess the rate of false positives of the classification method used by SKYNET between 0.008% and 0.18%. Since the surveillance programme gathers data from a population of 55 million people, this leads to up to 99 thousand Pakistani who may have been wrongly labelled as “terrorists”. Whether or not this actually led to deadly attacks through the “FindFixFinish” strategy based on Predator drones, this example illustrates the shortcomings of the universality of the combination of big data and machine learning. For if the SKYNET programme was about detecting unsolicited emails, rather than potential terror suspects, the false positive rate of 0.008% would be considered exceptionally good. It is far from it, if it may lead to causing highly defamatory accusations, if not outright death to thousands of innocent people. The observation to the effect that terrorists identification and spam detection are completely different problems, with incomparable social, legal, and ethical implications, though apparently trivial, may easily be overlooked as a consequence of the big data enthusiasm.
On a less spectacular, but no less worrying scale, this can be seen to feed the increasing excitement for predictive policing. Police departments in the United States and in Europe have been recently purchasing commercially available software to predict crimes. California based PredPol^{3}^{3}3 http://www.predpol.com/ is widely used across the country and by some police departments in the United Kingdom. The New York Times reports^{4}^{4}4The Risk to Civil Liberties of Fighting Crime With Big Data, 6 November 2016 that Coplogic^{5}^{5}5http://www.coplogic.com/ has contracts with 5,000 police departments in the US. Keycrime^{6}^{6}6http://www.keycrime.com/ is a Milan based firm which has been recently contracted by the Italian police. This list can be prolonged. Predictive policing’s main selling point is of course expense reduction. If we can predict where the next crime is going to be committed, we can optimise patrolling. Being more precise requires less resources, less taxpayers money, and it delivers surgical results. But context is once again neglected. When introducing the methods and techniques underlying predictive policing the authors of the 190 pages strong RAND report [27] on the subject note that
These analytical tools, and the IT that supports them, are largely developed by and for the commercial world.
This, we believe, suffices to illustrate the relevance and urgency of a matter which we now move on to discuss in greater generality. To this end we shall begin by recalling a seemingly obvious, and yet surprisingly often overlooked, feature of the forecasting problem, namely that not all forecasts are equal.
2. On forecasting
Laplace grasped rather clearly one important feature of how probability and uncertainty relate to information when he pointed out that probability depends partly on our knowledge and partly on our ignorance. What we do know clearly affects our understanding of what we don’t know and, consequently, our ability to estimate its probability.
It cannot be surprising then, that the meaning of scientific prediction or forecasts changes with the growth of science. In [25], for instance, it is suggested that one can get a clearer understanding of what physics is by being specific about the accepted meaning of physical predictions.
The origins of the very concept of scientific forecast can in fact be traced back to the beginning of modern physics. The paradigmatic example being classical mechanics – the deterministic world in which (for a limited class of phenomena) one can submit definite Yes/No predictions to experimental testing. A major conceptual revolution took place in the mid 1800s with the introduction of probabilistic prediction, a notion which in the intervening two centuries has taken three distinct interpretations. The first relates to the introduction of statistical mechanics, and is indeed responsible for introducing a novel, stochastic, view of the laws of nature. The second started at the beginning of the 1900s with the discovery of quantum mechanics. The third, which is coming of age, relates to the investigation of complex systems. It also observed in [25] that this development of the meaning of scientific forecasting amounted to its progressive weakening. Whilst the concept of stochastic prediction in statistical mechanics is clearly weaker than the Yes/No prediction of the next solar eclipse, it can be regarded as being stronger than predictions about complex systems which may involve probability intervals. The upside of increasingly weaker notions of forecasts is the extension of the applicability of physics to a wider set of problem. The downside is the lack of precision.
It is interesting to note that the first major shift in perspective – from the binary forecasts of classical mechanics to the probabilistic ones of statistical mechanics – can be motivated from an informational point of view. To illustrate, we borrow from a classic presentation of Ergodic Theory [13], in which a gas with molecules contained in a threedimensional box is considered. Since particles can move in any direction of the (Euclidean) space, we are looking at a system with degrees of freedom. Assuming complete information about the molecules’ masses and the forces they exert, the instantaneous state of the system can be described fully –at least in principle– by fixing spatial coordinates and the corresponding velocities, i.e. by picking a point in dimensional Euclidean space. We are now interested in looking at how the system evolves in time according to some underlying physical law. In practice though, the information we do possess is seldom enough to determine the answer.
[This led Gibbs to] abandon the deterministic study of one state (i.e., one point in phase space) in favor of a statistical study of an ensemble of states (i.e., a subset of phase space). Instead of asking “what will the state of the system be at time ?”, we should ask “what is the probability that at time the state of the system will belong to a specified subset of phase space?”.[13]
This (to our present lights) very natural observation led to enormous consequences. So it is likewise natural to ask, today, whether the present ability to acquire, store, and analyse unprecedented amounts of data may lead the concept of forecasts to the next level.
In what follows we address this question in an elementary setting. In particular we ask whether using our knowledge of the past states of a system – and without the use of models for the evolution equation – meaningful predictions about the future are possible. Our answer is negative to the extent that rather severe difficulties are immediately found, even in a very abstract and simplified situation. As we shall point out the most difficult challenge to this view is understanding of the “proper level” of abstraction of the system. This is apparent in the paramount case of weather forecasting discussed in Section 4. We will see there that the key to underestanding the “proper level” of abstraction lies with identifying the “relevant variables” and the effective equations which rule their time evolution. It is important to stress that the procedure of building such a description does not follow a fixed protocol, applicable in all contexts given that certain conditions are met. It should rather be considered as a sort of art, based on the intuition and the experience of the researcher.
3. An extreme inductivist approach to forecasting using Big Data
According to a vaguely defined yet rather commonly held view [15] big data may lead to dispense with theory, modelling or even hypothesising. All of this would be encompassed, across domains, by smart enough machine learning algoritms operating on large enough data sets. This extreme inductivist conception of forecasts is thought of as depending solely on data. Is this providing us with a new meaning of predictions, and indeed one which will outdate scientific modelling as we currently understand it?
Two hypotheses, which are seldom made explicitly, are needed to articulate an affirmative answer:

Similar premisses lead to similar conclusions (Analogy);

Systems which exhibit a certain behaviour, will continue doing so (Determinism).
Note that both assumptions are clearly at work in the very idea of predictive policing recalled above. For predicting who is going to commit the next crime and where this is going to happen, requires one to think of the disposition to commit crimes as a persistent feature of certain people, who in turn, tend to conform to certain specific features. Those analogies and the deterministic character of the ‘disposition to commit crimes’ are very prone to mistake correlation with causation. Racial profiling is the most obvious, but certainly not the sole ethical concern which is being currently raised in connection with the first performance assessments of predictive policing [32].
Let us go back to our key point by noting that Analogy and Determinism have been long debated in connection to forecasting and scientific prediction.
If a system behaves in a certain way, it will do so again seems a rather natural claim, but, as pointed out by Maxwell^{7}^{7}7Quoted in Lewis Campbell and William Garnett, The Life of James Clerk Maxwell, Macmillan, London (1882); reprinted by Johnson Reprint, New York (1969), p. 440. it is not such an obvious assumption after all.
It is a metaphysical doctrine that from the same antecedents follow the same consequents. […] But it is not of much use in a world like this, in which the same antecedents never again concur, and nothing ever happens twice. […] The physical axiom which has a somewhat similar aspect is “That from like antecedents follow like consequents”.
In his Essai philosophique sur les probabilités Laplace argued that analogy and induction, along with a “happy tact”, provide the principal means for “approaching certainty” in situations in which the probabilities involved are “impossible to submit to calculus”. Laplace then hastened to warn the reader against the subtleties of reasoning by induction and the difficulties of pinning down the right “similarity” between causes and effects which is required for the sound application of analogical reasoning.
More recently de Finetti sought to redo the foundations of probability by challenging the very idea of repeated events, which constitutes the starting point of frequentists approaches a la von Mises, a view which is not central to Kolmogorov’s axiomatisation, but for which the Russian voiced some sympathy. In a vein rather similar to that of Maxwell’s, de Finetti argued extensively [10, 11] that thinking of events as “repeatable” is a modelling assumption. If the modeller thinks that two events are in fact instances of the same phenomenon, she/he should state that as a subjective and explicit assumption.
This assumption is certainly not mentioned in the extreme inductivist big data narrative, which advocates an approach to forecasting which uses just knowledge of the past, without the aid of theory. Let us then turn our attention to this view, and frame the question in the simplest possible terms. We are interested in forecasts such that future states of a systems are predicted solely on the basis of known past states. If this turns out to be problematic in a highly abstract situation, then it can hardly be expected to work in contexts marred by high modeluncertainty, like the ones of interest for big data applications.
Basically [34], one looks for a past state of the system “near” to the present one: if it can be found at day , then it makes sense to assume that tomorrow the system will be “near” to day . In more formal terms, given the series where is the vector describing the state of the system at time , we look in the past for an analogous state, that is a vector with “near enough” (i.e. such that , being the desired degree of accuracy). Once we find such a vector, we “predict” the future at times by simply assuming for the state . It all seems quite easy, but it is not at all obvious that an analog can be found.
The problem of finding an analog is strictly linked to the celebrated Poincaré recurrence theorem^{8}^{8}8 In its original version the Poincaré recurrence theorem states that: Given a Hamiltonian system with a bounded phase space , and a set , all the trajectories starting from will return back to after some time repeatedly and infinitely many times, except for some of them in a set of zero probability. Actually, though this is seldom stressed in elementary courses, the theorem can be easily extended to dissipative ergodic systems provided one only considers initial conditions on the attractor, and “zero probability” is interpreted with respect to the invariant probability on the attractor [6]. : after a suitable time, a deterministic system with a bounded phase space returns to a state near to its initial condition [28, 14]. Thus an analog surely exists, but how long do we have to go back to find it? The answer has been given by the Polish mathematician Mark Kac who proved a Lemma [14] to the effect that the average return time in a region is proportional to the inverse of the probability that the system is in .
To understand how hard it is to observe a recurrence, and hence to find an analog, consider a system of dimension .^{9}^{9}9To be precise, if the system is dissipative, is the fractal dimension of the attractor [4]. The probability of being in a region that extends in every direction by a fraction is proportional to , therefore the mean recurrence time is . If is large (say, larger than ), even for not very high levels of precision (for instance, , that is ), the return time is so large that in practice a recurrence is never observed.
That is to say that the required analog, whose existence is guaranteed in theory, sometimes cannot be expected to be found in practice, even if complete and precise information about the system is available to us.
Fig. 1 shows how even for moderately large values of the fractal dimension of the attractor , a good analog can be obtained only in time series with enormous length. If is small (in the example ) for an analog with a precision a sequence of length is enough; on the contrary for we need a very large sequence, at least .
In addition usually we do not know the vector describing the state of the system. Such rather serious difficulty is well known in statistical physics; it has been stressed e.g. by Onsager and Machlup [24] in their seminal work on fluctuations and irreversible processes, with the caveat: how do you know you have taken enough variables, for it to be Markovian?; and by Ma [20]: the hidden worry of thermodynamics is: we do not know how many coordinates or forces are necessary to completely specify an equilibrium state.
Takens [33] gave an important contribution to such a topic: he showed that from the study of a time series , where is an observable sampled at the discrete times , it is possible (if we know that the system is deterministic and is described by a finite dimensional vector, and is large enough) to determine the proper variable . Unfortunately, at practical level, the method has rather severe limitations:

It works only if we know a priori that the system is deterministic;

The protocol fails if the dimension of the attractor is large enough (say more than or ).
Once again Kac’s lemma sheds light on the key difficulty encountered here: the minimum size of the time size allowing for the use of Taken’s approach increases as with [34, 4]. Therefore this method cannot be used, apart for special cases (with a small dimension), to build up a model from the data. All extreme inductivist approaches will have to come to terms with this fundamental fact. One of the few success of the method of the analogs is the tidal prediction from past history. This in spite of the fact that tides are chaotic; the reason is the low number of effective degrees of freedom involved [4].
4. Weather forecasting: the mother of all approaches to prediction
Weather forecasts provide a very good illustration of some central aspects of predictive models. Not last because of the extreme accuracy which this field managed to achieve over the past decades. And yet this accuracy could be attained only when it became clear that too much data would be detrimental to the accuracy of the model. Indeed, as we now briefly review, in the early days weather forecasts featured a naive form of inductivism not dissimilar to the one fuelling the big data enthusiasm.
Let us stress that the main limit to predictions based on analogs is not the sensitivity to initial conditions, typical of chaos. But, as realized by Lorenz [4], the main issue is actually to find good analogs.
The first modern steps in weather forecasting are due to Richardson [29, 19] who, in his visionary work, introduced many of the ideas on which modern meteorology is based. His approach was, to a certain extent, in line with genuine reductionism, and may be summarised as follows: the atmosphere evolves according to the hydrodynamic (and thermodynamics) equations for the velocity, the density, and so on. Therefore, future weather can be predicted, in principle at least, by solving the proper partial differential equations, with initial conditions given by the present state of the atmosphere.
The key idea by Richardson to forecast the weather was correct, but in order to put it in practice it was necessary to introduce one further ingredient that he could not possibly have known [5]. After few decades von Neumann and Charney noticed that the equations originally proposed by Richardson, even though correct, are not suitable for weather forecasting [19, 9]. The apparently paradoxical reason is that they are too accurate: they also describe highfrequency wave motions that are irrelevant for meteorology. So it is necessary to construct effective equations that get rid of the fast variables.
The effective equations have great practical advantages, e.g. it is possible to adopt large integration time steps making the numerical computations satisfactorily efficient. Even more importantly, they are able to capture the essence of the phenomena of interest, which could otherwise be hidden in too detailed a description, as in the case of the complete set of original equations. It is important to stress that the effective equations are not mere approximations of the original equations, and they are obtained with a subtle mixture of hypotheses, theory and observations [9, 5].
5. Concluding remarks
The above argument shows that in weather forecasting the accuracy of prediction need not be monotonic with the sheer amount of data. Indeed, beyond a certain point the opposite is true. This, in our opinion, is a serious methodological objection to the piecemeal big data entusiasm. Given its representativeness among all forecasting methods, the conclusions drawn with respect to predicting the weather are far reaching, and help unifying a number of observations that have been recently put forward along the same lines.
In many sciences and in engineering, an ever increasing gap between theory and experiment can be observed. This gap tends to widen particularly in the presence of complex features in natural systems science [25]. In socioeconomical systems the gap between data and our scientific ability to actually understanding them is typically enormous. Surely the availability of huge amounts of data, sophisticated methods for its retrieval and unprecedented computational power available for its analysis will undoubtedly help moving science and technology forward. But in spite of a persistent emphasis on a fourth paradigm (beyond the traditional ones, i.e. experiment, theory and computation) based only on data, there is as yet no evidence data alone can bring about scientifically meaningful advance. To the contrary, as nicely illustrated by Crutchfield [8], up to now it seems that the unique way to understand some non trivial scientific or technological problem, is following the traditional approach based on a clever combination of data, theory (and/or computations), intuition and wise use of previous knowledge. Similar conclusions have been reached in the computational biosciences. The authors of [7] point out very clearly not only the methodological shortcomings (and ineffectiveness) of relying on data alone, but also unfold the implications of methodologically unwarranted big data enthusiasm for the allocation of research funds to healthcare related projects: “A substantial portion of funding used to gather and process data should be diverted towards efforts to discern the laws of biology”.
Big data undoubtedly constitute a great opportunity for scientific and technological advance, with a potential for considerable socioeconomic impact. To make the most of it, however, the ensuing developments at the interface of statistics, machine learning and artificial intelligence, must be coupled with adequate methodological foundations. Not least because of the serious ethical, legal and more generally societal consequence of the possible misuses of this technology. This note contributed to elucidating the terms of this problem by focussing on the potential for big data to reshape our current understanding of forecasting. To this end we pointed out, in a very elementary setting, some serious problems that the naïve inductivist approach to forecast must face: the idea according to which reliable predictions can be obtained solely on the grounds of our knowledge of the past faces insurmountable problems – even in the most idealised and controlled modelling setting.
Chaos is often considered the main limiting factor to predictability in deterministic systems. However this is an unavoidable difficulty as long as the evolution laws of the system under consideration are known. On the contrary, if the information on the system evolution is based only on observational data, the bottleneck lies in Poincaré recurrences which, in turn, depend on the number of effective degrees of freedom involved. Indeed, even in the most optimistic conditions, if the state vector of the system were known with arbitrary precision, the amount of data necessary to make the meaningful predictions would grow exponentially with the effective number of degrees of freedom, independently of the presence of chaos. However, when, as for tidal predictions, the number of degrees of freedom associated with the scales of interest is relatively small, the future can be successfully predicted from past history. In addition, in absence of a theory, a purely inductive modelling methodology can only be based on times series and the method on the analogs, with the already discussed difficulties [34].
We therefore conclude that the big data revolution is by all means a welcome one for the new opportunities it opens. However the role of modelling cannot be discounted: not only larger datasets, but also the lack of an appropriate level of description [9, 5] may make useful forecasting practically impossible.
References
 [1] C. S. Calude and G. Longo. The Deluge of Spurious Correlations in Big Data. Foundations of Science, 21, 1–18, 2016.
 [2] D. Casacuberta and J. Vallverdú. Escience and the data deluge. Philosophical Psychology, 27(1), 126–140, 2014.
 [3] S. Canali. Big Data, epistemology and causality: Knowledge in and knowledge out in EXPOsOMICS. Big Data & Society, 3(2):1–11, 2016.
 [4] F. Cecconi, M. Cencini, M. Falcioni, and A. Vulpiani The prediction of future from the past: an old problem from a modern perspective American Journal of Physics 80(11), 10011008, 2012.
 [5] S. Chibbaro, L. Rondoni, and A. Vulpiani Reductionism, Emergence and Levels of Reality SpringerVerlag, Berlin, (2014)
 [6] P. Collet, and J.P. Eckmann Concepts and Results in Chaotic Dynamics: A Short Course SpringerVerlag, Berlin, (2006)
 [7] P. V. Coveney, E. R. Dougherty, and R.R. Highfield, Big data need big theory too Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 280, 374, 111, 2016.
 [8] J.P. Crutchfield The dreams of theory Wiley Interdisciplinary Reviews: Computational Statistics 6, 7579, 2014.
 [9] A. Dahan Dalmedico History and epistemology of models: meteorology as a case study Archive for the History of Exact Sciences 55, 395422, 2001
 [10] B. de Finetti. Theory of Probability, Vol 1. John Wiley and Sons, New York, 1974.
 [11] B. de Finetti. Philosophical lectures on probability. Ed. A. Mura, Translated by H. Hosni, Springer Verlag, Berlin, 2008.
 [12] P. Domingos. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. Basic Books, New York, 2015.
 [13] P.R. Halmos. Lectures on Ergodic Theory. Chelsea Publishing, London, 1956
 [14] M. Kac On the notion of recurrence in discrete stochastic processes Bullettin of the American Mathematical Society. 53, 1002–1010 1947.
 [15] R. Kitchin. Big Data, new epistemologies and shifts. Big Data & Society, 1:1–12, 2014.
 [16] D. Lazer, R. Kennedy, G. King, and A. Vespignani. The Parable of Google Flu: Traps in Big Data Analysis. Science, 343(6167), 1203–1205, 2014.
 [17] S. Leonelli. DataCentric Biology: A Philosophical Study. Chicago University Press, Chicago, 2016.
 [18] E. N. Lorenz, Predictability A problem partly solved in Proc. Seminar on Predictability (ECMWF, Reading, UK, 1996), pp. 1–18.
 [19] P. Lynch The Emergence of Numerical Weather Prediction: Richardson’s Dream Cambridge University Press, Cambridge, (2006)
 [20] S. K. Ma Statistical Mechanics World Scientific, Singapore, (1985).
 [21] V. MayerSchönberger and K. Cukier. Big Data: A Revolution That Will Transform How We Live, Work, and Think. Houghton Mifflin, New York, (2013) 2013.
 [22] E. Nowotny. The Cunning of Uncertainty. Polity, London, (2016).
 [23] M. Nural, M. E. Cotterell, and J. Miller. Using Semantics in Predictive Big Data Analytics. Proceedings  2015 IEEE International Congress on Big Data, BigData Congress 2015, pages 254–261, 2015.
 [24] L. Onsager, and S. Machlup, Fluctuations and irreversible processes Physical Review 91, 15051512, 1953.
 [25] G. Parisi. Complex Systems: a Physicist’s Viewpoint. Physica A, 263:557–564, 1999.
 [26] F. Pasquale. The Black Box Society, volume 36. Harvard University Press, Harvard, 2015.
 [27] W. L Perry, B. McInnes, C. C Price, S. C Smith, and J. S Hollywood. Predictive Policing: The Role of Crime Forecasting in Law Enforcement Operations. RAND Corporation, Santa Monica, 2013.
 [28] H. Poincaré. Sur le problème des trois corps et les équations de la dynamique, Acta Mathematica 13, 1–270, 1890.
 [29] L. F. Richardson. Weather Prediction by Numerical Methods Cambridge University Press, Cambridge (1922)
 [30] M. Robbins. Has a rampaging AI algorithm really killed thousands in Pakistan? The Guardian 18 February 2016 http://www.theguardian.com/science/thelayscientist/2016/feb/18/hasarampagingaialgorithmreallykilledthousandsinpakistan
 [31] SKYNET: Applying Advanced Cloudbased Behavior Analytics. The Intercept, 8 May 2005. https://theintercept.com/document/2015/05/08/skynetapplyingadvancedcloudbasedbehavioranalytics/
 [32] J. Saunders, P. Hunt, and J.S. Hollywood. Predictions put into practice: a quasiexperimental evaluation of Chicago’s predictive policing pilot. Journal of Experimental Criminology, 12, 1–25, 2016.
 [33] F. Takens Detecting strange attractors in turbulence In: D. Rand, L.S. Young (Ed.s), Dynamical Systems and Turbulence, Lecture Notes in Mathematics, 898, 366–381, 1981.
 [34] A. S. Weigend, and N. A. Gershenfeld (Ed.s) Time Series Prediction: Forecasting the Future and Understanding the Past AddisonWesley, Reading (1994).