No Return to Classical Reality
Abstract
At a fundamental level, the classical picture of the world is dead, and has been dead now for almost a century. Pinning down exactly which quantum phenomena are responsible for this has proved to be a tricky and controversial question, but a lot of progress has been made in the past few decades. We now have a range of precise statements showing that whatever the ultimate laws of Nature are, they cannot be classical. In this article, we review results on the fundamental phenomena of quantum theory that cannot be understood in classical terms. We proceed by first granting quite a broad notion of classicality, describe a range of quantum phenomena (such as randomness, discreteness, the indistinguishability of states, measurementuncertainty, measurementdisturbance, complementarity, noncommutativity, interference, the nocloning theorem, and the collapse of the wavepacket) that do fall under its liberal scope, and then finally describe some aspects of quantum physics that can never admit a classical understanding – the intrinsically quantum mechanical aspects of Nature. The most famous of these is Bell’s theorem, but we also review two more recent results in this area. Firstly, Hardy’s theorem shows that even a finite dimensional quantum system must contain an infinite amount of information, and secondly, the Pusey–Barrett–Rudolph theorem shows that the wavefunction must be an objective property of an individual quantum system. Besides being of foundational interest, results of this sort now find surprising practical applications in areas such as quantum information science and the simulation of quantum systems.
1 Introduction
We are constantly told that quantum theory has revolutionized our understanding of the universe, and reveals a strange new world, radically different from classical Newtonian mechanics – cats can be both alive and dead; particles can disappear and reappear behind the moon; spooky actionatadistance causes instantaneous effects at the other side of the universe; measuring one observable disturbs the value of another in a strange, conspiratorial way. But one can press the matter beyond the overused lines of newspaper articles and popscience accounts, and ask: which quantum phenomena unequivocally force us to discard longheld, classical conceptions of the universe in the same way that a constant speed of light for all local observers forces us to discard the notion of absolute time? Which phenomena of quantum theory are intrinsically nonclassical?
We might quickly point to things like “wavefunctions” and “Hilbert space”[1, 2, 3], but these are simply technical features of the mathematics of quantum theory, and on their own shed no light on how the physics of quantum mechanics radically departs from the classical realm. Indeed long ago Koopman and von Neumann showed that classical mechanics itself can be formulated in Hilbert space [4, 5, 6].
Discretization of observables, such as atomic energy levels, does not really present any dramatic shift in our perspectives – one can easily define a classical phase space that is discrete in positions and momenta with permutations for dynamics. Another answer might be that quantum reality is fundamentally probabilistic and that “uncertainty is hardwired into physics”, but again is this really such a radical departure from classicality? The behaviour of the particles in a hot cup of coffee is massively unpredictable, and we have absolutely no chance of describing them in any precise sense – should we view quantum physics as an exaggerated form of statistical randomness, perhaps in which we now have an irreducibly poor resolution of the complicated underlying details?
Yet another response might be that the collapse of the wavefunction allows us to instantaneously cause the state of the rest of the universe to change entirely. However consider the following scenario: both you and a friend (who lives on the other side of the galaxy) have been given a box each. You are both told that one box contains a gold coin while the other box contains a silver coin. Before you open your box, you are completely ignorant as to what is in your particular box, and so you can only predict that with probability you have gold, and with probability you have silver. Moreover correlations exist, you know that if you have a silver coin then your friend has a gold coin and conversely if you have gold then they have silver. You open the box, and to your delight, discover that you have gold. At that same instant you also know that the box on the other side of the galaxy must contain silver. An instantaneous collapse of the probability distribution has taken place! Is quantum entanglement and the collapse of the wavefunction simply an exaggerated form of this probabilistic updating?
The aim of this review is really twofold: firstly to show that the above phenomena are not the features of quantum physics that overthrow a classical conception of the world, and secondly to identify a range of deeper phenomena that do.
Of course, in order for us to provide meaningful answers we must commit to some minimal notion of classicality. The simple criterion that guides us is the following:
If a phenomenon of quantum physics also occurs within a classical statistical physics setting, perhaps with minor additional assumptions that don’t violently clash with our everyday conceptions, then it should not be viewed as an intrinsically quantum mechanical phenomenon.
This informal condition provides a standard for how surprised we should be by any quantum phenomenon. The term “everyday conceptions” is intentionally vague at this point, and ultimately depends on what the reader deems “classically reasonable”. However the key point here is that the more liberal you are with “classically reasonable” then the stricter you are with what aspects of quantum theory challenge your classical conception of the world. In what follows we adopt a fairly generous notion of “classicality”, or equivalently, we adopt a high standard for what we call “intrinsically quantum”^{1}^{1}1Ultimately the formal definition of “classical” will be that the theory is a “local, noncontextual theory in which nonorthogonal pure states are represented by overlapping statistical distributions defined on some state space ”.. We start by first fleshing out the above notion of classicality, and then exhibiting a range of quantum phenomena that, on their own, do not seriously challenge our classical conceptions. After mapping out these “classical fragments” of quantum theory, we then identify those quantum phenomena that forever banish the classical realm.
1.1 Overview
This review covers several interrelated facets of the foundations of quantum theory, which at times can become quite abstract and subtle. To avoid confusion, we start with a rough outline.
The purpose of §2 is to show that the quantum phenomena of measurementdisturbance, complementarity, randomness, collapse of the wavepacket, and others, also appear in classical statistical mechanics supplemented with minor additional assumptions. §2.1 contains the core concepts, and is largely selfcontained. §2.2 provides a more physical model that makes a direct connection to quantum physics. The conclusion of this is that Gaussian quantum physics is essentially classical in nature (see Appendix D for a definition of Gaussian quantum physics). §2.3 and §2.4 analyse the nocloning theorem, and wavepacket collapse in the EinsteinPodoloskyRosen experiment within this model, showing that they too are essentially classical.
Leading on from this, §2.5 and §2.6 discuss what it means for quantum phenomena to have a classical statistical model in general. This sets the scene for discussing intrinisically quantummechanical phenomena. These sections are a bit more abstract and technical, so a casual reader should just take the core message from section §2.5 that a probabilistic framework allows us to place quantum theory in a broader context, and in doing so contrast it with other theories such as classical theory.
§3 identifies three intrinisically quantummechanical phenomena. §3.1 reviews Bell’s theorem. This is a mostly selfcontained discussion, and can be read on its own with only a rough overview of §2. The takehome message is that the correlations obtained from measuring entangled states force us into a dilemma: either abandon basic notions of realism or abandon the relativistic notion that influences cannot travel faster than light. §3.2 discusses Hardy’s theorem. This requires understanding the basic framework of §2.5, and the takehome message is that quantum systems display seemingly contradictory properties of being both continuous and discrete and, contrary to traditional statements, it is the continuity which is puzzling. Finally, §3.3 discusses the recent PuseyBarrettRudolph theorem, which shows that the wavefunction must be an objective property of an individual system. This requires a little more background from §2.5 as well as a basic understanding of §2.6.
2 Classical Fragments of Quantum Theory
It is clear that quantum mechanics must accommodate some kind of emergent “classical properties”, and some kind of a “classical regime”, in which Newtonian mechanics is recovered as a limiting case. However our goal here is not to describe a classical limit, but rather to set up a sufficiently broad notion of classicality so that anything which does not fall under this notion must be deemed intrinsically quantum in character.
In this section we show that classical fragments exist within quantum theory, i.e. there are a range of phenomena in quantum physics that also appear in classical statistical physics supplemented with assumptions that do not violently clash with our intuitions about classical physics. However, if one tries to stretch this classical framework across all of quantum physics, then “classically unreasonable” features always appear – quantum physics can never be fit cleanly into a classical framework. By carving out these classical fragments and delineating their boundaries, we can identify the genuinely nonclassical aspects of Nature.
2.1 A toy classical universe with some odd features
Let us begin with an extremely simple statistical mechanics example due to Spekkens [7] that captures some of the conceptual problems we face in identifying genuinely quantum phenomena. Imagine a classical particle that can be in one of four possible states. For example, we might imagine a box with four internal cells to it, and the particle is located in one of the cells. For simplicity we can imagine that the cells form a grid, which allows us to label these cells via discrete coordinates as . These are the exact states, or microstates, of the system (see Figure 1).
However, now suppose that the box is so small that all our measurement devices are blunt, clumsy probes that only return coarse answers as to where the particle is actually located. Instead of precise microstates, we are forced to use statistical macrostates, which are probability distributions , where is the probability that the particle is in the cell .
So far everything is elementary classical statistical mechanics of a simple system, which requires exactly two bits of data, and , to specify the microstate of the particle. However, suppose we now conjure up a new fundamental law for this toyuniverse [7], and make the following assumption on our ultimate ability to determine the microstate of the particle:
Resolution Restriction (RRcondition): It is impossible to possess more than a single bit of information about the microstate of the classical system.
In other words, the macrostate is allowed to be fully random, , or to have a single sharp coordinate, say, so that is allowed. However, it cannot have any weaker form of randomness, for example the sharp microstate is fundamentally excluded from the toytheory.
There are six extremal macrostates in the theory, which are the six minimal uncertainty states shown in Figure 1.
The key thing to note is that while clearly not a fundamental part of classical mechanics, this RRcondition does not dramatically overthrow our classical conceptions – it simply describes a classical scenario where we have a bound on our resolving power. However despite such simplicity, the RRcondition has a range of surprising consequences for the physics of this toyuniverse. For a start, it implies that the six extremal macrostates of the theory cannot be perfectly distinguished. For example, if the system is prepared either in the macrostate or the macrostate , and you do not know which, then there is no procedure that will tell you which is the case with certainty. This is because the distributions and overlap – they each assign probability to the microstate . Therefore, if the system happens to occupy this microstate, which will happen with probability , then there is nothing you can possibly do to distinguish from . This parallels the fact that nonorthogonal quantum states, such as the and states of a spin particle [1, 8, 2], cannot be perfectly distinguished. Secondly, the RRcondition not only places restrictions on what can be measured, but it also implies that all measurements in the toyuniverse must uncontrollably disturb the particle.
Since we want to respect the RRcondition in this toyuniverse in the simplest way, it is reasonable to assume that any idealized measurement that we can perform obeys the following two conditions:

Consistency with the RRcondition: Whenever the system starts in a macrostate that obeys the RRcondition, it must end up in a macrostate that still obeys the RRcondition after the measurement has been performed and we have recorded the outcome.

Repeatability: When a measurement is performed and a certain outcome is obtained then repeating the measurement immediately afterwards should yield the same outcome [2].
Given these two conditions, it is easy to see that there is no measurement that reveals exactly which cell the particle is in. Suppose such a measurement were allowed and consider, for example, what would happen if its outcome revealed that the system was in the cell. Because of repeatability, the system must remain in the cell after the measurement, since otherwise there would be some probability of obtaining a different result upon repeating the measurement. However, this is incompatible with the RRcondition because it would leave us with full information about which cell the particle is in. We conclude that measurements in the toy universe must necessarily only reveal coarsegrained information about the microstate.
Whilst we cannot do a measurement that tells us which exact cell the particle is in, it turns out that we are allowed do a coarsegrained measurement , with outcomes and , that returns one of two answers:
Outcome : “The particle is in or ”
Outcome : “The particle is in or ”.
Now consider a situation in which the system is prepared in the macrostate , namely half the time there is a particle in the cell , the other half the time its in the cell .
If we perform the measurement then we will get the outcome half the time. Crucially, if the measurement does not cause a disturbance to the system, then getting this outcome would allow us to conclude that the particle must be in the cell because it had zero probability of being in the cell to begin with. To avoid this violation of the RRcondition, the particle must sometimes get kicked into another cell during the measurement procedure. By repeatability, the only cell it can get kicked to is because this is the only other cell that gives the outcome with certainty. In fact, to satisfy both the RRcondition and repeatability for any valid starting distribution, upon obtaining the outcome half the time the particle must remain where it is, and the other half the time the microstates and must be swapped. Thus, starting in the macrostate before the measurement, the measurement disturbs the system and sends it to (see Figure 2). Measurementdisturbance necessarily exists in the physics of this toy universe. Indeed the only macrostates that are not disturbed by measuring are the macrostates and .
In addition to , there are two other extremal measurements in the toy theory: distinguishes and from and , while distinguishes and from and . Each of these measurements necessarily induces a disturbance of a similar type to a measurement of . The three extremal measurements are illustrated in Figure 3.
This measurement disturbance is entirely classical, however it leads to phenomena that are familiar in quantum mechanics^{2}^{2}2In appendices B–D, we review those aspects of quantum mechanics that are relevant to this article. – for example, the measurement of spin along the axis for a spin1/2 particle disturbs all quantum states except the eigenstates and . As it turns out, the classical measurements on macrostates have probabilities and disturbance patterns that perfectly mimic the three quantum spin measurements along the directions performed on the different eigenstates for . The macrostates in the toymodel also display a form of complementarity in terms of the 3 measurements and . For example, the macrostates and have deterministic outcomes for measurement , however the outcomes of measurements and are fully random on these macrostates (see [2] or [8] for careful discussions of complementarity in quantum theory).
The disturbance induced by measurements also implies that the results obtained in a sequence of measurements depend on the order in which the measurements are made. Such a dependence is often taken to be a signature of noncommutativity in quantum mechanics, so in this sense we can say that the classical measurements in the toy model are noncommutative. For example, if we make two measurements in a row, one immediately after the other, then, by repeatability, we will get the same result both times. However, if is measured between the two measurements, then the disturbance it induces will cause the outcome of the second measurement to be totally random, and uncorrelated with the first measurement.
It is vital to emphasize that this example is most definitely not trying to show that quantum theory is actually some funny classical model, but simply that the phenomena of measurement uncertainty and measurement disturbance in quantum physics also arise within a classical model with simple additional assumptions (a bound on the sharpness of classical measurements) that do not overthrow our classical conception of the world. Therefore, according to the previously stated notion of classicality, measurement uncertainty, complementarity, noncommutativity, and measurement disturbance are not viewed as intrinsically quantum mechanical phenomena, and so we must search deeper.
Because the 3 sharpest measurements, on the 6 extremal macrostates turn out to have identical statistics to elementary quantum states and measurements, the toytheory also mimics interference – in spite of it being a classical statistical theory. At the simplest level, interference is our ability to reversibly evolve a quantum state such as to another state , where is an orthonormal basis for a 2dimensional, qubit system (see appendix C for details). The evolution acts linearly on and is defined by the transformations and . Under such an evolution the component of is enhanced (constructive interference), while the component of is eliminated (destructive interference) [1, 8]. If we were only keeping track of the measurement statistics of measuring in the basis , then the above interference would be described by the reversible evolution of measurement statistics to/from .
Such behaviour is perfectly mimicked within the classical toy model. For example, the minimal uncertainty macrostate gives rise to outcome statistics for the measurement, but it can be converted in a reversible way (deterministically shuffle the cells around) into any of the other minimal uncertainty macrostates. The reversible transformation of the cells transforms into and hence, the measurement statistics of go from to . If we simply look at the measurement statistics, this is indistinguishable from the quantum example of interference described above. Again, it is important to note that we are not claiming that general quantum interference is classical, merely that a similar phenomenon can appear in the reversible dynamics of classical statistical models and so a simple notion of “interference” is perhaps too blunt an answer to the question of what quantum phenomenon is intrinsically nonclassical, and so requires more precision.
Remarkably, the above classical toymodel exhibits an array of other phenomena traditionally associated with quantum mechanics, such as the collapse of the wavepacket, state teleportation [9] and the impossibility of cloning states [10, 11]. All the striking phenomena stem solely from a single restriction on classical resolving power. Instead of describing all these within the toymodel, we shall upgrade the basic RRcondition to obtain a more natural, and intuitive scenario where these phenomena are more vivid and generate a genuine classical fragment of quantum theory.
2.2 Gaussian states and operations are a classical fragment of quantum physics
The toy model of a classical particle in a box, subject to a single constraint on resolving power, leads to a mimicking of phenomena that are traditionally deemed quantum mechanical in character. However, this simple classical model can now be magnified into a more surprising result that has direct contact with quantum mechanics [12].
Consider the phase space of a classical system [13], parameterized by variables , but now imagine discretizing the phase space into boxes with sides of length and imposing a Resolution Restriction condition such that our Liouville distributions can never have smaller support than some limited number of boxes over the phase space. Such a simplistic approach would be clunky in that it would depend on an arbitrary way of partitioning phase space. To get around this we can instead impose an RRconstraint at the level of the expectation values of canonical coordinates. When we do so, we find that the resulting physical theory makes exactly the same predictions as Gaussian quantum mechanics^{3}^{3}3We give a short account of Gaussian states and operations in appendix D. See [14] for a more in depth review..
For simplicity, we restrict attention to a single classical particle moving in one spatial dimension, but the construction can easily be generalized to any classical system. The particle has microstates that make up the system’s phase space. A statistical Liouville distribution is then a probability distribution on the phase space with and such that . From this we can compute the statistical properties of the system, such as the expectation value of position , or the expectation value of momentum .
In classical mechanics there is no limit on how sharp the predictions of can be – we can know the precise microstate to any degree of accuracy. An RRconstraint can be imposed by limiting how small the fluctuations about the mean can be. The form that this constraint should take is motivated by the symmetries and structure of phase space, e.g. we want the constraint to be preserved by classical time evolution of the system. Since classical dynamics may cause fluctuations in to be transferred into fluctuations in and vice versa, and may induce correlations between and , it is natural to impose the constraint on fluctuations in , , and their correlations taken all together. To do this we construct a fluctuation matrix , which is given in terms of position and momentum by
(1) 
where and are the variances of the position and momentum in the particular Liouville distribution .
Using the matrix , we can define an RRcondition that restricts how small the fluctuations in can get. An elegant way to do this is to demand that obey the matrix equation
(2) 
for some matrix , and some constant scale that measures the size of the “boxes” on phase space, where Eq. (2) means that the eigenvalues of are all . Since the eigenvalues of itself are always , the case corresponds to switching off the RRconstraint, and so controls the level of fluctuations within the toy classical universe.
The question of which matrix to use is more subtle, but we want to choose it such that Eq. (2) is preserved under classical time evolution, i.e. if it holds at time then it should also hold at ant time under any dynamics allowed by classical mechanics. It turns out that setting , where is the imaginary unit and
(3) 
does the trick, so the final RRcondition is
(4) 
for some fixed minimal resolving scale on the classical phase space. See appendix A for more details of this construction.
Finally, because we are following a statistical mechanics account of the physics [15, 16], for a given fluctuation matrix , we use the Gibbsian distribution that maximizes the thermodynamic entropy . Thus the scenario we have described is precisely one of classical statistical mechanics where our classical resolving power is bounded in phase space by a scale . Indeed, Eq. (4) implies that , and so we find that the RRcondition encodes a classical uncertainty relation on the statistical system.
The RRcondition can be interpreted as a kind of externally imposed complementarity between position and momentum for the classical system, and, as with the previous toy model, we have measurementdisturbance: localizing the position of the particle must randomly disturb its momentum in order to maintain repeatability and the RRcondition. Once again, the theory displays a type of “interference” in terms of its macrostates in the sense described earlier, and the structure of the extremal macrostates of the theory is surprisingly rich.
To what degree does this mimic quantum mechanical phenomena? What classical fragment of quantum theory does this model define? The answer is perhaps surprising – the above scenario, in which we use classical statistical mechanics under the RRcondition Eq. (4), is precisely isomorphic to Gaussian quantum mechanics [12, 14] (see appendix D for a definition of Gaussian quantum mechanics). In other words,
If Nature only prepared Gaussian quantum states, and only performed Gaussian evolution and Gaussian measurements, then classical statistical mechanics with a single resolving constraint (with ) would perfectly reproduce all physical predictions.
The proof that all of the features of Gaussian quantum mechanics are reproduced is an involved computation, and we refer the reader to [12] for the details.
Thankfully, there more to life than Gaussian physics, but this correspondence still tells us some useful things. Firstly it shows that Liouville mechanics under the RRcondition reproduces all quantum phenomena that are present in Gaussian quantum mechanics [14]. This includes teleportation, superdense coding [17], remote steering [18], securekey distribution [19], nocloning [10, 11], and the collapse of the wavepacket (we discuss some of these shortly). However, the real value of the result is that it narrows our hunt and tells us that the intrinsically nonclassical phenomena, such as Bell nonlocality [20, 21], quantum computation [22] and contextuality [23], must necessarily be nonGaussian in nature.
It should also be emphasized that in a precise sense there is no “middleground” between Gaussian quantum mechanics and the full set of quantum operations. Specifically, the set of unitary transformations that describe the dynamics in Gaussian quantum mechanics are all those of the form , where the Hamiltonian is at most quadratic in the canonical coordinates and . Now imagine that we have a single nonquadratic term that can be added to the Hamiltonain and which can be switched on or off whenever we want. By adding this single to the set of quadratic Hamiltonians, the set of unitary transformations we can achieve explodes to become the full set of unitaries on the Hilbert space [24]. This means that the Gaussian fragment is in a sense the largest classical fragment we can obtain by following this line, and must rest right up against genuine nonclassicality.
2.3 Nocloning is a classical statistical phenomenon
It is often maintained that the impossibility to clone quantum information is a distinctly quantum mechanical phenomenon. Formally, the nocloning theorem [10, 11] says that it is impossible to construct a physical device that, on input of any quantum state will return the duplicated state . Indeed, if such a magical device existed then one could even violate relativity and signal faster than light^{4}^{4}4The rough idea is that if Alice repeatedly clones one half of an entangled state that has been collapsed by a remote measurement made by Bob on the other half of the state, then she can magnify the information as to what type of state she possesses (e.g. whether it is a momentum eigenstate or a position eigenstate). Bob can use this to signal faster than light by choosing which type of state to collapse Alice’s system to (e.g. momentum eigenstates=“yes”, position eigenstates=“no”)..
The proof of the nocloning theorem in quantum theory is very straightforward. Suppose a device existed that could clone two nonorthogonal and nonidentical states , where is the Hilbert space of the system. Any physically allowed transformation in quantum theory is described by a unitary operation on the joint Hilbert space of the primary system and some apparatus system, which together form a closed system. For a device that clones and , this transformation must satisfy
(5)  
(6) 
where is the initial state of the apparatus. We can now compute the inner product of the output states and in two different ways. Firstly, using Eq. (5), we have
(7) 
Alternatively, we can use the fact that is a reversible unitary evolution and so , where is the identity operator, to give
(8) 
where the last line follows from the fact that is a unit vector. In other words, a unitary transformation preserves inner products. Equating the two expression gives , which is only satisfied when is either or . However, since we assumed that and are nonidentical and nonorthogonal, this is a contradiction, and thus there is no physically allowed cloning device in quantum theory.
Whilst classical bits can be freely copied, this only applies to the exact values of the bits themselves. In light of the theories presented so far, it is perhaps better to think of a pure quantum state as analogous to a probability distribution over the values of classical bits, and these cannot be cloned. We find that the imposition of the RRcondition on the above classical statistical model generates precisely the same prohibition as in quantum theory: it is impossible to build a cloning device within this classical theory.
The proof parallels the quantum proof, and relies only on a basic property of Hamiltonian dynamics [12]. Suppose a classical device existed that could clone two overlapping but nonidentical Liouville distributions and defined on a classical system . Let be a second classical “apparatus” of equal size to that is initialized in some fiducial macrostate . The composite system undergoes some Hamiltonian dynamics on the underlying microstates, which is assumed to clone and . The initial joint state is thus either or , where and are the input macrostates of the system, and and are defined on the joint phase space . If this evolution is a cloning process then, under the Hamiltonian dynamics, we must have
(9)  
(10) 
where and are identical distributions, and similarly for and .
To mirror the inner product computation of quantum states, we use a classical measure [25] of how much two statistical distributions overlap. The only fact we need about the dynamics is the following: if and are two distributions on phase space, then the overlap integral is constant in time. Here, is the classical “fidelity” measure, quantifying the degree to which the distributions overlap – if then they overlap fully and , while if they have zero overlap then . Put another way, Hamiltonian dynamics evolves phase space distributions in a volumepreserving way, similar to an incompressible fluid, which implies that the fidelity is also preserved.
Now, we can compute the overlap of the two final states in two different ways. Firstly, using Eq. (9), we have
(11) 
However, since Hamiltonian dynamics preserves fidelity, we can alternatively compute the fidelity of the initial states, which is
(12) 
In parallel to the quantum case, these two equations can only be satisfied if or , but, since we assumed and are overlapping and nonidentical, this is a contradiction. We conclude that it is fundamentally impossible to construct a device that clones overlapping statistical distributions within the classical theory. Therefore the nocloning theorem does not only apply to quantum theory, but also to classical statistical mechanics. Hence, it should not be considered an intrinsically quantum mechanical phenomenon.
2.4 The EPR argument and the collapse of the wavepacket
The seminal 1935 paper [26] by Einstein, Podolsky and Rosen asked whether quantum theory is “complete” or “incomplete”. In other words, perhaps quantum theory is only a stopgap and there is a yet deeper theory, in which “God does not play dice”. Their analysis revolved around a two particle state, which displays correlations between both the position and momentum degrees of freedom.
It turns out that the EPR state and the measurements they considered actually lie within the classical fragment we have just described – in other words, at least as far as the EPR argument and measurements are concerned, the answer is a tentative “yes”: a perfectly deterministic and local completion does exist that reproduces all the statistics of the EPR experiment. It is only when we get to Bell’s theorem that entanglement prohibits any such underlying degrees of freedom that only interact locally, and thus the classical picture of the world becomes untenable.
The EPR twoparticle state is correlated in the two position degrees of freedom and has zero total momentum. Specifically, in the position representation, the state is
(13) 
where is the Dirac deltafunction distribution and is a constant. In the momentum representation the same state is
(14) 
Such a state might be obtained through the decay of some massive particle with zero momentum into two lighter particles that propagate in opposite directions and are now a distance apart^{5}^{5}5Note that, although the delta functions make this state unphysical, the same argument can be run with properly normalized Gaussian states that approximate them [12]. We use the idealized version for simplicity..
Now, according to the orthodox account of quantum theory (i.e. the textbook treatment of quantum mechanics), the state represents a situation in which neither particle has a definite position, since it is not an eigenstate of either of the operators or . Similarly, neither particle has a definite momentum. All we can say is that the system is in a state of definite relative position and momentum, i.e. there is perfect correlation between the two positions and momenta. Note that, in the orthodox account, it is not simply a matter of each particle having a definite position and momentum that is currently unknown to us, but rather that the individual positions and momenta do not exist, since the only properties that can be ascribed to the system are those corresponding to operators of which the state is an eigenstate.
The EPR argument amounts to the observation that if one measures the position of the first particle to be then the state of the remotely separated second particle is collapsed to a sharp position state , which, if measured, will always yield the value with certainty. Thus, according to the orthodox account, the position of the second particle pops into existence as soon as the first is measured and, since is arbitrary, they may be arbitrarily far apart. This represents a kind of nonlocality in the orthodox account, since an observation made over here can cause something to instantaneously pop into existence very far away. The only way to avoid this is to “complete” quantum mechanics by positing that, in fact, the second particle did have a definite position before the first was measured, and this is exactly what EPR argued for.
The same argument can also be run in momentum space. If one chooses to measure the momentum of the first particle and finds it to be , then the state of the second particle is collapsed to a sharp momentum state . Thus, according to the orthodox account, the momentum of the second particle pops into existence upon measuring the momentum of the first, and so EPR argued that the second particle must have a definite value of momentum prior to measurement in order to avoid nonlocality. It is rather striking that, on the orthodox account, the choice of which observable to measure affects which property pops into existence at a distant location, and that the completion of quantum mechanics proposed by EPR would violate a strict interpretation of the uncertainty principle. Note that this extra wrinkle on the argument is not required to establish the nonlocality of the orthodox account, which already follows just from considering position measurements on their own.
It turns out that the statistical model [12] with restricted resolving power, can exactly reproduce the EPR measurement statistics for position and momentum since they are Gaussian measurements. Within this parallel model the “paradoxical” or potential conflict with relativity is illusory and no longer a concern. To represent the state given in Eqs. (13, 14) in the classical scenario, we define the twoparticle distribution on the phase space given by
(15) 
which is the limit of a sequence of Gaussian distributions that all satisfy the RRcondition. This distribution has perfect correlations between the spatial and momentum degrees of freedom of the particles. The account of the EPR experiment now takes on a simple form: the actual state of the particles are microstates of definite position definite momentum. When we perform a local measurement that determines the position of the first particle it is simply the probability distribution for the total system that changes, and not the physical state of the remote system. This is no different to the example of the correlated coins that was provided in the introduction, and highlights that in many ways the collapse of the wavepacket is no more strange than the updating of a probability distribution; the objective state of the remote system remains the same, and there is no conflict with relativity theory. It requires a stronger result such as Bell’s theorem to fully and conclusively rule out an account along these lines.
2.5 Is analogous to a thermodynamic Gibbs state?
Some people might claim that quantum mechanics is not the final theoretical framework for physics – that there might be some even more fundamental theory yet to be discovered, which includes quantum theory as some kind of limiting case. Could it really be that quantum theory is incomplete [26], and that there are underlying variables that give a more finegrained description, and that our measurement devices actually respond to these underlying variables? Historically [15, 27], this is what happened with thermodynamics and statistical mechanics – the macroscopic properties of heat, temperature and pressure are welldefined properties obeying the laws of thermodynamics, however they admit a statistical mechanical description in terms of the rapid motion of underlying variables – atoms. How do we know that something like this could not happen in the future with quantum mechanics? Could the wavefunction be more like the Gibbs distribution of statistical mechanics, and somehow point to new underlying degrees of freedom?
We shall see that this can never happen. In a precise sense, and under a broad range of entirely reasonable assumptions, no such thing can happen in any future theory of physics. To tackle such a seemly nebulous question, we must use a sufficiently general framework that contains only the most primitive notions of “states” and “statistical measurements”, and which can account for the predictions of quantum theory as a special case. Since the framework we describe can account for the predictions of quantum physics, as well as theories which are not quantum theory, the framework effectively allows us to regard quantum mechanics as an object in itself, and to delineate its properties in contrast to other theories, including classical mechanics.
According to the textbook account [1, 3], the quantum state provides a full description of a quantum system, both in terms of its subsequent evolution in time and how it responds to any measurement that we may wish to perform. Every orthonormal basis of the Hilbert space corresponds to a quantum measurement, and has outcome probabilities given by the Bornrule. A qubit is any quantum system whose Hilbert space is twodimensional and so any state of the system is expressible as for an orthonormal basis and complex numbers obeying . A general orthonormal basis contains two states , and when a measurement is performed in this basis on a system prepared in the state , then the outcome probabilities are simply given by and .
Given this framework, what would it mean for the quantum state to be like a Gibbs state and admit some hypothetical microscopic structure? This would require that there exists some set of perhaps more fundamental states that give a sharper description of the system [28, 29]. If this did turn out to be the case, then instead of using to describe the physics we could replace it with a probability distribution over this set of variables . More precisely, each quantum state would be associated with a probability distribution^{6}^{6}6Even more precisely, probability distributions should be associated to the procedures for preparing quantum states rather than the states themselves to account for a subtlety called preparation contextuality [50]. However, this subtlety does not affect any of the results presented here. See [30] for a more rigorous treatment that does deal with this topic. for which for any point and which is normalized as
(16) 
In particular, this means that the integral
(17) 
is the probability that the underlying microstate state of the system is in the region of the full state space when the state is prepared experimentally.
Once we have a notion of quantum states being described by probability distributions, how do we describe quantum measurements? This too is relatively easy. Suppose we perform a measurement in the basis on a qubit prepared in the quantum state . According to quantum theory, the probabilities of the two outcomes are and . However, in terms of the underlying variables, the measurement is described by a conditional probability distribution , where is the probability that the measurement will return the outcome when the system occupies the microstate . This means that we must have for all , and also , so that the probabilities sum to one. If these functions are to correctly describe the observed measurement statistics then they must obey:
(18) 
for .
More generally, for systems of arbitrary dimension, a measurement in the basis performed on a system prepared in the state , is described by a conditional probability distribution over the outcomes that obeys
(19) 
for all . This is what would be required in any hypothetical theory in order for the quantum predictions for measurement outcomes to be reproduced.
It is vital to emphasize that this formulation includes the orthodox description of quantum mechanics, and so it is a broader framework that allows questions to be posed that are impossible within the traditional setting. To show that it includes the orthodox account, simply take to be the set of all quantum states , where states that differ by a global phase are identified, and let be a delta function distribution , with weight just on the quantum state that is prepared. The measurement conditional probabilities are then simply be taken to be , to trivially recover the Bornrule.
Why then should we bother to use such a description? The reason is that since this is a general, probabilistic setting that only requires the notion of abstract states and probability distributions, it uses only the most elementary notions of what one normally calls a “physical theory”. Such breadth makes it powerful, and will allow us to rule out alternative theories, identify intrinsically quantum phenomena, and to study quantum theory “from the outside”.
2.6 Overlapping distributions: The KochenSpecker model
In the orthodox account of quantum theory in which , the distributions corresponding to different quantum states do not overlap – they are simply delta functions located at the different quantum states. However, for a qubit, there is a slightly more interesting representation, due to Kochen and Specker, which can be used to frame a fundamental question concerning the objectivity of the wavefunction in quantum mechanics [28, 23].
An arbitrry quantum state of a qubit can be written as
(20) 
where and . By defining the vector , we see that the set of states of a qubit can be represented as points on the unit sphere , which is known in this context as the Bloch sphere (see appendix C for further details). There is a onetoone relation between every qubit state , and a vector on the surface of the Bloch sphere and, in what follows, we simply use this vector to represent the quantum state. Moreover, if a state is measured in a basis then the probability of obtaining the outcome is
(21) 
Kochen and Specker’s model for a qubit employs the unit sphere as its space of microstates, and to every quantum state , they associate a probability distribution
(22) 
where is the Heaviside step function
(23) 
In the Bloch sphere representation, this means that is only nonzero if the angle between and is less than , and it takes the value on this hemisphere. See Figure 4 for an illustration of these probability distributions.
To represent measurements, a basis vector is represented by the conditional probabilities , which describes the probability of getting the outcome when the exact microstate is . On the Bloch sphere, this means that the outcome will be if the angle between and is and will be the orthogonal basis state otherwise. A direct calculation (e.g. see [30]) shows that the model yields
(24) 
and so does in fact reproduce the Born rule for a qubit system.
Why is this construction of interest to us? Firstly, the KochenSpecker construction exactly reproduces the Bornrule for a twodimensional system and so can be viewed as mini classical fragment of quantum theory. However, the real reason it is of interest is that if we plot the distribution functions for two different states we notice a distinct feature of the representation – two different quantum states and can have overlapping distributions. The core significance of this is that there is a region of microstates (the lightshaded part of Figure 4) that belong to both and . If the system occupies the microstate , which lies in this overlap region, then a unique wavefunction cannot be associated to it. In other words, within such a hypothetical model a quantum state would not be an objective property that is “carved into” the physical system!
Why might we want the distributions corresponding to different quantum states to overlap? Recall from §2.1, that the extremal states of the toy model overlap, as do the distributions in the restricted Liouville mechanics presented in §2.2. This overlap naturally explains why the extremal states cannot be perfectly distinguished, and why distributions cannot be cloned. Essentially, if two preparation procedures sometimes lead to the exact same microstate, then the action of any physical device cannot depend on which of the two preparation procedures was used whenever a microstate in the overlap region happens to be occupied. Therefore, the probability of successfully distinguishing or cloning macrostates is limited by the probability assigned to the overlap region. The overlapping distributions in the KochenSpecker model explain why qubit states cannot be perfectly distinguished or cloned in the same way within this model.
This raises the question of whether such overlapping distributions could ever occur in some future theory for general quantum systems, and, if they could, then how much would we have to contort our perspective of the world? If we are necessarily forced to adopt something ridiculous or bizarre then we would conclude that this is not the case – the quantum state must indeed be carved into physical systems, and must be an objective label of a quantum system. This would show that the explanations of quantum phenomena in terms of overlapping distributions, despite intuitive appeal, must in fact be wrong.
3 Genuinely NonClassical Aspects of Quantum Physics
The previous results show that some fragments of quantum theory can be reproduced in ways that do not violently clash with our classical intuitions, but it is clear that we were being led into more and more contrived models in order to capture more and more of the phenomena of quantum theory. The takehome message so far is simply that many commonly touted phenomena – intrinsic randomness, complementarity, measurementdisturbance, nocloning, collapse of the wavepacket, etc. – do not in themselves dramatically challenge our classical notions, as they already appear within theories such as the statistical mechanics model with the RRcondition presented above. In addition, we have shown that the physics of a twodimensional quantum system can be reproduced by a statistical model in which the probability distributions associated to different quantum states can overlap. This raises the question of whether the quantummechanical wavefunction is necessarily an objective property of a quantum system.
We now describe results that reveal fundamentally nonclassical phenomena, in the sense that any classical account underlying the quantum phenomena must be rather contorted. We begin with the seminal result of John Bell [20], and then move onto two more recent results that provide further insights into the strangeness of quantum theory. The first of these is Hardy’s theorem [32], which shows that the set of microstates must be infinitely large, even for a finite dimensional system, and hence that such systems must contain an infinite amount of information. The second is the Pusey–Barrett–Rudolph theorem [34], which shows that, under reasonable assumptions, the quantum state must be an objective property of an individual quantum system.
3.1 Bell’s theorem: Quantum physics violates local causality
The departure of quantum mechanics from classicality was put into a very sharp and powerful form by John Bell [20, 2], who showed that some aspects of quantum entanglement can never fit into a model in which systems possess objective properties prior to measurement and that also obeys a principle of locality. Since the result only depends on certain empirically observed predictions of quantum theory, rather than the structure of the theory itself, any future theory beyond quantum theory will be subject to the same argument, so there can be no going back to a conception of the world that is both classical and local.
The version of Bell’s theorem we present here is due to Clauser, Horne, Shimony and Holt [21], and is the one most commonly used in experiments. To understand it, we need to explain both the mathematical components and the physical concepts. To avoid confusion, it is helpful to separate these two parts, so we begin with the mathematics.
The easiest way to understand the mathematics of Bell’s theorem is in terms of a cooperative game, in which Alice and Bob are playing as a team against Charlie. Suppose that Alice and Bob are captured and held captive by Charlie. Alice and Bob are told that the next morning they will be placed into two separate interrogation rooms with no possibility of communicating with each other. They will each be asked one of two possible yes/no questions. We call the question that Alice gets asked and the question that Bob gets asked . For definiteness, we can imagine that both Alice and Bob’s questions are labelled and , so that and are binary variables that take values or . Let be the answer that Alice gives to question and let be the answer that Bob gives to question . To be released they must get their stories straight in a very particular way. If they are both asked question then their answers, , must obey . In all other cases, i.e. if or , they must provide answers for which .
Alice and Bob are told that they can spend the night together to discuss their strategy for answering the questions. They also have access to devices for generating classical randomness – for definiteness suppose they have a set of dice with different weightings – which they may use to determine their strategy and which they may also bring into the interrogation room with them. They are assured that Charlie will not eavesdrop on their discussions and, in fact, that he will choose which questions to ask completely randomly by flipping two separate coins. What is the best strategy for Alice and Bob to adopt that gives them the highest chance of being released?
To begin with, let’s ignore the possibility of using randomness and ask what is the best that Alice and Bob can do if they employ a deterministic strategy, i.e. they simply have to decide, in advance, which answer Alice will give to her question and which answer Bob will give to his. It is helpful to represent the target answers as a graph where the answers and are four vertices [35, 36, 37], and we connect the variables that should be equal by a solid line and those that should be different by a dotted line (see Figure 5).
Now, let’s traverse the links in the graph and try to satisfy as many of the requirements as possible. Suppose we start by setting to “yes”. Following the top solid line, we see that should equal , so it should also be assigned “yes”. The diagonal solid line connecting to then implies that should also be “yes”. The dotted line from to implies that should be different from so we set . However, now we have a problem because the diagonal solid line from to implies that these two should be equal, but we have already set to “yes” and to “no”, so we have only managed to satisfy three of the four requirements. Therefore, using this strategy, Alice and Bob will win if Charlie picks any of the question pairs , or , but they will lose if he picks . Hence, their probability of winning is , since the chance of Charlie picking is if he chooses his questions by two fair coin flips.
It is fairly easy to see that, however Alice and Bob assign “yes” and “no” to their questions, they can only satisfy at most three of the four requirements. This is because, however you go about traversing the graph and assigning answers, you can never satisfy the final requirement encoded in the final link because it contradicts the implications of the other three. Thus, is the largest probability with which Alice and Bob can win the game via a deterministic strategy.
This is the basic mathematics of Bell’s theorem, but we still have to deal with the possibility of probabilistic strategies that employ randomness. There are two ways in which Alice and Bob can use randomness. The first is that, when they are still together the night before the interrogation, they can roll some dice and each write down the results. They can then make their choice of answers depend on the outcomes of the dice rolls. For example, they might roll one dice and agree that if it comes up odd then Alice and Bob should both answer “no” to all of the questions, whereas if it comes up even Bob will switch his answer to question to “yes”. It will not make a difference if Alice and Bob look at the dice roll outcomes while they are still together and compute their answers, or if they simply write down the outcomes of the dice rolls, take them with them to the interrogation room, and perform the computation after the questions are asked. So long as they have agreed on a strategy for computing the answers, this will have the same result. It is easy to see that this cannot increase their probability of winning the game. On any given outcome of the dice rolls, Alice and Bob will end up with some specific set of answers, and the bound of will apply to these. On average, they will win with probability whenever the dice rolls lead to an assignment of answers that is an optimal deterministic strategy, and with probability less than when they do not. Therefore, overall they may as well just pick an optimal deterministic strategy to begin with.
The second way they can use randomness is to take some dice with them into the interrogation room, roll them after the questions have been asked, and use the outcome to determine their answer based on a preagreed strategy. This looks like it adds generality because they could choose to roll different dice, with different weightings, depending on which questions they are asked. For example, Alice might take a green die and a blue die into the interrogation room with her and roll the green one if she is asked question and the blue one if she is asked question . In this case, Alice’s answer does not even really come into existence until the question is asked, so this perhaps looks a bit more like what is going on in a quantum measurement.
However, this cannot make a difference to the probability of winning either. Instead of waiting until she gets to the interrogation room, Alice could just roll both the green die and the blue die while she is still together with Bob. Then, she could just write down both outcomes and use one of them if she is asked question and the other if she is asked question . This is already covered by the first way of using randomness, where Alice and Bob can make their answers an arbitrary function of the dice rolls they make when they are together. Although she is doing something physically different – rolling two dice in advance vs. rolling just one die when she already knows the question – as far as the probability of winning the game is concerned we might as well just move all the randomness generation to the beginning, when Alice and Bob are still together, and we already know this is no better than just choosing a deterministic strategy at the outset.
Let us now reflect a bit on exactly what we have proved. In general, Alice and Bob can generate randomness when they are together and do not know which questions they will be asked yet, and also separately after they have been asked their questions. Let’s call the variables that they generate when they are together , taking values in some space . This will have some probability distribution . If Alice and Bob were together and able to communicate when they are asked their questions, then the most general thing they could do would be to base their answers and on the questions and they are asked, their prior shared randomness and any new randomness they choose to generate in the interrogation room. This can be described by conditional probabilities .
In general, a conditional probability distribution can be decomposed as
(25) 
What happens if we now take into account the fact that Alice and Bob are separated and unable to communicate when they are asked their questions? Firstly, cannot depend on , as Alice does not know when she is asked her question, so we have . Secondly, Bob does not know or when he is asked his question, so . Note that this does not mean that Bob has no information about how Alice will answer her question. In particular, if they have chosen a deterministic strategy then Bob knows exactly how Alice will answer each question. However, the point is that already encodes all of the information Bob has about Alice’s strategy so, given , Bob’s answer has no additional dependence on .
Altogether then, we have that
(26) 
This condition is known as local causality, for reasons we shall see shortly.
Finally, to work out the probabilities that Alice and Bob will answer to the pair of questions , we need to average over the randomness they generated to obtain
(27) 
What we have proved via our long discussion is that, in any strategy that satisfies local causality, Alice and Bob can win the game with probability no greater than , or, equivalently, if we define and , we have
(28) 
which is an example of a Bell inequality.
Now, instead of classical randomness, let’s see what happens if we allow Alice and Bob to take correlated quantum systems with them into the interrogation rooms, and to base their responses on the outcomes of quantum measurements. Remarkably, this allows them to violate the bound and win the game with probability . Specifically, suppose they share a pair of qubits in the singlet state
(29) 
If Alice is asked question then she measures the Pauli observable . (If Alice’s qubit is a spin1/2 particle then the Pauli observables and correspond to the angular momenta along the , and directions respectively. The Pauli observables are defined in appendix C.) If the outcome is she answers and if it is she answers . If she is asked question then she instead measures the Pauli observable and answers if she gets and if she gets . Bob measures if asked question and answers if he gets and if he gets . If he is asked question he instead measures , and answers if he gets and if he gets . A straightforward computation reveals that, with this strategy, Alice and Bob will win with probability [2]. This is strictly larger than the classical bound and results from the superstrong correlations that exist within the singlet state.
This concludes our discussion of the mathematics of Bell’s theorem, but what does it all mean physically? Consider the spacetime diagram in Figure 6. A source emits two qubits in the singlet state, which travel to two spacelike separated detectors, and , where an observable is measured on each of them. Each of the detectors has two settings, corresponding to the two questions that Alice and Bob might be asked in the game. We label the outcome at detector when it has setting as and the outcome at detector when it has setting as . The choice of which setting to use at each detector is made at random, sufficiently late such that there is no possibility of a signal from the point at which the choice of setting is made, travelling at the speed of light or less, to the detection event and similarly for and . If we set the detectors to measure the observables described above, then we know we can obtain the value for the left hand side of Eq. (28), in violation of the inequality.
Given the causal structure of Figure 6, what would we expect to happen in a classical statistical model? Consider a region of spacetime that “screens off” and from and and vice versa. By this we mean that any timelike path from or to that passes through the source must travel through the region. One such region is highlighted in light blue in Figure 6, but many others are possible. Let denote the state of all of the fundamental degrees of freedom that exist in this region.
We would expect that any correlation that exists between the two wings of the experiment should be mediated by . This is because we assumed that describes all of the fundamental degrees of freedom in the region, so there is nothing else within the light blue region that could possibly mediate correlations, and any causal link between the two wings that does not pass through this region would require superluminal signalling. In other words, we are in precisely the same scenario that Alice and Bob face in their separate interrogation rooms, unable to communicate with each other. Thus, for the exact same reasons as before, we expect local causality to hold, i.e. . As we have already shown, this implies that the inequality given in Eq. (28) should hold.
Remarkably, quantum violations of the inequality have been observed experimentally in the experiments of Aspect et. al. [38], and in numerous experiments since then [39, 40, 41]. The implication of this is that local causality must fail, and therefore either one accepts superluminal influences at some fundamental level, or that elementary notions of realism within statistical theories must be discarded forever. In the literature, this is often referred to by saying that either “locality” or “realism” must be given up. However you wish to parse the dilemma, it is clear that Bell inequality violations imply a radical departure from classical physics.
3.2 Hardy’s theorem: Quantum systems contain an infinite amount of information
At the heart of classical information theory is the idea of a classical bit – the information revealed by a single yesno question. Our ability to quantify, encode and transform information has revolutionised the world in countless ways (telecommunications, the internet, computers, etc.), and its study has shed light on the foundations of physics. Central to this is the idea that information does not care how we choose to encode it – we can encode information on paper, in electronic pulses or carve it into stone. For almost all of history our encoding of information has been into classical degrees of freedom. However, Nature is quantummechanical and, in recent years, we have begun to use quantum degrees of freedom to encode information. A central question therefore arises: does information in quantum mechanics have the same properties as in classical mechanics?
Now, the state of even the simplest quantum system – a qubit – is specified by continuous parameters. This means that it requires an infinite amount of information to specify the state exactly. For example, the amplitude of in the superposition could encode the decimal expansion of . Thus, at first glance, it seems that that quantum systems can carry vastly more information than classical systems. However, Holevo [42, 22, 43] showed only a single bit of classical information can ever be extracted from a qubit system via measurement. Further, in spite having a continuous infinity of pure states, quantum computation do not suffer from the the problems that rule out analog classical computers [22]. Powerful theorems on the discretization of errors [22] tell us that we do not need to correct a continuum of errors, but only particular discrete types. These surprising characteristics present a basic conundrum: how is it that qubits behave as if they are discrete systems when their state space forms a continuum?
As already discussed, in classical statistical mechanics we can consider the allowed macrostates: the set of probability distributions over some state space of microstates. It is easy to see that these distributions also form a continuum – even if there is only a discrete finite set of microstates. As an example, consider the case of DNA bases, which can be in one of 4 microstates , , or . The macrostate for a single base is therefore a probability distribution , obeying and for all . The set of such distributions therefore forms a solid tetrahedron (a simplex) in 3dimensional space, and there is a continuum of macrostates (see Figure 7).
The fact that qubits behave in many ways like discrete, finite systems would be easily explained if perhaps there were only a finite number of more fundamental states – like the finite number of DNA bases – and if the continuum of quantum states only represented our uncertainty about which one of them is occupied – like the continuum of DNA macrostates. Surprisingly, in spite of Holevo’s bound and the discretization of errors, this cannot be the case: any future physical theory that reproduces the physics of finitedimensional quantum systems must have an infinite number of fundamental states.
The proof, due to Hardy [32] (see [33] for an earlier related result), is quite straightforward. Firstly, for the sake of contradiction, assume that is a finite set having elements, i.e. there are fundamental states in the theory. Let be the probability distribution corresponding to in some hypothetical future theory, and define the support of to be
(30) 
Consider a measurement basis that includes the state . In the underlying theory, the outcome is represented by a conditional probability . By the discrete version of Eq. (19), with integrals replaced by sums, if we prepare the state and measure in this basis, the underlying theory must satisfy
(31) 
Now note that , being a probability distribution, must satisfy . This means that must equal for all in order to also make Eq. (31) true.
Next, consider a two dimensional subspace spanned two orthonormal states and , and consider the states
(32) 
for , as illustrated in Figure 7, where can be chosen as large as we wish. For any finite , these states satisfy,
(33) 
for any .
We will now show that reproducing the statistics of these states implies that must contain at least states and hence, since can be chosen to be arbitrarily large, must be infinite.
Consider preparing the system in the state and measuring it in a basis that contains for . Then, Eq. (33) implies that
(34) 
This means that there must exist a such that since otherwise the sum would equal . Since everywhere on , this means that and must be different subsets of . Since this argument applies to every pair , we must in total have distinct subsets of .
Now, if has elements then it has distinct subsets, so we must have , or . However, can be arbitrarily large, and since as , we conclude that if the microstate description reproduces quantum theory then must have infinitely many elements. No finite set of states will ever work, and even the most primitive quantum system must contain an infinite amount of information – in stark contrast with classical theory. One can further prove that there must in fact be a continuous infinity of microstates. A heuristic way to see this is that the experimental probability distributions for quantum states vary smoothly as we vary the measurement basis, and so any underlying model must also inherit this smoothness. We refer the reader to the literature [44, 45] for a rigorous proof of this.
It is said that discreteness is a distinctly quantummechanical phenomenon, but anyone who has ever played a video game will tell you that the concept of a discrete (pseudorandom) classical world is really not so strange. Hardy’s theorem together with results such as Holevo’s bound and the discreteness of errors, show that precisely the opposite is the case: it is instead the continuity of quantum physics that is so strange. How can it be that we have a continuum of quantum states that ostensibly behave discretely but we do not have, and cannot have, an underlying discrete structure?
3.3 The Pusey–Barrett–Rudolph Theorem: Is the wavefunction carved into a quantum system?
In §2.6, we alluded to a particularly subtle question: is the wavefunction an objective property of a quantum system? Again, our meaning comes through comparison with classical statistical mechanics. There the microstates are the objective properties of the system – at any instant of time the “real” state of the system is actually a particular microstate, in contrast to the system’s macrostate which simply describes the ensemble properties of the system and yields the thermodynamic variables of interest. Despite there being no consensus on what, if any, objective properties exist in the quantum realm, we can still ask whether it is possible for a system to have the same objective properties when it is prepared in one of two different quantum states. If the answer is “no”, then we know that the wavefunction must be considered an intrinsic, or objective property of a system. Note that Hardy’s theorem does not settle the question of whether the wavefunction is an intrinsic property because, although the space of fundamental states must be a continuum, the distributions corresponding to distinct quantum states may still overlap, as they do in the KochenSpecker model discussed in §2.6.
The broad statistical framework allows us to frame this question in a quantitative way. For any hypothetical theory with fundamental states that reproduces the predictions of quantum mechanics for some system, we say that the wavefunction is an intrinsic property of the system if any two distinct quantum states and have corresponding probability distributions and that do not overlap. To put it another way, consider the case where is a finite set^{7}^{7}7Measuretheoretic qualifications are needed to deal with the general case. See [30] for details.. Then, by and overlapping, we mean that there exists a microstate for which both and . If the system occupies then we would not be able to tell with certainty which of the two quantum states had been prepared, even if we had access to the full microstate of the system. Hence, it would be impossible to ascribe a unique wavefunction to this microstate – in such a hypothetical scenario the wavefunction would not be an intrinsic property of the system.
We now discuss a recent result due to Pusey, Barrett and Rudolph, which shows that, under two additional reasonable assumptions, the wavefunction must be an intrinsic property of a quantum system [34].
It suffices to consider a two dimensional quantum system, since, for any higherdimensional system, we can simply restrict to a twodimensional subspace. We establish a reducio ad absurdum by supposing there did exist some future statistical theory in which is a not an intrinsic property and from this arrive at a contradiction.
Suppose that for two quantum states and , the corresponding distributions and overlap. Again, we specialize to a finite space for simplicity, so this means that there is an underlying microstate which has a probability of occurring if we prepared either the quantum state or the quantum state . More precisely, regardless of which of these states is prepared, the microstate will be occupied a nonzero fraction of the time.
We now introduce the two assumptions used to prove the theorem. Firstly, imagine preparing two copies of the system in one of the four quantum states
(35) 
We can imagine that the two systems are initially located very far apart, and Alice and Bob each choose whether to prepare or independently of each other.
The first assumption is that, when two systems are prepared independently like this, each system gets its own copy of . The total state space of the two systems is simply the product of two copies of and so microstates are given by . This means that the quantum states are associated with probability distributions of the form and measurements on the joint system are associated with conditional probability distributions of the form , where is a vector in the basis we are measuring.
The second assumption is that the distribution describing the total state factorizes as , where and are the distributions that would be associated with and for a single system. Taken together, these two assumptions are called preparation independence.
We now illustrate how preparation independence can be used to prove that the quantum state must be an intrinsic property of a quantum system by an example, before outlining the general result. Suppose that and . Under the assumption of preparation independence, if we prepare one of the four states , then, at least a fraction of the time, the joint system will be in the microstate , i.e. both systems will occupy the microstate , for which it is impossible to tell which of or was prepared with certainty. This leads to the desired contradiction, once we consider the following twoqubit measurement in the basis consisting of the four vectors
(36) 
where .
Why is this particular (entangled) basis significant? The key point is that for every choice of . Put more simply: the measurement outcome never occurs when the quantum state is prepared. However, we know that, whichever of these states is prepared, a fraction of the time the system is in the hypothetical microstate . If the system is in this microstate then which particular outcome occurs when we make a measurement in this basis? Suppose we get the outcome when the system occupies . We know that this microstate occurs when is prepared a nonzero fraction of the time, so, in order to reproduce the quantum predictions, the outcome should never occur for this microstate. Thus, a nonzero fraction of the time the measurement device cannot give any outcome that is consistent with quantum mechanics, and we have our contradiction.
The above argument was specific to the states and , but it can be extended to show that every pair of pure states must correspond to nonoverlapping distributions. Here, we will just outline a version of this argument due to Moseley [46], and refer the reader to [30] for details.
First, note that if , then the states and are at least as distinguishable as and , in the sense that it is at least as easy to tell them apart via a quantum measurement. It is known that, when this is the case, it is possible to find a physical transformation that maps to and to [47, 48]. If we apply this transformation to both systems and then make a measurement in the basis then this whole procedure can itself be thought of as a measurement on the states . This will have the same measurement probabilities as before, so the previous argument can be adapted to this case.
It remains to deal with the case where . For this, we note that, if instead of preparing one system in the state or , we prepare systems either all in the state or all in the state , then the mod squared inner product of the resulting states will be . For , it is possible to choose such that and thus we can apply the previous argument to show that the distributions corresponding to and can have no overlap. However, preparation independence implies that these distributions are just fold products of the distributions corresponding to and , so we infer that these cannot have any overlap either.
The contradiction we have derived rules out a nonzero , which quantifies how much the distributions corresponding to any pair of pure states can overlap in any future theory that can reproduce the predictions of quantum theory. We conclude that there can never exist a theory in which pure states have overlapping distributions, unless preparation independence is violated [49]. It is clear that preparation independence carries great force, and is an extremely natural assumption to make – imagine preparing a Hydrogen atom close to its groundstate in one part of the universe, and another person preparing another such atom on the other side of the galaxy; no correlations ought to be created between these two preparations, which should be fully independent. We have just shown that this deceptively simple principle has a deep implication: whatever future theory may arise, the quantum wavefunction will always be carved into a quantum system as an objective label of reality.
4 Conclusion
To sum up, we have shown that many phenomena that are traditionally viewed as intrinsically quantummechanical; such as randomness, discreteness, the indistinguishability of states, measurementuncertainty, measurementdisturbance, complementarity, noncommutativity, interference, the nocloning theorem, and the collapse of the wavepacket; all appear within classical statistical mechanics under reversible dynamics. These serve to map out classical fragments of quantum physics, in a search for the genuinely strange aspects of the theory. In addition to Bell’s theorem on the failure of local causality at a fundamental level, we have described two less wellknown results that reveal further deep and subtle insights into the quantum realm. Quantum systems unavoidably contain a continuous infinity of information, despite their apparently discrete behaviour, while the ability to prepare physical systems independently of one another implies that the quantum wavefunction is carved onto a quantum system as an objective physical property of its microstate.
In this article, we have only presented a small sample of recent results, in what is a flourishing area of current research. For example, we have not discussed the recently developed operational approach to contextuality [50], which is another genuinely nonclassical quantum phenomenon, or the powerful graph theoretic approach to contextuality [51, 52]. We have neglected many beautiful and deep results, such as those of Colbeck and Renner [53, 54] who rule out theories beyond quantum mechanics using weaker assumptions than Bell; the results of Montina [55, 56, 57, 58, 59] that reveal links between the structure of classical fragments to the far more grounded topic of the communication complexity of quantum channels; and, following on from the Pusey–Barrett–Rudolph theorem, many other results and experiments on the reality of the wavefunction [30, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73]. There are also ambitious programs that seek to reformulate quantum theory as a theory of Bayesian inference [74], to derive quantum theory from physically reasonable axioms [75, 76, 77, 78, 79], and to formulate quantum theory in the absence of fixed causal structure [80, 81, 82, 83, 84]. Finally, we have not discussed prominent areas of research such as quantum computing [22], quantum cryptography [85] and quantum metrology [86], which are the practical fruit of foundational investigations, and have their own insights to offer about the difference between classical and quantum physics. For these and more, we refer the interested reader to the bibliography, where they will find an array of diverse and vibrant research programs that continue to delve into the very foundations of quantum physics.
Acknowledgements
DJ is supported by the Royal Society. ML is supported by the Foundational Questions Institute (FQXi). Research at Perimeter Institute is supported by the Government of Canada through Industry Canada and by the Province of Ontario through the Ministry of Research and Innovation. DJ thanks Sania Jevtic for useful comments on an earlier draft.
Appendix A The Resolution Restriction on classical statistical mechanics
Any statistical Liouville distribution is a function on the system’s phase space (given by for a system of canonical coordinates), with and such that . For simplicity we restrict attention to a single particle in one dimension, but the same argument holds for more general systems. The expectation value of the particle’s location is given by , while the expectation value of its momentum is