Quantum realism and quantum surrealism

Quantum realism and quantum surrealism

Mateus Araújo
July 13, 2019




1.5in1in* \checkandfixthelayout\pdfstringdefDisableCommands \pdfstringdefDisableCommands

*                     ΧϒΑΝΤϒΜ

Μαςτερ´ς τηεςις

Πρεςεντεδ το τηε Γραδυατε Προγραμ ιν Πηψςιςς οφ τηε ϒνιερςιδαδε Φεδεραλ δε Μινας Γεραις\theφοοτνοτε\theφοοτνοτεφοοτνοτε: \theφοοτνοτεΤηις ερςιον ινςορπορατες φυρτηερ ςορρεςτιονς.

Αυτηορ: Ματευς Αραúϑο Σαντος
Συπεριςορ: Μαρςελο Ο. Τερρα ῝υνηα
Εξαμινερς: Ερνεςτο Φ. Γαλãο
Εξαμινερς: ῝αρλος Η. Μονϰεν
Θυνε, 2012 \epigraph…ωε αλωαψς ηαε ηαδ α γρεατ δεαλ οφ διφφιςυλτψ ιν υνδερςτανδινγ τηε ωορλδ ιεω τηατ χυαντυμ μεςηανιςς ρεπρεςεντς. Ατ λεαςτ Ι δο, βεςαυςε Ι´μ αν ολδ ενουγη μαν τηατ Ι ηαεν´τ γοτ το τηε ποιντ τηατ τηις ςτυφφ ις οβιους το με. Οϰαψ, Ι ςτιλλ γετ νερους ωιτη ιτ. Ανδ τηερεφορε, ςομε οφ τηε ψουνγερ ςτυδεντς…ψου ϰνοω ηοω ιτ αλωαψς ις, εερψ νεω ιδεα, ιτ ταϰες α γενερατιον ορ τωο υντιλ ιτ βεςομες οβιους τηατ τηερε´ς νο ρεαλ προβλεμ. Ιτ ηας νοτ ψετ βεςομε οβιους το με τηατ τηερε´ς νο ρεαλ προβλεμ. Ι ςαννοτ δεφινε τηε ρεαλ προβλεμ, τηερεφορε Ι ςυςπεςτ τηερε´ς νο ρεαλ προβλεμ, βυτ Ι´μ νοτ ςυρε τηερε´ς νο ρεαλ προβλεμ.Ριςηαρδ Φεψνμαν


À minha Luciana, por ter me feito um homem feliz e por ter conseguido controlar seus ciúmes dessa minha amante.

Aos meus pais, por serem quem são, e por me tornarem quem sou. Seu apoio foi e ainda é indispensável.

Ao meu orientador Marcelo Terra Cunha, por ter me dado a liberdade de putanejar enquanto eu podia, e por ter me mandado trabalhar quando eu precisava.

Ao meu grande amigo Marco Túlio Quintino, sem quem essa dissertação seria muito pior.

Ao Marcelo França, pelas conversas fiadas que me impediam de trabalhar, e por me impedir de ignorar suas sugestões.

A Gláucia Murta, pela ajuda indispensável em ler e reler a dissertação em busca de erros e passagens obscuras. Qualquer falha de matemática ou de estilo que tenha permanecido no texto é culpa dela. Também agradeço por ser um recurso local capaz de realizar protocolos inacessíveis a uma pessoa altamente não-local.

Aos meus amigos da Pós, agradeço pelo bom ambiente. Vocês tornam possível ser feliz e aprender física.

Aos professores da Pós, por tudo o que me ensinaram, e por tudo o que não me ensinaram.



In this thesis we explore the questions of what should be considered a “classical” theory, and which aspects of quantum theory cannot be captured by any theory that respects our intuition of classicality.

This exploration is divided in two parts: in the first we review classical results of the literature, such as the Kochen-Specker theorem, von Neumann’s theorem, Gleason’s theorem, as well as more recent ideas, such as the distinction between -ontic and -epistemic ontological models, Spekkens’ definition of contextuality, Hardy’s ontological excess baggage theorem and the PBR theorem.

The second part is concerned with pinning down what should be the “correct” definition of contextuality. We settle down on the definition advocated by Abramsky and Branderburger, motivated by the Fine theorem, and show the connection of this definition with the work of George Boole. This definition allows us to unify the notions of locality and noncontextuality, and use largely the same tools to characterize how quantum mechanics violates these notions of classicality. Exploring this formalism, we find a new family of noncontextuality inequalities. We conclude by reviewing the notion of state-independent contextuality.

Capítulo \thechapter Introduction


Quantum mechanics is magic.Daniel Greenberger

This thesis is meant to explore the question posed by Chris Fuchs: what is “Zing!” [fuchs02]? What is the property of quantum mechanics which is essentially quantum, absent from any classical theory? Contrary to the goals of Chris Fuchs, our exploration is operationalist rather than axiomatic: our “Zing!” is not a deep axiom that reveals the essence of quantum theory, but rather logically connected sets of probability distributions that cannot be reproduced by any classical theory. Although finding his axiom would be nice, we feel that our approach is more useful, as these sets of probability distributions are the resources needed for quantum magic: quantum computing and quantum key distribution.

This is emphatically not a historical account of the subject: these are plentiful, and another one is unnecessary. Therefore, we shall try to keep references to the great works of von Neumann, Bell, Kochen, and Specker to a bare minimum, while emphasising the newer111As a result, the median year of publishing of our references is 2002. works of Abramsky, Busch, Cabello, Hardy, Pitowsky, and Spekkens. The sole exception shall be the work of George Boole, that although very old is still very unknown.

Given a general picture of my motivations and goals, let me now give a more detailed account of the structure of this thesis.

Chapter \thechapter presents introductory material222The reader that is already well-acquainted with the subject (or a mathematician) may find it better to skip it. on the question “is quantum mechanics really different from ‘classical’ theories?”. It begins by capturing some notions of classicality within the framework of ontological theories; then this question is made more precise as “is there an ontological embedding of quantum theory?”.

The chapter proceeds by detailing specific ontological models, and showing which problems arise in trying to reproduce the results of quantum mechanics within them. These problems are then understood as their failure to respect noncontextuality, a notion that we argue to be fundamental in defining classicality. After giving a precise definition of noncontextuality, we proceed to prove Spekkens’ theorem of the impossibility of embedding quantum theory within a preparation noncontextual ontological model.

We proceed then to revisit our assumptions, and try to find whether a less ambitious notion of classicality can embed quantum theory. To do that, we revisit the historical theorems of von Neumann and Gleason, culminating with the recent version of Busch. In each of their frameworks, a “classical” formulation of quantum mechanics is again ruled out.

The next stop is the famous theorem of Kochen and Specker, that uses the weakest assumptions yet. We present three recent versions of it, by Cabello et al., Yu and Oh, and Peres and Mermin, that are considerable simplifications of the original proof.

The chapter concludes by presenting a recent theorem of Hardy, that “any ontological embedding of quantum theory is very uncomfortable”, and two specific contextual ontological embeddings of quantum theory.

Our conclusion is then that any reasonable ontological embedding of quantum theory is impossible; therefore there is something more in quantum mechanics that classical theories cannot quite capture. Chapter \thechapter is then dedicated to detail what this something is.

We begin by constructing our final definition of noncontextuality. Based on the recent work of Abramsky and Brandenburger, we show that the Fine theorem admits a natural generalization that applies to any set of observables, without regard to spatial separation. This generalization in its turn motivates a definition of noncontextuality that is a natural generalization of the definition of locality, with mostly the same mathematical structure – this allows us to consider generalizations of Bell inequalities that test noncontextuality instead of locality. Interestingly, this “new” definition was already implicit in the ancient works of Boole (and in the more recent works by Pitowsky), which motivates us to call these generalized Bell inequalities Boole inequalities.

This “new” approach is then formalized via a classical problem in mathematics, the marginal problem. Using its formalism, we gain access to powerful tools to separate contextual from noncontextual probability distributions, and with them derive a new result: a set of Boole inequalities that completely describes an infinite family of noncontextual polytopes.

Capítulo \thechapter Notation and definitions

The purpose of this part of the thesis is only to establish notation, not to teach quantum mechanics to anyone. If one needs such an introduction, we recommend the excellent book of Michael Nielsen and Isaac Chuang [chuang00].

We say that an operator is self-adjoint, i.e., , if for all . We shall only deal with finite-dimensional operators. The set of all self-ajoint operators is .

A quantum-mechanical observable is a self-adjoint operator.

We say that an operator is positive, i.e., , if for all .

A quantum state is a positive operator such that [hardy01]. Since we shall have no use for states such that , we can omit the normalization of our quantum states without ambiguity. The set of all quantum states is . A pure quantum state is an extremal point of , a rank-one projector . The vector of a pure quantum state will be denoted by , and the vectors are connected to the projectors by

The set of all pure states is .

An effect is a positive operator smaller than identity, i.e., . The set of all effects is . A set of effects such that describes a measurement333Except for the post-measurement state. and is called a POVM.

A projector is a self-adjoint operator such that . The set of all projectors is . A set of projectors such that describes a measurement and is called a PVM. Note that a PVM is a special case of a POVM.

The Born rule is the quantum mechanical rule for associating measurement probabilities with states and effects. We say that



Capítulo \thechapter Ontological embeddings of quantum theory


Classical measurements reveal information. Quantum measurements produce information.Marcelo Terra Cunha

The quest for embedding quantum mechanics in a “classical” theory is almost as old as quantum theory itself. People were disturbed with the role of measurement in the theory, particularly with its intrinsic randomness and non-repeatability. So they tried to explain away these features as emergent, rather than fundamental, as if they appeared because of a lack of control and understanding of a more refined theory, that would describe the “deeper” physics behind quantum phenomena. We call this refined theory an ontological theory.

But despite being familiar, the words “classical” and “ontological” have very fuzzy meanings. In the next section we shall pin them down and clarify them.

1 What is an ontological theory?

The first ontological models that appeared tried to “solve” the problem of non-determinism. They postulated that was not the real state of nature, but rather some kind of shadow of it. So they postulated that there was a real state, an ontic state444The reader that is well-acquainted with the subject might be wondering when the expression “hidden-variable” will appear. Well, it won’t., called , that if known would render all measurement outcomes deterministic. That is, given a PVM555Even the most determined determinist can’t hope for a POVM to be deterministic. We’ll explain why in a while. , the probability of outcome given would be either or , that is, we can define a response function

such that is the probability of outcome . Here, is any space in which our ontic states are defined, and to account for the fact that , we require that for all . This is just the requirement that some outcome must occur in a measurement.

Then the subjective indeterminism of quantum theory would be recovered by the ignorance of which ontic states were really present in a experiment. That is, a quantum state would determine a probability distribution over . This property can be thought of as “you were trying to generate state , but you ended up generating an ensemble of ontic states ”. As in quantum (and classical) mechanics, we shall call the ensemble itself a state, while reserving the term pure ontic state for the individual , which can of course be represented as an ensemble with a distribution.

Of course, we want this subjective indeterminism to agree with the predictions of quantum mechanics, so


1.1 On mixed states and POVMs

The early literature of ontological theories did not do this separation between states and measurements666With the honourable exception of the Kochen-Specker model, discussed in section 3.1. [bell66, mermin93]; instead they tried to define a deterministic value function that would answer with certainty the outcome of an experiment, given the quantum state and the ontic state, and recover the quantum statistics by averaging over . This is quite problematic, since it can only describe models in which itself has an ontic status777See section 3 for further discussion of this point.; it therefore can never describe experiments where the quantum state is explicitly epistemic, e.g., a mixed state. For instance, let’s say we have two pure states and with different deterministic outcomes and . Then if I prepare state with probability or state with probability , corresponding to the mixed state , the outcome must be

which is neither nor for non-trivial , a contradiction.

Using probability distributions like we do, this can be accommodated in a very natural manner:

Lemma 1.

If one prepares the quantum states with probabilities , then the corresponding ontic state is


Quantum mechanics tells us that . Writing these probabilities ontologically, we have888When doing calculations we shall often omit the integration variable , but only when there’s no risk of ambiguity.

Since is positive and arbitrary, this implies that

Note that this same rule is used to describe convex combinations of states in quantum and classical mechanics.

The issue with POVMs is similar: one can implement the POVM

simply by measuring the PVM with probability and the PVM with probability [ariano05]; we must have then , which is obviously not deterministic. We must accept, then, that for these kinds of “mixed” POVMs999Following [ariano05], we are calling “mixed” the POVMs that can be written as a convex combination of different POVMs, and “pure” those who can’t. the response functions must be modified to

that is, allowing the whole interval as image.

For “pure” POVMs, this argument does not apply, and we can not decide a priori whether to demand them to be deterministic. In fact, it is fruitful to allow even PVMs to be objectively non-deterministic101010However discomforting that may seem for some people, it’s certainly a milder discomfort than abandoning the notion of reality altogether as in quantum mechanics. See section 2.1., so we shall not exclude this possibility.

The most general case is, therefore,


and this is what an ontological theory should strive to reproduce, only falling back to pure states and PVMs when unavoidable.

2 Ontological models

With the definitions given in the previous section, it is already possible to construct some examples of ontological theories, to examine their features in a more concrete manner.

2.1 The naïve ontology

If we allow an ontological model to have objective non-determinism, what we gain in relation to quantum mechanics? Not much, actually. This ontological model is so similar to quantum mechanics that it can be confounded with a naïve interpretation of it, that ascribes ontological status to the pure states. Nevertheless, it is quite useful to examine meticulously this ontological model, to be aware of the problems that such a naïve interpretation has. This particular model was first proposed by [beltrametti95], and further explored in [harrigan10].

In this model, we are considering the pure states to be the ontic states , so we identify the ontic state space with , and define

The response function is then

and we recover the results of quantum mechanics by

We can see, then, that mathematically this ontological model is quite trivial. One interesting thing to examine, though, is the representation of mixed states in this formalism. Following lemma 1, we see that

which trivially reproduces the required quantum statistics. The problem with this approach, however, is that the ontic state depends on which convex decomposition of we chose to use. This makes the the notation suspect, since it should actually be , and blatantly violates the -algebraic definition of state [strocchi12], that requires that states that gives rises to the same statistics to have the same mathematical representation. We call this (unwanted) feature preparation contextuality, which we shall define more carefully in section 4.

Remember that it is common for beginners to be surprised by the fact that it is impossible to know which convex combination was actually used to construct a given density matrix. Regarding the pure states as ontological, this feeling becomes quite natural, since the mystery is why should the state give the same statistics as the state when .

To solve this problem, one might be tempted to ignore common sense (and lemma 1) and ascribe ontological status to mixed states, identifying with instead of ; then the ontic states would be just

relieving us of the basis-dependence. But this is in fact a terrible idea, since one can always write a mixed state as a convex combination of two different states and , as

If you want to regard every mixed state as ontological, you have, by lemma 1,

a flat-out contradiction.

One can now begin to suspect that it is not possible to avoid preparation contextuality; this will be proved in section 5. For now, we see that even the most humble ontological model, that does not even provide determinism, already has some very undesirable features. It would be a question then if a deterministic ontological model is even possible; fortunately this question was answered a long time ago in the positive. We shall see how in the next subsection.

2.2 Constructing a deterministic ontological model

In 1964, Bell had an idea on how to make a deterministic ontological model [bell66]: hide the quantum mechanical probability of an outcome in the measure of the set of ontic states associated to that outcome. I shall present here a modified version of his model that makes this point quite clear.

This model can describe in a deterministic way the measurement of a one-qubit PVM . The ontic space is , with ontic variable . The ontic state of a given quantum state is

and the response functions111111Note that the response functions depend explicitly on the label of the projectors, so it would be desirable to set a consistent ordering convention to avoid giving different results to and . are

where is the Heaviside step function defined by

One then recovers quantum statistics by uniform averaging over the ontic space:

The reader might have noticed that although the model claims to only work for a qubit, the mathematical formalism does not make any reference to this, and one might be tempted to think that it actually works for any two-outcome PVM. The fact that it does not work is more subtle, and we shall see why in section 7.

3 -ontic and -epistemic models

Both models presented in the previous section share a common feature: the quantum state has an ontological status. Either the ontic state is the quantum state itself, like in the naïve model, or it is the quantum state supplemented by real number in the unit interval, as in the Bell model. In both cases, knowing the (pure) ontic state of the system is enough to determine uniquely the (pure) quantum state that was prepared. These kind of models are called121212The concept of ontic and epistemic states was first introduced in [hardy04], and further formalized in [spekkens05, harrigan10]. A nice discussion of these concepts can be found in [leifer11]. -ontic, and have the equivalent but more operational definition:

Definition 2.

An ontological model is -ontic if for different quantum states and the ontic states have disjoint support, i.e.,

To motivate this definition it might be useful to make an analogy with classical mechanics: in it, an ontic state is a point in phase space, and ontic properties of it (like energy, momentum) are functions of the phase space point. Likewise, anything that is uniquely determined by the ontic state in an ontological theory should be regarded as ontic itself, as a change in it requires a change of the underlying ontic states. As the quantum state is uniquely determined by the ontic state in -ontic models, it has to be regarded as ontic, as it is not possible to change it without changing the underlying ontic states.

Apart from conceptual clarity, a reason to make this definition is that it is easy to see that -ontic models necessarily require instant transfer of information131313Only in the formalism, of course; if they displayed an observable violation of causality that would be a contradiction with quantum mechanics.. In the first case, where is the whole ontic state, it suffices to consider a measurement in an entangled state: Alice and Bob share and are spatially separated, Alice then measures the PVM and obtains, e.g., the result . Bob’s state then changes instantly from to , violating causality. Of course, if is not the whole ontic state, there is no need for a violation of causality: can tell us that the state of Bob’s system actually was all along, and so the ontic state does not change during the measurement.

To deal with this case, we need the epr gedankenexperiment141414The version presented here is Einstein’s version, reproduced in [harrigan10]. [einstein35]: consider that Alice can also measure the PVM ; then after her measurement Bob’s state will belong to the set if she measures the first PVM, or to the set if Alice measures the second PVM. Even if the results of any given measurement can be predetermined by , it cannot tell which measurement was made151515Indeed, it could conceivably determine which measurement Alice will make – here we are using the assumption that she has free will.. Since Bob’s quantum state does depend on which measurement was made (since the four possibilities are different), the formalism needs again instant transfer of information.

Another way to avoid the violation of causality is to say that is not ontic, but merely the representation of Alice’s knowledge of reality, i.e., epistemic. Then what changed after the measurement was actually just what Alice knew about Bob’s state, which is in fact a quite reasonable proposition. But this amounts to give up -ontic models in favour of -epistemic ones161616It is interesting to notice that although we’ve known this since 1935, the first ontological models were all -ontic.:

Definition 3.

An ontological model is -epistemic if it is not -ontic.

Again, an analogy with classical mechanics might be useful: the classical mixed state is a probability distribution over the phase space, and it is interpreted as epistemic, as it is merely an ignorance about which is the real phase space point that the system occupies. This is only possible as there is no restriction about the overlaps of different mixed states, i.e., the same phase space point can belong to numerous different mixed states. Notice that this definition is quite weak compared to the classical case: it only requires that there is one pair , whose ontic states and share a single in their support.

The obvious question to ask: is there a -epistemic model?

3.1 The Kochen-Specker model

Even before this question was raised, it was already answered by Simon Kochen and Ernst Specker [kochen67], by the ontological model they constructed as a counterexample to von Neumann’s theorem [vonneumann32]. It seems that the authors were trying to make a model that was somewhat physically plausible, and ended up making a -epistemic model. We presented it here as rendered in [harrigan10].

The ontic space is the unit sphere , and we shall use the Bloch vectors and to represent a pure state and a measurement projector in as well, defined via the isomorphism . The ontic state is then

making the model clearly -epistemic, since the only states that do not overlap are orthogonal states. The response function is given by

To recover the quantum statistics, notice that each of and has as support an hemisphere centred in and , so their intersection defines a spherical lune. To take advantage of this, let’s choose coordinates such that and lie in the equator of , so that , , and . We have then

This model does seem to be the most “natural” of the ontological models yet considered, and there have even been attempts to understand it physically [rudolph06]. In this same article, Terry Rudolph explores extensions of the Kochen-Specker model to higher dimensions, but fails to precisely reproduce quantum mechanics with them. A -epistemic model for higher dimensions has since then been found (we discuss it in section 9.2), but it does not have the simplicity of the Kochen-Specker model, and so it would be unfair to call it an extension of it.

3.2 Two theorems on -epistemic models

We can see, then, that -epistemic models are desirable and can actually be constructed. There are, however, two theorems that say that any such model, if it exists, has to be very unnatural. They are both based on the following idea:

Lemma 4.

If there are quantum states and measurements such that , then there can be no in the support of all .


If these conditions are satisfied, then it must be true that

and therefore that for all in the support of . If there is a in the support of all the , making the model -epistemic, then , an absurd, since in the definition of the response functions we require that for all . ∎

Of course, if we could prove that for any pair of states the hypothesis of the lemma are satisfied, we would have proven that no -epistemic model is possible; but for a pair of states the hypothesis of the lemma are satisfied only if they are orthogonal, and by lemma 13 they must have disjoint support anyway:

Lemma 5.

If there are quantum states and measurements such that , then


, so the support of is contained in the support of . But implies that the supports of and are disjoint, and therefore the supports of and are disjoint, so

Instead, the two theorems we shall present consider larger families: the first considers families of three states to show that there are non-trivial examples, and the second argues that the existence of some specific families implies that any -epistemic model must be very unnatural.

Theorem 6 (Caves, Fuchs, Shack [caves02]).

If the convex hull of a family of states contains , where is the Hilbert space dimension, then there can be no in the common support of all .


For any state , it is true that . If we can find coefficients such that is a POVM, then lemma 4 applies and we’re done. What we need is

for . Taking the trace on both sides we get that . Simple algebra then shows us that

This theorem was first proven in [caves02], with a different objective. While it does not exclude -epistemic models, it shows there are a wide variety of families of states that can’t have an overlap. If the number of states is three, there are already examples in any dimension where they are not orthogonal; see equations (6) for an example.

The next theorem needs the following (very natural, in the author’s opinion) assumption about the composition of different systems:

Assumption 1.

If two quantum states and are prepared independently, such that their joint state is , then the corresponding ontic state for the joint system is .

Theorem 7 (Pusey, Barret, Rudolph [pusey11]).

Given assumption 1, no -epistemic ontological model of quantum mechanics is possible.


Consider the four quantum states , , , and . If there is a in the support of and , then is in the support of all four . If there is a POVM such that , then lemma 4 applies and we’re done.

Consider now the particular case and . Then if is the projector onto , it is easy to see that

and it is also easy (but tedious) to check that . Unfortunately, this simple strategy only works for this pair of states, and states with smaller overlap require measurements on a larger number of parts. For the proof of the general case, see the original article171717This proof uses the notation from [leifer11], which is clearer than the one in the original article. [pusey11]. ∎

This theorem has two immediate corollaries:

Corollary 8.

Any ontological model of quantum mechanics must violate causality.

One only has to notice that since the theorem excludes -epistemic models, we’re left with -ontic ones. And we have shown that those violate causality in the beginning of this section.

Corollary 9.

The ontic state space is uncountable.

In a -ontic model there is an injection of onto . Since is uncountable, must be uncountable. In fact, even if without assumption 1 we can still prove that is infinite; we shall do this in section 8.

The obvious question that this theorem raises is: can we do away with assumption 1 and prove once and for all that -epistemic models are always impossible? The existence of the Kochen-Specker model already hints that at least some weaker assumption is needed, since it is a bona fide -epistemic model. Of course, its existence does not contradict the theorem, since it only forbids models for dimension 4 or greater. In fact, soon after the Pusey-Barret-Rudolph was published, some of the same authors showed that without assumption 1 they could make a -epistemic model for a quantum system of any dimension. We shall describe this model in section 9.2.

This theorem already hints of a theme that shall be recurrent in the search for ontological models: we can in fact make ontological models for quantum theory, and in fact we can make them almost in any way that we like, but there’s a price to pay: the various aspects of the model become more and more intertwined. We can’t really talk of independent quantum systems, separation between state and experiment, nor even (as we shall see in the next section) talk about a measurement outcome without talking about the whole experiment. Of course, this bodes very badly for the idea of ontological models: in the extreme limit of this interdependence our ontological model only lists possible experiments and their results, without ever trying to make sense of them in a simpler and more general theory. A model like this wouldn’t be falsifiable by its very nature, but precisely because of this it is a perversion of the scientific method [popper34], and should therefore be rejected on methodological grounds.

What we seek, therefore, is not any ontological model, but one that might have some plausibleness. The ontological models present hitherto are of course very contrived, but by themselves they should not be taken as an evidence against the possibility of a reasonable ontological model, since they were conceived only as proofs of principle, without any inspiration from physical grounds.

4 Contextuality

One should contrast the state of research into contextuality to the state of research into nonlocality. It is quite clear that nonlocality has a better status: it was subjected to experimental tests much earlier1818181972 [freedman72], in contrast with 2000 [michler00]., and also had its potential as a resource for practical applications recognized much earlier1919191991 [ekert91], versus 2000 [bechmann-pasquinucci00].

This state of affairs has many causes, which certainly includes the intuitive appeal of nonlocality via its relation with relativity, but I’d like to focus in a more formal one: the definitions of nonlocality and contextuality. Right in the first paper about nonlocality, John Bell [bell64] already gave a clear operational definition of nonlocality, that was not dependent on quantum theory, but instead only on a general probabilistic framework. By contrast, the first definition of contextuality, also due to John Bell202020The concept appeared first in 1966 [bell66], in a critique of the Gleason theorem, whereas the name “contextuality” was created in 1978 [clauser78], by Clauser and Shimony., was very specific to quantum theory, and was not at all operational:

Definition 10 (Bell’s contextuality).

We say that an ontological model for quantum theory is noncontextual if the response function associated to the outcome of a PVM , i.e., depends only on and not on the whole .

This definition also lacks conceptual clarity: John Bell even thought that it was reasonable for a physical theory to be contextual [bell66]:

The result of an observation may reasonably depend not only on the state of the system (including hidden variable) but also on the complete disposition of the apparatus.

But one consequence of contextuality is precisely the violation of causality that he abhorred: consider, for instance, the PVM

If the real result , associated with the projector , depends on whether the other side of the PVM is or , then the apparatuses must always be able to communicate their arrangement to each other, even when the choice of arrangement is made with a space-like separation, which is of course absurd. This settles the question about ontological models of independent quantum systems. But what about single systems? Is there any unacceptable consequence of contextuality for them?

Yes! It also implies on a violation of causality. As put by Asher Peres and Amiran Ron [peres88]:

More generally, if but , suppose that we measure first and only a later time decide whether to measure or or none of them. How can the outcome of the measurement depend on this future decision?

Furthermore, this whole story about communicating apparatuses is quite queer, even when it is not a violation of causality. After all, all the evidence we have is that the measurement of commuting observables does not affect each other, and an ontological theory that requires this kind of communication would be very weird indeed. Another problem is that this communication could affect only the individual measurements , and must never be detectable in the quantum experiments we do. To postulate this kind of “cryptocontextuality”212121With apologies to Asher Peres. seems very unscientific: we would be making a theory which is about precisely what we can’t measure.

Another way to think about the weirdness of a contextual model is operationally: imagine that you are an experimentalist that has implemented an apparatus that can differentiate between the ground state and the excited states of a many-level atom. You try it hard, repeat your experiment a lot of times, with different input states, gather the statistics, and is confident that your apparatus is quite trustworthy; you now want to teach a friend experimentalist how to build a similar apparatus. Quite simple, isn’t it? You just tell him how you did, ask him to gather statistics, and compare with yours: if the statistics match, you’ve implemented the same experiment. Except it isn’t so if your physical theory is contextual: the statistics of the projector (the projector onto the ground state) are not enough to determine the results of the experiment, since according to definition 10 the real results depend on the rest of the (unmeasured) projectors; and these are not only the higher energy levels of the atom, but can in principle include any environmental data, such as the apparatus’ mass, the local weather, whether Virgo is ascendant…

In this way, we are rendered incapable of comparing experiments and establishing patterns, the very foundation of our scientific method. Notice the strong parallel between this discussion and the definitions of state and observable in the -algebraic axiomatization done by Franco Strocchi [strocchi12]. This motivates a new definition of contextuality, due to Spekkens [spekkens05], that takes into account these arguments:

A noncontextual ontological model of an operational theory is one wherein if two experimental procedures are operationally equivalent, then they have equivalent representations in the ontological model.

Within this reasoning, it becomes sufficient to have equivalent statistics to be able to identify different experiments, and we are able again to do science. But a definition that uses only words is quite imprecise, and we should codify it in order to avoid misinterpretations:

Definition 11 (Spekkens’ contextuality).

Let be the probability of obtaining the outcome when doing the measurement on a state prepared via procedure . Then we say that an ontological model of an operational theory is measurement noncontextual if


Analogously, we say that an ontological model of an operational theory is preparation noncontextual if


The central idea is simple: if measurements and give the same statistics for every preparation procedure , then we must say that they are in fact the same measurement, with equivalent mathematical representation, and if preparation procedures and give the same statistics for every measurement , then we must say that they are in fact the same preparation procedure, with equivalent mathematical representation.

Note that this definition improves on Bell’s definition by removing any explicit reference to quantum theory, talking about only an “operational theory”, i.e., a theory in which we can talk about preparation procedures, measurements, and probabilities. However, this is still not the definition we’re looking for. We want to be able to say whether a given probability distribution is contextual or not, as we do with the definition of nonlocality. This we shall do in the next chapter; for this one, this definition is good enough.

We want to specialize this definition to ontological models of quantum theory, as a matter of convenience, since that’s all we’ll be talking about. Note that in quantum theory is completely defined by the measurement operator and the quantum state , so that’s all our ontological model can take into account. More precisely

Definition 12.

We say that an ontological model of quantum theory is measurement noncontextual if

that is, if the response function associated to the outcome of a measurement depends only on the measurement operator . Analogously, we say that an ontological model of quantum theory is preparation noncontextual if

that is, if the ontic state associated to the preparation procedure depends only on the quantum state that is prepared.

What else could the ontic state possibly depend on? Well, in the ontological models we discussed in sections 2.1 and 2.2 it depended on the “true” basis of , making these states preparation contextual. It could also depend on the “true” purification of , or really anything that one might deem plausible or implausible. What about measurements? Well, the most famous sort of context is that of Bell’s definition of contextuality: the whole PVM , as do the ontological models discussed on section 9.2, but it could also be anything, such as the colour of the measurement apparatus, the latitude and longitude of the laboratory where the experiment is performed, etc.

One final remark: if quantum theory were an ontological model of itself then definition 11 (and 12) would imply that it is not contextual, since it is trivial to prove that


Since it is not, the oft-heard claim that “quantum mechanics is contextual” is just meaningless. What one probably means with it is that any ontological model of quantum theory must be contextual, repeating a situation that happen in the area of nonlocality: quantum mechanics is obviously a local theory, in the relativistic sense, but any ontological model of quantum theory must be nonlocal, leading to the meaningless sentence “quantum mechanics is nonlocal”.

5 Contextuality for preparation procedures

In this section we shall show that it is not possible to construct a preparation noncontextual ontological model of quantum theory [spekkens05]. This is not the conflict with quantum theory usually discussed, but we feel that it is appropriate to begin with it for three reasons:

  1. It is independent of assumptions on determinism

  2. It is simple

  3. It is novel

To begin, we’ll need to prove a simple lemma about how orthogonal states are represented in the ontic space . We’ll see that the possibility of distinguishing orthogonal states with certainty by a single-shot measurement implies that their representations in the ontic space must have disjoint support.

Lemma 13.

If two quantum states and are orthogonal then the corresponding ontic states and have disjoint support:


If and are orthogonal, then they can be distinguished with certainty in a single-shot measurement. To construct one such measurement, note that the supports of and must be orthogonal, and let be the projector onto the support of . Then

Writing these measurements ontologically, we have

so for all in the support of , and for all in the support of , so the supports of and are disjoint, and for all . ∎

We will also need the assumption that is violated by all the ontological models discussed so far:

Assumption 2 (Preparation noncontextuality).

With the groundwork laid, we can now state the theorem and prove it.

Theorem 14 (Spekkens [spekkens05]).

It is not possible to embed quantum theory into a preparation noncontextual ontological theory.


Let , , , , , and be quantum states such that


That such a family of states exists can be proven by exhibiting an example in dimension 2, that can be easily embedded in higher dimensions:


A nice way to visualize the orthogonality and completeness relations (5) is to represent states (6) in the plane of the Bloch sphere, as done in figure 1.

Figura 1: Representation of states (6) in the plane of the Bloch sphere. The barycenter of antipodal states or states which are connected by a triangle is .

Now we shall use lemmas 1 and 13 together with assumption 2 and relations (5) to derive a contradiction. Lemma 13 together with (5a) implies that


Lemma 1, together with assumption 2 and relations (5b), implies that


and together with relations (5c)


We shall conclude the proof by showing that the only simultaneous solution to (8), (9), and (7) is the all-zero solution

which is absurd, since probability distributions can’t be zero everywhere.

The disjointness relations (7) imply that for each at least one of and must be zero, and the same for the other letters. Therefore there are 8 different cases to examine, although only two are essentially different. The first one is when , , and are zero. Then (9) implies that , , and must also be zero. The second case is when , , and are zero. Then (8a) implies that , and (9a) implies that . But the only solution to is , and we can apply the previous argument to show that all probability distributions must be zero. The six remaining cases are simply relabellings of these two.

As the above argument applies to every , we have that all probability distributions are zero for every , and thus are not probability distributions. ∎

6 Gleason theorems

There are three theorems that I call “Gleason theorems”: von Neumann’s theorem [vonneumann32], Gleason’s theorem [gleason57] and Busch’s theorem [busch03]. Of these three, the most famous is certainly Gleason’s222222The most infamous being von Neumann’s. Busch’s theorem is still new., and that is why I chose to name this section after it. All three theorems share a similar structure: they postulate some properties that a measurement should have, and then prove that the only measurement that satisfies those properties is the quantum mechanical one . They can be interpreted in two ways:

  1. As an axiomatic improvement, by showing that the notion of quantum state and Born’s rule follow from weaker axioms.

  2. As excluding deterministic ontological theories, by saying that properties of should be true in any theory, not only in quantum mechanics. Then one only has to notice that Born’s rule is not deterministic.

If one chooses the first interpretation, all three theorems are perfectly fine, and in fact quite similar. Problems arise, however, if one insists on interpreting them as excluding deterministic ontological theories. Then von Neumann’s theorem becomes foolish232323The hasty reader might wonder why learn a foolish theorem. A quick answer would be to avoid repeating mistakes of the past [alicki08, zukowski09]. For a longer answer, read the section. [mermin93], as its assumptions already excludes a large class of ontological theories, without good reason.

6.1 von Neumann’s theorem

Theorem 15 (von Neumann [vonneumann32]).

Let be self-adjoint operators, and a function such that

  1. for real .

  2. for commuting .

  3. for non-commuting .

  4. for positive A.

Then any such function can be written as

where is a positive operator of unit trace.


Properties 1, 2, and 3 establish that is a linear functional on , and by the Riesz lemma can be represented as an inner product . Property 4 then implies that has unity trace, as , and property 5 implies its positivity, since in particular projectors are positive operators, and for all is the definition of positivity. ∎

We can see, then, that the theorem itself is quite simple, and its value resides in the strength of its assumptions, which we shall examine now. The first thing one may notice is that the theorem already makes use of the Hilbert space formalism for the observables, and the fact that the states also follow the same formalism seems almost like a tautology. But this is not the case. Quantum mechanics can already implement this formalism in experiments in a quite successful manner, and one may regard observable as just a proxy for the experiment that implements it; as can be any function a priori (we don’t even assume it is continuous), there is not limitation in using as its domain. We shall now proceed to examine the physical content of the assumptions.

Assumption 1 and 2 can be interpreted as doing classical post-processing to the data of a single experiment, the measurement of a PVM , that we define from the eigendecomposition of A. The multiplication of A by a constant is implemented just by multiplying its eigenvalues by the same constant. To implement the observable corresponding to the sum of commuting operators and one notices that they can be diagonalized simultaneously as and , and so their sum is just a combination and rescaling of the data coming from the outputs. Assumptions 4 and 5 can be justified by the possibility of interpreting as a probability: probabilities are positive, and some outcome must happen.

The one which is harder to justify is assumption 3, since , , and correspond to different experimental configurations: so the possibility of measuring just by processing the data coming from the PVMs that measure or is excluded. Its justification comes from the fact that in quantum mechanics , and our ontological theory must reproduce its results. But this is where von Neumann slips, and to make the slip more clear, it’s best to use the ontological notation, the correspondence being . So assumption 3 translates to

which is clearly overkill, since correspondence with quantum mechanics only requires that

that is, that the expected values correspond, not the values of the response functions themselves. For instance, in the Bell-Mermin model, discussed in appendix \thechapter, we can see that the response function (36) is clearly linear with respect to the sum of commuting observables242424Note that and commute iff for some real .


since the values that assumes are the eigenvalues of , and eigenvalues are linear with respect to the sum of commuting observables. Of course, this is not true when the observables do not commute, as we can see in the following example:

Therefore, we must conclude that this assumption is unfounded, and if no justification can be found to it, we must abandon von Neumann’s prohibition of ontological models. We shall see, however, that even if we abandon this assumption, we can still prove a von Neumann-like theorem, valid in a more restricted context: that is Gleason’s theorem. More surprisingly, however, is the fact that this assumption can be justified, by the consideration of POVMs. This realisation is what motivated the proof of Busch’s theorem.

6.2 Gleason’s theorem

Andrew Gleason was not concerned with von Neumann’s theorem, not even with the problem of ontological models for quantum mechanics. His goal was to study the mathematical foundations of quantum mechanics, and to strengthen its axiomatic basis by showing that essentially every measure on a Hilbert space is given by Born’s rule [gleason57]. Its significance to the exclusion of ontological models of quantum mechanics was first noticed by Bell [bell66], who also remarked that contextual ontological models were not bound by Gleason’s theorem.

Theorem 16 (Gleason [gleason57]).

Let be a separable Hilbert space over with , and a function such that for any PVM . Then any such function can be written as

where is a positive operator of unity trace.

The proof of this theorem is already well-known, and a bit boring, so we shall omit it. The interested reader may find it in the original work [gleason57], or in the clearer version by Bell [bell66].

It is easy to see that von Neumann’s functions satisfy all the properties of Gleason’s functions, and continue to do so even if we drop his questionable assumption 3, so it is certainly possible to interpret Gleason’s theorem as a “reasonable” von Neumann theorem, with weaker assumptions. Also notice that Gleason’s assumptions are explicitly non-contextual, by assuming that is only a function of the projector , and not of the whole PVM.

6.3 Busch’s theorem

Paul Busch was concerned with the justification of von Neumann’s assumption 3. He noticed that if one measures a POVM instead of a PVM, then it is possible to have in a single experiment two outcomes and that do not commute252525In fact, this happens in all non-trivial POVMs.., so it is perfectly natural to demand that , since one can measure just by combining the outcomes corresponding to and . He then restricted assumption 3 to sums of effects belonging to a single POVM, and was able to derive Born’s rule from it, thus resurrecting von Neumann’s theorem [busch99]. Later he realized that the form of his theorem was actually closer to Gleason’s than von Neumann’s; to obtain it from Gleason’s one only has to demand to be true for POVMs, instead of just form PVMs. Interpreted in this way, his theorem is a much stronger version of Gleason’s with a much simpler proof [busch03].

The proof presented here mostly follows the one presented in [fuchs02], with the difference that it does not require the domain of to be extended.

Theorem 17 (Busch [busch03]).

Let be a separable Hilbert space over262626 is the field extension of the rationals with the imaginary number , . or , and a function such that for any POVM . Then any such function can be written as

where is a positive operator of unity trace.


The proof begins by noticing that is in fact a linear functional on . From that, the Riesz lemma establishes that it can represented as an inner product. Positivity and normalization of then comes from the positivity and normalization of . We shall first prove the case where is over the complex rationals, and later extend the proof to the continuum.

First note that if is an effect, is also an effect. Then considering the POVMs and , where , we see that . Considering the particular case , we get that . On the other hand, if we consider and , we get . Combining these two cases, we see that , that is, for whenever both and are effects. Wrapping up, we have that

for rational whenever are effects, so already has some restricted linearity. If we can remove the restriction that are effects, we get full linearity on , and that’s what we’ll do now.

Consider the effects and . Then , and , so . Consider now and such that , but at least one of and is larger than unity, so and are not necessarily effects. Without loss of generality, let . Then , , and are all effects, and by the property we just proved, , so and

for any rational , so we have full linearity on . Let then be a MIC-POVM and, as such, a basis for . Then any effect can be written as for (a moment’s thought will convince you that complex numbers aren’t allowed). We can now define by solving the equations , and see that

Positivity of comes from considering the case where is a one-dimensional projector:

The unity of the trace comes from

This completes the proof for . To extend it to the continuum, note again that if , then , and so . Let then and be sequences of rational numbers tending to the real number such that . We have , and as such , so . From this fact, one can now retrace the proof and see that it also holds for . ∎

The reason that we decided to highlight the fact that Busch’s theorem holds for is that the original Gleason theorem fails for it, hinting that traditional contextuality might have problems dealing with subsets of [meyer99, pitowsky83]. This feature of Busch’s theorem was first noticed in [caves04].

6.4 Wrapping up

Busch’s theorem is clearly superior to von Neumann’s in every way, but this is not true for Gleason’s: they can be interpreted in different ways. Busch’s shows that there can’t be a non-contextual model capable of reproducing quantum mechanics in any dimension, while Gleason’s opens up the possibility of such a model existing in dimension two, if we only care about projective measurements. That such a model exists can be seen by looking at the Bell-Mermin model in appendix \thechapter; but if, like Gleason, the reader is not interested in the question of ontological theories, but in which measures are allowed given the Hilbert space structure of observables, the following counterexample272727Due to Marcelo Terra Cunha and Rafael Rabelo. should suffice:

Note that for this formula is simply Born’s rule.

It is easy to check that

as required in Gleason’s assumptions.

To see that for this formula can’t equal Born’s rule, notice that

only has one root, if considered as a function of the angle , whereas our has roots.

7 The Kochen-Specker theorem

A corollary of the Gleason theorem is that one can’t embed quantum theory in a noncontextual ontological model if , since the Born rule is explicitly noncontextual and non-deterministic; a direct proof of this fact might seem superfluous. But one might not like its assumptions: after all, it already assumes a fair bit of structure that is not quite needed and, more importantly, it needs to assume that the quantum valuation is defined for a continuous amount of projectors, which of course can never have experimental justification. This was the motivation282828The motivation can come from Gleason’s theorem, or from a 1960 work of Specker [specker60, seevinck11], that was independent of Gleason and also contained a “continuous” proof of contextuality. for Simon Kochen and Ernst Specker to develop a finite proof of noncontextuality, finding an inconsistency in any deterministic assignment of values to a set of experiments realizable in quantum mechanics [kochen67]. Another motivation to present it here is that it proves the claim in section 2.2 that noncontextual deterministic ontological models can not describe two-outcome PVMs.

In modern parlance, the Kochen-Specker theorem is referred to as a proof of state-independent contextuality, as the logical contradiction found depends only on the structure of quantum observables, and not on the statistics from the measurement of specific states. This situation contrasts, of course, with proofs of state-dependent contextuality, which we shall explore mainly on the next chapter.

More specifically, their proof says that we can’t attribute deterministic values to a set of projectors in dimension three respecting the quantum mechanical observation that in the measurement of a PVM one answer (and only one answer) always occurs. An elegant way to proceed with the proof is to represent this set of projectors in an orthogonality graph (where each vertex corresponds to a projector, and two vertices are connected iff the corresponding projectors are orthogonal), and map the quantum mechanical observation into two rules for colouring the graph:

  1. Two connected vertices can’t both have the value – If two projectors and are orthogonal, they can be measured simultaneously, and therefore and can’t both equal .

  2. In a loop of three connected vertices, one of them must have the value – If three projectors are mutually orthogonal, they form a PVM, and in a PVM one answer (and only one answer) always occurs.

The proof concludes by showing that no such colouring of the graph can exist, and therefore one can’t attribute deterministic values to this set of projectors. We shall, however, omit it. Even though it is quite beautiful, the proof is mainly of historical interest, as simpler proofs have hitherto been found. We refer the interested reader to the original paper, or the excellent exposition of it by Cabello [cabello96].

7.1 An 18-projector proof by Cabello, Estebaranz, and García-Alcaine

The simplest (with fewest projectors) such no-colouring proof that we currently know292929We do know that in dimensions 3 and 4 there are no no-colouring proofs with 17 projectors or less [cabello06, arends11]. was found in 1996 by Cabello, Estebaranz, and García-Alcaine [cabello96b]. In contrast with Kochen-Specker’s 117 projectors, it needs only 18 to generate a contradiction. These projectors are represented in figure 2, where is just a shorthand notation for the projector onto . This figure does not represent an orthogonality graph, which would be quite cumbersome, but an orthogonality hypergraph, where sets of four commuting projectors are connected by edges of the same colour.

Figura 2: Vectors for the 18-projector proof of the Kochen-Specker theorem. Reproduced from [cabello08] with permission from the author.

One could in fact proceed to prove directly that it is non-colourable (there are few non-equivalent potential colourings), but it is more elegant to use a parity argument: we know that in each context we must have one answer , so the sum over all answers in all contexts must be . But if we do this sum projector by projector, we see that each projector appears in exactly two contexts, and likewise each answer appears twice, so the sum over them must be an even number, a contradiction.

7.2 A 13-projector proof by Yu and Oh

Shockingly, more recently it has been found that a non-colourable graph is not necessary to prove state-independent contextuality. Yu and Oh [yu12] have found such a proof in dimension 3 based on a set of 13 projectors that does have a colouring that obeys rules 1 and 2. They argue that every possible colouring of their graph contradicts another prediction of quantum theory. The orthogonality graph is represented in figure 3, and its quantum realization is given by the vectors

where is just a shorthand notation for the projector onto . It is important for the proof that this is actually the unique quantum realization of the orthogonality graph up to a global unitary transformation, which is trivial to prove.

Figura 3: Orthogonality graph for the proof of Yu and Oh. Reproduced from [cabello12] with permission from the authors.

To obtain the contradiction with quantum mechanics, first note that no two can be assigned simultaneously. We shall prove this by contradiction. By the symmetry of the graph, there are only two cases:

  1. Assume that . Then by the KS rules we must assign to and , which oblige us to assign to and , a contradiction.

  2. Assume that . Then by the KS rules we must assign to and , which oblige us to assign to and , a contradiction.

This implies that , and furthermore that

But the lhs must be equal to the quantum expectation value ; since

we get that for any state, a contradiction.

7.3 A 9-observable proof by Peres and Mermin

Last but not least, we’d like to present the beautiful proof of the Kochen-Specker theorem done in 1990 by Asher Peres and David Mermin [peres90, mermin90], the Peres-Mermin square. It uses 9 four-dimensional observables, so in some sense it is larger than the previous two proofs, and also older; but it is also quite elegant, and so it might seem smaller to the human mind.



be the Peres-Mermin square, where , , and are Pauli matrices. Note that observables that lie in the same line or column always commute, so they are simultaneously measurable, and we should be justified in assigning them a predefined value . But also note that the product of the observables in each line or column is always plus or minus identity, relation that our predefined values should also respect. More specifically, this reasoning leads us to the relations