What Quantum Measurements Measure

What Quantum Measurements Measure


A solution to the second measurement problem, determining what prior microscopic properties can be inferred from measurement outcomes (“pointer positions”), is worked out for projective and generalized (POVM) measurements, using consistent histories. The result supports the idea that equipment properly designed and calibrated reveals the properties it was designed to measure. Applications include Einstein’s hemisphere and Wheeler’s delayed choice paradoxes, and a method for analyzing weak measurements without recourse to weak values. Quantum measurements are noncontextual in the original sense employed by Bell and Mermin: if , the outcome of an measurement does not depend on whether it is measured with or with . An application to Bohm’s model of the Einstein-Podolsky-Rosen situation suggests that a faulty understanding of quantum measurements is at the root of this paradox.

I Introduction

I a The Second Measurement Problem

The measurement problem is a central issue in quantum foundations, because textbook quantum mechanics uses the idea of a measurement to give a physical interpretation to probabilities generated from a quantum wavefunction, but never explains the measurement process itself in terms of more fundamental quantum principles. If, as is widely believed, quantum mechanics applies to macroscopic as well as microscopic phenomena, then it should be possible, at least in principle, to describe actual laboratory measurements in terms of basic quantum properties and processes, rather than employing “measurement” as an unanalyzed primitive.

It is convenient to divide the measurement problem into two parts. The first measurement problem, which is at the center of most discussions in the literature, is to understand how the measurement process can result in a well-defined macroscopic outcome or pointer position, to use the archaic but picturesque language of the foundations community, rather than some strange quantum superposition of the pointer in different positions, as results in many cases from a straightforward application of unitary time development: Schrödinger’s equation leads to Schrödinger’s cat. But even if the first measurement problem is solved, so the pointer comes to rest at a single position, the second measurement problem remains: what can one infer from the pointer position regarding the microscopic situation that existed before the measurement took place, which the apparatus was designed to measure? Experimental physicists talk all the time about gamma rays triggering a detector, neutrinos arriving from the sun, and other microscopic objects or events which are invisible, and whose existence can only be inferred from the macroscopic outcomes of suitable measurements. Should we take this talk seriously? Maybe we do, but why, if the second measurement problem remains unresolved? Would we have any confidence in the stories told us by cosmologists if they did not understand the operation of their telescopes well enough to interpret the data these instruments provide?

A recent (and at the time of writing continuing) controversy [1, 2] about the path followed by a photon passing through an interferometer on its way to a detector shows how difficult it is to analyze, using the tools of textbook quantum theory, with perhaps some additional ad hoc principles, a microscopic situation that is really not very complicated. This problem is, in turn, related to a hotly contested claim, published in a reputable journal, that information can be sent between two parties by means of a photon that is actually never—or at least hardly ever—present in the optical fiber that connects them [3, 4, 5]. What this suggests is that the failure of quantum physicists to solve the measurement problem(s) is not only an intellectual embarrassment—surely it is that, as pointed out by some leading physicists (see [6] and Sec. 3.7 of [7])—but also a serious impediment to ongoing research in areas such as quantum information, where understanding microscopic quantum properties and how they depend on time is central to the enterprise. In addition, a fuzzy understanding of quantum principles makes the subject hard to teach as well as to learn. Students confused by unfamiliar mathematics are not helped by the absence of a clear physical interpretation of what the mathematics means, something which neither textbooks nor instructors seem able to provide.

In this paper the second (and, incidentally, the first) measurement problem is addressed using the consistent histories, also known as decoherent histories, interpretation of quantum mechanic. While this approach is controversial (as is everything else in quantum foundations) it possesses specific principles and clear rules for applying and interpreting quantum theory at the microscopic level. These principles are comparatively few in number, include no reference to measurements, and apply universally to all quantum processes, whether microscopic to macroscopic, “from the quarks to the quasars.” They are, so far as is known at present, consistent in the sense that when properly applied they do not lead to contradictions, and they have resolved (perhaps ’tamed’ would be a better term) various quantum paradoxes; see Chs. 21 to 25 of [8] for a number of examples.

I B Article Overview

The remainder of the paper is structured as follows. Section II explores the second measurement problem from a phenomenological perspective using two paradoxes, the first by Einstein and the second by Wheeler, that show why the problem is both difficult and confusing. Section II C is a brief discussion of how a measurement apparatus can be calibrated to ensure its reliability. Next a brief summary of the consistent histories approach, along with references to literature that provides further details, constitutes Section III; readers already familiar with consistent histories ideas can skip it.

Section IV is the heart of the paper, and contains the key ideas needed to address the second measurement problem both for projective measurements, Sec. IV A, and for generalized measurements (positive operator-valued measures, or POVMs), Sec. IV B. The emphasis is on simple cases of single measurements; situations where there are several successive measurement on the same system are not discussed, though the histories methodology can also be extended to such situations. A useful conceptual tool, which so far as we know has not been pointed out previously, is the backwards map from output (pointer) states to earlier microscopic properties. It is very helpful in identifying the microscopic properties which have been measured in the case of a generalized measurement. A separate Sec. IV C discusses nondestructive measurements and preparations, both closely related to von Neumann’s measurement model. This may assist the reader in connecting the approach followed in this paper to ideas, such as wavefunction collapse, frequently encountered in textbook treatments and quantum foundations literature. The final Sec. IV D has a few comments about density operators.

Next in Sec. V the tools developed in Sec. IV are applied to six different situations, where the first two, Secs. V A and  V B, are closely related to the examples discussed earlier in Sec. II. The third, Sec. V C, is an elementary but not entirely trivial example of a POVM that is not a projective measurement. A fairly elementary, but again nontrivial, example in Sec. V D shows how a weak measurement can be interpreted in terms of quantum properties instead of the widely used “weak values.” The last two applications address topics which often come up in the quantum foundations literature, and are hence somewhat controversial. It is argued in Sec. V E, using a less formal and more physical approach than [9], that if one uses Bell’s original definition of “contextual,” quantum mechanics is in fact noncontextual, despite confusing claims to the contrary. Finally, the Bohm (spin singlet) model of the famous Einstein-Podolsky-Rosen paradox is discussed in Sec. V F from the perspective of what one can infer from measurements on one of the spin-half particles about its prior properties and those of the other spin-half particle.

The final Sec. VI is an attempt to summarize the most important conclusions about what it is that quantum measurements measure, while summarizing the principles which make it possible for the consistent histories interpretation to arrive at a satisfactory resolution of the second (as well as the first) measurement problem.

I C Notation and Acronyms

In addition to standard Dirac notation note the following:
 The symbol is a tensor product symbol equivalent to the usual , but used in a quantum history to separate an earlier event to the left of from a later event to its right.
 The curly brackets in indicate two histories: and .
  when is a normalized state. Square brackets may be omitted if the meaning is clear: in place of
 Superscripts are used as labels and not as exponents on projectors (where an exponent is never needed) and sometimes on other symbols in order to reserve the subscript position to label system or the time. Thus and in refer to for system at time , and at time .

The following acronyms (to be precise, initialisms) are placed here for ready reference:
 EPR = Einstein-Podolsky-Rosen
 PDI = projective decomposition of the identity. See (6) in Sec. III B
 POVM = positive operator-valued measure. See Sec. IV B

Ii Measurement Phenomenology

Ii a Einstein’s Paradox

Figure 1: Einstein paradox. (a) Spherical wave; (b) Particle moving on straight line through collimator; (c) Quantum wavepacket passing through collimator.
Figure 2: (a) Collimator with two holes. (b) Fluorescent screen a large distance to the right of the collimator. Due to constructive interference of waves coming from the two holes a particle can sometimes be observed in a region which is classically forbidden.

Figure  1(a) shows Einstein’s paradox (pp. 115-117 in [10], pp. 440-442 in [11]). A particle emerges from a small hole at the left and propagates as a spherical wave towards a curved fluorescent screen where its arrival is signalled by a flash of light at a particular point on the screen, a point which varies randomly on successive repetitions of the experiment. It seems as if the quantum wave collapses instantly when the particle reaches the screen, a result which bothered Einstein as it would mean a superluminal effect if every point on the screen is equidistant from the hole. An experimental physicist, on the other hand, might say that the particle travels on a straight line from the source to the screen, and could support that explanation by placing a collimator, a thick plate with a hole in it, between the source and the screen, and noting that now flashes are detected only at places on the screen which are connected to the source by a straight line passing through the hole, Fig. 1(b).

But isn’t this second perspective classical, not quantum mechanical? No, for there is a good quantum mechanical description in which the particle is a small wave packet traveling from the source to the screen, Fig. 1(c); one only has to assume that the particle emerging from the source is described by such a wave packet whose initial direction of propagation is random from one run to the next. (And this gets around another problem with wavefunction collapse. If the particle reaches the screen, does this mean that its failing to interact with the collimator has collapsed the spherical wave enough so that it can fit through the hole?)

Continuing on, if the collimator has two holes, Fig. 2(a), one will observe flashes on the screen due to particles which have passed through one hole or the other, but never simultaneous flashes behind both holes. Again, easy to understand using the picture of little wave packets. But consider the situation in Fig. 2(b) where, if the two holes are formed very carefully and the fluorescent screen placed a long distance away, the result will be an interference pattern with the distance between fringes determined by, among other things, the distance between the two holes and the de Broglie wavelength of the quantum particle. The particle must, in this case, be thought of as a wave passing simultaneously through both holes and emerging behind them with a well-defined phase. We have arrived at the double slit or two hole paradox so well described by Feynman [12].

Everyone knows that quantum particles are waves, and quantum waves are particles. The gedanken experiments just discussed, especially the contrast between Fig. 2(a) and (b), illustrate the fact that sometimes a particle (fairly well localized wavepacket) and sometimes a wave (coherence in phase over a macroscopic distance) description is needed in order to understand what is going on. The need to use different, and seemingly incompatible, descriptions is one of the fundamental difficulties behind the second measuring problem. One aim of the present article is to show how it can be addressed without invoking retrocausation: a future measurement influencing past behavior.

Ii B Mach-Zehnder with Removable Beamsplitter

Figure 3: Mach-Zehnder interferometer (a) with a source , two beamsplitters and , and detectors and ; (b) with the second beamsplitter removed.

Einstein’s paradox becomes easier to analyze if we consider the case of a Mach-Zehnder interferometer, Fig. 3(a), with an upper and lower arm connecting two beam splitters and , and the phases adjusted so that a photon—hereafter referred to as a ‘particle’—from the source on the left is always detected by the lower detector on the right. That the particle is, in some sense at least, in both the upper and the lower arm while inside the interferometer can be checked by inserting two phase shifters, one in each arm. One then finds that, depending on the choice of phases, the particle will sometimes be detected in and sometimes in . However, if both phases are identical, the particle will always be detected in . Additional checks can be made by blocking either the upper arm or the lower arm, and noting that when one arm is blocked the particle will sometimes arrive in and sometimes in .

If, on the other hand, the second beam splitter is absent, Fig. 3(b), the experimentalist will say that a particle detected in was originally in the upper arm of the interferometer, and if detected in it was in the lower arm, as these are the direct paths from the first beam splitter to the detectors. This can be checked by placing barriers in the upper or lower arms of the interferometer and noting that a barrier in the upper arm will prevent the particle arriving at , and one in the lower arm suppresses counts in . Similarly, if a nondestructive measuring device, something which will register the particle’s presence without seriously perturbing its motion, is placed in one of the arms, its outcome will show the expected correlation with the final detectors.

Wheeler’s delayed choice paradox [13] comes from asking what will occur if just before the particle arrives at , when it has already passed and is inside the interferometer, the second beamsplitter is removed. Alternatively, suppose that the second beamsplitter is absent while the particle is traversing the interferometer, but is suddenly inserted just before the particle arrives at the crossing point. One can imagine either of these experiments repeated many times, and the result will be that the presence or absence of at the crossing point at the instant the particle arrives there determines whether the particle is always detected in or randomly detected in and . And experimental checks can be carried out with phase shifters or barriers placed on the paths inside the interferometer. The paradox is perhaps most telling if one starts off with a series in which is absent, and the particle arrives randomly in or , so about half the time it is detected in , and hence, plausibly, it has been following the lower path through the interferometer. Now undertake a series of runs in which is initially absent, but is inserted in its proper place at the very last moment. In all of these runs the particle is detected by . But in roughly half of these cases, assuming there is no retrocausal effect from the later insertion of , the particle must have been traveling through the lower arm, and were it traveling through the lower arm it would, upon passing through , arrive with equal probability in either of the detectors. Thus it might seem that sometimes the particle when traveling through the lower arm of the interferometer senses that at a future moment will be present and decides to split itself into a pair of wavepackets, one in each arm, with an appropriate phase, so that it will arrive with certainty at . That seems very strange. Is there not some other way of understanding what is going on without invoking magic or retrocausation?

Figure 4: Mach-Zehnder interferometer with two inputs (a) arranged to determine relative phase between the two arms; (b) arranged to measure which path (which arm).

Adding a second source to Wheeler’s paradox, Fig. 4, makes it somewhat analogous to our previous discussion of Einstein’s paradox. In any given run, only one source emits a photon, and the phases have been chosen so that with the second beamsplitter present a particle (photon) which originates in source will later arrive in , and one emitted by will arrive at . In both cases the particle while inside the interferometer is a superposition of a state in the upper arm and a state in the lower arm; in particular let us assume the phases are such that


One can then regard the second beam splitter and the two detectors as forming a single measurement apparatus that measures “which phase?”—the difference between the two possible relative phases, vs.  in (1)—when is in place; or “which arm?” if has been removed. Note the analogy with the situation depicted in Fig. 2 (with (a) and (b) interchanged). The fact that in any particular run the experimenter, by leaving in place or removing it can measure which phase or which path, but cannot determine both, is a fundamental fact of quantum mechanics. Taking it into account is essential if one is to make progress in resolving the second measurement problem.

Ii C Calibration

Competent experimenters check their apparatus in various ways to make sure it is operating as designed and gives reliable results. There are varieties of tests, some suggested earlier: placing collimators in various places, removing beam splitters from a Mach-Zehnder interferometer or placing absorbers in its arms, etc. If the apparatus is designed to measure the value of some quantity (observable) associated with a particle, the simplest form of calibration means preparing many particles with known values of , thus having the property corresponding to some particular eigenvalue, and seeing if the measurement outcome (pointer position) corresponds in each case to the known property. Once the calibration has been carried out the experimenter can be confident that when a particle of this type is measured by the apparatus, the outcome will indicate the value of possessed by the particle just before it reached the apparatus, even when the particle’s prior history is unknown. Experimenters frequently make assumptions of this kind, and without it a significant part of experimental physics would be impossible. A proper quantum mechanical theory of measurement must be able to justify this practice. In reality things are not always so simple, since apparatus is never perfect and one may have to account for possible errors; however, for the present discussion we shall focus on the ideal case in order to get to the essentials of quantum measurements.

Iii Properties, Probabilities and Histories

This section contains a rapid review of material found elsewhere; readers familiar with consistent histories can skip ahead to Sec. IV. See [14] for an introduction to consistent histories, [8] for a detailed treatment, and [15] for extended comments on some conceptual difficulties.

Iii a Quantum Properties

We use the term physical property for something like “the energy is less than 2 Joules” or “the particle is in a region in space,” something which can be true or false, and thus distinct from a physical variable such as the energy or position, represented by a real number in suitable units. Von Neumann, Sec. III.5 of [16], proposed that a quantum property should correspond to a subspace of the quantum Hilbert space, or, equivalently, the projector (orthogonal projection operator) onto this subspace. (We are only concerned here with finite-dimensional Hilbert spaces for which all subspaces are closed.) What one finds in textbooks is consistent with von Neumann’s prescription, though this is not always clearly stated.

A projector, a Hermitian operator equal to its square, is the quantum analog of an indicator function on a classical phase space , a function that takes the value 1 if at the point the corresponding physical property is true, or 0 if it is false. For example, the property that the energy of a harmonic oscillator is less than 2 Joules corresponds to an indicator equal to 1 for inside, and 0 for outside, an ellipse centered at the origin of the phase plane. A quantum projector’s eigenvalues are or , which supports the analogy with a classical indicator. One can make a plausible case that any “classical” property of a macroscopic physical object, when viewed in quantum terms, is represented by a quantum projector on a very high-dimensional subspace of an enormous Hilbert space.

The smallest nontrivial quantum subspace is one-dimensional, consisting of all complex multiples of a normalized ket , and the projector is given by the corresponding Dirac dyad


We will often make use of this convenient square bracket notation. A projector on a two-dimensional subspace can be written in the form , where and form an orthonormal basis for the subspace, and similarly for larger subspaces.

The analogy between quantum projectors and classical indicators also works for negation. The projector corresponding to the property ’NOT ’ is , where is the identity operator, and the same holds for a classical indicator when is understood as the function taking the value 1 everywhere on the phase space. Given two indicator functions representing properties and , their product, which is obviously the same written in either order, , is the indicator for the property . (Think of “energy less than one Joule” AND “momentum is positive”). But in the quantum world the product of two projectors and is itself a projector if and only if they commute: , and in this case the product can be associated with the property .

But suppose that and do not commute, what then? Consider a specific example, that of a spin-half particle, where the Hilbert space is two-dimensional, and spanned by two orthonormal kets and , eigenvectors of , the component of spin angular momentum, with eigenvalues and in units of . The projectors


in the notation used in (2), represent these two physical properties; they commute and their product is 0. Similarly,


are eigenvectors corresponding to the eigenvalues and of the component of spin angular momentum . The corresponding projectors


commute, and their product is zero. However, neither nor commutes with either or . Because the projectors do not commute there is, in the consistent histories approach, no way to make sense of a statement like “ AND .” And there is no nontrivial subspace of the Hilbert space which can be associated with such a combination. (In quantum logic [17, 18] one would associate the trivial subspace containing only the 0 ket with such a conjunction, but quantum logic has its own set of conceptual difficulties; see [15].) This is an instance of the single framework rule discussed in more detail in Sec. III C.

From time to time the claim has been made that the consistent histories approach is logically inconsistent. However, none of these claims when scrutinized has turned out to be correct. What typically happens is that the author has either overlooked the single framework rule or has not taken it seriously. Arguments that show that consistent histories is internally consistent will be found in Ch. 16 of [8], Sec. 4.1 of [15], and Sec. 8.1 of [19].

Iii B Quantum Probabilities

Ordinary (Kolmogorov) probability theory employs a sample space of mutually exclusive items or situations which together exhaust all possibilities, and an event algebra which in simple situations consists of all subsets (including the empty set) of items from the sample space. In classical statistical mechanics the sample space can consist of all the distinct points that make up the phase space , but one could also cut up the phase space into nonoverlapping regions, “cells”, and use these for the sample space. The quantum analog of a sample space is a projective decomposition of the identity (PDI): a collection of projectors (the superscripts are labels, not exponents) satisfying


Obviously, each projector commutes with every other projector in the PDI. The simplest choice for a corresponding event algebra, one which will suffice for our purposes, consists of the 0 projector, all projectors belonging to the PDI, and in addition all sums of two or more distinct projectors from the PDI.

Given a physical variable represented by a Hermitian operator (there is no harm in using the same symbol for both) there is an associated PDI employed for the spectral decomposition of ,


where the eigenvalues are the possible values which can take on, and identifies the subspace where takes on the value . (We assume that if in (7); thus for degenerate eigenvalues the corresponding may project onto a space of dimension greater than 1.)

In classical physics it is usually the case that only a single sample space need be considered when discussing a particular physical problem, and so its choice needs no emphasis, and it may not even be mentioned. In quantum physics this is no longer the case: many mistakes and numerous paradoxes, e.g., the Kochen-Specker Paradox (see Sec. V E), are based on not paying sufficient attention to the sample space in circumstances in which several distinct and incompatible sample spaces may seem like reasonable choices. For this reason it is convenient to use a special term, framework, to indicate the sample space or the corresponding event algebra which is under discussion.

A central feature of consistent histories is the single framework rule, which states that probabilistic reasoning in the quantum context must always be carried out using a specific and well-defined framework. This rule does not prevent the physicist from using many different frameworks when analyzing a particular physical problem; instead it prohibits combining results from incompatible frameworks. Two PDI’s and and the corresponding event algebras are compatible provided all the projectors in one commute with all the projectors in the other: for every and . In this case there is a common refinement, a PDI consisting of all nonzero products of the form . Otherwise the frameworks are incompatible, and the single framework rule prohibits combining a (probabilistic) inference made using one framework with another that employs a different framework. If the two frameworks are compatible, then inferences in one can be combined with those in the other using the common refinement, which contains both of the event algebras, so again only a single framework is required. (An additional requirement—consistency conditions—for combining frameworks arises in the case of quantum histories, Sec. III C.)

A PDI can be assigned a probability distribution , where the are nonnegative real numbers that sum to 1, and this distribution will generate the probabilities for all the elements in the corresponding event algebra, just as in ordinary probability theory; e.g., the property is assigned the probability . In quantum mechanics there are various schemes for assigning probabilities. One method starts with a wavefunction or pure quantum state , and assigns to the elements of a PDI probabilities


In this situation it is helpful to refer to as a pre-probability, i.e., it is used to construct a probability distribution. Since probability distributions are generally not considered part of physical reality, at least not in the same sense as physical properties, a ket or wavefunction used in this way need not be interpreted as something physical; instead it is simply a tool used to compute probabilities. But in some other context may be a way of referring to the property represented by the projector . Carelessly combining these two usages can cause a great deal of confusion. Note in particular that as long as two of the in (8) are nonzero, the property , or to be more precise the minimal PDI that contains it, is incompatible with the PDI . Hence the single framework rule prevents using as a pre-probability, as in (8), while at the same time regarding it as a physical property of the quantum system.

Since the consistent histories interpretation of quantum theory allows many distinct but incompatible frameworks, a natural question is: Which is the right framework to use in describing some situation of physical interest? In thinking about this it is helpful to remember that a fundamental difference between classical and quantum mechanics is that the former employs a phase space and the latter a Hilbert space for describing a physical system. At a single time a single point in the phase space represents the “actual” state of a classical system: all properties (subsets of points in the phase space) which contain this point are true and all which do not contain the point are false. The term unicity has been used in Sec. 27.3 of [8], and in [15, 14] to describe this concept of a single unique state of affairs at any given time. However, in the quantum Hilbert space the closest analogy to a single point in classical phase space is a one-dimensional subspace or ray. If one assumes that one particular ray is true, then one might suppose that all rays orthogonal to it are false. But there are many rays that are neither identical to nor orthogonal to the ray in question; what shall be said of them? Thus attempting to extend the concept of unicity into the quantum domain runs into problems. We have good reason to believe that physical reality is better described by quantum theory than by classical physics, and hence certain classical concepts must be abandoned, to join others, such as the earth immobile at the center of the universe, which modern science has rendered untenable, even though for certain purposes they may remain useful approximations. Unicity seems to belong to that category.

But the question remains: what are the criteria which lead to the use of a particular framework, rather than another which is incompatible with it? The examples in Sec. II and various applications in Sec. V suggest that quantum physical situations possess what one might call different aspects, and a quantum description of a particular aspect can only be constructed using a framework compatible with that aspect. For example, the “aspect” of a spin half particle can only be discussed using the framework; the framework is of no use. As is usual with with unfamiliar concepts, the best way to understand them is to apply them to several different examples. In particular, in Secs. V A and V B we will show how the use of frameworks can “untangle” the paradoxes in Secs. II A and II B.

Iii C Histories and the Extended Born Rule

A quantum history is best understood as a sequence of quantum properties at successive times. A classical analogy is a sequence of coin tosses, or rolls of dice. The theory is simplest if one employs a finite set of discrete times, rather than continuous time. This is no real limitation, as these times may be arbitrarily close together. A history associated with the times can be written in the form


where each is a projector representing some quantum property at the time , and the separating properties at successive times are tensor product symbols, a variant of . Thus if is the quantum Hilbert space at one time, in (9) is a projector on the tensor product history Hilbert space . A family of histories is a collection of such projectors that sum to the history identity , thus a PDI. For present purposes it suffices to use a family in which the histories are of the form


where , see (2), is the projector on a pure state . The superscripts are labels distinguishing different projectors at the same time, and together they form a vector . In addition there is a special history which is assigned zero probability, and whose sole purpose is to ensure that the history projectors sum to .

A complete family of histories is one in which the sum to , but we will also use the term if they sum to . One way to ensure that the family is complete is if for each time it is the case that the are a PDI of , but this is often too restrictive. There is no reason why a family should not contain projectors on states “entangled” between different times, but in the following discussion we will only need “product” histories as in (9).

Since a family of histories is a PDI it can serve as a probabilistic sample space for the quantum analog of a classical stochastic process such as a random walk. As in the classical case there is no fixed rule for assigning probabilities to such a process. However, in a closed quantum system for which Schrödinger’s equation yields unitary time development operators (e.g., in the case of a time-independent Hamiltonian ) these can be used to assign probabilities to a history family using an extension of the Born rule, provided certain consistency (or decoherence) conditions are satisfied. If all histories start with the same initial pure state one defines a chain ket (an element of not ):


The consistency conditions are the requirement that the chain kets are orthogonal for distinct histories,


When it is satisfied the extended Born rule assigns to each history of the sample space a probability


The orthogonality requirement (12) is not unnatural when one remembers that the are elements of the single-time Hilbert space , not the history space , and the ordinary Born rule is used to assign probabilities to an orthonormal basis, or, more generally, a PDI. In fact, for a history involving only two times, and , the consistency condition is automatically satisfied because the for different form a PDI on , and then (13) is just the usual Born probability.

It is important to notice that quantum mechanics allows a description of what happens in an individual realization of a quantum stochastic process, even though the dynamics is probabilistic; the same as in a classical stochastic theory. One is sometimes given the impression that quantum theory only allows a discussion of statistical averages over many runs of an experiment. This is not the case, and it is easy to identify instances where individual outcomes and not just averages play a significant role. For example, if Shor’s quantum algorithm [20, 21] is employed to factor a long integer, then at the end of each run the outcome of a measurement is processed to see if this result solves the problem, and if it does, no further runs are needed. While it may take more than one run to achieve success, the outcome of a particular run is a significant quantity, and not just the average over several runs. Similarly, in the case of Einstein’s paradox, Sec. II A, a flash of light at a particular point on the fluorescent screen, Fig. 1(a), can be understood to mean that the particle traveled on a straight (or almost straight) path from the source to the screen on this particular occasion.

Iv Measurement Models

Iv a Projective Measurements

Our first model is a generalization of the one introduced by von Neumann in Sec. VI.3 of [16].1 Let be the Hilbert space of the system to be measured, which for convenience will hereafter be referred to as “the particle”, whereas the measuring apparatus, including its environment if that is significant, is described by a Hilbert space . The total system with Hilbert space is thought of as closed, so its dynamics can be associated with a collection of unitary time development operators . We will focus on histories involving three times , where the interval from to is so short that and thus


with negligible error. At the initial time the particle can be assigned a quantum state in , and the apparatus (and environment) a state in ; hence an initial state


for the combined, closed system. The use of pure states rather than density operators does not involve any loss of generality; see Sec. IV D for additional comments. But the requirement that in (15) be a product state is important. It means that the particle and the apparatus (or environment) are initially uncorrelated, at least to a sufficiently good approximation.

We assume that the interaction between the particle and the apparatus takes place during the time interval between and , and as a consequence of this interaction


where the form an orthonormal basis for the particle Hilbert space , while the , which lie in the Hilbert space , are states of the particle plus apparatus that correspond to distinct macroscopic outcomes of the measurement—distinct “pointer positions” of the apparatus, to use the traditional terminology of quantum foundations—in the sense of satisfying (17) below. (The space has the same dimension as , but we have not written it in that form since sometimes the particle does not even exist at the end of the measurement. See the discussion of nondestructive measurements in Sec. IV C.) These pointer positions are mutually orthogonal, as is always the case for states which are macroscopically distinct. To be more precise, we assume there is a PDI on such that


where each is a projector on a macroscopic subspace (property) whose interpretation is that the pointer is in position , and (17) says that lies within the subspace defined by . To ensure that the sum to the identity on , assume that the possible pointer positions are represented by , and let


project on the subspace that includes all other possibilities (e.g., the apparatus has broken down).

To better understand what this measurement measures it is useful to introduce an isometry defined by


An isometry, like a unitary, preserves lengths, and is characterized by the requirement that


where is the adjoint of . (The operator is a projector on the subspace of that is the image of under of , and is not important for our discussion.)

The isometry that corresponds to in (16) is


Combining this with (17) leads to


Multiplying both sides on the left by and using (20) yields


which implies that


That is, the “backwards map” applied to the projector on the subspace that corresponds to pointer position is the prior microscopic state giving rise to this outcome.

To complete the discussion of projective measurements we need to introduce families of histories. Let us begin with the family consisting of histories


at times , where was defined in (15), and


is an arbitrary state of . The chain kets


associated with these histories (remember that ) are obviously orthogonal to each other in view of (17) and the fact that the form a PDI. Thus the Born rule assigns a probability


the absolute square of the coefficient of in (26), to the pointer outcome , in agreement with textbooks, but without employing any special rule for measurements, since (28) is nothing but a particular application of the general formula (13) that assigns probabilities to histories.

Note that the first measurement problem, attempting to give a physical interpretation to the macroscopic superposition state


never arises, because has never entered the discussion. To be sure, from the consistent histories perspective there is nothing wrong with the family consisting of just the two histories


where each history uses one of the projectors inside the curly brackets. It (trivially) satisfies the consistency condition, and the Born rule assigns a probability of 1 to . It is a perfectly good quantum description which, however, is incompatible with the family (25) if at least two of the in (26) are nonzero, since will then not commute with the corresponding , rendering a discussion of measurement outcomes impossible. Combining the families in (25) and (30) is as silly as simultaneously assigning to a spin-half particle a value for along with one for . The choice of which of these families to use will generally be made on pragmatic grounds. In particular, if one wants to discuss real experiments of the sort actually carried out in laboratories and what one can infer from their outcomes—one might call this practical physics—the choice is clear: one needs to employ a family in which measurements have outcomes.

There are physicists who object to a framework choice based on pragmatic grounds which seem related to human choice, e.g., see Sec. 3.7 of [7], though they might not object to astronomers interested in the properties of Jupiter using concepts appropriate to that planet rather than, say, Mars. Of course this is a classical analogy, but thinking about it, along with the spin-half example mentioned earlier, may help in understanding how the single framework rule can assist in sorting out quantum paradoxes while still allowing quantum theory to be an objective science. The idea that there can only be exactly one valid quantum description, the principle of unicity discussed in Sec. III B, runs into difficulties in the case of Einstein’s paradox, Sec. II A, as well rendering the infamous first measurement problem insoluble for reasons that have just been discussed.

After this diversion let us return to the second measurement problem. To see how the macroscopic measurement outcomes are related to the microscopic properties the measurement was designed to measure, we introduce a refinement


of the family (25) considered previously. Here at the intermediate time is to be interpreted, following the usual physicists’ convention, as ; the property of the particle and no information about anything else. The corresponding chain kets, see (26) and (16),


are mutually orthogonal since the are orthogonal. Thus the family is consistent, and yields a joint probability distribution


where the subscripts of the arguments of indicate time. Summing over gives (28), and combining that with (33) yields conditional probabilities:


assuming is nonzero. In words: if the measurement outcome (pointer position) is , i.e., , at time , the particle certainly had the property at time . Thus the second measurement problem has been solved for the case of projective measurements. Note that this conclusion does not depend upon the initial state , which only determines the probability of the measurement outcome as noted above in (28). (If , (34) does not hold, but it is also not needed, since the outcome will never occur.)

Iv B Generalized Measurements and POVMs

The basic setup for discussing generalized measurements is the same as that in Sec. IV A: times , an initial state (15) at time , negligible time development (see (14)) between and , the isometry defined in (19), and a PDI corresponding to different pointer positions at . However, we now drop the assumption of an orthonormal basis of with lying in the space . Instead, use the backwards map of the projectors on the pointer subspaces to define for each an operator


on . For a projective measurement is the property possessed by the particle at the earlier time when the measurement outcome is , and we shall see that something similar, though a bit more complicated, holds for generalized measurements. Another special case, a generalized projective measurement, is one in which each is a projector and the form a PDI, but one or more may have a rank (so project on a subspace of dimension) greater than 1.

The collection forms a POVM (positive operator-valued measure), a collection of positive semi-definite operators with sum equal to the identity on . The equality


for an arbitrary in , with , demonstrates that , just like the projector , is a positive semi-definite operator. Summing both sides of (36) over and remembering that the form a PDI shows that


completing the proof that is a POVM. (Note that the special in (18) gives rise to .)

The first measurement problem for such a POVM is solved in exactly the same way as for the von Neumann model: use the PDI at time , not the projector of the unitarily evolved state. The second measurement problem is more subtle, as it requires introducing suitable properties as events at to produce a consistent family. The choice is not unique, but the following is a quite general and fairly useful approach. The spectral decomposition of can be written in the form


where for each fixed the labeled by are projectors that form a PDI on , while the are the corresponding eigenvalues of . We assume the eigenvalues are unique, when , so some of the may have rank greater than one. As with any PDI the projectors are orthogonal and sum to the identity:


The family of histories


when augmented with the uninteresting (of zero weight), is complete, since


The chain kets


are obviously mutually orthogonal if the two values differ. For a given we need to consider


where the second equality follows from (35) the third from (38) and (39). Thus the family defined in (40) is consistent, with probabilities


where subscripts 1 and 2 identify the times and before and after the measurement takes place. It follows that


What (46) tells us is that if the outcome (pointer position) is the system earlier had one of the properties , with probabilities that will in general depend on the initial particle state . If is itself a projector or proportional to a projector, as will be the case for a general projective measurement, one can be sure that the particle possessed the property at time . If the support of is a proper subspace of , the system can be assigned the property corresponding to this subspace at the time immediately before the measurement. If neither of these conditions holds it may be possible on the basis of additional information about to assign probabilities to the different for this , or perhaps argue that some of these probabilities are negligible, allowing one with reasonable confidence to say something nontrivial about the property possessed earlier by the particle.

Note that whereas for a fixed the for different are mutually orthogonal, for different values, different outcomes of the experiment, one may be able to draw different and perhaps mutually incompatible conclusions about the prior properties. This is a feature of quantum measurements which has given rise to a lot of confusion, and is best discussed in terms of a specific example; see the one in Sec. V C. While the consistent family in (40) is not the only possibility for discussing what one can learn about the prior state of the particle from measurement outcomes, it is a rather natural choice, especially when nothing else is known about the measured system.

Iv C Nondestructive Measurements and Preparations

A measurement determines a past property whereas a preparation is a procedure to prepare a particular quantum state, and a nondestructive measurement combines the two: the apparatus both measures and prepares certain properties. While preparations lie somewhat outside the scope of the present paper, it is worthwhile making some remarks on the subject, if only because of the confusion found in textbooks and other publications, where “measurement” is often (incorrectly) defined as something that has to do with “wavefunction collapse.” The confusion goes back to von Neumann’s original measurement model in which, using the notation of the present paper, , and the isometry in (19) takes the form


with the an orthonormal basis of . (The and the PDI now refer to rather than , as in our earlier discussion, but this is a minor difference.) In place of (31) use the family


It is straightforward to show that it is consistent, since all the chain kets vanish except for the cases , with the result


This measurement is nondestructive in the sense that from the outcome one can immediately infer that the particle property both before and after the measurement was , so it did not change. Furthermore, this conclusion is independent of the initial particle state (assuming only that in (26) is not zero; if it is zero the outcome will never occur). That the earlier is replaced by the later in the case of outcome is the idea of “wavefunction collapse,” a confusing notion best replaced with the second equality in (50).

Discussions of measurements are sometimes based on a generalization of (47) in which for any in the isometry is assumed to be of the form


where the are an orthonormal collection, and the Kraus operators (note that is a label) are arbitrary maps of to itself subject only to the condition that


which guarantees that in (51) is an isometry. Regarded as a measurement, which is to say something that determines the property of the particle at , this is equivalent to a POVM in which


The nondestructive model in (47) is easily extended to a general PDI on by setting the Kraus operator in (51) equal to , whence it follows that any initial in with the property , i.e., will result in a measurement outcome and will emerge unchanged at time . This is the essence of Lüders’ proposal [22, 23], which is best regarded as a particular model of a nondestructive measurement and not (as sometimes supposed) a general principle of quantum theory.

In the case of a preparation one is not interested in the property of the particle at an earlier time, but instead its state at a time after the interaction with the measuring device is over. If, for example, the isometry is given by (47), then according to (50) if the pointer is in position at time one can be certain that the particle is in state at this time. But a simpler and more general preparation model is obtained if in place of (47) one assumes there is a normalized state at time and an isometry such that


where the are probabilities that sum to 1. The states are normalized, but we do not assume that they form a basis; in particular, they need not be mutually orthogonal. Nonetheless one can infer that if at the pointer is in position , the particle at this time is in the state . Note that even if the are not orthogonal the states are orthogonal and hence distinct; see Ch. 14 in [8] for some discussion of states of this sort. One might worry that this preparation model is stochastic: if outcome is desired, sometimes it will occur and sometimes it won’t. But since the pointer position is macroscopic it is not difficult to design a system whereby undesired outcomes are removed (e.g., run the particle into a barrier), or if one is repeating the experiment many times, simply keep a record of the value of k for each run, and throw out the runs for which it is not equal to 3.

Iv D Some Remarks About Density Operators

The foregoing discussion of measurement models employed pure states and projectors on pure states, and it is natural to ask what the appropriate formulation ought to be if one is dealing with mixed states. Mixed states arise in quantum mechanics in two somewhat different ways. The first is analogous to a classical probability distribution: one has in mind some collection of pure states with associated probabilities , known as an ensemble, and the associated density operator is


Suppose particles are prepared in states chosen from this ensemble with the specified probabilities, and then measured. What can one infer about the state of a particle just before the measurement, given a particular outcome? Since the only role of the initial state in Secs. IV A and IV B is to assign probabilities, in the case of a random input one replaces by when computing averages; e.g., in (45) is replaced with . Note that the state inferred in this way from the measurement outcome in a particular run need not be the same as the member of the ensemble sent into the measurement apparatus. This is no more surprising than the fact that the inferred in (38) can be different from .

The second way in which a density operator arises is through taking a partial trace of an entangled pure state on a composite system down to one of the subsystems; see Ch. 15 of [8] for further details. If one is only concerned with properties of this particular subsystem and not its correlations with the others, and if only this subsystem interacts with the measuring apparatus, then the previous discussion applies: the situation is exactly the same as for the case of an ensemble. If, however, one is interested in correlations with the another subsystem or subsystems it is best to treat the entire system under consideration as a single system when working out what one can infer from a measurement, even if the measurement is carried out on just one of the subsystems, as the density operator may not provide the sort of information one is interested in. See Sec. V F below for an example.

One may also be concerned about using a pure initial state for a macroscopic apparatus rather than a density operator or a projector onto a large (macroscopic) subspace. This gives rise to a different set of concerns, and we refer the reader to the treatment in Ch. 17 of [8].

V Applications

Various applications below will illustrate the approach outlined in Sec. IV. Those in Secs. V A and V B show how a proper application of quantum principles can give physically reasonable results for the cases considered in Sec. II A and II B, while avoiding paradoxes. Simple examples of POVMs and weak measurements are considered in Secs. V C and V D. Quantum (non)contextuality and aspects of the Einstein-Podolsky-Rosen (EPR) paradox are examined in Secs. V E and V F.

V a Spin Half

The simplest nontrivial example of a quantum system is the spin of a spin-half particle, and the spin was first measured in the Stern-Gerlach experiment mentioned in every textbook. Using the notation for the eigenstates of the component of angular momentum introduced earlier in Sec. III A, suppose that a measurement of corresponds to an isometry


of the form (21), where or , and the macroscopic outcomes correspond to projectors and on pointer subspaces satisfying (17). Then (24) takes the form


Hence if the macroscopic outcome is —e.g, an atom is detected in the upper beam emerging from a Stern-Gerlach magnet—one can conclude using the family of four histories at times (at and choose one of the two properties inside the curly brackets)


that at time before the measurement began the particle had the property corresponding to , whatever the initial state . Similarly, would indicate at the earlier time.

One can check this by a direct calculation assuming an initial state


and using the chain kets to evaluate the probabilities for the four histories in (58):


The marginals and conditionals are then


where the last two hold if (respectively, ) is nonzero. In short, the particle at had the value of indicated by the measurement outcome at , independent of the state at , in agreement with (57).

Next, assuming the same unitary dynamics (56), consider a different family of histories,


in which the initial is now , and the properties at refer to instead of . It is straightforward to show that the family is consistent, with joint probabilities (obtained from chain kets)


The conditionals


are exactly the same for and , so the measurement outcomes at tell us nothing at all about at time . Instead its value is determined entirely by the initial state at .

Given the family (62) and a pointer outcome, say at , are we to infer at the earlier time using (64), or using (61)? Both inferences are correct, but in separate frameworks which cannot be combined. Frameworks are chosen by the physicist depending on which aspect of the situation is of interest. The physicist who sets up an apparatus to prepare a spin-half particle with a particular polarization may wish to explain in quantum mechanical terms how it functions, in which case the family (62) is an appropriate starting point, and (64) will confirm that later measurements do not have any undesirable retrocausal influence. On the other hand the physicist who has constructed an apparatus to measure a particular polarization can best explain how it functions in that capacity by using the family (58). Even if is not an eigenstate of , (61) shows that the later pointer position reveals the prior property the instrument was designed to measure. These two physicists might be one and the same; several incompatible frameworks may be useful for analyzing a particular experimental arrangement, while the single framework rule prevents drawing meaningless conclusions or paradoxical results.

Properties at an additional intermediate time before the measurement has begun, say , can be added to (62) to form a consistent family at times , (62):


where we assume that . Using it one can show that


Thus if the later measurement outcome is one can be sure (based on the initial state) that at and also (based on the measurement outcome) that at . This seems odd if one tries to imagine a physical process rotating the direction of the spin from to , since the particle is moving in a field-free region and not subject to a torque. Once again the choice of framework which allows a description of a particular aspect of the situation must be carefully distinguished from a dynamical physical process. While there is no exact classical counterpart of a framework choice, the following analogy may help.. If one looks at a coffee cup from above one can discern certain things—is it filled with coffee?—which are not visible from below, whereas things visible from below, such as a crack in the bottom, may not be visible from above. Changing the point of view does not change the coffee cup or its contents, but does allow one to see different things. The analogy with the quantum case breaks down in that it makes sense to speak of a cup that both contains coffee and has a (small) crack in the bottom, whereas AND  is meaningless, as the projectors do not commute. To be sure, at an earlier time is correctly combined in (65) with at a later time: think of first looking at the coffee cup from the top and later from the bottom. However, interchanging the intermediate events in (65) so that properties at precede the properties at results in an inconsistent family. Classical analogies help, but in the end there is no substitute for a consistent quantum analysis.

V B Mach-Zehnder

A correspondence between spin-half measurements as discussed in Sec. V A and the Mach-Zehnder setup of Sec. II B will assist in understanding the latter. Consider a time at which, see Fig. 3, the photon has been reflected from the upper and lower mirrors, but has yet to reach the location of the second beam splitter, or, if the latter is absent, the crossing point of the two trajectories. Let be the part of the photon wavepacket in the upper arm, and the part in the lower arm of the interferometer at this time, and let and be the coherent superpositions of and defined in (4). Further assume that the action of the first beamsplitter in Fig. 3 is to prepare the photon in the state . Let be the projector on the macroscopic subspace in which in Fig. 3 has detected the photon while has not, and its counterpart for detection by rather than .

If the second beamsplitter is absent, Fig. 3(b), a photon in the state in the upper arm will trigger , while in the lower arm will trigger . This can be discussed using a family of four histories as in (58), with :


The conditional probabilities are the same as in (61): if is triggered one can be certain the photon was earlier in the state , so in the upper arm of the interferometer, whereas detection by indicates the earlier state in the lower arm. These are the same conclusions one would arrive at from a naive inspection of Fig. 3(b), but they have now been confirmed using an analysis based upon consistent quantum principles.

Now add an additional time at which the photon is still inside the interferometer. The consistent family


(where note that histories with at have zero probability, so can be ignored) is formally identical to (65), but introduces a new conceptual difficulty. In the spin-half case the issue was how a spin angular momentum of at could suddenly precess into or at . However mysterious that might be, one could still imagine the change taking place at the location of the spin half particle. But for the Mach-Zehnder is a nonlocal superposition between the two arms at ; can it suddenly collapse into one or the other arm, or , at a time , even if the interval between and is very short, so making this collapse essentially instantaneous? Is this (seeming) nonlocality consistent with relativity theory?

Just as in the case of spin half this (apparent) paradox may be dealt with by noting that a change in what is being described is not the same a