Contextual-value approach to the generalized measurement of observables

# Contextual-value approach to the generalized measurement of observables

J. Dressel    A. N. Jordan Department of Physics and Astronomy, University of Rochester, Rochester, New York 14627, USA
July 12, 2019
###### Abstract

We present a detailed motivation for and definition of the contextual values of an observable, which were introduced by Dressel et al., Phys. Rev. Lett. 104, 240401 (2010). The theory of contextual values is a principled approach to the generalized measurement of observables. It extends the well-established theory of generalized state measurements by bridging the gap between partial state collapse and the observables that represent physically relevant information about the system. To emphasize the general utility of the concept, we first construct the full theory of contextual values within an operational formulation of classical probability theory, paying special attention to observable construction, detector coupling, generalized measurement, and measurement disturbance. We then extend the results to quantum probability theory built as a superstructure on the classical theory, pointing out both the classical correspondences to and the full quantum generalizations of both Lüder’s rule and the Aharonov-Bergmann-Lebowitz rule in the process. As such, our treatment doubles as a self-contained pedagogical introduction to the essential components of the operational formulations for both classical and quantum probability theory. We find in both cases that the contextual values of a system observable form a generalized spectrum that is associated with the independent outcomes of a partially correlated and generally ambiguous detector; the eigenvalues are a special case when the detector is perfectly correlated and unambiguous. To illustrate the approach, we apply the technique to both a classical example of marble color detection and a quantum example of polarization detection. For the quantum example we detail two devices: Fresnel reflection from a glass coverslip, and continuous beam displacement from a calcite crystal. We also analyze the three-box paradox to demonstrate that no negative probabilities are necessary in its analysis. Finally, we provide a derivation of the quantum weak value as a limit point of a pre- and postselected conditioned average and provide sufficient conditions for the derivation to hold.

###### pacs:
03.65.Ta,03.67.-a,02.50.Cw,03.65.Ca

## I Introduction

Since the advent of quantum mechanics, practitioners have struggled with an inherent conceptual dualism in its formalism. On one hand, time evolution of a quantum state is a continuous, deterministic, and reversible process well described by a wave equation. On the other hand, there is irreducible stochasticity present in the measurement process that leads to discontinuous and generally irreversible state evolution in the form of so-called “quantum jumps” or “state collapse.”

To cope with the necessary introduction of the stochastic element of the theory while still preserving ties with the deterministic classical mechanics, traditional quantum mechanics Dirac (1930); von Neumann (1932) emphasizes the role of Hermitian observable operators that are analogous to classical observables. Indeed, we find that observables underlie most of the core concepts in the quantum theory: commutation relations of observables, complete sets of commuting observables, spectral expansions of observables, conjugate pairs of observables, expectation values of observables, uncertainty relations between observables, and time evolution generated by a Hamiltonian observable. Even the quantum state is introduced as a superposition of observable eigenvectors. The stochasticity of the theory manifests itself as a single prescription for how to average the omnipresent observables under a deterministically evolving quantum state: the implicit projective quantum jumps corresponding to laboratory measurements are largely hidden by the formalism.

Experimental control of quantum systems has improved since the early days of quantum mechanics, however, so the discontinuous evolution present in the measurement process can now be more carefully investigated. Modern optical and condensed matter systems, for example, can condition the evolution of a state on the outcomes of weakly coupled measurement devices (e.g. Korotkov and Jordan (2006); Katz et al. (2008); Kim et al. (2009)), resulting in nonprojective quantum jumps that alter the state more gently, or even resulting in continuous controlled evolution of the state. Since observables are defined in terms of projective jumps that strongly affect the state, it becomes unclear how to correctly apply a formalism based on observables to such nonprojective measurements. A refinement of the traditional formalism must be employed to correctly describe the general case.

To address this need, the theory of quantum operations, or generalized measurement, was introduced in the early 1970’s by Davies Davies and Lewis (1970) and Kraus Kraus (1971), and has been developed over the past forty years to become a comprehensive and mathematically rigorous theory Kraus (1983); Braginski and Khalili (1992); Busch et al. (1995); Nielsen and Chuang (2000); Alicki and Fannes (2001); Keyl (2002); Breuer and Petruccione (2007); Barnett (2009); Wiseman and Milburn (2010). The formalism of quantum operations has seen the most use in quantum optics, quantum computation, and quantum information communities, where it is indispensable and well-supported by experiment. However, it has not yet seen wide adoption outside of those communities.

Unlike the traditional observable formalism, the formalism of quantum operations emphasizes the states. Observables are mentioned infrequently in the quantum operations literature, usually appearing only in the context of projective measurements where they are well-understood. Some references (e.g. Keyl (2002); Breuer and Petruccione (2007); Wiseman and Milburn (2010)) define “generalized observables” in terms of the generalized measurements and detector outcome labels, but give no indication about their relationship to traditional observables, if any. As a result, there is a conceptual gap between the traditional quantum mechanics of observables and the modern treatment of quantum operations that encompasses a much larger class of possible measurements than the traditional observables seemingly allow.

A possible response to this conceptual gap is to declare that traditional observables are meaningless outside the context of projective measurements. This argument is supported by the fact that any generalized measurement can be understood as a part of a projective measurement being made on a larger joint system that can be associated with a traditional observable in the usual way (i.e. (Keyl, 2002, p. 20)). However, there has been parallel research into the “weak measurement” of observables Aharonov et al. (1988); Duck et al. (1989); Aharonov and Vaidman (1990); Aharonov and Botero (2005); Aharonov and Vaidman (2008); Jozsa (2007); Di Lorenzo and Egues (2008); Aharonov et al. (2009, 2010); Tollaksen (2007); Geszti (2010); Hosoya and Shikano (2010); Shikano and Hosoya (2010, 2011); Kagami et al. (2011); Wu and Li (2011); Haapasalo et al. (2011) that suggests that linking generalized measurements to traditional observables may not be such an outlandish idea.

Weak measurements were introduced as a consequence of the von Neumann measurement protocol von Neumann (1932) that uses an interaction Hamiltonian with variable coupling strength to correlate an observable of interest to the generator of translations for a continuous meter observable. The resulting shift in the meter observable is then used to infer information about the observable of interest in a nonprojective manner. The technique has been used to great effect in the laboratory Ritchie et al. (1991); Wiseman (2002, 2003); Brunner et al. (2004); Pryde et al. (2005); Mir et al. (2007); Hosten and Kwiat (2008); Dixon et al. (2009); Lundeen and Steinberg (2009); Starling et al. (2009); Howell et al. (2010); Starling et al. (2010a, b); Goggin et al. (2011); Kocsis et al. (2011) to measure physical quantities like pulse delays, beam deflections, phase shifts, polarization, and averaged trajectories. Therefore, we conclude that there must be some meaningful way to reconcile nonprojective measurements with traditional observables more formally.

The primary purpose of the present work is to detail a synthesis between generalized measurements and observables that is powerful enough to encompass projective measurements, weak measurements, and any strength of measurement in between. The formalism of contextual values, which we explicitly introduced in Dressel et al. (2010); Not (a) and further developed in Dressel et al. (2011a, 2012); Dressel and Jordan (2012), forms a bridge between the traditional notion of an observable and the modern theory of quantum operations. For a concise introduction to the topic in the context of the quantum theory, we recommend reading our letter Dressel et al. (2010).

The central idea of the contextual-value formalism is that an observable can be completely measured indirectly using an imperfectly correlated detector by assigning an appropriate set of values to the detector outcomes. The assigned set of values generally differs from the set of eigenvalues for the observable, and forms a generalized spectrum that is associated with the operations of the generalized measurement, rather than the spectral projections for the observable. Thus, the spectrum that one associates with an observable will depend on the context of how the measurement is being performed; such an inability to completely discuss observables without specifying the full measurement context is reminiscent of Bell-Kochen-Specker contextuality Bell (1966); Kochen and Specker (1967); Mermin (1993); Spekkens (2005); Leifer and Spekkens (2005); Tollaksen (2007); Spekkens (2008) and motivates the name “contextual values.”

The secondary purpose of the present work is to demonstrate that the contextual values formalism for generalized observable measurement is essentially classical in nature. Hence, it has potential applications outside the usual scope of the quantum theory. Indeed, we will show that any system that can be described by Bayesian probability theory can benefit from the contextual-value formalism.

Extending contextual values to the quantum theory from the classical theory clarifies which features of the quantum theory are novel. The quantum theory can be seen as an extension of a classical probability space to a continuous manifold of incompatible frameworks, where each framework is a copy of the original probability space. Hence, intrinsically quantum features arise not from the observables defined in any particular framework, but instead from the relative orientations of the incompatible frameworks. As we shall see, the differences manifest in sequential measurements and conditional measurements due to the probabilistic structure of the incompatible frameworks, rather than the observables or contextual values themselves.

To keep the paper self-contained with these aims in mind, we first develop both the operational approach to measurement and the contextual values formalism completely within the confines of classical probability theory, giving illustrative examples to cement the ideas. We then port the formalism to the quantum theory and identify the essential differences that arise. Our analysis therefore doubles as a pedagogical introduction to the operational approaches for both classical and quantum probability theory that should be accessible to a wide audience.

The paper is organized as follows: In Sec. I.1, we provide a simple intuitive example to introduce the concept of contextual values. In Secs. II.1 through II.3, we develop the classical version of the operational approach to measurement. In Sec. II.4, we introduce the contextual values formalism classically and then give several examples similar to the initial example. In Secs. III.1 through III.3, we generalize the classical operations to quantum operations and highlight the key differences with explicit examples. In Sec. III.4, we apply the contextual values formalism to the quantum case and show that it is unchanged. We also specifically address how to treat weak measurements as a special case of our more general formalism and provide a derivation of the quantum weak value in Sec. III.5. Finally, we give our conclusions in Sec. IV.

### i.1 Example: Colorblind Detector

The idea of the contextual values formalism is deceptively simple. Its essence can be distilled from the following classical example of an ambiguous detector: Suppose we wish to measure a marble that may be colored either red or green. A person with normal vision can distinguish the colors unambiguously and so would represent an ideal detector for the color state of the marble. A partially colorblind person, however, may only estimate the color correctly some percentage of the time and so would represent an ambiguous detector of the color state of the marble.

If the person is only mildly colorblind, then the estimations will be strongly correlated to the actual color of the marble. The ambiguity would then be perturbative and could be interpreted as noise introduced into the measurement. However, if the person is strongly colorblind, then the estimations may be only mildly correlated to the actual color of the marble. The ambiguity becomes nonperturbative, so the noise dominates the signal in the measurement.

We can design an experimental protocol where an experimenter holds up a marble and the colorblind person gives a thumbs-up if he thinks the marble is green or a thumbs-down if he thinks the marble is red. Suppose, after testing a large number of known marbles, the experimenter determines that a green marble correlates with a thumbs-up 51% of the time, while a red marble correlates with a thumbs-down 53% of the time. The experimental outcomes of thumbs-up and thumbs-down are thus only weakly correlated with the actual color of the marble.

Having characterized the detector in this manner, the experimenter provides the colorblind person with a very large bag of an unknown distribution of colored marbles. The colorblind person examines every marble, and for each one records a thumbs-up or a thumbs-down on a sheet of paper, which he then returns to the experimenter. The experimenter then wishes to reconstruct what the average distribution of marble colors in the bag must be, given only the ambiguous output of his colorblind detector.

For simplicity, the clever experimenter decides to associate the colors with numerical values: for green (g) and for red (r). In order to compare the ambiguous outputs with the colors, he also assigns them different numerical values: for thumbs-up (u), and for thumbs-down (d). He then writes down the following probability constraint equations for obtaining the average marble color, , based on what he has observed,

 P(u) =(.51)P(g)+(.49)P(r), P(d) =(.47)P(g)+(.53)P(r), ⟨color⟩ =1P(g)−1P(r)=aP(u)+bP(d), (1)

which he can rewrite as a matrix equation in the basis of the color probabilities and ,

 (1−1) =(.51.47.49.53)(ab). (2)

After solving this equation, he finds that he must assign the amplified values and to the outcomes of thumbs-up and thumbs-down, respectively, in order to compensate for the detector ambiguity. After doing so, he can confidently calculate the average color of the marbles in the large unknown bag using the identity (1).

The classical color observable has eigenvalues of and that correspond to an ideal measurement. The amplified values of and that must be assigned to the ambiguous detector outcomes are contextual values for the same color observable. The context of the measurement is the characterization of the colorblind detector, which accounts for the degree of colorblindness. The expansion (1) relates the spectrum of the observable to its generalized spectrum of contextual values. With this identity, both an ideal detector and a colorblind detector can measure the same observable; however, the assigned values must change depending on the context of the detector being used.

## Ii Classical probability theory

To define contextual values more formally, we shall define generalized measurements within the classical theory of probability using the same language as quantum operations. In particular, rather than representing the observables of classical probability theory in the traditional way as functions, we shall adopt a more calculationally flexible, yet equivalent, algebraic representation that closely resembles the operator algebra for quantum observables.

We also briefly comment that the relevant subset of probability theory that is summarized here may slightly differ in emphasis from incarnations that the reader may have encountered previously. Our treatment acknowledges that probability theory, in its most general incarnation, is a system of formal reasoning about Boolean logic propositions Cox (1946, 1961); specifically, our treatment emphasizes logical inference rather than the traditional frequency analysis of concrete random variable realizations. However, the “frequentist” approach of random variables is not displaced by the logical approach, but is rather subsumed as an important special case pertaining to repeatable experiments with logically independent outcomes. Due to its clarity and generality, the logical approach has been widely adopted in diverse disciplines under the distinct name “Bayesian probability theory.” Several physicists, including (but certainly not limited to) Jaynes Jaynes (2003), Caves Caves et al. (2002), Fuchs Fuchs (2010), Spekkens Spekkens (2007), Harrigan Harrigan and Spekkens (2010), Wiseman Wiseman and Milburn (2010), and Leifer Leifer and Spekkens (2011), have also extolled its virtues in recent years. We follow suit to emphasize the generality of the contextual-value concept.

### ii.1 Sample spaces and observables

In what follows, we shall consider the stage on which classical probability theory unfolds—namely its space of observables—to be a commutative algebra over the reals that we denote . This choice of notation is motivated by the fact that the observable algebra is built from and contains two related spaces, and , that are conceptually distinct and equally important to the theory. The three are illustrated in Fig. 1 to orient the discussion. To avoid distracting technical detail, we will briefly describe finite-dimensional versions of these three spaces here, and note straightforward generalizations to the continuous case when needed Not (b).

Sample spaces.—The core of a probability space is a set of mutually exclusive logic propositions, , known as the sample space of atomic propositions. In other words, elements of the sample space, such as , represent “yes or no” questions that cannot be answered “yes” simultaneously and cannot be broken into simpler questions. For example, and are valid mutually exclusive atomic propositions. To be a proper sample space, the propositions should form a complete set, meaning that there must always be exactly one true proposition. Physically, such propositions typically correspond to mutually independent outcomes of an experiment that probes some system of interest. Indeed, any accessible physical property must be testable by some experiment, and any experiment can be described by such a collection of yes or no questions.

Boolean algebra.—The atomic propositions in can be extended to more complex propositions by logical combination in order to form the larger space . Specifically, we can combine them algebraically with a logical or denoted by addition and a logical and denoted by multiplication. For example, given propositions , the quantity would denote the proposition “( and ) or ( and ).” Importantly, both the sum and the product commute since the corresponding logical operations commute, and the propositions are idempotent so for any . Furthermore, the product of any two nonequal propositions in must be trivially false since they are mutually exclusive; we denote the trivially false proposition as since its product with any proposition is also trivially false. Similarly, the sum of all propositions in will be trivially true since one of the atomic propositions must be true by construction; we denote the trivially true proposition as since its product with any proposition leaves that proposition invariant, . The logical operation of not, or complementation () with respect to , can then be defined as the subtraction from the identity since must be true for any proposition by definition. The proposition space contains and is closed under the operations of and, or, and not; hence, it forms a Boolean logic algebra Not (c).

Observables.—Finally, we extend linearly over the real numbers to obtain the commutative algebra of observables . That is, any linear combination of propositions with and is an observable in ; similarly any linear combination of observables with and is also an observable in . Countable sums are permitted provided the coefficients converge. The three spaces , , and are illustrated in Fig. 1.

The observables combine logical propositions with numbers that describe the relation of each proposition to some meaningful reference. For example, one could define a simple observable that assigns a value of to the proposition asking whether a marble looks green and assigns a value of to the proposition asking whether that same marble looks red in order to distinguish the colors by a sign. Alternatively, one can bestow a physical meaning to the color propositions by defining a wavelength observable instead: . One could even define an observable that indicates a monetary bet made on the color of the marble, with being awarded for a color of green and being lost for a color of red. Such numerical labels are always assigned by convention, but indicate physically relevant information about the type of questions being asked by the experimenter that are answerable by the independent propositions.

Representation.—The algebra can be represented as the lattice of projection operators acting on a Hilbert space exactly as in the standard representation of quantum theory von Neumann (1932); Jauch (1968); Alicki and Fannes (2001). The elements of correspond to rank-1 projection operators onto orthogonal subspaces spanned by orthonormal vectors in the Hilbert space. Any sum of elements of , , corresponds to a rank- projection operator onto a subspace spanned by orthonormal vectors in the Hilbert space. Hence, we shall casually refer to propositions of the Boolean algebra as projections in what follows. However, it is important to note that the Boolean algebra need not be represented in this fashion to be well defined.

Just like the propositions can be represented as projections on a Hilbert space, the observables can also be represented as the algebra of Hermitian operators acting on the same Hilbert space. Hence, we shall casually refer to observables in as observable operators in analogy to the quantum theory. However, unlike quantum observables, all classical observables commute. It is important to note that the representation of observables as operators on a Hilbert space in both the classical and the quantum case remains strictly optional for calculational convenience.

Independent Probability Observables.—We note that the identity observable can be partitioned into many distinct sets of independent propositions in , such as , which is known as a closure relation. Each partitioning corresponds to a particular detector arrangement that only probes those propositions. Such a partitioning has the common mathematical name projection-valued measure (PVM) since it forms a measure over the index and has a representation that consists of orthogonal projections. However, we shall make an effort to call the propositions independent probability observables to be more physically descriptive. We will later contrast them with more general probability observables.

General observables can be constructed from independent probability observables by associating a real value to each index in the sum, . The product of the observable with any of its constituent probability observables simplifies, ; hence, the associated values form the set of eigenvalues for the observable. For a finite observable space , the set of atomic propositions itself is a maximally refined set of independent probability observables that can construct any observable in the space,

 F =∑x∈Xf(x)x. (3)

In the continuous case the values form a measurable function that specifies the spectrum of the observable; the sum (3) is then commonly written as an integral over the continuous set of propositions , . We use the Hilbert space notation in the integral to avoid later confusion with real-valued integrals.

### ii.2 States, densities, and collapse

Probability measures.—A state is a probability measure over the Boolean algebra , meaning that it is a linear map from to the interval such that . Such a state assigns a numerical value to each proposition that quantifies its degree of plausibility; that is, formally indicates how likely it is that the question would be answered “yes” were it to be answered, with indicating a certain “yes” and indicating a certain “no.” The value is called the probability for the proposition to be true. Normalizing ensures that exactly one proposition in the sample space must be true. For continuous spaces, the state becomes an integral .

Frequencies.—Empirically, one can check probabilities by repeatedly asking a proposition in to identically prepared systems and collecting statistics regarding the answers. For a particular proposition , the ratio of yes-answers to the number of trials will converge to the probability as the number of trials becomes infinite. However, the probability has a well-defined meaning as a plausibility prediction even without actually performing such a repeatable experiment. Indeed, designing good quality repeatable experiments to check the probabilities assigned by a predictive state is the primary goal of experimental science, and is generally quite difficult to achieve.

Expectation functionals.—The linear extension of a state to the whole observable algebra is an expectation functional that averages the observables, and is traditionally notated with angled brackets . Specifically, for an observable , then,

 ⟨F⟩ =∑x∈Xf(x)P(x), (4)

is the expectation value, or average value, of under the functional that extends the probability state . Since is linear, it passes through the sum and the constant factors of to apply directly to the propositions . The restriction of to is , so as written in (4). That is, the expectation value of a pure proposition is the probability of that proposition. The probability state and its linear extension are illustrated in Fig. 1. For continuous spaces the sum (4) becomes an integral of the measurable function , .

Moments.—The th statistical moment of is and empirically corresponds to measuring the observable times in a row per trial on identical systems and averaging the repeated results. Hence, the moments quantify the fluctuations of the observable measurements that stem from uncertainty in the state. For continuous spaces, the higher moments also become integrals .

Densities.—States can often be represented as densities with respect to some reference measure from to , which can be convenient for calculational purposes. Just as the state can be linearly extended to an expectation functional , any reference measure can be linearly extended to a functional . For continuous spaces, such a reference functional takes the form of an integral . The representation of a state as a density follows from changing the integration measure for the state to the reference measure . The Jacobian conversion factor from the integral over to the integral over a different measure is the probability density for with respect to , if it exists Not (d). We can then define a state density observable that relates the expectation functional to the reference functional directly according to the relation .

For continuous spaces, the standard integral is most frequently used as a reference. Hence, the probability density with respect to the standard integral is given the simple notation such that . Importantly, the probability for is not the density , but is the (generally infinitesimal) integral of the density over a single point Robinson (1966); Nelson (1987), commonly notated .

In discrete spaces we apply the same idea by defining a state density observable directly in terms of measure ratios,

 Pμ=∑x∈XP(x)μ(x)x. (5)

Then by definition and linearity, , as required. Evidently, the measure must be nonzero for all propositions for which is nonzero in order for such a state density to be well defined. This definition as a ratio of functionals will correctly reproduce the Jacobian derivative in the continuous case using a limiting prescription.

Trace.—An important reference measure which is nonzero for any nonzero proposition is the counting measure, or trace Tr, which evaluates to the rank of any proposition in ; for example, given then is a rank-3 proposition and . Since the trace evaluates to unity on any atomic proposition, any state has a trace-density defined by equation (5) that is traditionally notated as .

 ρ =∑x∈XP(x)x. (6)

The trace-density is the only state density that is always defined and exactly determined by the probabilities of the atomic propositions . Because of this, the trace-representation of a state can be naturally interpreted as an inner product,

 ⟨ρ,F⟩ =Tr(ρF)=⟨F⟩, (7)

between the trace-density and the observable, known as the Hilbert-Schmidt inner product. The trace will become particularly important when we generalize to quantum mechanics, which is why we mention it here. Indeed, the trace-density will be equivalent to the quantum mechanical density operator when extended to the noncommutative case. For continuous spaces the integral is traditionally preferred to the trace as a reference because the trace can frequently diverge.

State collapse.—If a question on the probability space is answered by some experiment, then the state indicating the plausibilities for future answers must be updated to reflect the acquired answer. The update process is known as Bayesian state conditioning, or state collapse. Specifically, if a proposition is verified to be true, then the experimenter updates the expectation functional to the conditioned functional,

 ⟨F⟩y =⟨yF⟩P(y), (8)

that reflects the new information. For a proposition , the conditional probability has the traditional notation and is read as “the probability of given .”

From (8), any state density corresponding to will be similarly updated to a new density via a product,

 Pμ|y=PμyP(y). (9)

Notably, conditioning the trace-density on an atomic proposition will collapse the density to become the proposition itself, .

Note that the proposition serves a dual role in the conditioning procedure. First, it is used to compute the normalization probability . Second, it directly updates the state via a product action. The product indicates that future questions will be logically linked to the answered question with the and operation; that is, the knowledge about the system has been refined by the answered question. The process of answering a question about the system and then conditioning the state on the new information is called a measurement; moreover, since the proposition is a projection acting on the density, this kind of measurement is called a projective measurement.

Bayes’ rule.—If we pick another proposition as the observable in (8) we can derive Bayes’ rule as a necessary consequence by interchanging and and then equating the joint probabilities ,

 P(z|y)=P(y|z)P(z)P(y). (10)

Bayes’ rule relates conditioned expectation functionals to one another and so is a powerful logical inference tool that drives much of the modern emphasis on the logical approach to probability theory.

Disturbance.—Conditioning, however, is not the only way that one can alter a state. One can also disturb a state without learning any information about it, which creates a transition to an updated expectation functional that we denote with a tilde according to,

 ⟨˜F⟩ =⟨D(F)⟩, (11a) D(F) =∑x∈X⟨F⟩Dxx, (11b) ⟨F⟩Dx =∑x′∈Xf(x′)Dx(x′). (11c)

Here the disturbance is a map from to that is governed by a collection of states that specify transition probabilities from old propositions to new propositions . To be normalized, the transition states must satisfy , so that and therefore . Updating the state according to (11) is also known as Bayesian belief propagation Leifer and Spekkens (2011) and is more commonly written in the fully expanded form .

Time evolution.—As an important special case, the time evolution of a Markovian stochastic process is a form of disturbance , known as a propagator, that is parametrized by a time interval . No information is learned as the system evolves, so the knowledge about the system as represented by the expectation functional can only propagate according to the laws governing the time evolution. For a Hamiltonian system, the time evolution is of Liouville form; that is, if we define a time-evolving observable as then we have , where is defined point wise as the Poisson bracket. The differential equation implicitly specifies the form of the disturbance .

Correlation functions.—Correlations between observables at different times can be obtained by inserting a time-evolution disturbance between the observable measurements,

 ⟨F(0)G(t)⟩ =⟨FDt(G)⟩, (12) =∑x∈XP(x)f(x)∑x′∈XDx,t(x′)g(x′).

Operationally this corresponds to measuring the observable , waiting an interval of time , then measuring the observable . Similarly, -time correlations can be defined with time-evolution disturbances between the observable measurements . Computing the correlation of an observable with itself at the same time will produce a higher moment .

Invasive measurement.—A system may also be disturbed during the physical process that implements conditioning, which will alter the state above and beyond the pure conditioning expression (8). With such an invasive measurement, one conditions a state after a disturbance induced by the measurement process has occurred; hence, one obtains a new state,

 ⟨˜F⟩y =⟨D(yF)⟩⟨D(y)⟩, (13) =∑x∈XP(x)∑x′∈XDx(yx′)f(x′)∑x∈XP(x)Dx(y),

which is a composition of the measurement disturbance (11) followed by the pure conditioning (8).

As we shall see later in Sec. III.2, the quantum projection postulate (Lüder’s Rule) can be understood as an invasive measurement similar to (13), but not as pure conditioning similar to (8). This observation has also been recently emphasized by Leifer and Spekkens (2011), who show that a careful extension of (8) to the noncommutative quantum setting does not reproduce the projection postulate. Hence, better understanding classical invasive measurement should provide considerable insight into the quantum measurement process. However, to properly understand the implications of invasive measurements on the measurement of observables, we must consider the measurement process in more detail.

### ii.3 Detectors and probability observables

For a single ideal experiment that answers questions of interest with perfectly correlated independent outcomes, knowing the spectrum of an observable for that experiment is completely sufficient. However, in many (if not most) cases the independent propositions corresponding to the experimental outcomes are only imperfectly correlated with the questions of interest about the system. Since in such a case one may not have direct access to the questions of interest, one also may not have direct access to the observables of interest. One must instead infer information about the observables of interest indirectly from the correlated outcomes of the detector to which one does have access.

Joint sample space.—To handle this case formally, we first enlarge the sample space to include both the sample space of interest, which we call the system, and the accessible sample space, which we call the detector, . Questions about the system and the detector can be asked independently, so every question for the system can be paired with any question from the detector; therefore, the resulting joint sample space must be a product space, , where the products of propositions from different sample spaces commute. The Boolean algebra and observable algebra are constructed in the usual way from the joint sample space, and contain the algebras , , , and as subalgebras. When represented as operators on a Hilbert space, the corresponding joint representation exists within the tensor product of the system and detector space representations.

Product states.—If the probabilities of the system propositions are uncorrelated with the probabilities of the detector propositions under a joint state on the joint sample space, then the joint state can be written as a composition of independent states that are restricted to the sample spaces of the system and detector, . Just as the state has a linear extension to , its restrictions and have linear extensions and , respectively. Thus, for any joint observable an uncorrelated expectation has the form . Such an uncorrelated joint state is known as a product state. The name stems from the fact that for a simple product of system and detector observables the corresponding joint expectation decouples into a product of system and detector expectations separately, .

Similarly, general measures on the joint sample space can be product measures. A particularly useful example is the trace on , which is composed of the partial traces, and . The trace serves as a convenient reference measure since it is a product measure for which any joint state has a corresponding density. On continuous spaces the standard integral is also a product measure, , which tends to have noninfinitesimal densities.

Correlated states.—In addition to product states, the joint space admits a much larger class of correlated states where the detector and system questions are dependent on one another. With such a correlated state a measurement on the detector cannot be decoupled in general from a measurement on the system. Information gathered from a measurement on a detector under a correlated state will also indirectly provide information about the system, thus motivating the term “detector.”

Reduced states.—For a pure system observable or a pure detector observable , the average under a joint state will be equivalent to the average under a state restricted to either the system or the detector space, known as a reduced state, or a marginalized state. We can define such a reduced state by using the joint state density under any reference product measure , such as the trace Tr. It then follows that,

 ⟨FX⟩ =⟨⟨Pμ⟩μYFX⟩μX=⟨PμXFX⟩μX, (14a) ⟨FY⟩ =⟨⟨Pμ⟩μXFY⟩μY=⟨PμYFY⟩μY. (14b)

The quantities and are the reduced state densities that define the reduced states and with expectation functionals,

 ⟨FX⟩X =⟨PμXFX⟩μX, (15a) ⟨FY⟩Y =⟨PμYFY⟩μY. (15b)

By definition, and . However, in general , , and unless is a product state. The resulting reduced expectations and are independent of the choice of reference product functional .

Probability observables.—Any correlation between the system and detector in the joint state allows us to directly relate propositions on the detector to observables on the system. We can compute the relationship directly by using a closure relation and rearranging the conditioning procedure (8) to find,

 P(y)= (16) Ey= ∑x∈XP(y|x)x. (17)

The resulting set of system observables exactly correspond to the detector outcomes . Analogously to a set of independent probability observables, they form a partition of the system identity, but are indexed by detector propositions rather than by system propositions, . Such a set has the common mathematical name positive operator-valued measure (POVM) Nielsen and Chuang (2000), since it forms a measure over the detector sample space consisting of positive operators. However, we shall make an effort to refer to them as general probability observables to emphasize their physical significance. As long as the detector outcomes are not mutually exclusive with the system, the probability observables (17) will be a faithful representation of the reduced state of the detector in the observable space of the system.

Process tomography.—The probability observables are completely specified by the conditional likelihoods for a detector proposition to be true given that a system proposition is true. Such conditional likelihoods are more commonly known as response functions for the detector and can be determined via independent detector characterization using known reduced system states; such characterization is also known as detector tomography, or process tomography. Any good detector will then maintain its characterization with any unknown reduced system state. That is, a noninvasive coupling of such a good detector to an unknown system produces a correlated joint state according to , where is the unknown reduced system state prior to the interaction with the detector.

Generalized state collapse.—In addition to allowing the computation of detector probabilities, , probability observables also have the dual role of updating the reduced system state following a measurement on the detector. To see this, we apply the general rule for state collapse (8) for a detector proposition on the joint state to find,

 ⟨FX⟩y =⟨yFX⟩P(y)=∑x∈XfX(x)P(y|x)PX(x)P(y), (18) =⟨EyFX⟩X⟨Ey⟩X,

which can be seen as a generalization of the Bayesian conditioning rule (8) to account for the effect of an imperfectly correlated detector, and can also be understood as a form of Jeffrey’s conditioning Jeffrey (1965). For this reason, probability observables are commonly called effects of the generalized measurement. A reduced state density for the system updates as . Such a generalized measurement is nonprojective, so is not constrained to the disjoint questions on the sample space of the system. As a result, it answers questions on the system space ambiguously or noisily.

Weak measurement.—The extreme case of such an ambiguous measurement is a weak measurement, which is a measurement that does not (appreciably) collapse the system state. Such a measurement is inherently ambiguous to the extent that only a minuscule amount of information is learned about the system with each detection. Formally, the probability observables for a weak measurement are all nearly proportional to the identity on the system space. Typically, an experimenter has access to some control parameter (such as the correlation strength) that can alter the weakness of the measurement such that,

 ∀y,limϵ→0Ey(ϵ)=PY(y)1X, (19)

where is the nonzero probability of obtaining the detector outcome in the absence of any interaction with the system. Then for small values of the measurement leaves the system state nearly unperturbed, . The limit as such a control parameter is known as the weak measurement limit and is a formal idealization not strictly achievable in an experiment.

Strong measurement.—The opposite extreme case is a strong measurement or projective measurement, which is a measurement for which all outcomes are independent, as in (3). In other words, the probability observables are independent for a strong measurement. The projective collapse rule (8) can therefore be seen as a special case of the general collapse rule (18) from this point of view.

Measurement sequences.—A further benefit of the probability observable representation of a detector is that it becomes straightforward to discuss sequences of generalized measurements performed on the same system. For example, consider two detectors that successively couple to a system and have the outcomes and measured, respectively. To describe the full joint state of the system and both detectors requires a considerably enlarged sample space. However, if the detectors are characterized by two sets of probability observables and we can immediately write down the probability of both outcomes to occur as well as the resulting final collapsed system state without using the enlarged sample space,

 P(yz) =⟨E′zEy⟩X, (20a) ⟨FX⟩yz =⟨E′zEyFX⟩X⟨E′zEy⟩X. (20b)

Similarly, a conditioned density takes the form . The detectors have been abstracted away to leave only their effect upon the system of interest.

Generalized invasive measurement.—The preceding discussion holds provided that the detector can be noninvasively coupled to a reduced system state to produce a joint state . However, more generally the process of coupling a reduced detector state to the reduced system state will disturb both states as discussed for (11). The disturbance produces a joint state from the original product state of the system and detector according to,

 ⟨˜xy⟩ (21) D(xy) =∑x′∈X∑y′∈YDx′,y′(xy)x′y′, (22)

where are states specifying the joint transition probabilities for the disturbance. The noninvasive coupling is a special case of this where the reduced system state is unchanged by the coupling.

As a result, we must slightly modify the derivation of the probability observables (16) to properly include the disturbance,

 ⟨˜y⟩ (23a) ~Ey =⟨D(y)⟩Y, (23b) =∑x∈X∑y′∈YPY(y′)Dx,y′(y)x.

The modified probability observable includes both the initial detector state and the disturbance from the measurement. Detector tomography will therefore find the effective characterization probabilities .

The generalized collapse rule similarly must be modified to include the disturbance,

 ⟨˜FX⟩y (24) Ey(FX) =⟨D(yFX)⟩Y, (25) =∑x′∈Xx′∑y′∈YPY(y′)∑x∈XDx′,y′(yx)f(x).

Surprisingly, we can no longer write the conditioning in terms of just the probability observables ; instead we must use an operation that takes into account both the coupling of the detector and the disturbance of the measurement in an active way. The measurement operation is related to the effective probability observable according to, .

The change from observables to operations when the disturbance is included becomes particularly important for a sequence of invasive measurements. Consider an initial system state that is first coupled to a detector state via a disturbance , then conditioned on the detector proposition , then coupled to a second detector state via a disturbance , and finally conditioned on the detector proposition . The joint probability for obtaining the ordered sequence can be written as

 ⟨⟨D1(y⟨D2(z)⟩Z)⟩Y⟩X (26)

The effective probability observable for the ordered measurement sequence is no longer a simple product of the probability observables and as in (20a), but is instead an ordered composition of operations.

The ordering of operations also leads to a new form of postselected conditioning. Specifically, if we condition only on the second measurement of in an invasive sequence , we obtain,

 (27) E(~E′z) =∑y′∈YEy′(~E′z)=⟨D(~E′z)⟩Y. (28)

The different position of the subscript serves to distinguish the postselected probability from the preselected probability corresponding to the reverse measurement ordering of . The operation appearing in the denominator is called a nonselective measurement since it includes the disturbance induced by the measurement coupling, but does not condition on any particular detector outcome. When the disturbance to the reduced system state vanishes, the conditioning becomes order-independent and both types of conditional probability reduce to .

The two forms of conditioning for invasive measurements in turn lead to a modified form of Bayes’ rule that relates the preselected conditioning of a sequence to the postselected conditioning of the same sequence,

 (29)

When the disturbance to the reduced system state vanishes, the nonselective measurement reduces to the identity operation, reduces to , reduces to , and we correctly recover the noninvasive Bayes’ rule (10).

### ii.4 Contextual values

Observable correspondence.—With the preliminaries about generalized state conditioning out of the way, we are now in a position to discuss the measurement of observables in more detail. First we observe an important corollary of the observable representation of the detector probabilities from (16): detector observables can be mapped into equivalent system observables,

 ⟨FY⟩ =∑y∈YfY(y)P(y)=⟨FX⟩X, (30) FX =∑y∈YfY(y)Ey. (31)

Note that the eigenvalues of the equivalent system observable are not the same as the eigenvalues of the original detector observable , but are instead their average under the detector response. If the system propositions were accessible then the system observable would allow nontrivial inference about the detector observable , provided that the probability observables were nonzero for all in the support of .

Contextual values.—A more useful corollary of the expansion (31) is that any system observable that can be expressed as a combination of probability observables may be equivalently expressed as a detector observable,

 FX =∑y∈YfY(y)Ey⟹FY=∑y∈YfY(y)y, (32)

which is the classical form of our main result. Using this equivalence, we can indirectly measure such system observables using only the detector. We dub the eigenvalues of the detector observable the contextual values (CVs) of the system observable under the context of the specific detector characterized by a specific set of probability observables . The CVs form a generalized spectrum for the observable since they are associated with general probability observables for a generalized measurement and not independent probability observables for a projective measurement; the eigenvalues are a special case when the probability observables are the spectral projections of the observable being measured.

With this point of view, we can understand an observable as an equivalence class of possible measurement strategies for the same average information. That is, using appropriate pairings of probability observables and CVs, one can measure the same observable average in many different ways, . Each such expansion corresponds to a different experimental setup.

Moments.—Similarly, the th statistical moment of an observable can be measured in many different, yet equivalent, ways. For instance, the th moment of an observable can be found from the expansion (32) as,

 ⟨(FX)n⟩ =⟨(∑y∈YfY(y)Ey)n⟩X, (33) =∑y1,…,yn∈YfY(y1)⋯fY(yn)⟨Ey1⋯Eyn⟩X.

By examining the general collapse rule for measurement sequences (20a) we observe that the quantity must be the joint probability for a sequence of noninvasive measurements that couple the same detector to the system times in succession. Furthermore, the average in (33) is explicitly different from the th statistical moment of the raw detector results, .

We conclude that, for imperfectly correlated noninvasive detectors, one must perform measurement sequences to obtain the correct statistical moments of an observable using a particular set of CVs. Only for unambiguous measurements with independent probability observables do such measurement sequences reduce to simple powers of the eigenvalues being averaged with single measurement probabilities. If a single measurement by the detector is done per trial, then only the statistical moments of the detector observable can be inferred from that set of CVs, as opposed to the true statistical moments of the inferred system observable .

We can, however, change the CVs to define new observables that correspond to powers of the original observable, such as . These new observables can then be measured indirectly using the same experimental setup without the need for measurement sequences. The CVs for the th power of will not be simple powers of the CVs for unless the measurement is unambiguous.

Invasive measurements.—If the measurement is invasive, then the disturbance forces us to associate the CVs with the measurement operations and not solely with their associated probability operators in order to properly handle measurement sequences as in (25). Specifically, we must define the observable operation,

 FX =∑y∈YfY(y)Ey, (34)

which produces the identity similar to (32).

Correlated sequences of invasive observable measurements can be obtained by composing the observable operations,

 ⟨(FX)n(1X)⟩X =∑y1,…,ynfY(y1)⋯fY(yn)× ⟨Ey1(Ey2(⋯(~Eyn)⋯))⟩X. (35)

Such an -measurement sequence reduces to the th moment (33) when the disturbance vanishes.

If time evolution disturbance is inserted between different invasive observable measurements, then we obtain an invasive correlation function instead,

 ⟨˜FX(0)GX(t)⟩ =⟨FX(Dt(GX(1X)))⟩X. (36)

When the observable measurements become noninvasive, then this correctly reduces to the noninvasive correlation function (12). Similarly, -time invasive correlations can be defined with time-evolution disturbances between the invasive observable measurements .

Conditioned averages.—In addition to statistical moments of the observable, we can also use the CVs to construct principled conditioned averages of the observable. Recall that in the general case of an invasive measurement sequence we can condition the observable measurement in two distinct ways. If we condition on an outcome before the measurement of we obtain the preselected conditioned average defined in (24). On the other hand, if the invasive conditioning measurement of happens after the invasive observable measurement then we must use the postselected conditional probabilities (27) to construct a postselected conditioned average,

 (37) =∑y∈YfY(y)⟨Ey(~E′z)⟩X∑y∈Y⟨Ey(~E′z)⟩X=⟨FX(~E′z)⟩X⟨E(~E′z)⟩.

The observable operation and the nonselective measurement encode the relevant details from the first measurement. When the disturbance to the reduced system state vanishes, both the preselected and the postselected conditioned averages simplify to the pure conditioned average defined in (18) that depends only on the system observable .

While the pure conditioned average is independent of the order of conditioning and is always constrained to the eigenvalue range of the observable, the postselected invasive conditioned average can, perhaps surprisingly, stray outside the eigenvalue range with ambiguous measurements. The combination of the amplified CVs and the disturbance can lead to a postselected average that lies anywhere in the full CV range, rather than just the eigenvalue range. We will see an example of this in Sec. II.4.2.

Inversion.—So far we have treated the CVs in the expansion (32) as known quantities. However, for a realistic detector situation, the CVs will need to be experimentally determined from the characterization of the detector and the observable that one wishes to measure. The reduced system state will generally not be known a priori, since the point of a detector is to learn information about the system in the absence of such prior knowledge. We can still solve for the CVs without knowledge of the system state, however, since the probability observables are only specified by the conditional likelihoods that can be obtained independently from detector tomography.

To solve for the CVs when the system state is presumed unknown, we rewrite (32) in the form,

 FX =∑x∈Xx∑y∈YP(y|x)fY(y), (38) =∑x∈Xx⟨FY⟩x=S(FY),

where is the map that converts observables in the detector space to observables in the system space . Our goal is to invert this map and solve for the required spectrum of given a desired system observable . However, the inverse of such a map is not generally unique; for it to be uniquely invertible it must be one-to-one between system and detector spaces of equal size. If the detector space is smaller than the system, then no exact inverse solutions are possible; it may be possible, however, to find course-grained solutions that lose some information. Perhaps more alarmingly, if the detector space is larger than the system, then it is possible to have an infinite set of exact solutions.

When disturbance is taken into account as in (23), the equality (38) becomes,

 FX =⟨D(FY)⟩Y=S(FY), (39)

so the composition of the disturbance and the detector expectation produces the map that must be inverted. Equation (38) is a special case when the reduced system state is unchanged by the coupling disturbance.

Pseudoinversion.—The entire set of possible solutions to (39) may be completely specified using the Moore-Penrose pseudoinverse of the map , which we denote as . The pseudoinverse is the inverse of the restriction of to the space ; that is, the null space of is removed from the detector space before constructing the inverse. We will show a practical method for computing the pseudoinverse using the singular value decomposition in the examples to follow.

Using the pseudoinverse, all possible solutions of (39) can be written compactly as,

 FY=S+(FX)+(I−S+S)(G), (40)

where is the identity map and is an arbitrary detector observable. The solutions specified by the pseudoinverse in this manner contain exact inverses and course-grainings as special cases.

Detector variance.—Since is a projection operation to the null space of , the second term of (40) lives in the null space of and is orthogonal to the first term. Therefore, the norm squared of has the form,

 ||FY||2 =∑y(fY(y))2, (41) =||S+(FX)||2+||(I−S+S)(G)||2,

making the solution have the smallest norm.

The norm of the CV solution is relevant because the second moment of the detector observable is simply bounded by the norm squared . The second moment is similarly an upper bound for the variance of the detector observable . Therefore, the norm squared is a reasonable upper bound for the detector variance that one can make without prior knowledge of the state.

Mean-squared error.—The variance of governs the mean-squared error of any estimation of its average with a finite sample, such as an empirically measured sample in a laboratory. Specifically, one measures a sequence of detector outcomes of length , , and uses this finite sequence to estimate the average of via the unbiased estimator,

 ¯¯¯¯¯¯¯FY =1nn∑ifY(yi), (42)

that converges to the true mean value as . The mean squared error of this estimator from the true mean is the variance over the number of trials in the sequence . Hence, the maximum mean squared error for a finite sequence of length must be bounded by the norm squared of the CVs divided by length of the sequence,

 MSE(¯¯¯¯¯¯¯FY)=Var(FY)n≤||FY||2n. (43)

That is, the norm bounds the number of trials necessary to obtain an experimental estimation of observable averages to a desired precision using the imperfect detector.

Pseudoinverse prescription.—Choosing the arbitrary observable to be therefore not only picks the solution that is uniquely related to by discarding the irrelevant null space of , but also picks the solution with the smallest norm, which places a reasonable upper bound on the statistical error. Without prior knowledge of the system state, the pseudoinverse solution does a reasonable job at obtaining an optimal fit to the relation (39). Moreover, when (39) is not satisfied by the direct pseudoinverse then an exact solution is impossible, but the pseudoinverse still gives the “best fit” coursegraining of an exact solution in the least-squares sense. As such, we consider the direct pseudoinverse of to be the preferred solution in the absence of other motivating factors stemming from prior knowledge of the state being measured.

#### ii.4.1 Example: Ambiguous marble detector

As an illustrative example similar to the one given in the introduction, suppose that one wishes to know whether the color of a marble is green or red, but one is unable to examine the marble directly. Instead, one only has a machine that can display a blue light or a yellow light after it examines the marble color. In such a case, the marble colors are the propositions of interest, but the machine lights are the only accessible propositions. The lights may be correlated imperfectly with the marble color; for instance, if a blue light is displayed one may learn something about the possible marble color, but it may still be partially ambiguous whether the marble is actually green or actually red.

The relevant Boolean algebra for the system is , where is the proposition for the color green, is the proposition for the color red, and is the logical or of the two possible color propositions. We consider the task of measuring a simple color observable that distinguishes the colors with a sign using an imperfectly correlated detector.

The relevant Boolean algebra for the detector is , where is the proposition for the blue light, is the proposition for the yellow light, and . In order to measure the marble observable using only the detector, the experimenter must determine the proper form of the corresponding detector observable .

First, the experimenter characterizes the detector by sending in known samples and observing the outputs of the detector. After many characterization trials, the experimenter determines to some acceptable precision the four conditional probabilities,

 P(b|g) =0.6, P(y|g) =0.4, (44a) P(b|r) =0.2, P(y|r) =0.8, (44b)

for the detector outcomes and given specific marble preparations and . These characterization probabilities completely determine the detector response in the form of its probability observables (17),

 Eb =P(b|g)g+P(b|r)r, (45a) Ey =P(y|g)g+P(y|r)r. (45b)

By construction, .

Second, the experimenter expands the system observable using the detector probability observables (45) and unknown contextual values (CVs) and (32),

 FX =(+1)g+(−1)r=fY(b)Eb+fY(y)Ey. (46)

After expressing this relation as the equivalent matrix equation,

 (+1−1)