Measurement Uncertainty for Finite Quantum Observables
Measurement uncertainty relations are lower bounds on the errors of any approximate joint measurement of two or more quantum observables. The aim of this paper is to provide methods to compute optimal bounds of this type. The basic method is semidefinite programming, which we apply to arbitrary finite collections of projective observables on a finite dimensional Hilbert space. The quantification of errors is based on an arbitrary cost function, which assigns a penalty to getting result rather than , for any pair . This induces a notion of optimal transport cost for a pair of probability distributions, and we include an appendix with a short summary of optimal transport theory as needed in our context. There are then different ways to form an overall figure of merit from the comparison of distributions. We consider three, which are related to different physical testing scenarios. The most thorough test compares the transport distances between the marginals of a joint measurement and the reference observables for every input state. Less demanding is a test just on the states for which a “true value” is known in the sense that the reference observable yields a definite outcome. Finally, we can measure a deviation as a single expectation value by comparing the two observables on the two parts of a maximally entangled state. All three error quantities have the property that they vanish if and only if the tested observable is equal to the reference. The theory is illustrated with some characteristic examples.
Measurement uncertainty relations are quantitative expressions of complementarity. As Bohr often emphasized, the predictions of quantum theory are always relative to some definite experimental arrangement, and these settings often exclude each other. In particular, one has to make a choice of measuring devices, and typically quantum observables cannot be measured simultaneously. This often used term is actually misleading, because time has nothing to do with it. For a better formulation recall that quantum experiments are always statistical, so the predictions refer to the frequency with which one will see certain outcomes when the whole experiment is repeated very often. So the issue is not simultaneous measurement of two observables, but joint measurement in the same shot. That is, a device is a joint measurement of observable with outcomes and observable with outcomes , if it produces outcomes of the form in such a way that if we ignore outcome , the statistics of the outcomes is always (i.e., for every input state) the same as obtained with a measurement of , and symmetrically for ignoring and comparing with . It is in this sense that non-commuting projection valued observables fail to be jointly measurable.
However, this is not the end of the story. One is often interested in approximate joint measurements. One such instance is Heisenberg’s famous -ray microscope , in which a particle’s position is measured by probing it with light of some wavelength , which from the outset sets a scale for the accuracy of this position measurement. Naturally, the particle’s momentum is changed by the Compton scattering, so if we make a momentum measurement on the particles after the interaction, we will find a different distribution from what would have been obtained directly. Note that in this experiment we get from every particle a position value and momentum value. Moreover, errors can be quantified by comparing the respective distributions with some ideal reference: The accuracy of the microscope position measurement is judged by the degree of agreement between the distribution obtained and the one an ideal position measurement would give. Similarly, the disturbance of momentum is judged by comparing a directly measured distribution with the one after the interaction. The same is true for the uncontrollable disturbance of momentum. This refers to a scenario, where we do not just measure momentum after the interaction, but try to build a device that recovers the momentum in an optimal way, by making an arbitrary measurement on the particle after the interaction, utilizing everything that is known about the microscope, correcting all known systematic errors, and even using the outcome of the position measurement. The only requirement is that at the end of the experiment, for each individual shot, some value of momentum must come out. Even then it is impossible to always reproduce the pre-microscope distribution of momentum. The tradeoff between accuracy and disturbance is quantified by a measurement uncertainty relation. Since it simply quantifies the impossibility of a joint exact measurement, it simultaneously gives bounds on how an approximate momentum measurement irretrievably disturbs position. The basic setup is shown in Fig. 1.
Note that in this description of errors we did not ever bring in a comparison with some hypothetical “true value”. Indeed it was noted already by Kennard  that such comparisons are problematic in quantum mechanics. Even if one is willing to feign hypotheses about the true value of position, as some hidden variable theorists will, an operational criterion for agreement will always have to be based on statistical criteria, i.e., the comparison of distributions. Another fundamental feature of this view of errors is that it provides a figure of merit for the comparison of two devices, typically some ideal reference observable and and an approximate version of it. An “accuracy” in this sense is a promise that no matter which input state is chosen, the distributions will not deviate by more than . Such a promise does not involve a particular state. This is in contrast to preparation uncertainty relations, which quantify the impossibility to find a state for which the distributions of two given observables (e.g., position and momentum) are both sharp.
Measurement uncertainty relations in the sense described here were first introduced for position and momentum in , and were initially largely ignored. A bit earlier, an attempt by Ozawa  to quantify error-disturbance tradeoffs with state dependent and somewhat unfortunately chosen  quantities had failed, partly for reasons already pointed out in . When experiments confirmed some predictions of the Ozawa approach (including the failure of the error-disturbance tradeoff), a debate ensued [4, 16, 6, 2]. Its unresolved part is whether a meaningful role for Ozawa’s definitions can be found. Technically, the computation of measurement uncertainty for position and momentum in  carries over immediately to more general phase spaces [24, 3]. Apart from some special further computed instances [8, 5], this remained the only case in which sharp measurement uncertainty relations could be obtained. This was in stark contrast with preparation uncertainty, for which an algorithm based on solving ground state problems  efficiently provides the optimal relations for generic sets of observables. The main aim of the current paper is to provide efficient algorithms also for sharp measurement uncertainty relations.
In order to do that we restrict the setting in some ways, but allow maximal generality in others. We will restrict to finite dimensional systems, and reference observables which are projection valued and non-degenerate. Thus, each of the ideal observables will basically be given by an orthonormal basis in the same -dimensional Hilbert space. The labels of this basis are the outcomes of the measurement, where is a set of elements. We could choose all , but it will help to keep track of things using a separate set for each observable. Moreover, this includes the choice , the set of eigenvalues of some hermitian operator. We allow not just two observables but any finite number of them. This is makes some expressions easier to write down, since the sum of an expression involving observable and analogous one for observable becomes an indexed sum. We also allow much generality in the way errors are quantified. In earlier works, we relied on two elements to be chosen for each observable, namely a metric on the outcome set, and an error exponent , distinguishing, say absolute (), root-mean-square (), and maximal () deviations. Deviations were then averages of . Here we generalize further to an arbitrary cost function , which we take to be positive, and zero exactly on the diagonal (e.g., ), but not necessarily symmetric. Again this generality comes mostly as a simplification of notation. For a reference observable with outcome set and an approximate version with the same outcome set, this defines an error . Our aim is to provide algorithms for computing the uncertainty diagram associated with such data, of which Fig. 2 gives an example. The given data for such a diagram are projection valued observables , with outcome sets , for each of which we are given also a cost function for quantifying errors. An approximate joint measurement is then an observable with outcome set , and hence with POVM elements , where . By ignoring every output but one we get the marginal observables
and a corresponding tuple
of errors. The set of such tuples, as runs over all joint measurements, is the uncertainty region. The surface bounding this set from below describes the uncertainty tradeoffs. For we call it the tradeoff curve. Measurement uncertainty is the phenomenon that, for general reference observables , the uncertainty region is bounded away from the origin. In principle there are many ways to express this mathematically, from a complete characterization of the exact tradeoff curve, which is usually hard to get, to bounds which are simpler to state, but suboptimal. Linear bounds will play a special role in this paper.
We will consider three ways to build a single error quantity out of the comparison of distributions, denoted by , , and . These will be defined in Sect. 2. For every choice of observables and cost functions, each will give an uncertainty region, denoted by , , and , respectively. Since the errors are all based on the same cost function , they are directly comparable (see Fig. 2). We show in Sect. 3 that the three regions are convex, and hence characterized completely by linear bounds. In Sect. 4 we show how to calculate the optimal linear lower bounds by semidefinite programs. Finally, an Appendix collects the basic information on the beautiful theory of optimal transport, which is needed in Sects. 2.1 and 4.1.
2 Deviation measures for observables
Here we define the measures we use to quantify how well an observable approximates a desired observable . In this section we do not use the marginal condition (1), so is an arbitrary observable with the same outcome set as , i.e., we drop all indices identifying the different observables. Our error quantities are operational in the sense that each is motivated by an experimental setup, which will in particular provide a natural way to measure them. All error definitions are based on the same cost function , where is the “cost” of getting a result , when would have been correct. The only assumptions are that with iff .
As described above, we consider a quantum system with Hilbert space . As a reference observable we allow any complete von Neumann measurement on this system, that is, any observable whose the set of possible measurement outcomes has size and whose POVM elements () are mutually orthogonal projectors of rank ; we can then also write with an orthonormal basis of . For the approximating observable the POVM elements (with ) are arbitrary with and .
The comparison will be based on a comparison of output distributions, for which we use the following notations: Given a quantum state on this system, i.e., a density operator with and , and an observable such as , we will denote the outcome distribution by , so . This is a probability distribution on the outcome set and can be determined physically as the empirical outcome distribution after many experiments.
For comparing just two probability distributions and , a canonical choice is the “minimum transport cost”
where the infimum runs over the set of all couplings, or “transport plans” of to , i.e., the set of all probability distributions satisfying the marginal conditions and . The motivations for this notion, and the methods to compute it efficiently are described in the Appendix. Since is finite, the infimum is over a compact set, so it is always attained. Moreover, since we assumed and , we also have with equality iff . If one of the distributions, say , is concentrated on a point , only one coupling exists, namely . In this case we abbreviate , and get
i.e., the average cost of moving all the points distributed according to to .
2.1 Maximal measurement error .
The worst case error over all input states is
which we call the maximal measurement error. Note that, like the cost function and the transport costs , the measure need not be symmetric in its arguments, which is sensible as the reference and approximating observables have distinct roles. Similar definitions for the deviation of an approximating measurement from an ideal one have been made, for specific cost functions, in [4, 6] and  before.
The definition (5) makes sense even if the reference observable is not a von Neumann measurement. Instead, the only requirement is that and be general observables with the same (finite) outcome set , not necessarily of size . All our results below that involve only the maximal measurement error immediately generalize to this case as well.
One can see that it is expensive to determine the quantity experimentally according to the definition: one would have to measure and compare the outcome statistics and for all possible input states , which form a continuous set. The following definition of observable deviation alleviates this burden.
2.2 Calibration error .
Calibration is a process by which one tests a measuring device on inputs (or measured objects) for which the “true value” is known. Even in quantum mechanics we can set this up by demanding that the measurement of the reference observable on the input state gives a sharp value . In a general scenario with continuous outcomes this can only be asked with a finite error , which goes to zero at the end , but in the present finite scenario we can just demand . Since, for every outcome of a von Neumann measurement, there is only one state with this property (namely ) we can simplify even further, and define the calibration error by
Note that the calibration idea only makes sense when there are sufficiently many states for which the reference observable has deterministic outcomes, i.e., for projective observables .
A closely related quantity has recently been proposed by Appleby . It is formulated for real valued quantities with cost function , and has the virtue that it can be expressed entirely in terms of first and second moments of the probability distributions involved. So for any , let and be the mean and variance of , and the mean quadratic deviation of from . Then Appleby defines
Here we added the square to make Appleby’s quantity comparable to our variance-like (rather than standard deviation-like) quantities, and chose the letter , because Appleby calls this the -error. Since in the supremum we have also the states for which has a sharp distribution (i.e. ), we clearly have . On the other hand, let and with some parameter . Then one easily checks that , so is a pricing scheme in the sense defined in the Appendix. Therefore
Maximizing this expression over gives exactly (7). Therefore .
2.3 Entangled reference error .
In quantum information theory a standard way of providing a reference state for later comparison is by applying a channel or observable to one half of a maximally entangled system. Two observables would be compared by measuring them (or suitable modifications) on the two parts of a maximally entangled system. Let us denote the entangled vector by . Since later we will look at several distinct reference observables, the basis kets in this expression have no special relation to or its eigenbasis . We denote by the transpose of an operator in the basis, and by the observable with POVM elements , where is the complex conjugate of in -basis. These transposes are needed due to the well-known relation . We now consider an experiment, in which is measured on the first part and on the second part of the entangled system, so we get the outcome pair with probability
As is a complete von Neumann measurement, this probability distribution is concentrated on the diagonal () iff , i.e., there are no errors of relative to . Averaging with the error costs we get a quantity we call the entangled reference error
Note that this quantity is measured as a single expectation value in the experiment with source . Moreover, when we later want to measure different such deviations for the various marginals, the source and the tested joint measurement device can be kept fixed, and only the various reference observables acting on the second part need to be adapted suitably.
2.4 Summary and comparison
The quantities , and constitute three different ways to quantify the deviation of an observable from a projective reference observable . Nevertheless, they are all based on the same distance-like measure, the cost function on the outcome set . Therefore it makes sense to compare them quantitatively. Indeed, they are ordered as follows:
Moreover, as we argued before Eq. (10), if and only if , which is hence equivalent also to and .
3 Convexity of uncertainty diagrams
In this section we will consider tuples of projection valued non-degenerated observables, as described in the introduction. We will collect some basic properties of the uncertainty regions , where , that is,
For two observables and with the same outcome set we can easily realize their mixture, or convex combination by flipping a coin with probability for heads in each instance and then apply when heads is up and otherwise. In terms of POVM elements this reads . We show first that this mixing operation does not increase the error quantities from Sect. 2.
For the error quantity , is a convex function of , i. e. for and :
The basic fact used here is that the pointwise supremum of affine functions (i.e., those for which equality holds in the definition of a convex function) is convex. This is geometrically obvious, and easily verified from the definitions. Hence we only have to check that each of the error quantities is indeed represented as a supremum of functions, which are affine in the observable .
For we even get an affine function, because (10) is linear in . For equation (6) has the required form. For the definition (5) is as a supremum, but the function is defined as an infimum. However, we can use the duality theory described in the Appendix (e.g. in (49)) to write it instead as a supremum over pricing schemes, of an expression which is just the expectation of plus a constant, and therefore an affine function. Finally, for Appleby’s case (7), we get the same supremum, but over a subset of pricing schemes (the quadratic ones, see below (7)). ∎
The convexity of the error quantities distinguishes measurement from preparation uncertainty. Indeed, the variances appearing in preparation uncertainty relations are typically concave functions, because they arise from minimizing the expectation of over . Consequently, the preparation uncertainty regions may have gaps, and non-trivial behaviour on the side of large variances. The following proposition will show that measurement uncertainty regions are better behaved.
For every cost function on a set we can define a “radius” , the largest transportation cost from the uniform distribution (the “center” of the set of probability distributions) and a “diameter” , the largest transportation cost between any two distributions:
Let observables and cost functions be given, and define and . Then, for , the uncertainty regions is a convex set and has the following (monotonicity) property: When and such that , then .
Let us first clarify how to make the worst possible measurement , according to the various error criteria, for which we go back to the setting of Sect. 2, with just one observable , and cost function . In all cases, the worst measurement is one with constant and deterministic output, i.e., . For and such a measurement will have , and we can choose to make this equal to . For we get instead the average, which is maximized by .
We can now make a given joint measurement worse by replacing it partly by a bad one, say for the first observable . That is, we set, for ,
Then all marginals for are unchanged, but . Now as changes from to , the point in the uncertainty diagram will move continuously in the first coordinate direction from to the point in which the first coordinate is replaced by its maximum value (see Fig. 6(left)). Obviously, the same holds for every other coordinate direction, which proves the monotonicity statement of the proposition.
Let and be two observables, and let be their mixture. For proving the convexity of we will have to show that every point on the line between and can be attained by a tuple of errors corresponding to some allowed observable (see Fig. 6 (right)). Now lemma 1 tells us that every component of is convex, which implies that . But, by monotonicity, this also means that is in again, which shows the convexity of . ∎
3.1 Example: Phase space pairs
As is plainly visible from Fig. 2, the three error criteria considered here usually give different results. However, under suitable circumstances they all coincide. This is the case for conjugate pairs related by Fourier transform . The techniques needed to show this are the same as for the standard position/momentum case [6, 22], and in addition imply that the region for preparation uncertainty is also the same.
In the finite case there is not much to choose: We have to start from a finite abelian group, which we think of as position space, and its dual group, which is then the analogue of momentum space. The unitary connecting the two observables is the finite Fourier associated with the group. The cost function needs to be translation invariant, i.e., . Then, by an averaging argument, we find for all error measures that a covariant phase space observable minimizes measurement uncertainty (all three versions). The marginals of such an observable can be simulated by first doing the corresponding reference measurement, and then adding some random noise. This implies  that . But we know more about this noise: It is independent of the input state so that the average and the maximum of the noise (as a function of the input) coincide, i.e., . Finally, we know that the noise of the position marginal is distributed according to the position distribution of a certain quantum state which is, up to normalization and a unitary parity inversion, the POVM element of the covariant phase space observable at the origin. The same holds for the momentum noise. But then the two noise quantities are exactly related like the position and momentum distributions of a state, and the tradeoff curve for that problem is exactly preparation uncertainty, with variance criteria based on the same cost function.
If we choose the discrete metric for , the uncertainty region depends only on the number of elements in the group we started from . The largest for all quantities is the distance from a maximally mixed state to any pure state, which is . The exact tradeoff curve is then an ellipse, touching the axes at the points and . The resulting family of curves, parameterized by , is shown in Fig. 7. In general, however, the tradeoff curve requires the solution of a non-trivial family of ground state problems, and cannot be given in closed form. For bit strings of length , and the cost some convex function of Hamming distance there is an expression for large .
4 Computing uncertainty regions via semidefinite programming
We show here how the uncertainty regions – and therefore optimal uncertainty relations – corresponding to each of the three error measures can actually be computed, for any given set of projective observables and cost functions . Our algorithms will come in the form of semidefinite programs (SDPs) [20, 19], facilitating efficient numerical computation of the uncertainty regions via the many existing program packages to solve SDPs. Moreover, the accuracy of such numerical results can be rigorously certified via the duality theory of SDPs. To obtain the illustrations in this paper we used the CVX package [10, 9] under MATLAB.
As all our uncertainty regions (for ) are convex and closed (Sect. 3), they are completely characterized by their supporting hyperplanes (for a reference to convex geometry see ). Due to the monotonicity property stated in Prop. 2 some of these hyperplanes just cut off the set parallel along the planes . The only hyperplanes of interest are thus those with nonnegative normal vectors (see Fig. 8). Each hyperplane is completely specified by its “offset” away from the origin, and this function determines :
In fact, due to homogeneity we can restrict everywhere to the subset of vectors that, for example, satisfy , suggesting an interpretation of the as weights of the different uncertainties . Our algorithms will, besides evaluating , also allow to compute an (approximate) minimizer , so that one can plot the boundary of the uncertainty region by sampling over , which is how the figures in this paper were obtained.
Let us further note that knowledge of for some immediately yields a quantitative uncertainty relation: every error tuple attainable via a joint measurement is constrained by the affine inequality , meaning that some weighted average of the attainable error quantities cannot become too small. When is strictly positive, this excludes in particular the zero error point . The obtained uncertainty relations are optimal in the sense that there exists which attains strict equality .
Having reduced the computation of an uncertainty region essentially to determining (possibly along with an optimizer ), we now treat each case in turn.
4.1 Computing the uncertainty region
On the face of it, the computation of the offset looks daunting: expanding the definitions we obtain
where the infimum runs over all joint measurements with outcome set , inducing the marginal observables according to (1), and the supremum over all sets of quantum states , and where the transport costs are given as a further infimum (3) over the couplings of and .
The first simplification is to replace the infimum over each coupling , via a dual representation of the transport costs, by a maximum over optimal pricing schemes , which are certain pairs of functions , where runs over some finite label set . The characterization and computation of the pairs , which depend only on the chosen cost function on , is described in the Appendix. The simplified expression for the optimal transport costs is then
We can then continue our computation of :
where denotes the maximum eigenvalue of a Hermitian operator . Note that , which one can also recognize as the dual formulation of the convex optimization over density matrices, so that
We obtain thus a single constrained minimization:
Making the constraints on the POVM elements of the joint observable explicit and expressing the maginal observables directly in terms of them by (1), we finally obtain the following SDP representation for the quantity :
The derivation above shows further that, when , the attaining the infimum equals , where is the marginal coming from a corresponding optimal joint measurement . Since numerical SDP solvers usually output an (approximate) optimal variable assignment, one obtains in this way directly a boundary point of when all are strictly positive. If vanishes, a corresponding boundary point can be computed via from an optimal assignment for the POVM elements .
For completeness we also display the corresponding dual program  (note that strong duality holds, and the optima of both the primal and the dual problem are attained):
4.2 Computing the uncertainty region
To compute the offset function for the calibration uncertainty region we use the last form in (6) and recall that the projectors onto the sharp eigenstates of (see Sect. 2.2) are exactly the POVM elements for :
where again the infimum runs over all joint measurements , inducing the marginals , and we have turned, for each , the maximum over into a linear optimization over probabilities () subject to the normalization constraint . In the last step, we have made the explicit via (1).
The first main step towards a tractable form is von Neumann’s minimax theorem [14, 18]: As the sets of joint measurements and of probabilities are both convex and the optimization function is an affine function of and, separately, also an affine function of the , we can interchange the infimum and the supremum:
The second main step is to use SDP duality  to turn the constrained infimum over into a supremum, abbreviating the POVM elements as :
Putting everything together, we arrive at the following SDP representation for the offset quantity :
The dual SDP program reads (again, strong duality holds, and both optima are attained):
This dual version can immediately be recognized as a translation of Eq. (27) into SDP form, via an alternative way of expressing the maximum over (or via the linear programming dual of from Eq. (29)).
To compute a boundary point of lying on the supporting hyperplane with normal vector , it is best to solve the dual SDP (33) and obtain from an (approximate) optimal assignment of the . Again, this works when , whereas otherwise one can compute from an optimal assingment of the . From many primal-dual numerical SDP solvers (such as CVX [10, 9]), one can alternatively obtain optimal POVM elements also from solving the primal SDP (32) as optimal dual variables corresponding to the constraints , and compute from there.
4.3 Computing the uncertainty region
As one can see by comparing the last expressions in the defining equations (6) and (10), respectively, the evaluation of is quite similar to (27), except that the maximum over is replaced by a uniform average over . This simply corresponds to fixing for all in Eq. (29), instead of taking the supremum. Therefore, the primal and dual SDPs for the offset are