An Efficient Algorithm for Optimizing Adaptive Quantum Metrology Processes

An Efficient Algorithm for Optimizing Adaptive Quantum Metrology Processes

Alexander Hentschel    Barry C. Sanders Institute for Quantum Information Science, University of Calgary, Calgary, Alberta, Canada T2N 1N4

Quantum-enhanced metrology infers an unknown quantity with accuracy beyond the standard quantum limit (SQL). Feedback-based metrological techniques are promising for beating the SQL but devising the feedback procedures is difficult and inefficient. Here we introduce an efficient self-learning swarm-intelligence algorithm for devising feedback-based quantum metrological procedures. Our algorithm can be trained with simulated or real-world trials and accommodates experimental imperfections, losses, and decoherence.

Precise metrology underpins modern science and engineering. However, the ‘standard quantum limit’ (SQL) restricts achievable precision, beyond which measurement must be treated on a quantum level. Quantum-enhanced metrology (QEM) aims to beat the SQL by exploiting entangled or squeezed input states and a sophisticated detection strategy giovannetti:010401 (); Pezze&Smerzi:PhysRevLett.2009 (); Luo:Lett.Math.Phys:2000 (). Feedback-based QEM is most effective as accumulated measurement data are exploited to maximize information gain in subsequent measurements, but finding an optimal QEM policy for a given measurement device is computationally intractable even for pure input states, unitary evolution , and projective measurements. Typically, policies have been devised by clever guessing PhysRevLett.85.5098 (); PhysRevA.63.053804 () or brute-force numerical optimization PhysRevA.63.053804 (). Recently we introduced swarm-intelligence reinforcement learning to devise optimal policies for measuring an interferometric phase shift QLearning:hentschel:PRL:2010 (). Our algorithm is space efficient; i.e. the memory requirement is a polynomial function of the number of times that is effected, in contrast to the exponentially expensive brute-force algorithm. Although our result demonstrated the power of reinforcement learning, our algorithm requires a run-time that is exponential in and a perfect interferometer, thereby effectively restricting its applicability to proofs of principle. Here we report a space- and time\hypefficient algorithm (based on new heuristics) for devising QEM policies. Our algorithm works for noisy evolution and loss, thus making reinforcement learning viable for autonomous design of feedback-based QEM in a real-world setting.

We restrict our focus to single-parameter QEM. Interferometric phase estimation is the canonical quantum metrology problem and is applicable to measurements of time, displacements, and imaging. Therefore, we develop and benchmark our algorithm for autonomous policy design in this context. To beat the SQL, we employ an entangled sequence of input photons, feedback control, and direct measurements of the interferometer output. For adaptive phase estimation, the interferometer processes one photon at a time. Each input photon can be in two modes, labeled , corresponding to the interferometer’s two paths. Thus, a time-ordered sequence of photons implements an -qubit state.

Figure 1: Adaptive feedback scheme for estimating an interferometric phase . The input state is fed into the unital quantum channel one qubit at a time and the output qubit is measured or lost. The processing unit (PU) shifts the interferometric phase by after each successful measurement prior to processing the next qubit.

We assume that the interferometric transformation (Fig. 1), can be expressed as a tensor product of quantum channels (i.e. completely\hyppositive trace\hypnonincreasing maps Hou:J.Phys.A:2010 ()) for the unknown phase shift being estimated and a controllable phase with . The channel is a noisy version of the restrictive single-qubit unitary process normally considered in QEM. Our tensor-product description corresponds to the assumption that the interferometric process, other than the control, is unchanging during the measurement procedure. Photons of the -qubit input state enter the interferometer one-by-one, are transformed by . Detectors measure where each photon exits, thereby implementing a projective\hypvalued measure with elements that yield one bit if the photon is not lost. The processing unit (PU) modifies the interferometric phase shift by , according to the measurement history up to the photon, prior to the next photon being processed. After all input qubits have passed through the interferometer, the PU estimates the interferometric phase shift as . A policy is a ‘behavior pattern’ for the PU, i.e., a collection of rules that tell the PU how to set given and which phase estimate to report at the end.

The error probability distribution of the policy yields the standard error of the estimate for . As is cyclic over , is given by the Holevo variance , for the sharpness of Holevo1984 (). Evaluating requires exponential computing time with respect to and thus is computationally intractable. However, from trial runs of with randomly chosen phases , we can infer a sharpness estimate for the error of the phase estimate. For QEM, should scale better than the SQL and as close as possible to the ultimate Heisenberg limit Luo:Lett.Math.Phys:2000 (); giovannetti:010401 (); Pezze&Smerzi:PhysRevLett.2009 ().

For unitary evolution, the interferometer transforms each input qubit by for the Pauli matrices, a unit vector, and the interferometric phase difference. Without loss of generality, we can restrict our analysis to . However, because of imperfections, a real\hypworld interferometer is represented by a non\hypunitary quantum channel . We assume an unbiased interferometer, i.e. a random input qubit is mapped to itself (), corresponding to a unital channel. Hence, for continuous or discrete and countable and Mendl&Wolf:ComMathPhys2009 (),


with and for an ideal interferometer. In contrast, corresponds to an input state-independent loss rate , and quantum noise is incorporated by being a general distribution with and . We simulate noise using normal distributions with the aforementioned means and small standard deviations , corresponding to visibility . For an optical interferometer, noise corresponds to path-length difference fluctuations and to beam splitter reflectivity fluctuations. We utilize the input state

from PhysRevA.63.053804 (); PhysRevLett.85.5098 (); QLearning:hentschel:PRL:2010 (), with Wigner’s -matrix Group_theory_and_its_application_to_QM.Wigner.1931 (). is a permutationally-symmetric state with qubits in and in Hentschel:Permutationally-Symmetric_Qubit_Strings:J.Phys.A:2011 (). The state is appealing because it allows precision close to the Heisenberg limit PhysRevA.63.053804 (); PhysRevLett.85.5098 () and is robust against loss QLearning:hentschel:PRL:2010 (), but our learning methods work for other states as well.

Figure 2: (a) Decision tree representation of a GLS policy for (solid) and (entire tree). For each path in the tree, the inner nodes represent the applied feedback phases and the leaf shows the final phase estimate . At depth , a measurement directs the path to the left and to the right. (b) Embedding the best policy in the policy space , shown for . From the best two-qubit policy , the policy is generated as a guideline. The initial candidate policies for three input qubits are chosen according to probability density (3), indicated by the shaded area around . (For clarity, the case is depicted, although only candidate policies for are chosen according to (3).)

The control flow graph of any deterministic policy for a lossless conditions and a fixed -qubit input state can be represented as a binary decision tree of depth with an example shown in Fig. 2(a). Each of the nodes of the tree corresponds to one specific state of the experiment and represents the resultant action of the policy. Numeric optimization is computationally intractable due to the exponentially large number of nodes. Therefore, we restrict our search to policies that implement a ‘generalized logarithmic search’ (GLS) heuristic as described below, because the set of all GLS policies can be parametrized by only parameters and contains phase estimation policies with optimal precision scaling QLearning:hentschel:PRL:2010 () with respect to .

For a uniform prior of , the GLS heuristic commences with the initial feedback . After the measurement result , the feedback phase is . If the qubit is lost, remains unchanged. After all input qubits are processed, there are measurement results , and the GLS heuristic reports the phase estimate . According to this parametrization, every GLS policy for an -qubit input state is represented by a vector in the policy space , and any such vector is a valid policy. As any policy utilizes a string of input qubits, we refer to it as an -qubit policy. Every implements a GLS because has variable entries compared to logarithmic search (LS) for which Nowak:GLS:2008 (). The -qubit LS policy but does not surpass the SQL. The duality between GLS policies and points in allows the use of function optimization techniques to search for an optimal with minimum , i.e. . Unfortunately, this optimization problem is non-convex and hence difficult QLearning:hentschel:PRL:2010 ().

Particle swarm optimization (PSO) algorithms Eberhart1 (); Engelbrecht:2006:Computational_Swarm_Intelligence () are outstandingly successful for non-convex optimization. PSO is a ‘collective intelligence’ strategy from the field of machine learning that learns via trial-and-error and performs as well as or better than simulated annealing and genetic algorithms SA_vs_PSO_Ethni:2009 (); Kennedy98matchingalgorithms (); Groenwold:2002 (). We have shown that PSO also delivers an autonomous approach to devising adaptive phase-estimation policies for ideal interferometry QLearning:hentschel:PRL:2010 (); QLearning:hentschel:ITNG:2010 ().

To search for , the PSO algorithm models a ‘swarm’ of ‘particles’ that move in the search space . A particle’s position represents a candidate policy for estimating , which is initially chosen at random. Furthermore, remembers the best position, , it has visited so far (including its current position). In addition, communicates with other particles in its neighborhood . We adopt the common approach to set each in a pre-defined way regardless of the particles’ positions by arranging them in a ring topology: for , all particles with maximum distance on the ring are in . In iteration , the PSO algorithm updates the position of all particles in a round-based manner as follows.

  1. Each particle samples of its current position with trial runs.

  2. re-samples of its personal-best policy , and the performance of is taken to be the arithmetic mean of all sharpness evaluations.

  3. Each updates if and

  4. communicates and to all members of .

  5. Each particle determines the sharpest policy found so far by any one particle in (including itself) and

  6. moves to


The arrows indicate that the right value is assigned to the left variable. The damping factor assists convergence, and are uniformly-distributed random numbers from the interval that are re-generated each time Eq. (2) is evaluated. The ‘exploitation weight’ parametrizes the attraction of a particle to its personal best position , and the ‘exploration weight’ describes attraction to the best position in the neighborhood. To improve convergence, we bound each component of by a maximum value of . The user-specified parameters , and determine the swarm’s behavior. Tests indicate that , , , and result in the highest probability to find an optimal policy.

The trial runs for assessing sharpness can be simulated or performed with a real world-experiment. For finite , the sampled sharpness has statistical errors that can prevent the PSO algorithm from learning optimal solutions Bartz-Beielstein&el:Metaheuristics:2007 (). We reduce sharpness errors by averaging over multiple samples in step (ii) SWIS-CONF-2005-004 (). However, for , the PSO algorithm fails to learn good policies from scratch due to sharpness errors QLearning:hentschel:ITNG:2010 (). Therefore, we maintain our earlier strategy of running the learning algorithm for each independently when . For , our new heuristic bootstraps a starting point for the optimization of an -qubit policy from the best -qubit policy . Our heuristic exploits the fact that an -qubit policy can be used as an -qubit policy by ignoring the measurement result. For , the optimal -qubit policy estimates phases with only less accuracy compared to an optimal -qubit policy when used with the -qubit input 111 See Fig. 5, page 5. . Furthermore, the performance difference between the optimal -qubit policy and the -qubit policy decreases with increasing because the relative change in qubit number decreases with increasing . Therefore, a good -qubit policy is a valuable starting point for optimizing an -qubit policy.

Utilizing previously learned policies is done at the initialization step of the PSO algorithm. The initial policy is selected as the particle’s starting position with probability


with a truncated normal distribution. See Fig. 2(b) for an illustration of this strategy. The standard deviation determines the similarity of the first actions of the newly generated policies compared to the template policy . determines the extent to which the action for the new qubit agrees with the previous action of . We found that and yields a high success rate for our PSO heuristic.

For and perfect interferometry, we verified that our new PSO algorithm with swarm size learns optimal -qubit policies regardless of whether each policy’s sharpness is evaluated exactly (requires time ) or sampled from trial runs (requires polynomial runtime in when simulated). Therefore, we sample the sharpness of each particle’s current position and personal best position in each PSO iteration. As we run the PSO algorithm for a constant iterations, the entire optimization process requires trials. However, to obtain an -qubit policy, we have to optimize policies for input qubits beforehand, as our algorithm requires an qubit policy for devising an -qubit policy for any . Therefore, learning an -qubit policy requires trial runs. When the trials are simulated, the computational complexity of our PSO heuristic is (hence efficient) as a single trial run can be simulated in time Hentschel:Permutationally-Symmetric_Qubit_Strings:J.Phys.A:2011 (). Once learned, the execution of an -qubit policy requires entangled input qubits.

(b)QLearning:hentschel:PRL:2010 ()
Figure 3: Holevo phase variance of PSO-optimized policies compared to other schemes vs. the number of input qubits for (a) and (b) as input states, respectively. The dashed line shows the SQL. Due to limited computational resources, some simulations are carried out only to . (Loss rate is in percent; .)

We trained our PSO algorithm with simulated trial runs for various noise and loss rates. In each case, our PSO algorithm tries to find the sharpest policy for given . As the algorithm uses stochastic optimization, it is not guaranteed to learn the optimal policy every time and must be run several times independently for each . Nevertheless, within the limits of available computational resources, the PSO algorithm succeeded in at least 25% of the runs, independently of . We compared the policies generated by our new machine-learning algorithm to our previous numerically-optimized policies QLearning:hentschel:PRL:2010 (), the Berry-Wiseman (BW) policy PhysRevLett.85.5098 (), and policies obtained by brute-force numerical optimization PhysRevA.63.053804 ().

We first discuss policies for a noiseless, lossless setup, i.e., for unitary evolution. Fig. 3(a) shows that our new method, tested to the limits of available computational resources, outperforms the BW-policy. We estimate the performance difference by calculating the scaling of the Holevo variance . Our policies yield with , compared to the inferred scaling for .Furthermore, our new efficient method greatly surpasses our previous optimization scheme QLearning:hentschel:PRL:2010 () by more than tripling the domain of for developing policies while maintaining the same precision. The inefficient brute-force optimization was carried out in the full policy space, i.e. without restriction to GLS-policies. However, the resulting globally optimal policies perform better only by a constant factor of compared to our PSO-optimized policies but do not yield better scaling . As expected the PSO algorithm yields policies approaching the SQL for separable input states (Fig. 3(b)) Luo:Lett.Math.Phys:2000 (); giovannetti:010401 (); Pezze&Smerzi:PhysRevLett.2009 ().

Our new algorithm delivers the first QEM policies optimized for a simulated imperfect interferometer with loss and Gaussian quantum noisy. When applied to noisy conditions, policies generated by our new algorithm have significantly improved performances compared to policies optimized for perfect interferometry. As expected, the performance difference increases with the noise level 222 See Fig. 6, page 6. . We verify that our algorithm successfully devises superior policies also for non-Gaussian noise by using skew-normal distributions with skewness for and Azzalini:Scand.J.Stat.:2005 (). We find that a nonzero third standardized moment with variances kept as before does not reduce the performance of the policies learned by our new PSO algorithm 333 See Fig. 5, page 5. .

In summary, we have devised an efficient machine learning algorithm to construct adaptive-feedback measurement policies autonomously for time-independent, single-parameter estimation problems. Our one prerequisite is a training-phase comparison criterion to evaluate the success of candidate policies. Within the limits of available computational resources, our PSO-generated policies outperform all known schemes for adaptive single-shot phase estimation with direct measurement of the channel output. Our algorithm learns to account for experimental errors and loss thereby making time-consuming error modeling and extensive calibration dispensable.

Acknowledgments: We thank B. Bunk and Humboldt-Universität zu Berlin for computational resources, and D. W.Ḃerry, L. Maccone, and H. M. Wiseman for comments on an earlier draft. This project has been supported by iCORE, AITF, NSERC, and CIFAR. BCS is supported by a CIFAR Fellowship.


  • (1) V. Giovannetti, S. Lloyd, and L. Maccone, “Quantum Metrology,” Phys. Rev. Lett., vol. 96, no. 1, p. 010401, 2006.
  • (2) L. Pezzé and A. Smerzi, “Entanglement, Nonlinear Dynamics, and the Heisenberg Limit,” Phys. Rev. Lett., vol. 102, p. 100401, Mar 2009.
  • (3) S. Luo, “Quantum Fisher Information and Uncertainty Relations,” Letters in Mathematical Physics, vol. 53, pp. 243–251, 2000.
  • (4) D. W. Berry and H. M. Wiseman, “Optimal States and Almost Optimal Adaptive Measurements for Quantum Interferometry,” Phys. Rev. Lett., vol. 85, no. 24, pp. 5098–5101, 2000.
  • (5) D. W. Berry, H. M. Wiseman, and J. K. Breslin, “Optimal input states and feedback for interferometric phase estimation,” Phys. Rev. A, vol. 63, no. 5, p. 053804, 2001.
  • (6) A. Hentschel and B. C. Sanders, “Machine Learning for Precise Quantum Measurement,” Phys. Rev. Lett., vol. 104, p. 063603, Feb 2010.
  • (7) J. Hou, “A characterization of positive linear maps and criteria of entanglement for quantum states,” J. Phys. A, vol. 43, no. 38, p. 385201, 2010.
  • (8) A. S. Holevo, “Covariant measurements and imprimitivity systems,” in Quantum Probability and Applications to the Quantum Theory of Irreversible Processes, vol. 1055 of Lecture Notes in Mathematics, pp. 153–172, Springer Berlin / Heidelberg, 1984.
  • (9) C. Mendl and M. Wolf, “Unital Quantum Channels – Convex Structure and Revivals of Birkhoff’s Theorem,” Commun. Math. Phys., vol. 289, pp. 1057–1086, 2009. 10.1007/s00220-009-0824-2.
  • (10) E. P. Wigner, Group Theory and its Application to the Quantum Mechanics of Atomic Spectra. Academic Press, New York, 1971.
  • (11) A. Hentschel and B. C. Sanders, “Ordered Measurements of Permutationally-Symmetric Qubit Strings,” J. Phys. A, vol. 44, p. 115301, Feb 2011.
  • (12) R. D. Nowak, “Generalized Binary Search,” in Proc. 46th Allerton Conference on Communications, Control, and Computing, Illinois, USA, 2008, pp. 568–574, IEEE, New York, 2008.
  • (13) R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Proc. Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 1995, pp. 39–43, IEEE, New York, 1995.
  • (14) A. P. Engelbrecht, Fundamentals of Computational Swarm Intelligence. John Wiley & Sons, England, 2006.
  • (15) S. Ethni, B. Zahawi, D. Giaouris, and P. Acarnley, “Comparison of Particle Swarm and Simulated Annealing Algorithms for Induction Motor Fault Identification,” in Proc. 7th IEEE Int. Conf. on Industrial Informatics, Cardiff, England, 2009, IEEE, New York, 2009, June 2009.
  • (16) J. Kennedy and W. M. Spears, “Matching Algorithms to Problems: An Experimental Test of the Particle Swarm and Some Genetic Algorithms on the Multimodal Problem Generator,” in Proc. IEEE Congress on Evolutionary Computation, Anchorage, Alaska, pp. 78–83, IEEE, New York, May 1998.
  • (17) P. Fourie and A. Groenwold, “The particle swarm optimization algorithm in size and shape optimization,” Structural Multidisciplinary Optimization, vol. 23, pp. 259–267, 2002.
  • (18) A. Hentschel and B. C. Sanders, “Machine Learning for Adaptive Quantum Measurement,” in Proc. 7th Int. Conf. Information Technology: New Generations, Las Vegas, 2010, pp. 506–511, IEEE, New York, Apr 2010.
  • (19) T. Bartz-Beielstein, D. Blum, and J. Branke, “Particle Swarm Optimization and Sequential Sampling in Noisy Environments,” in Metaheuristics, vol. 39 of Operations Research/Computer Science Interfaces Series, pp. 261–273, Springer US, 2007.
  • (20) J. Pugh, Y. Zhang, and A. Martinoli, “Particle swarm optimization for unsupervised robotic learning,” in Proc. Swarm Intelligence Symposium, Pasadena, 2005, pp. 92–99, IEEE, New York, 2005.
  • (21) See Fig. 5, page 5.
  • (22) See Fig. 6, page 6.
  • (23) A. Azzalini, “The Skew-normal Distribution and Related Multivariate Families,” Scandinavian Journal of Statistics, vol. 32, no. 2, pp. 159–188, 2005.
  • (24) See Fig. 5, page 5.


Figure 4: Holevo phase variance of PSO optimized GLS policies. Purple crosses (

): -qubit policy used with input state . Brown pluses (

): -qubit policy used with input state (the last measurement result is ignored by the policy).
Figure 5: Holevo phase variance of policies optimized for simulated Gaussian quantum noise () and skew-normal quantum noise with skewness (). In both cases, we used the standard deviations and .
(a) (b)
Figure 6: Holevo phase variance  of policies from QLearning:hentschel:PRL:2010 (), that are optimized for a perfect interferometer (), compared to the policies optimized by our new algorithm for the specific imperfections (). The performance of the policies are evaluated for Gaussian quantum noise with standard deviations and , . (a) For low noise ( and ), there is no noticeable performance enhancement. (b) For larger noise ( and ), the policies optimized for perfect interferometry have a performance scaling with . In contrast, the policies optimized for the aforementioned noise and loss achieve a scaling of .
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description