An Efficient Algorithm for Optimizing Adaptive Quantum Metrology Processes
Abstract
Quantumenhanced metrology infers an unknown quantity with accuracy beyond the standard quantum limit (SQL). Feedbackbased metrological techniques are promising for beating the SQL but devising the feedback procedures is difficult and inefficient. Here we introduce an efficient selflearning swarmintelligence algorithm for devising feedbackbased quantum metrological procedures. Our algorithm can be trained with simulated or realworld trials and accommodates experimental imperfections, losses, and decoherence.
Precise metrology underpins modern science and engineering. However, the ‘standard quantum limit’ (SQL) restricts achievable precision, beyond which measurement must be treated on a quantum level. Quantumenhanced metrology (QEM) aims to beat the SQL by exploiting entangled or squeezed input states and a sophisticated detection strategy giovannetti:010401 (); Pezze&Smerzi:PhysRevLett.2009 (); Luo:Lett.Math.Phys:2000 (). Feedbackbased QEM is most effective as accumulated measurement data are exploited to maximize information gain in subsequent measurements, but finding an optimal QEM policy for a given measurement device is computationally intractable even for pure input states, unitary evolution , and projective measurements. Typically, policies have been devised by clever guessing PhysRevLett.85.5098 (); PhysRevA.63.053804 () or bruteforce numerical optimization PhysRevA.63.053804 (). Recently we introduced swarmintelligence reinforcement learning to devise optimal policies for measuring an interferometric phase shift QLearning:hentschel:PRL:2010 (). Our algorithm is space efficient; i.e. the memory requirement is a polynomial function of the number of times that is effected, in contrast to the exponentially expensive bruteforce algorithm. Although our result demonstrated the power of reinforcement learning, our algorithm requires a runtime that is exponential in and a perfect interferometer, thereby effectively restricting its applicability to proofs of principle. Here we report a space and time\hypefficient algorithm (based on new heuristics) for devising QEM policies. Our algorithm works for noisy evolution and loss, thus making reinforcement learning viable for autonomous design of feedbackbased QEM in a realworld setting.
We restrict our focus to singleparameter QEM. Interferometric phase estimation is the canonical quantum metrology problem and is applicable to measurements of time, displacements, and imaging. Therefore, we develop and benchmark our algorithm for autonomous policy design in this context. To beat the SQL, we employ an entangled sequence of input photons, feedback control, and direct measurements of the interferometer output. For adaptive phase estimation, the interferometer processes one photon at a time. Each input photon can be in two modes, labeled , corresponding to the interferometer’s two paths. Thus, a timeordered sequence of photons implements an qubit state.
We assume that the interferometric transformation (Fig. 1), can be expressed as a tensor product of quantum channels (i.e. completely\hyppositive trace\hypnonincreasing maps Hou:J.Phys.A:2010 ()) for the unknown phase shift being estimated and a controllable phase with . The channel is a noisy version of the restrictive singlequbit unitary process normally considered in QEM. Our tensorproduct description corresponds to the assumption that the interferometric process, other than the control, is unchanging during the measurement procedure. Photons of the qubit input state enter the interferometer onebyone, are transformed by . Detectors measure where each photon exits, thereby implementing a projective\hypvalued measure with elements that yield one bit if the photon is not lost. The processing unit (PU) modifies the interferometric phase shift by , according to the measurement history up to the photon, prior to the next photon being processed. After all input qubits have passed through the interferometer, the PU estimates the interferometric phase shift as . A policy is a ‘behavior pattern’ for the PU, i.e., a collection of rules that tell the PU how to set given and which phase estimate to report at the end.
The error probability distribution of the policy yields the standard error of the estimate for . As is cyclic over , is given by the Holevo variance , for the sharpness of Holevo1984 (). Evaluating requires exponential computing time with respect to and thus is computationally intractable. However, from trial runs of with randomly chosen phases , we can infer a sharpness estimate for the error of the phase estimate. For QEM, should scale better than the SQL and as close as possible to the ultimate Heisenberg limit Luo:Lett.Math.Phys:2000 (); giovannetti:010401 (); Pezze&Smerzi:PhysRevLett.2009 ().
For unitary evolution, the interferometer transforms each input qubit by for the Pauli matrices, a unit vector, and the interferometric phase difference. Without loss of generality, we can restrict our analysis to . However, because of imperfections, a real\hypworld interferometer is represented by a non\hypunitary quantum channel . We assume an unbiased interferometer, i.e. a random input qubit is mapped to itself (), corresponding to a unital channel. Hence, for continuous or discrete and countable and Mendl&Wolf:ComMathPhys2009 (),
(1) 
with and for an ideal interferometer. In contrast, corresponds to an input stateindependent loss rate , and quantum noise is incorporated by being a general distribution with and . We simulate noise using normal distributions with the aforementioned means and small standard deviations , corresponding to visibility . For an optical interferometer, noise corresponds to pathlength difference fluctuations and to beam splitter reflectivity fluctuations. We utilize the input state
from PhysRevA.63.053804 (); PhysRevLett.85.5098 (); QLearning:hentschel:PRL:2010 (), with Wigner’s matrix Group_theory_and_its_application_to_QM.Wigner.1931 (). is a permutationallysymmetric state with qubits in and in Hentschel:PermutationallySymmetric_Qubit_Strings:J.Phys.A:2011 (). The state is appealing because it allows precision close to the Heisenberg limit PhysRevA.63.053804 (); PhysRevLett.85.5098 () and is robust against loss QLearning:hentschel:PRL:2010 (), but our learning methods work for other states as well.
The control flow graph of any deterministic policy for a lossless conditions and a fixed qubit input state can be represented as a binary decision tree of depth with an example shown in Fig. 2(a). Each of the nodes of the tree corresponds to one specific state of the experiment and represents the resultant action of the policy. Numeric optimization is computationally intractable due to the exponentially large number of nodes. Therefore, we restrict our search to policies that implement a ‘generalized logarithmic search’ (GLS) heuristic as described below, because the set of all GLS policies can be parametrized by only parameters and contains phase estimation policies with optimal precision scaling QLearning:hentschel:PRL:2010 () with respect to .
For a uniform prior of , the GLS heuristic commences with the initial feedback . After the measurement result , the feedback phase is . If the qubit is lost, remains unchanged. After all input qubits are processed, there are measurement results , and the GLS heuristic reports the phase estimate . According to this parametrization, every GLS policy for an qubit input state is represented by a vector in the policy space , and any such vector is a valid policy. As any policy utilizes a string of input qubits, we refer to it as an qubit policy. Every implements a GLS because has variable entries compared to logarithmic search (LS) for which Nowak:GLS:2008 (). The qubit LS policy but does not surpass the SQL. The duality between GLS policies and points in allows the use of function optimization techniques to search for an optimal with minimum , i.e. . Unfortunately, this optimization problem is nonconvex and hence difficult QLearning:hentschel:PRL:2010 ().
Particle swarm optimization (PSO) algorithms Eberhart1 (); Engelbrecht:2006:Computational_Swarm_Intelligence () are outstandingly successful for nonconvex optimization. PSO is a ‘collective intelligence’ strategy from the field of machine learning that learns via trialanderror and performs as well as or better than simulated annealing and genetic algorithms SA_vs_PSO_Ethni:2009 (); Kennedy98matchingalgorithms (); Groenwold:2002 (). We have shown that PSO also delivers an autonomous approach to devising adaptive phaseestimation policies for ideal interferometry QLearning:hentschel:PRL:2010 (); QLearning:hentschel:ITNG:2010 ().
To search for , the PSO algorithm models a ‘swarm’ of ‘particles’ that move in the search space . A particle’s position represents a candidate policy for estimating , which is initially chosen at random. Furthermore, remembers the best position, , it has visited so far (including its current position). In addition, communicates with other particles in its neighborhood . We adopt the common approach to set each in a predefined way regardless of the particles’ positions by arranging them in a ring topology: for , all particles with maximum distance on the ring are in . In iteration , the PSO algorithm updates the position of all particles in a roundbased manner as follows.

Each particle samples of its current position with trial runs.

resamples of its personalbest policy , and the performance of is taken to be the arithmetic mean of all sharpness evaluations.

Each updates if and

communicates and to all members of .

Each particle determines the sharpest policy found so far by any one particle in (including itself) and

moves to
(2)
The arrows indicate that the right value is assigned to the left variable. The damping factor assists convergence, and are uniformlydistributed random numbers from the interval that are regenerated each time Eq. (2) is evaluated. The ‘exploitation weight’ parametrizes the attraction of a particle to its personal best position , and the ‘exploration weight’ describes attraction to the best position in the neighborhood. To improve convergence, we bound each component of by a maximum value of . The userspecified parameters , and determine the swarm’s behavior. Tests indicate that , , , and result in the highest probability to find an optimal policy.
The trial runs for assessing sharpness can be simulated or performed with a real worldexperiment. For finite , the sampled sharpness has statistical errors that can prevent the PSO algorithm from learning optimal solutions BartzBeielstein&el:Metaheuristics:2007 (). We reduce sharpness errors by averaging over multiple samples in step (ii) SWISCONF2005004 (). However, for , the PSO algorithm fails to learn good policies from scratch due to sharpness errors QLearning:hentschel:ITNG:2010 (). Therefore, we maintain our earlier strategy of running the learning algorithm for each independently when . For , our new heuristic bootstraps a starting point for the optimization of an qubit policy from the best qubit policy . Our heuristic exploits the fact that an qubit policy can be used as an qubit policy by ignoring the measurement result. For , the optimal qubit policy estimates phases with only less accuracy compared to an optimal qubit policy when used with the qubit input ^{1}^{1}1 See Fig. 5, page 5. . Furthermore, the performance difference between the optimal qubit policy and the qubit policy decreases with increasing because the relative change in qubit number decreases with increasing . Therefore, a good qubit policy is a valuable starting point for optimizing an qubit policy.
Utilizing previously learned policies is done at the initialization step of the PSO algorithm. The initial policy is selected as the particle’s starting position with probability
(3)  
(4)  
(5) 
with a truncated normal distribution. See Fig. 2(b) for an illustration of this strategy. The standard deviation determines the similarity of the first actions of the newly generated policies compared to the template policy . determines the extent to which the action for the new qubit agrees with the previous action of . We found that and yields a high success rate for our PSO heuristic.
For and perfect interferometry, we verified that our new PSO algorithm with swarm size learns optimal qubit policies regardless of whether each policy’s sharpness is evaluated exactly (requires time ) or sampled from trial runs (requires polynomial runtime in when simulated). Therefore, we sample the sharpness of each particle’s current position and personal best position in each PSO iteration. As we run the PSO algorithm for a constant iterations, the entire optimization process requires trials. However, to obtain an qubit policy, we have to optimize policies for input qubits beforehand, as our algorithm requires an qubit policy for devising an qubit policy for any . Therefore, learning an qubit policy requires trial runs. When the trials are simulated, the computational complexity of our PSO heuristic is (hence efficient) as a single trial run can be simulated in time Hentschel:PermutationallySymmetric_Qubit_Strings:J.Phys.A:2011 (). Once learned, the execution of an qubit policy requires entangled input qubits.
We trained our PSO algorithm with simulated trial runs for various noise and loss rates. In each case, our PSO algorithm tries to find the sharpest policy for given . As the algorithm uses stochastic optimization, it is not guaranteed to learn the optimal policy every time and must be run several times independently for each . Nevertheless, within the limits of available computational resources, the PSO algorithm succeeded in at least 25% of the runs, independently of . We compared the policies generated by our new machinelearning algorithm to our previous numericallyoptimized policies QLearning:hentschel:PRL:2010 (), the BerryWiseman (BW) policy PhysRevLett.85.5098 (), and policies obtained by bruteforce numerical optimization PhysRevA.63.053804 ().
We first discuss policies for a noiseless, lossless setup, i.e., for unitary evolution. Fig. 3(a) shows that our new method, tested to the limits of available computational resources, outperforms the BWpolicy. We estimate the performance difference by calculating the scaling of the Holevo variance . Our policies yield with , compared to the inferred scaling for .Furthermore, our new efficient method greatly surpasses our previous optimization scheme QLearning:hentschel:PRL:2010 () by more than tripling the domain of for developing policies while maintaining the same precision. The inefficient bruteforce optimization was carried out in the full policy space, i.e. without restriction to GLSpolicies. However, the resulting globally optimal policies perform better only by a constant factor of compared to our PSOoptimized policies but do not yield better scaling . As expected the PSO algorithm yields policies approaching the SQL for separable input states (Fig. 3(b)) Luo:Lett.Math.Phys:2000 (); giovannetti:010401 (); Pezze&Smerzi:PhysRevLett.2009 ().
Our new algorithm delivers the first QEM policies optimized for a simulated imperfect interferometer with loss and Gaussian quantum noisy. When applied to noisy conditions, policies generated by our new algorithm have significantly improved performances compared to policies optimized for perfect interferometry. As expected, the performance difference increases with the noise level ^{2}^{2}2 See Fig. 6, page 6. . We verify that our algorithm successfully devises superior policies also for nonGaussian noise by using skewnormal distributions with skewness for and Azzalini:Scand.J.Stat.:2005 (). We find that a nonzero third standardized moment with variances kept as before does not reduce the performance of the policies learned by our new PSO algorithm ^{3}^{3}3 See Fig. 5, page 5. .
In summary, we have devised an efficient machine learning algorithm to construct adaptivefeedback measurement policies autonomously for timeindependent, singleparameter estimation problems. Our one prerequisite is a trainingphase comparison criterion to evaluate the success of candidate policies. Within the limits of available computational resources, our PSOgenerated policies outperform all known schemes for adaptive singleshot phase estimation with direct measurement of the channel output. Our algorithm learns to account for experimental errors and loss thereby making timeconsuming error modeling and extensive calibration dispensable.
Acknowledgments: We thank B. Bunk and HumboldtUniversität zu Berlin for computational resources, and D. W.Ḃerry, L. Maccone, and H. M. Wiseman for comments on an earlier draft. This project has been supported by iCORE, AITF, NSERC, and CIFAR. BCS is supported by a CIFAR Fellowship.
References
 (1) V. Giovannetti, S. Lloyd, and L. Maccone, “Quantum Metrology,” Phys. Rev. Lett., vol. 96, no. 1, p. 010401, 2006.
 (2) L. Pezzé and A. Smerzi, “Entanglement, Nonlinear Dynamics, and the Heisenberg Limit,” Phys. Rev. Lett., vol. 102, p. 100401, Mar 2009.
 (3) S. Luo, “Quantum Fisher Information and Uncertainty Relations,” Letters in Mathematical Physics, vol. 53, pp. 243–251, 2000.
 (4) D. W. Berry and H. M. Wiseman, “Optimal States and Almost Optimal Adaptive Measurements for Quantum Interferometry,” Phys. Rev. Lett., vol. 85, no. 24, pp. 5098–5101, 2000.
 (5) D. W. Berry, H. M. Wiseman, and J. K. Breslin, “Optimal input states and feedback for interferometric phase estimation,” Phys. Rev. A, vol. 63, no. 5, p. 053804, 2001.
 (6) A. Hentschel and B. C. Sanders, “Machine Learning for Precise Quantum Measurement,” Phys. Rev. Lett., vol. 104, p. 063603, Feb 2010.
 (7) J. Hou, “A characterization of positive linear maps and criteria of entanglement for quantum states,” J. Phys. A, vol. 43, no. 38, p. 385201, 2010.
 (8) A. S. Holevo, “Covariant measurements and imprimitivity systems,” in Quantum Probability and Applications to the Quantum Theory of Irreversible Processes, vol. 1055 of Lecture Notes in Mathematics, pp. 153–172, Springer Berlin / Heidelberg, 1984.
 (9) C. Mendl and M. Wolf, “Unital Quantum Channels – Convex Structure and Revivals of Birkhoff’s Theorem,” Commun. Math. Phys., vol. 289, pp. 1057–1086, 2009. 10.1007/s0022000908242.
 (10) E. P. Wigner, Group Theory and its Application to the Quantum Mechanics of Atomic Spectra. Academic Press, New York, 1971.
 (11) A. Hentschel and B. C. Sanders, “Ordered Measurements of PermutationallySymmetric Qubit Strings,” J. Phys. A, vol. 44, p. 115301, Feb 2011.
 (12) R. D. Nowak, “Generalized Binary Search,” in Proc. 46th Allerton Conference on Communications, Control, and Computing, Illinois, USA, 2008, pp. 568–574, IEEE, New York, 2008.
 (13) R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Proc. Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 1995, pp. 39–43, IEEE, New York, 1995.
 (14) A. P. Engelbrecht, Fundamentals of Computational Swarm Intelligence. John Wiley & Sons, England, 2006.
 (15) S. Ethni, B. Zahawi, D. Giaouris, and P. Acarnley, “Comparison of Particle Swarm and Simulated Annealing Algorithms for Induction Motor Fault Identification,” in Proc. 7th IEEE Int. Conf. on Industrial Informatics, Cardiff, England, 2009, IEEE, New York, 2009, June 2009.
 (16) J. Kennedy and W. M. Spears, “Matching Algorithms to Problems: An Experimental Test of the Particle Swarm and Some Genetic Algorithms on the Multimodal Problem Generator,” in Proc. IEEE Congress on Evolutionary Computation, Anchorage, Alaska, pp. 78–83, IEEE, New York, May 1998.
 (17) P. Fourie and A. Groenwold, “The particle swarm optimization algorithm in size and shape optimization,” Structural Multidisciplinary Optimization, vol. 23, pp. 259–267, 2002.
 (18) A. Hentschel and B. C. Sanders, “Machine Learning for Adaptive Quantum Measurement,” in Proc. 7th Int. Conf. Information Technology: New Generations, Las Vegas, 2010, pp. 506–511, IEEE, New York, Apr 2010.
 (19) T. BartzBeielstein, D. Blum, and J. Branke, “Particle Swarm Optimization and Sequential Sampling in Noisy Environments,” in Metaheuristics, vol. 39 of Operations Research/Computer Science Interfaces Series, pp. 261–273, Springer US, 2007.
 (20) J. Pugh, Y. Zhang, and A. Martinoli, “Particle swarm optimization for unsupervised robotic learning,” in Proc. Swarm Intelligence Symposium, Pasadena, 2005, pp. 92–99, IEEE, New York, 2005.
 (21) See Fig. 5, page 5.
 (22) See Fig. 6, page 6.
 (23) A. Azzalini, “The Skewnormal Distribution and Related Multivariate Families,” Scandinavian Journal of Statistics, vol. 32, no. 2, pp. 159–188, 2005.
 (24) See Fig. 5, page 5.
Appendix