Magneticfieldlearning using a single electronic spin in diamond
with onephotonreadout at room temperature
Abstract
Nitrogenvacancy (NV) centres in diamond are appealing nanoscale quantum sensors for temperature, strain, electric fields and, most notably, for magnetic fields. However, the cryogenic temperatures required for lownoise singleshot readout that have enabled the most sensitive NVmagnetometry reported to date, are impractical for key applications, e.g. biological sensing. Overcoming the noisy readout at roomtemperature has until now demanded repeated collection of fluorescent photons, which increases the timecost of the procedure thus reducing its sensitivity. Here we show how machine learning can process the noisy readout of a single NV centre at roomtemperature, requiring on average only one photon per algorithm step, to sense magnetic field strength with a precision comparable to those reported for cryogenic experiments. Analysing large data sets from NV centres in bulk diamond, we report absolute sensitivities of \unit60\nano\tesla\second^1/2 including initialisation, readout, and computational overheads. We show that dephasing times can be simultaneously estimated, and that timedependent fields can be dynamically tracked at room temperature. Our results dramatically increase the practicality of earlyterm single spin sensors.
Quantum sensors are likely to be among the first quantum technologies to be translated from laboratory setups to commercial products Taylor et al. [2008]. The single electronic spin of a nitrogenvacancy (NV) centre in diamond operates with nanoscale spatial resolution as a sensor for electric and magnetic fields McGuinness et al. [2011], Müller et al. [2014], Lovchinsky et al. [2017], Zhao et al. [2011], Barry et al. [2016]. However, achieving high sensitivities for NVmagnetometers has required a low noise mode of operation available only at cryogenic temperatures, which constitutes a major obstacle to realworld applications Robledo et al. [2011], Bonato et al. [2015]. Machine learning has played an enabling role for new generations of applications in conventional information processing technologies, including pattern and speech recognition, diagnostics, and robot control Murphy [2012], Jordan and Mitchell [2015]. Here we show how machine learning algorithms Hentschel and Sanders [2010], Granade et al. [2012], Wiebe et al. [2014], Hincks et al. [2018] can be applied to singlespin magnetometers at room temperature to give a sensitivity that scales with the Heisenberg limit, and reduces overheads by requiring only onephotonreadout. We go on to show that these methods allow multiparameter estimation to simultanesouly learn the decoherence time, and implement a routine for the dynamical tracking of timedependent fields.
Magnetic field sensing with an NV centre uses Ramsey interferometry Taylor et al. [2008], Rondin et al. [2014], Jelezko and Wrachtrup [2006]. With a microwave pulse the spin vector is rotated into an equal superposition of its spin eigenstates, such that its magnetic moment is perpendicular to the magnetic field () to be sensed Degen [2008], Nusran et al. [2012]. For some Larmor precession time, , and frequency, , the relative phase between the eigenstates becomes , where is the electron gyromagnetic ratio of magnetic moment to angular momentum. After a further pulse to complete the Ramsey sequence, a measurement of the spin in its basis provides an estimate of , the precision of which is usually improved by repeating the procedure. Collecting statistics for a series of different , produces a fringe of phase varying with time, from which can be inferred.
Increasing the sensitivity of a magnetometer translates to increasing its rate of sensing precision with sensing time. The intrinsic resource cost in estimating is the total phase accumulation time Arai et al. [2015], Waldherr et al. [2012], Puentes et al. [2014], which is the sum of every performed during an experiment. A fundamental limitation on the sensitivity of an estimate of is quantum projection noise — from the uncertain outcome of a basis measurement — the effect of which is conventionally reduced through repeated measurements, at the cost of increasing the total sensing time. A further typical limitation on sensing precision is the timescale, , on which spin states decohere due to inhomogeneous broadening (though spinecho methods could extend this Balasubramanian et al. [2008]). In an idealised setting, with an optimal sensing protocol, the Heisenberg limit (HL) Berry et al. [2009] in sensitivity can be achieved, to arrive at a precision limited by in the shortest time allowed by quantum mechanics. In practice, overheads such as the time required for initialisation, computation, and readout must also be accounted for, while repeated measurements due to experimental inefficiencies and lowfidelity readout increase the time to reach the precision limited by . The increase in total sensing time due to overheads and repeated measurements thus decreases the sensitivity.
A particularly relevant overhead is the time taken to readout the state of the spin, which depends on the experimental conditions. At cryogenic temperatures, spinselective optical transitions can be accessed such that, during optical pumping, fluorescence is almost completely absent for one of the spin states. This singleshot method allows the spin state to be efficiently determined with a high confidence for any given Ramsey sequence (up to collection and detection efficiencies), resulting in a relatively low readout overhead. At room temperature, in contrast, where spinselective optical transitions are not resolved in a single shot, readout is typically performed by simultaneously exciting a spintriplet that includes both basis states, and observing fluorescence from subsequent decay, the probabilities for which differ by only . Overcoming this classical uncertainty (in addition to quantum projection noise) to allow a precise estimate of the relative spin state probabilities after a given precession time , required repeated Ramsey sequences to produce a large ensemble of fluorescent photons. Such a large readout overhead significantly reduces the sensitivity of NVmagnetometry, and so far, the high sensitivities reported at cryogenic temperatures have been out of reach for room temperature operation by several orders of magnitude. Yet NVsensing at cryogenic temperatures is impractical for biological applications such as invivo measurements McGuinness et al. [2011] and monitoring of metabolic processes Degen et al. [2017].
A large body of work Giovannetti et al. [2004], Higgins et al. [2007], Berry et al. [2009], Higgins et al. [2009], Giovannetti et al. [2011], Said et al. [2011], Waldherr et al. [2012], Nusran et al. [2012], Bonato et al. [2015] has developed and improved quantum sensing algorithms to surpass the classical standard measurement sensitivity (SMS). While the SMS bounds the sensitivity that can be achieved for NVmagnetometry with constant phase accumulation time, phase estimation algorithms using a set of different precession times , allow the SMS to be overcome Waldherr et al. [2012], Nusran et al. [2012]. Further improvements in sensitivity are possible by adapting measurement bases, to require fewer Ramsey sequences Bonato et al. [2015]. However, sensing algorithms that use a standard Bayesian approach typically involve probability distributions that are computationally intensive to update, or which contain outlying regions that significantly affect an estimate. An appealing alternative Granade et al. [2012], Wiebe et al. [2014], Wiebe and Granade [2016] uses techniques from machine learning to approximate a probability distribution with a relatively small collection of points, known as particles. These methods have been applied to the problem of learning a Hamiltonian Wiebe et al. [2014], Wang et al. [2017], and to implement noisetolerant quantum phase estimation Paesani et al. [2017].
Here, we experimentally demonstrate a magnetic field learning (MFL) algorithm that operates with on average only one photon readout from a single NV centre at room temperature, and achieves a level of sensitivity so far only reported for cryogenic operation Bonato et al. [2015]. MFL adapts efficient Bayesian phase estimation and Hamiltonian Learning techniques for magnetometry to achieve a fast convergence to the correct value of the magnetic field, and requires no adaptation of measurement bases. The parameters of our MFL algorithm, including the number of particles, can be optimised prior to operation without adding to the sensing time overhead. Each precession time is chosen Ruster et al. [2017] as the inverse of the uncertainty in the algorithm’s previous estimate of , allowing to grow exponentially to achieve HL scaling in sensitivity. We tested MFL on a large data set from Ramsey interferometry experiments on a bulk diamond NV centre. We benchmark the performance of MFL against standard FFT methods, as well as previous experimental results from other phase estimation algorithms. Simultaneous to the learning of , MFL produces an estimate of , which, in contrast to other phase estimation algorithms, allows MFL to lower bound its sensitivity to the SMS, however long its implementation runtime. Remarkably, we show that MFL enables the dynamical tracking of timevarying magnetic fields at room temperature.
In general, Hamiltonian learning algorithms estimate the parameters of a model Hamiltonian , through iterations of informative measurements Wiebe et al. [2014]. At each step, a prior probability distribution stores estimates of every parameter and its uncertainty Granade et al. [2012]. Similarly, the four principal recursive steps of MFL, called an epoch and depicted in Fig. 1(ad), are: (a) Choose for the next Ramsey sequence from the heuristic , where is the uncertainty embedded in the prior . (b) Allow the system to evolve under for a time , using the Ramsey sequence shown in Fig. 1(eh). (c) Measure the outcome , extracted from the photoluminescence count, e.g. Fig. 1(i). (d) Update the prior using Bayes’ rule, , where is the likelihood function Granade et al. [2012]. The use of sequential Monte Carlo algorithms Granade et al. [2012], Wiebe et al. [2014], Wiebe and Granade [2016] where particles are reallocated when required, makes the inference process practical and computationally efficient. Here, the Hamiltonian for the two relevant NV states is modelled as
(1) 
so that is the only parameter to be estimated to learn the value of .
Experiments were performed using a confocal setup, at room temperature, with an external magnetic field of \unit450G, parallel to the NV centre axis, giving a Zeeman shift of Waldherr et al. [2012], where Jakobi et al. [2017]. For each Ramsey sequence, the electronic spin is initialised and readout with \unit532\nano\meter laser pulses, by detecting the photoluminescence (PL) signal with an avalanche photodiode (APD) for \unit350\nano\second. The PL signal is then normalised to extract an experimental estimate for . For every sequence, the experimental overhead is the sum of the times for the laser pulses length (\unit3\micro\second), an idle time for relaxation (\unit1\micro\second), a short TTL pulse for synchronization (\unit20\nano\second) and the duration of the two MWpulses (together \unit50 \nano\second).
Data for several hundred Ramsey fringes were generated from experiments on three NV centres, labelled , and (see Supplementary Table S1). In particular, the dataset comprises Ramsey sequences for precession times increasing from to in steps of \unit20 \nano\second. For each , sequences were performed, and data were stored such that the results from each individual sequence could be retrieved. Therefore, subsets of data from could be selected and combined to construct fringes comprised of sequences. Running MFL on a sample of these subsets allowed its performance to be compared over fringes with different (but fixed within a fringe) numbers of sequences including down to , where (due to low collection efficiencies) the average PL count () is approximately one photon. Additional experiments on the three NVs generated further data sets for several hundred fringes that each comprised tens of thousands of averaged sequences. All implementations of MFL are reported as representative behaviour averaged over 1000 independent protocol runs (unless otherwise stated) each using a single fringe from these data sets.
We begin by analysing how the estimate of uncertainty in the magnetic field, , given by the variance of , scales with the number of MFL epochs. For this purpose, we use the dataset , with fringes all obtained with sequences. At every MFL epoch, given the adaptively chosen phase accumulation time , the experimental datum with minimising () is provided to the MFL updater. Figure 2(a) shows an exponential decrease in the scaling of , until epochs are reached. After this point, the precession times selected by MFL saturate at \unit10\micro\second, and is reduced only polynomially fast, by accumulating statistics for already retrieved. This slowdown is analogous to that occurring when the heuristic requires exceeding the system dephasing time Granade et al. [2012] (see Supplementary Information for details). A comparison with FFT methods, inset in Fig. 2(a), finds that is times smaller for MFL.
Neglecting overheads, the sensitivity of a magnetometer, is calculated from
(2) 
where from epochs. Figure 2(b) plots against , for each epoch, and compares MFL with the standard FFT method, using the same set. The precision of MFL scales as , which overlaps with HL scaling (). The FFT method rapidly approaches the SMS (), whereas (neglecting overheads) the scaling reported for quantum phase estimation methods are qualitatively comparable to MFL, at the expense of more intensive postprocessing Said et al. [2011].
For a true measure of absolute sensitivity, experimental and computational overheads must be accounted for. Including initialisation, readout and computation time, into the total running time , we redefine Eq. 2 for absolute scaling of (see Methods for details). The average number of luminescent photons, , used for readout during each epoch, scales linearly with the number of sequences (); on average, one photon every sequences is detected. As shown in Fig. 3, we use MFL to measure the scaling of with (up to epochs) for decreasing numbers within each epoch. The plots have a shape characterised by an initial slow decrease, followed by a fast increase in precision. The relatively slow learning rate for the short phase accumulation times in the early stages of the algorithm leads to a slow increase in phase accumulation time, since (). The algorithm is slowly learning but the total measurement time is increasing faster than the decrease in uncertainty. However, when the particles start converging to a valid estimate of , the uncertainty decreases exponentially, overcoming the corresponding increase in sensing times. Our analysis compares well with previous results performed under cryogenic conditions, and scaling parameters for linear least squares fitting obtain a consistent overlap with HL scaling for protocol update rates up to \unit13Hz.
Decreasing the number of sequences (thus ) per epoch increases the statistical noise, which extends the slow learning period. However, the total time decreases with to produce an increased sensitivity in a shorter time. For , readout infidelities and losses become the dominant noise mechanisms. In the case for therefore, these additional sources of noise were included in the model. For we obtain a sensitivity of \unit60\nano\tesla \second^1/2 in \unit10 \milli\second (see also the Supplementary Information).
When an NVsensing algorithm begins to request precession times beyond , where no information can be retrieved, the effectively wasted sensing time reduces the sensitivity. Knowledge of can ensure that all are less than , to prevent this reduction in sensitivity and instead guaranteeing it to scale at the SMS for long sensing times. Learning simultaneously with , as part of a multiparameter estimation strategy Ralph et al. [2017], Ciampini et al. [2016], can be more efficient than independently estimating ahead of each sensing experiment. MFL naturally operates as a multiparameter estimation protocol when the prior probability distribution is multivariate Granade et al. [2012], with the uncertainty in its joint probability distribution captured by a normalised covariance matrix .
Each precession time is chosen proportionality to the inverse of the (Frobenius) norm of the covariance matrix (see Methods). This can incur an initial slow learning period due to shorter being initially most useful in estimating while longer precession times are better for an estimation of . We therefore begin MFL in the single parameter estimation mode for , and introduce the simultaneous learning of at epoch (chosen empirically).
Figure 4 shows results from running the MFL algorithm on the data set, where . As is the case for single parameter estimation results, we find an exponential scaling of the generalised uncertainty with the number of epochs, though the learning rate for is faster than that for . There is a discrepancy between the estimate of from MFL shown in 4(a) and the fit (nonweighted leastsquare) to the decaying sinusoid shown in 4(d). The discrepancy between these two estimates results from the PGH preferentially requesting , such that an estimate of is more informed by data at these relatively shorter time scales (see Methods).
The strength of may not be fixed in time for typical sensing experiments Bonato and Berry [2017]. The Bayesian inference process is conceived to learn online when experimentally retrieved likelihoods conflict with its prior information. Thus, the ability to track timevarying magnetic fields follows naturally from the MFL’s processing speed and adaptivity. With minor controls in the Bayesian inference procedure, MFL can account for such fluctuations and highamplitude changes in the sensed (See Methods for details). Here, we test an algorithm that tracks a field using the dataset, where was experimentally modulated by changing the position of the permanent magnet (see Fig 1b). Data recording was paused during magnet adjustments, leading to stepwise transitions in this data set, where the magnetic field instantly jumps to a new strength then remains stable for a period of between hundreds and thousands of milliseconds.
Results are shown in Fig. 5(a), with a maximum fold instantaneous change in . MFL detects when the posterior distribution has become nonrepresentative of the most recent measurements, by increasing the uncertainty, . After approximately 10 epochs, the estimate converges to the new value set for . Figure 5(b) summarises the different computational and experimental contributions to the total running time per epoch ( \unit10\milli\second). The computational time cost of MFL is , with the remaining time costs coming from experimental routines. We note the computational efficiency of MFL allows a computational overhead (\unit0.21\milli\second) that is smaller than the average phase accumulation time (\unit0.41\milli\second) and two orders of magnitude smaller than the experimental overheads (\unit16.28\milli\second).
Figure 5(c) shows numerical results demonstrating the resilience of MFL against a dynamic component of increasing frequency, when tracking an A.C. oscillating field , where we choose , with a constant and . The effectiveness of the tracking for each run is captured by a timedependent normalised squared error , averaged for all epochs performed, capturing the efficiency of the tracking as is not constant along epochs. Typical estimation errors in are lower than 3% for dynamic components up to \unit18\micro\tesla/ \milli\second.
The performance of magnetic field learning found for our room temperature setup is comparable to other protocols in cryogenic environmentsBonato et al. [2015]. These methods could be applied to other sensing platforms where noise has been a limiting factor. Alternatively, in pursuit of the fundamental limits in absolute sensing precision they could be used together with singleshot readout Robledo et al. [2011], adaptive measurement bases Bonato et al. [2015], faster communication, and dynamical decoupling techniques MacQuarrie et al. [2015], Farfurnik et al. [2018]. Our methods would be particularly effective in applications where singlespin sensing is desired for nanoscale resolution, but where cryogenic conditions are prohibitive, such as biological sensing and in new nanoMRI applications Barry et al. [2016], Boretti and Castelletto [2016].
References
 Taylor et al. [2008] J M Taylor, P Cappellaro, L Childress, L Jiang, D Budker, P R Hemmer, A Yacoby, R Walsworth, and M D Lukin. Highsensitivity diamond magnetometer with nanoscale resolution. Nature Physics, 4(10):810–816, October 2008. doi: 10.1038/nphys1075. URL https://www.nature.com/articles/nphys1075.
 McGuinness et al. [2011] L P McGuinness, Y Yan, A Stacey, D A Simpson, L T Hall, D Maclaurin, S Prawer, P Mulvaney, J Wrachtrup, F Caruso, R E Scholten, and L C L Hollenberg. Quantum measurement and orientation tracking of fluorescent nanodiamonds inside living cells. Nature Nanotechnology, 6(6):358–363, June 2011. doi: 10.1038/nnano.2011.64. URL https://www.nature.com/articles/nnano.2011.64.
 Müller et al. [2014] C Müller, X Kong, J M Cai, K Melentijević, A Stacey, M Markham, D Twitchen, J Isoya, S Pezzagna, J Meijer, J F Du, M B Plenio, B Naydenov, L P McGuinness, and F Jelezko. Nuclear magnetic resonance spectroscopy with single spin sensitivity. Nature Communications, 5:4703, August 2014. doi: 10.1038/ncomms5703. URL https://www.nature.com/articles/ncomms5703.
 Lovchinsky et al. [2017] I. Lovchinsky, J. D. SanchezYamagishi, E. K. Urbach, S. Choi, S. Fang, T. I. Andersen, K. Watanabe, T. Taniguchi, A. Bylinskii, E. Kaxiras, P. Kim, H. Park, and M. D. Lukin. Magnetic resonance spectroscopy of an atomically thin material using a singlespin qubit. Science, 355(6324):503–507, February 2017. ISSN 00368075. doi: 10.1126/science.aal2538. URL http://science.sciencemag.org/content/355/6324/503.
 Zhao et al. [2011] N Zhao, J L Hu, S W Ho, J T K Wan, and R B Liu. Atomicscale magnetometry of distant nuclear spin clusters via nitrogenvacancy spin in diamond. Nature Nanotechnology, 6(4):242–246, April 2011. doi: 10.1038/nnano.2011.22. URL https://www.nature.com/articles/nnano.2011.22.
 Barry et al. [2016] J F Barry, M J Turner, J M Schloss, David R Glenn, Y Song, M D Lukin, H Park, and R L Walsworth. Optical magnetic detection of singleneuron action potentials using quantum defects in diamond. Proceedings of the national academy of sciences, 113(49):14133–14138, November 2016. ISSN 00278424. doi: 10.1073/pnas.1601513113. URL http://www.pnas.org/content/113/49/14133.
 Robledo et al. [2011] L Robledo, L Childress, H Bernien, B Hensen, P F A Alkemade, and R Hanson. Highfidelity projective readout of a solidstate spin quantum register. Nature, 477(7366):574–578, September 2011. doi: 10.1038/nature10401. URL https://www.nature.com/articles/nature10401.
 Bonato et al. [2015] C Bonato, M S Blok, H T Dinani, D W Berry, M L Markham, D J Twitchen, and R Hanson. Optimized quantum sensing with a single electron spin using realtime adaptive measurements. Nature Nanotechnology, 11(3):247–252, November 2015. doi: 10.1038/nnano.2015.261. URL http://dx.doi.org/10.1038/nnano.2015.261.
 Murphy [2012] K P Murphy. Machine Learning: a Probabilistic Perspective. MIT Press, 2012. ISBN 9780262018029. URL https://mitpress.mit.edu/books/machinelearning1.
 Jordan and Mitchell [2015] M I Jordan and T M Mitchell. Machine learning: Trends, perspectives, and prospects. Science, 349(6245):255–260, July 2015. ISSN 00368075. doi: 10.1126/science.aaa8415. URL http://science.sciencemag.org/content/349/6245/255.
 Hentschel and Sanders [2010] A Hentschel and B C Sanders. Machine learning for precise quantum measurement. Physical Review Letters, 104(6):063603, February 2010. doi: 10.1103/PhysRevLett.104.063603. URL https://link.aps.org/doi/10.1103/PhysRevLett.104.063603.
 Granade et al. [2012] C E Granade, C Ferrie, N Wiebe, and D G Cory. Robust online hamiltonian learning. New Journal of Physics, 14(10):103013, October 2012. doi: 10.1088/13672630/14/10/103013. URL http://iopscience.iop.org/article/10.1088/13672630/14/10/103013.
 Wiebe et al. [2014] N Wiebe, C Granade, C Ferrie, and D G Cory. Hamiltonian learning and certification using quantum resources. Physical Review Letters, 112(19):190501–5, May 2014. doi: 10.1103/PhysRevLett.112.190501. URL http://link.aps.org/doi/10.1103/PhysRevLett.112.190501.
 Hincks et al. [2018] I Hincks, C Granade, and D G Cory. Statistical inference with quantum measurements: methodologies for nitrogen vacancy centers in diamond. New Journal of Physics, 20(1):013022, January 2018. doi: 10.1088/13672630/aa9c9f. URL http://iopscience.iop.org/article/10.1088/13672630/aa9c9f.
 Rondin et al. [2014] L Rondin, JP Tetienne, T Hingant, JF Roch, P Maletinsky, and V Jacques. Magnetometry with nitrogenvacancy defects in diamond. Reports on progress in physics, 77(5), May 2014. doi: 10.1088/13672630/14/10/103033. URL http://iopscience.iop.org/article/10.1088/00344885/77/5/056503/meta.
 Jelezko and Wrachtrup [2006] F Jelezko and J Wrachtrup. Single defect centres in diamond: A review. Physica status solidi (a), 203(13):3207–3225, October 2006. doi: 10.1002/pssa.200671403. URL https://onlinelibrary.wiley.com/doi/full/10.1002/pssa.200671403.
 Degen [2008] C L Degen. Scanning magnetic field microscope with a diamond singlespin sensor. Applied Physics Letters, 92(24):243111, June 2008. doi: 10.1063/1.2943282. URL http://aip.scitation.org/doi/10.1063/1.2943282.
 Nusran et al. [2012] N M Nusran, M Ummal Momeen, and M V G Dutt. Highdynamicrange magnetometry with a single electronic spin in diamond. Nature Nanotechnology, 7(2):109–113, February 2012. doi: 10.1038/nnano.2011.225. URL https://www.nature.com/articles/nnano.2011.225.
 Arai et al. [2015] K Arai, C Belthangady, H Zhang, N BarGill, S J DeVience, P Cappellaro, A Yacoby, and R L Walsworth. Fourier magnetic imaging with nanoscale resolution and compressed sensing speedup using electronic spins in diamond. Nature Nanotechnology, 10(10):859–864, October 2015. doi: 10.1038/nnano.2015.171. URL https://www.nature.com/articles/nnano.2015.171.
 Waldherr et al. [2012] G Waldherr, J Beck, P Neumann, R S Said, M Nitsche, M L Markham, D J Twitchen, J Twamley, F Jelezko, and J Wrachtrup. Highdynamicrange magnetometry with a single nuclear spin in diamond. Nature Nanotechnology, 7(2):105–108, February 2012. doi: 10.1038/nnano.2011.224. URL https://www.nature.com/articles/nnano.2011.224.
 Puentes et al. [2014] G Puentes, G Waldherr, P Neumann, G Balasubramanian, and J Wrachtrup. Efficient route to highbandwidth nanoscale magnetometry using single spins in diamond. Scientific Reports, 4(1):4677, April 2014. doi: 10.1038/srep04677. URL https://www.nature.com/articles/srep04677.
 Balasubramanian et al. [2008] G Balasubramanian, I Y Chan, R Kolesov, M AlHmoud, J Tisler, C Shin, C Kim, A Wojcik, P R Hemmer, A Krueger, T Hanke, A Leitenstorfer, R Bratschitsch, F Jelezko, and J Wrachtrup. Nanoscale imaging magnetometry with diamond spins under ambient conditions. Nature, 455(7213):648, October 2008. doi: 10.1038/nature07278. URL https://www.nature.com/articles/nature07278.
 Berry et al. [2009] D W Berry, B L Higgins, S D Bartlett, M W Mitchell, GJ Pryde, and H M Wiseman. How to perform the most accurate possible phase measurements. Physical Review A, 80(5):3–22, November 2009. doi: 10.1103/PhysRevA.80.052114. URL https://link.aps.org/doi/10.1103/PhysRevA.80.052114.
 Degen et al. [2017] C L Degen, F Reinhard, and P Cappellaro. Quantum sensing. Reviews of Modern Physics, 89(3):035002, July 2017. doi: 10.1103/RevModPhys.89.035002. URL http://link.aps.org/doi/10.1103/RevModPhys.89.035002.
 Giovannetti et al. [2004] V Giovannetti, S Lloyd, and L Maccone. Quantumenhanced measurements: Beating the standard quantum limit. Science, 306(5700):1330–1336, November 2004. doi: 10.1126/science.1104149. URL http://www.sciencemag.org/cgi/doi/10.1126/science.1104149.
 Higgins et al. [2007] B L Higgins, D W Berry, S D Bartlett, H M Wiseman, and GJ Pryde. Entanglementfree heisenberglimited phase estimation. Nature, 450(7168):393–396, November 2007. doi: 10.1038/nature06257. URL https://www.nature.com/articles/nature06257.
 Higgins et al. [2009] B L Higgins, D W Berry, S D Bartlett, M W Mitchell, H M Wiseman, and GJ Pryde. Demonstrating heisenberglimited unambiguous phase estimation without adaptive measurements. New Journal of Physics, 11(7):073023, July 2009. doi: 10.1088/13672630/11/7/073023. URL http://iopscience.iop.org/article/10.1088/13672630/11/7/073023.
 Giovannetti et al. [2011] V Giovannetti, S Lloyd, and L Maccone. Advances in quantum metrology. Nature Photonics, 5(4):222–229, April 2011. doi: 10.1038/nphoton.2011.35. URL https://www.nature.com/articles/nphoton.2011.35.
 Said et al. [2011] R. S. Said, D. W. Berry, and J. Twamley. Nanoscale magnetometry using a singlespin system in diamond. Phys. Rev. B, 83:125410, Mar 2011. doi: 10.1103/PhysRevB.83.125410. URL https://link.aps.org/doi/10.1103/PhysRevB.83.125410.
 Wiebe and Granade [2016] N Wiebe and C Granade. Efficient bayesian phase estimation. Physical Review Letters, 117(1):010503, June 2016. doi: 10.1103/PhysRevLett.117.010503. URL https://link.aps.org/doi/10.1103/PhysRevLett.117.010503.
 Wang et al. [2017] J. Wang, S Paesani, R Santagati, S Knauer, A A Gentile, N Wiebe, M Petruzzella, J L O’Brien, J G Rarity, A Laing, and M G Thompson. Experimental quantum hamiltonian learning. Nature Physics, 13(6):551–555, June 2017. doi: 10.1038/nphys4074. URL http://www.nature.com/doifinder/10.1038/nphys4074.
 Paesani et al. [2017] S Paesani, A A Gentile, R Santagati, J. Wang, N Wiebe, D P Tew, J. L. O’Brien, and M. G. Thompson. Experimental bayesian quantum phase estimation on a silicon photonic chip. Physical Review Letters, 118(10):100503, March 2017. doi: 10.1103/PhysRevLett.118.100503. URL https://link.aps.org/doi/10.1103/PhysRevLett.118.100503.
 Ruster et al. [2017] T Ruster, H Kaufmann, M A Luda, V Kaushal, C T Schmiegelow, F SchmidtKaler, and U G Poschinger. Entanglementbased dc magnetometry with separated ions. Physical Review X, 7(3):031050, September 2017. doi: 10.1103/PhysRevX.7.031050. URL https://link.aps.org/doi/10.1103/PhysRevX.7.031050.
 Bonato and Berry [2017] C Bonato and D W Berry. Adaptive tracking of a timevarying field with a quantum sensor. Physical Review A, 95(5):052348, May 2017. doi: 10.1103/PhysRevA.95.052348. URL http://link.aps.org/doi/10.1103/PhysRevA.95.052348.
 Jakobi et al. [2017] I Jakobi, P Neumann, Y Wang, D B R Dasari, F El Hallak, M A Bashir, M Markham, A Edmonds, D Twitchen, and J Wrachtrup. Measuring broadband magnetic fields on the nanoscale using a hybrid quantum register. Nature Nanotechnology, 12(1):67–72, January 2017. doi: 10.1038/nnano.2016.163. URL https://www.nature.com/articles/nnano.2016.163.
 Ralph et al. [2017] J F Ralph, S Maskell, and K Jacobs. Multiparameter estimation along quantum trajectories with sequential monte carlo methods. Phys. Rev. A, 96:052306, Nov 2017. doi: 10.1103/PhysRevA.96.052306. URL https://link.aps.org/doi/10.1103/PhysRevA.96.052306.
 Ciampini et al. [2016] M A Ciampini, N Spagnolo, C Vitelli, L Pezzè, A Smerzi, and F Sciarrino. Quantumenhanced multiparameter estimation in multiarm interferometers. Scientific Reports, 6(1):222, July 2016. doi: 10.1038/srep28881. URL https://www.nature.com/articles/srep28881.
 MacQuarrie et al. [2015] E. R. MacQuarrie, T. A. Gosavi, S. A. Bhave, and G. D. Fuchs. Continuous dynamical decoupling of a single diamond nitrogenvacancy center spin with a mechanical resonator. Phys. Rev. B, 92:224419, Dec 2015. doi: 10.1103/PhysRevB.92.224419. URL https://link.aps.org/doi/10.1103/PhysRevB.92.224419.
 Farfurnik et al. [2018] D Farfurnik, A Jarmola, D Budker, and N BarGill. Spin ensemblebased ac magnetometry using concatenated dynamical decoupling at low temperatures. Journal of Optics, 20(2):024008, January 2018. doi: 10.1088/20408986/aaa1bf. URL http://iopscience.iop.org/article/10.1088/20408986/aaa1bf.
 Boretti and Castelletto [2016] Albert Boretti and Stefania Castelletto. Nanometric resolution magnetic resonance imaging methods for mapping functional activity in neuronal networks. MethodsX, 3:297 – 306, 2016. ISSN 22150161. doi: https://doi.org/10.1016/j.mex.2016.04.003. URL http://www.sciencedirect.com/science/article/pii/S2215016116300085.
 Granade et al. [2017] C Granade, C Ferrie, Ian Hincks, S Casagrande, T Alexander, J Gross, M Kononenko, and Y Sanders. Qinfer: Statistical inference software for quantum applications. Quantum, 1:5, April 2017. doi: 10.22331/q201704255. URL http://quantumjournal.org/papers/q201704255/.
 Cover and Thomas [2006] T M Cover and J A Thomas. Elements of information theory. John Wiley & Sons, Inc., Hoboken, NJ, USA, 2006. ISBN 9780471241959. doi: 10.1002/047174882X. URL http://doi.wiley.com/10.1002/047174882X.
 Ferrie [2014] C Ferrie. Dataprocessing inequalities for quantum metrology. Physical Review A, 90(1):014101, July 2014. doi: 10.1103/PhysRevA.90.014101. URL https://link.aps.org/doi/10.1103/PhysRevA.90.014101.
 Gill and Levit [1995] R D Gill and B Y Levit. Applications of the van trees inequality: a bayesian cramérrao bound. Bernoulli, 1(12):59–79, March 1995. doi: 10.2307/3318681. URL https://www.jstor.org/stable/3318681.
 Opper [1999] M Opper. A Bayesian approach to online learning. Cambridge University Press, New York, May 1999. ISBN 0521652634. URL http://dl.acm.org/citation.cfm?id=304710.304756.
Methods
MFL execution The data processing was performed by adapting the open Python package QInfer Granade et al. [2017] to the case of experimental metrology.
In order to describe experimental data from Ramsey fringes collected from an NV centre with dephasing time , immersed in a magnetic field of intensity , we adopt the likelihood function as in Granade et al. [2012]:
(3) 
where is a known parameter, or approximated by in all cases when .
In cases when , the datum adopted was obtained from combined sequences as stated in the main text. Results in Fig. 2a&b and Fig. 5 were all obtained adopting a majority voting scheme to preprocess data from combined sequences Paesani et al. [2017]. Majority voting decides each singleshot datum according to the most frequent outcome. This is done by previously determining, during the characterisation of the experimental setup, the average photoluminescence counts () detected throughout the execution of a Ramsey sequence. The datum of a single outcome is determined by comparing the number of photons detected during the measurement (extracted from sweeps), , and . If then we set the value of the outcome to , otherwise to . Without this scheme in place, the outcome of a measurement is assigned sampling from the set , with probabilities , respectively, with the maximum photoluminescence counts estimated during the characterisation.
Other than the study of in Fig. 3, further examples of the performance of MFL with no majority voting scheme in place are reported in the Supplementary Information.
Errors in the precision scaling are estimated from a bootstrapping procedure, involving a sampling with replacement from the available runs (). The cardinality of each resample matches . The resampling is repeated times. Median precision scalings from each resample are estimated, and the standard deviation from this approximate population of scaling performances is provided as the precision scaling error.
Absolute scaling In Fig. 3 we reported the absolute scaling of , which requires to take into account the main experimental and computational overheads contributing to the total running time of a phase estimation (PE) protocol (communication time is not considered here). In particular, these can be listed as: the time required by the PE algorithm to compute the next experiment (here \unit0.4 \micro\second per step, per particle on a singlecore machine), the duration of the laser pulse for initialisation and readout (\unit3\micro\second in total), the waiting time for relaxation (\unit1\micro\second), a short TTL pulse for the photodetector (\unit20\nano\second) and the duration of MWpulses (approximately \unit50\nano\second in total). Including variable and constant overheads, we obtain:
(4) 
after epochs of a PE algorithm.
In the case, the final \unit0.45\micro\tesla after epochs, and \unit18\milli\second, that is \unit60\nano\tesla\second^1/2. In the case, exhibiting a precision scaling that is essentially Heisenberg limited, the uncertainty saturates at protocol convergence ( epochs) to \unit0.3\micro\tesla, for a total running time \unit78\milli\second. This leads to a final sensitivity \unit84\nano\tesla\second^1/2 and \unit12.8\hertz repetition rates.
Multiparameter Learning For the multiparameter case, we use again Eq. 3, but now considering the unknown parameters . Each precession time is chosen proportionally to the inverse of the Frobenius norm of the covariance matrix, . The parameters and are introduced to render dimensionless, with the prior at epoch , when both parameters start to be learnt simultaneously. In this analysis this corresponded to \unit11\micro\tesla and \unit20.2\micro\second, however we stress how different choices would be possible, with equivalent results for , up to a normalisation factor. We observed that MFL estimates of the dephasing time may differ consistently from a nonweighted leastsquare fit. In the presence of dephasing, the heuristic of MFL will preferentially adopt experiments with . This relation is similar to a weighing mechanism of the data (see also Supplementary Information ), preferring more consistent observations. On the other hand, a leastsquare fit will attempt to equally mediate over datapoints where the contrast in the fringes is almost completely lost, underestimating .
MFL tracking We mentioned that Bayesian inference processes are ideally suited for tracking purposes. However, we observe that in cases where the magnitude of the changes in the parameter completely invalidates the aposteriori credibility region, the recovery time of a standard Hamiltonian learning protocol might be unsuitable for practical applications. To tackle also this situation, we modified here the standard update procedure to reset its prior when the effective sample size of the particles’ ensemble is not restored above the resampling threshold by a sequence of resampling events. Details and a pseudocode are provided in Supplementary Information.
FFT execution For most analyses, FFT estimates were run against the whole datasets available. For example in the case of Fig. 2, the final estimate provided by a single run of FFT was performed using once all of the 500 phase accumulation times, recorded with \unit20\nano\second steps, for a representative subset among those available in (Supplementary Table 1). We emphasise how this amounts to twice as many ’s as those actually used by the MFL algorithm (being the singlerun estimate reported as converged after 250 epochs).
The only exception is the tracking in Fig. 5, where the datapoints were cumulatively added to the dataset. In such tracking applications, as long as is kept constant, the estimate from FFT compares to MFL in a way similar to Fig. 2. However, FFT keeps estimating from the prominent peak in the spectrum, corresponding to the that was maintained for the longest time, not the most recent. Thus, it fails to track changes as they occur.
Experimental details In Ramsey interferometry, as performed here, we measure the magnetic field component parallel to the NV centres’ symmetry axes. However, the MFL protocol can be expanded to differently orientated NV centres, to detect arbitrary orientated magnetic fields.
The experiments are performed here using two different C isotopically purified diamond samples. For the Ramsey interferometry we use the and electronic sublevels.
Photon number estimation After exciting a single NV centre by a 532nm laser pulse, the redshifted, individual photons were detected by an avalanche photodiode. A timetagged single photon counting card with nanosecond resolution was used for recoding. A TTLconnection between the timetagger and the MW pulse generator synchronises the photon arrival time with respect to the pulse sequence and allows to record the number of detected photons for every single laser pulse. Thereby, the photon detection efficiency is mainly limited by the collection volume, the total reflection within diamond (due to the high refractive index) and further losses due to the optics. This results in a photon detected about every eighth laser pulse. Thus, to readout the NV state with highfidelity (and about 30% contrast) multiple measurements are usually required for meaningful statistics.
Acknowledgements The authors thank Cristian Bonato and Marco Leonetti for useful discussion. M.G.T. acknowledges support from the ERC starting grant ERC2014STG 640079 and and an EPSRC Early Career Fellowship EP/K033085/1. J.G.R. is sponsored under EPSRC grant EP/M024458/1. L.M. and F.J. would like to acknowledge DFG (Deutsche Forschungsgemeinschaft grants SFB/TR21 and FOR1493), BMBF, VW Stiftung, European Research Council DIADEMS (Simulators and Interfaces with Quantum Systems), ITN ZULF and BioQ. A.Laing acknowledges support from an EPSRC Early Career Fellowship EP/N003470/1.
Supplementary Information
Classical and quantum likelihood estimation
In the main text we have introduced the Bayesian inference process underlying the MFL protocol (known as CLE, Classical Likelihood EstimationGranade et al. [2012]) as composed by four main steps. Here we expand the discussion to provide additional details and comments about the adoption of CLE.

At each epoch the prior distribution is used to choose what experimental setting to use for the next iteration. In MFL, the only experimental setting is the phase accumulation time , that can be updated effectively using the so called particle guess heuristic (see the section below).

The quantum system undergoes an appropriate evolution, according to the Hamiltonian .

The system is prepared in an appropriate initial state , chosen such to have informative evolution under . E.g. a state orthogonal to the Hilbert subspace spanned by the Hamiltonian eigenstates. We remark how is not adaptive in CLE. In this work, the NV centre is always prepared in .

Let the system evolve according to its Hamiltonian for the chosen time .


A measurement is performed on the system (here the quantum sensor). In MFL we perform a projective measurement on the computational basis, obtaining a bipartite outcome .

The computed likelihoods are used to update the probability distribution of the Hamiltonian parameters

The same experiment is also performed on a simulator, implementing a generic parametrised Hamiltonian , thus providing an estimate for the likelihood , i.e. the probability of measuring outcome when is chosen as parameter.

It is thus possible to apply Bayes’ rule:
(S1) where can be immediately inferred from the prior at the corresponding epoch, while is a normalization factor.

Steps 1 – 4 are repeated until the variance of the probability distribution converges, or falls below a predefinite target threshold. In cases with limited readout fidelity like for the NV centre setup presented here, in step 3 a meaningful statistics might be cumulated for repeating the same measurement a number of times , as suggested in the main text. Evidences from the text suggest that in most cases, this is a suboptimal choice for the absolute scaling performance of the MFL protocol. Note how only steps 2 & 3 involve the quantum sensor. All other steps require instead a simulator. In particular, step 4(a) can be performed on a classical or quantum simulator, the choice of the second being justified whenever the size of the sensor, and the eventual lack of an analytical model to simplify the evolution, make the system simulation classically expensive. In this case, the inference process is known as Quantum Likelihood Estimation (QLE, Wiebe and Granade [2016]).
label  NV centre 


(ns)  (T) 

()  
120  18500  20  52    ()  
115  18500  20  710    ()  
67  30000  100  8.3  16   ()  
1  20275  20  58     ( )  
1  8876  200  6.0  64  ()  
1  44000  20     ()  
Sequential Monte Carlo approximation and Particle Guess Heuristic
The protocol performances in terms of computational overhead are made possible by adopting advanced approximate Bayesian inference methods. In particular, MFL inherits from CLE the Sequential Monte Carlo (SMC) approximation Granade et al. [2012], Wiebe et al. [2014], Hincks et al. [2018]. Within this approximation only a finite number of values (called particles) are sampled from the prior distribution, and thus used to approximate the prior in each update step. This approximation makes the Bayesian update as in Eq. S1 numerically feasible.
If the particle positions were held constant throughout the inference process, and starting the protocol from a uniform prior, the cardinality of their set should scale approximately as , where is the expected magnetic field range to be sensed, and is the targeted uncertainty upon convergence. This is inefficient, as with the inference progressing through epochs, many particles will provide very limited support to the updated prior approximation. Indeed, for a successful learning process for the weights of most particles, as they have been effectively ruled out by the observations.
This inefficiency can be addressed with resampling methods, that allow the particles to be sampled again from the updated posterior, whenever their weights signify that the effective size of the sampled particles has fallen below a userdefined threshold. Following Granade et al. [2017], here we adopt a LiuWest resampler (LWR) with optimised resampling threshold and smoothing parameter . These parameters allow to tune when and to what extent the positions of the particles can be altered by the LWR Granade et al. [2012]. Hence, it was possible to accurately represent throughout the whole protocol execution, whilst employing not more than particles for the discretisation in most cases. The only exceptions were limited fidelity cases, as for the absolute scaling we chose the number of particles according to the empirical rule
(S2) 
with the number of averaged single sequences selected. This increase in the number of particles can be justified by a corresponding reduction in the risk of “aggressive” resampling leading to inference failures, heralded especially by multimodality in the parameter distribution Hincks et al. [2018].
The particle guess heuristic (PGH) plays a fundamental role in the effectiveness of the MFL protocol. PGH was introduced in Wiebe et al. [2014] to provide optimal choice of the experimental (here the phase accumulation time) in analytically tractable cases of Hamiltonian Learning protocols. Such cases happen to include the sensing Hamiltonian presented in the main paper as Eq. 1. The PGH samples two particles from the particle distribution , and then chooses:
(S3) 
In the single parameter where only is sensed, , where represents the standard deviation of the Gaussianapproximated posterior distribution . Intuitively, this corresponds to selecting longer, more informative accumulation times, as the estimated uncertainty about the parameter to be learned shrinks.
The role of in time adaptivity
In the main text, we observed the emergence of a slowdown in the learning rate, when MFL chooses accumulation times . This slowdown appears when plotting either the scaling in as well as (respectively Fig. 2a&b, referring for example to the dataset ). This dataset represents a situation, where is chosen as a maximum time budget perepoch. In this case, once the PGH encounters the limit, learning by statistical accumulation of datapoints with will occur, and MFL precision scaling will tend to . We highlight this behaviour, using averaged sequences from the whole set , in Fig. S1b, plotting the scaling in alongside with the ratio . When the plateau in occurs, we correspondingly observe that a typical run of MFL starts suggesting , before saturating as the uncertainty converges.
Note, this artefact deriving from the artificial choice of a maximum time budget is equivalent to the phenomenon exhibited in correspondence of dephasing noise, reducing the contrast from experimental Ramsey fringes, like in the data reported in Fig. 4. To prove it, we show in Fig. S1c the same performance for the dataset , where \unit100\micro\second (estimated from a leastsquares fit), and \unit6\micro\tesla to have approximately the same number of periods in the corresponding Ramsey fringe, as in dataset (refer to Fig. 4). For MFL to deal properly with decaying data, in this analysis we remove any majority voting scheme from the data processing, and at each epoch the corresponding datum is probabilistically extracted from the experimentally estimated likelihood (see Methods). This justifies the slowdown in the scaling of , as each datapoint is now affected by the same amount of binomial noise that would occur in a setup with the same readout fidelity, but singleshot measurements. In other words, the additional information acquired about the likelihood by combining sequences for each measurement is partially removed from the inference process by the bipartite sampling. For this case, we observe the plateau in occurs when the adaptive choice of the phase accumulation time saturates in average to (though a single run will oscillate around this value, as emphasised by the behaviour for a single run also reported in Fig. S1c). Similarly, also the scaling in precision plateaus when (not reported for brevity), slowing towards . A formal discussion of this saturation is performed in the following section.
Precision bounds and sensitivity
In assessing the performance of MFL, it is helpful to compare the uncertainties achieved with those achieved by FFT, using the same datasets. We begin by considering the Cramér–Rao bound Cover and Thomas [2006]. Suppose that our procedure implements a function of the entire data record (data) that estimates the true magnetic field . After, we want to minimise the squared error as much as possible. If as the average of over all possible data records is zero, then we state that our procedure is unbiased. In this case, the Cramér–Rao bound provides, that on average over all data records one finds Ferrie [2014],
(S4) 
where is the evolution time used at the th step of the MFL procedure. We stress that this inequality holds only on average; after all, we might be “lucky” with the estimate that we assign to any particular data record.
The righthand side of this inequality is derived using the Fisher information for a single measurement,
(S5)  
(S6) 
The Fisher information for an experiment consisting of multiple independent measurements is given by the sum of the Fisher informations for each measurement, giving the Cramér–Rao bound (Eq. S4).
Let be the total phase accumulation time used for a single “run” of a magnetometry procedure; in our case, . By the above argument, can then scale no better than , corresponding to consolidating our phase accumulation into a single measurement. This observation is sometimes referred to as the Heisenberg limit for magnetometry.
At the other extreme, suppose that we have a total time budget of , that we are able to spend on a magnetometry experiment, such that we can consider repeating a given procedure times. The factor of then factors out of the Cramér–Rao bound, giving
(S7) 
The observation that is sometimes referred to as the standard quantum limit,in the case that we repeat a magnetometry procedure for independent iterations. Indeed, we can use this observation to motivate a general figure of merit for the time budget of the a given magnetometry procedure. Assume that the Fisher information for a given procedure is , where is the Fisher information for a single repetition using phase accumulation time . Then it follows . Next, we define as the sensitivity of the proposed magnetometry procedure.
Using this definition, we can then restate the standard quantum limit as the statement that is constant in . That is, a magnetometry procedure bound by the standard quantum limit gains no advantage from phase accumulation time beyond that conferred by repeating the entire procedure for independent runs. By contrast, a Heisenberg limited magnetometry procedure has a sensitivity which scales as , indicating that an additional advantage can possibly be gained by using longer phase accumulation times.
So far we have considered the case in which , such that we can approximate the dynamics of our magnetometry experiment as dephasingfree. The dichotomy between the Heisenberg and standard quantum limit scalings, however, is changed by dephasing such that we have to consider the definition of the sensitivity in the dephasinglimited case. In particular, Ref. Ferrie [2014] derived that the Fisher information for limited magnetometry is given by
(S8) 
where is used to represent dephasing in frequency units, in analogy with . We note, that unlike the Fisher information describing the noiseless case, the bound Equation S8 for the dephasinglimited case is not independent of the true value of . Thus, to determine the achievable sensitivity in the case of dephasinglimited magnetometry, we must either assume a particular value of the field being estimated, or must generalise beyond the Cramér–Rao bound. We choose the latter case in this work, which provides further insight into the tradeoff between phase accumulation time and experimental repetitions for .
Specifically, we consider the van Trees inequality (also known as the Bayesian Cramér–Rao bound) Gill and Levit [1995],
(S9) 
where describes the error that can be achieved using prior information, and an expectation value over a distribution of different hypotheses about the field . We intentionally do not further define , as this term depends on the context in which a magnetometry procedure is used, rather than on the magnetometry procedure itself. Moreover, the effect of is minimal in the limit of large experimental data sets, such that ineffectively consists of a correction to the Cramér–Rao bound in the case of finite data records Opper [1999].
In analogy to the Fisher information derivation above, the fieldaveraged Fisher information in the dephasinglimited case gives for a single phase accumulation . Hence, the analogous bound to Equation S4 is given by
(S10) 
To derive the sensitivity in the van Trees case, let be the average Fisher information for a dephasinglimited procedure. We can then define the average sensitivity for a total phase accumulation time to reformulate the van Trees inequality in a more practical form for our purposes, thus
(S11) 
Following Equation S10, is constant if a fixed phase accumulation time is used, while if . The average Fisher information saturates at , however, such that the Heisenberg and standard quantum limits coincide as approaches . Therefore, the performance observed in Fig. 1a is limited by saturation near .
Absolute precision scaling
As discussed in the main text and Methods, using a room temperature setup can be challenging for the effect of quantum projection noise
and readout infidelities. These need to be properly addressed when reduced sequence repetitions lead to a low number of PL photons to be detected when recording a fringe.
The results in terms of absolute scaling have already been discussed (see e.g. Fig. 3 in the main paper). Here, we complete those analyses with additional studies.
In Fig. S2a&b, we report respectively the scaling and ultimate uncertainty achievable by MFL after 150 epochs for a subset of cases with . Fig. S2b suggests an approximate gain in the uncertainty achievable halting the protocol after a fixed number of steps.
From Fig. S2c we observe that the choice for in this case is motivated by running MFL for enough steps, to observe for all cases the convergence of the median quadratic losses – i.e. the square error in the parameters’ estimate, here . To estimate the true , we run MFL once over the whole dataset (), checking that the result is consistent with FFT.
We remark how the advantages in increasing the used (along with the higher precision scaling that increases from for to Heisenberg limited for ) come at the expense of worse final absolute sensitivities achievable by the protocol. This is due to the linear increment of experimental overheads with .
The robustness against sources of noise present in the room temperature setup is emphasized by Fig. S2d, where we plot the estimates obtained by MFL for a similar subset of ’s analysed. We observe that for , MFL estimates are all substantially consistent with the result obtained for , within the estimated uncertainty and taking into account minor fluctuations in that might have occurred during the collection of the whole dataset. By contrast, we observe how FFT estimates are completely unreliable at the noise level corresponding to .
The role of noise for low PL photon counts
Finally, we observe how for , the Bayesian process fails due to increased experimental noise and reduced statistics, underestimating both the real and the uncertainty associated with it. For example, this is evident from Fig. S2c, as the does not improve with the number of epochs. In particular, losses in the system cause an asymmetry between and , respectively the overall readout fidelities for the states and (i.e. taken all sources of noise and loss into account). From experimental raw data for (see Fig. S3), we observed that if we assume , then . This translates in unbalanced output probabilities, that conflict with the underlying assumption made so far of a binomial model for the outcomes , with probabilities given by the likelihood in Eq. 3. This level of “poisoning” in the assumed model is evidently beyond the CLE noise robustness Granade et al. [2017].
In order to prove there is no fundamental limit preventing MFL to provide correct estimates, within uncertainty, given a correct model, we thus modified the likelihood such that:
(S12) 
where , and for we recover the usual of Eq. 3. In order to estimate , we use it as the free parameter in a preliminary CLE run against the same dataset, but assuming known from the inference process with (i.e. having ). We thus obtain , and use this as a known parameter when running MFL with . In principle, could also be estimated from a multiparameter inference model.
The result is reported in Fig. S4. Intuitively, measurement outcomes are interpreted as less informative by the inference process, as might be due to additional losses. This effectively slows down the learning rate perepoch, but at the same time restores a correct behaviour of MFL .
Wide range operability of MFL methods
It is known how in Ramsey experiments, adaptive choices of time can lead not only to scalings beyond the standard quantum limit, but also to improved dynamic ranges for the sensed magnetic field , up to . Given that MFL is adaptive in the choices of the phase accumulation time , and we have shown that its precision scaling is Heisenberg limited, it follows naturally that also MFL benefits from the highdynamic range already reported by previous experiments.
In the main paper, we already showed applicability of MFL for cases in the dataset , with \unit\micro\tesla (see Fig. 2 in the main text). Here we complement this study with an additional case () exhibiting \unit713\micro\tesla. In the case of this dataset, equivalently to , single sequences were collected and averaged from the experimental setup, so it was reasonable to adopt a majority voting scheme to use the additional information in the data. We stress that such high intensities of tend to make leastsquares fit procedures with no initial guess of the parameters fail.
The results in terms of and precision scalings are reported in Fig. S5. We observe how after 250 epochs, the difference in the final uncertainties provided by MFL is \micro\tesla. It can thus be considered approximately independent of the strength of the magnetic field. Also the precision scaling is the same, within error, of the one observed for the lower field in .
Tracking
In the main text, we tested against experimental data the tracking capabilities of the MFL protocol. In Fig. 4 we reported the results in the case where the magnetic field intensity is synthetically altered stepwise, at random times, in a fashion completely equivalent to a stochastic timedependent Poisson process . Such random, abrupt variations in the magnetic field might for example reproduce applicative scenarios such as the raster scanning of a surface embedding magnetic nanoparticles. A sketch of a possible experimental setup is provided in Fig. S6a. The modifications to the standard CLE inference process required by this particularly demanding tracking scenario are summarised as pseudocode in Algorithm 1. The modifications to the standard inference process adopted amount to detect changes in the sensed parameter, that completely invalidate the current posterior, and thus suggest a reset of the prior as the most effective update step. Without triggering such reset events, huge stepwise changes would otherwise require a long time for MFL to react, because of the little support provided by the prior to the new value.
In Fig. S6b, we show a simulated performance of MFL in a representative run with timevarying . The figure exemplifies the decrease in the rate of failure events, as the frequency of the oscillating signal is decreased with time. We loosely define failure events, all those at which the quadratic loss of a single run , the mean performance achievable by the protocol, estimated across independent runs. We modify synthetically the magnetic field in the simulations as with , equivalently to Fig. 4c of the main text, but in this case we chirp the oscillating frequency for each run, and thus , with and constants. We notice how points where the second derivative of the oscillating magnetic signal is highest are those where failure events tend to occur.