Magnetic-field-learning using a single electronic spin in diamond with one-photon-readout at room temperature

Magnetic-field-learning using a single electronic spin in diamond
with one-photon-readout at room temperature

R. Santagati    A.A. Gentile    S. Knauer    S. Schmitt    S. Paesani    C. Granade    N. Wiebe    C. Osterkamp    L.P. McGuinness    J. Wang    M.G. Thompson    J.G. Rarity    F. Jelezko    A. Laing Quantum Engineering Technology Labs, H. H. Wills Physics Laboratory and Department of Electrical and Electronic Engineering, University of Bristol, Bristol BS8 1FD, UK. Centre of Excellence for Quantum Computation and Communication Technology, School of Physics, University of New South Wales, Sydney, NSW 2052, Australia. Institute of Quantum Optics, Ulm University, 89081 Ulm, Germany. Quantum Architectures and Computation Group, Microsoft Research, Redmond, Washington 98052, USA. These authors contributed equally to this work. email to:,

Nitrogen-vacancy (NV) centres in diamond are appealing nano-scale quantum sensors for temperature, strain, electric fields and, most notably, for magnetic fields. However, the cryogenic temperatures required for low-noise single-shot readout that have enabled the most sensitive NV-magnetometry reported to date, are impractical for key applications, e.g. biological sensing. Overcoming the noisy readout at room-temperature has until now demanded repeated collection of fluorescent photons, which increases the time-cost of the procedure thus reducing its sensitivity. Here we show how machine learning can process the noisy readout of a single NV centre at room-temperature, requiring on average only one photon per algorithm step, to sense magnetic field strength with a precision comparable to those reported for cryogenic experiments. Analysing large data sets from NV centres in bulk diamond, we report absolute sensitivities of \unit60\nano\tesla\second^1/2 including initialisation, readout, and computational overheads. We show that dephasing times can be simultaneously estimated, and that time-dependent fields can be dynamically tracked at room temperature. Our results dramatically increase the practicality of early-term single spin sensors.

preprint: APS/123-QED

Quantum sensors are likely to be among the first quantum technologies to be translated from laboratory set-ups to commercial products Taylor et al. [2008]. The single electronic spin of a nitrogen-vacancy (NV) centre in diamond operates with nano-scale spatial resolution as a sensor for electric and magnetic fields McGuinness et al. [2011], Müller et al. [2014], Lovchinsky et al. [2017], Zhao et al. [2011], Barry et al. [2016]. However, achieving high sensitivities for NV-magnetometers has required a low noise mode of operation available only at cryogenic temperatures, which constitutes a major obstacle to real-world applications Robledo et al. [2011], Bonato et al. [2015]. Machine learning has played an enabling role for new generations of applications in conventional information processing technologies, including pattern and speech recognition, diagnostics, and robot control Murphy [2012], Jordan and Mitchell [2015]. Here we show how machine learning algorithms Hentschel and Sanders [2010], Granade et al. [2012], Wiebe et al. [2014], Hincks et al. [2018] can be applied to single-spin magnetometers at room temperature to give a sensitivity that scales with the Heisenberg limit, and reduces overheads by requiring only one-photon-readout. We go on to show that these methods allow multiparameter estimation to simultanesouly learn the decoherence time, and implement a routine for the dynamical tracking of time-dependent fields.

Magnetic field sensing with an NV centre uses Ramsey interferometry Taylor et al. [2008], Rondin et al. [2014], Jelezko and Wrachtrup [2006]. With a microwave -pulse the spin vector is rotated into an equal superposition of its spin eigenstates, such that its magnetic moment is perpendicular to the magnetic field () to be sensed Degen [2008], Nusran et al. [2012]. For some Larmor precession time, , and frequency, , the relative phase between the eigenstates becomes , where is the electron gyromagnetic ratio of magnetic moment to angular momentum. After a further -pulse to complete the Ramsey sequence, a measurement of the spin in its basis provides an estimate of , the precision of which is usually improved by repeating the procedure. Collecting statistics for a series of different , produces a fringe of phase varying with time, from which can be inferred.

Increasing the sensitivity of a magnetometer translates to increasing its rate of sensing precision with sensing time. The intrinsic resource cost in estimating is the total phase accumulation time Arai et al. [2015], Waldherr et al. [2012], Puentes et al. [2014], which is the sum of every performed during an experiment. A fundamental limitation on the sensitivity of an estimate of is quantum projection noise — from the uncertain outcome of a -basis measurement — the effect of which is conventionally reduced through repeated measurements, at the cost of increasing the total sensing time. A further typical limitation on sensing precision is the timescale, , on which spin states decohere due to inhomogeneous broadening (though spin-echo methods could extend this Balasubramanian et al. [2008]). In an idealised setting, with an optimal sensing protocol, the Heisenberg limit (HL) Berry et al. [2009] in sensitivity can be achieved, to arrive at a precision limited by in the shortest time allowed by quantum mechanics. In practice, overheads such as the time required for initialisation, computation, and readout must also be accounted for, while repeated measurements due to experimental inefficiencies and low-fidelity readout increase the time to reach the precision limited by . The increase in total sensing time due to overheads and repeated measurements thus decreases the sensitivity.

A particularly relevant overhead is the time taken to readout the state of the spin, which depends on the experimental conditions. At cryogenic temperatures, spin-selective optical transitions can be accessed such that, during optical pumping, fluorescence is almost completely absent for one of the spin states. This single-shot method allows the spin state to be efficiently determined with a high confidence for any given Ramsey sequence (up to collection and detection efficiencies), resulting in a relatively low readout overhead. At room temperature, in contrast, where spin-selective optical transitions are not resolved in a single shot, readout is typically performed by simultaneously exciting a spin-triplet that includes both basis states, and observing fluorescence from subsequent decay, the probabilities for which differ by only . Overcoming this classical uncertainty (in addition to quantum projection noise) to allow a precise estimate of the relative spin state probabilities after a given precession time , required repeated Ramsey sequences to produce a large ensemble of fluorescent photons. Such a large readout overhead significantly reduces the sensitivity of NV-magnetometry, and so far, the high sensitivities reported at cryogenic temperatures have been out of reach for room temperature operation by several orders of magnitude. Yet NV-sensing at cryogenic temperatures is impractical for biological applications such as in-vivo measurements McGuinness et al. [2011] and monitoring of metabolic processes Degen et al. [2017].

A large body of work Giovannetti et al. [2004], Higgins et al. [2007], Berry et al. [2009], Higgins et al. [2009], Giovannetti et al. [2011], Said et al. [2011], Waldherr et al. [2012], Nusran et al. [2012], Bonato et al. [2015] has developed and improved quantum sensing algorithms to surpass the classical standard measurement sensitivity (SMS). While the SMS bounds the sensitivity that can be achieved for NV-magnetometry with constant phase accumulation time, phase estimation algorithms using a set of different precession times , allow the SMS to be overcome  Waldherr et al. [2012], Nusran et al. [2012]. Further improvements in sensitivity are possible by adapting measurement bases, to require fewer Ramsey sequences Bonato et al. [2015]. However, sensing algorithms that use a standard Bayesian approach typically involve probability distributions that are computationally intensive to update, or which contain outlying regions that significantly affect an estimate. An appealing alternative Granade et al. [2012], Wiebe et al. [2014], Wiebe and Granade [2016] uses techniques from machine learning to approximate a probability distribution with a relatively small collection of points, known as particles. These methods have been applied to the problem of learning a Hamiltonian Wiebe et al. [2014], Wang et al. [2017], and to implement noise-tolerant quantum phase estimation Paesani et al. [2017].

Here, we experimentally demonstrate a magnetic field learning (MFL) algorithm that operates with on average only one photon readout from a single NV centre at room temperature, and achieves a level of sensitivity so far only reported for cryogenic operation Bonato et al. [2015]. MFL adapts efficient Bayesian phase estimation and Hamiltonian Learning techniques for magnetometry to achieve a fast convergence to the correct value of the magnetic field, and requires no adaptation of measurement bases. The parameters of our MFL algorithm, including the number of particles, can be optimised prior to operation without adding to the sensing time overhead. Each precession time is chosen Ruster et al. [2017] as the inverse of the uncertainty in the algorithm’s previous estimate of , allowing to grow exponentially to achieve HL scaling in sensitivity. We tested MFL on a large data set from Ramsey interferometry experiments on a bulk diamond NV centre. We benchmark the performance of MFL against standard FFT methods, as well as previous experimental results from other phase estimation algorithms. Simultaneous to the learning of , MFL produces an estimate of , which, in contrast to other phase estimation algorithms, allows MFL to lower bound its sensitivity to the SMS, however long its implementation runtime. Remarkably, we show that MFL enables the dynamical tracking of time-varying magnetic fields at room temperature.

In general, Hamiltonian learning algorithms estimate the parameters of a model Hamiltonian , through iterations of informative measurements Wiebe et al. [2014]. At each step, a prior probability distribution stores estimates of every parameter and its uncertainty Granade et al. [2012]. Similarly, the four principal recursive steps of MFL, called an epoch and depicted in Fig. 1(a-d), are: (a) Choose for the next Ramsey sequence from the heuristic , where is the uncertainty embedded in the prior . (b) Allow the system to evolve under for a time , using the Ramsey sequence shown in Fig. 1(e-h). (c) Measure the outcome , extracted from the photo-luminescence count, e.g. Fig. 1(i). (d) Update the prior using Bayes’ rule, , where is the likelihood function Granade et al. [2012]. The use of sequential Monte Carlo algorithms Granade et al. [2012], Wiebe et al. [2014], Wiebe and Granade [2016] where particles are reallocated when required, makes the inference process practical and computationally efficient. Here, the Hamiltonian for the two relevant NV states is modelled as


so that is the only parameter to be estimated to learn the value of .

Figure 1: An epoch of the MFL algorithm including a Ramsey sequence and readout. (a) The uncertainty encoded in the prior distribution determines the phase accumulation time for the next set of Ramsey sequences. (b) A number of Ramsey sequences are implemented for , with the precession driven by a -field from permanent magnets. Laser light is focused with a confocal microscope. A planar copper wire on the surface of the bulk diamond generates MW pulses. (c) The outcome from the Ramsey sequences are measured. (d) The prior distribution is updated through Bayesian inference, from which the next phase accumulation time, is determined. (e) The NV spin vector is initialised with laser light, rotated with MW pulses, and, using a second laser pulse, readout from photoluminescence (PL) with an avalanche photodiode (APD). (f) The electronic energy level triplet supports initialisation and MW manipulation between the and states, which encode the basis states, and , respectively. (g) The Bloch sphere depicts the transit of the electronic state vector for the MW rotations and Larmor precession. (h) Detection is performed by optically pumping the basis states to a higher energy level triplet and measuring the decay via (non spin-preserving) PL. (i) A representative PL fringe (theory plotted as dashed line) with orange data-points representing the number of detected photons for .
Figure 2: Experimental results for scaling of precision. Lines represent median values, and performance within the 68.27% percentile range is shown as shaded areas. (a) Estimated uncertainty is plotted as a function of the epoch number; data from one sample run is shown as blue circles. In the inset, a plot of the final in the Ramsey frequency for a typical protocol run, from FFT (Lorentzian fit) and MFL (Gaussian fit). (b) The scaling of precision with total phase accumulation time , excluding all overheads, is shown as density plots with a linear least-squares fit (blue dashed line). The FFT approach is plotted as a grey dashed curve. Scaling for phase estimation algorithms in Refs. Waldherr et al. [2012], Nusran et al. [2012] (respectively green and violet lines) are also reproduced. The inset shows data from a Ramsey fringe in normalised PL, with a \unit20ns sampling rate, up to . A least-squares-fit with a decaying sinusoid is shown as a blue dashed line.
Figure 3: The representative scaling of precision, including overheads, against total running time is plotted for different average numbers of photons detected per epoch (identified by different colours). Each protocol run for comprises epochs, and only Poissonian noise is modelled in the likelihood function. For , each run comprises epochs and an improved likelihood models also infidelities and losses.
Figure 4: Simultaneous learning of and magnetic field. (a) Simultaneous estimates of magnetic field (purple) and decoherence time (green) for epochs higher than . Solid lines are from MFL and dashed lines are from a least squares fit to the Ramsey fringe data in (d). (b) 68.27% credible region at epoch 100 (green) and 500 (blue) for and , reported respectively on the y (x) axes. The smaller area of the distribution at the final epoch indicates the decreased uncertainty on both parameters. (c) The norm of the covariance matrix , representing the uncertainty in simultaneous estimates of and , is plotted against epoch number. The median performance is shown as a solid line, with a shaded area representing the 68.27% percentile range. (d) Renormalised experimental data for a Ramsey fringe, along with a least-square-fit and an MFL-learned decay function showing decoherence.
Figure 5: Magnetic field tracking. (a) Tracking with the MFL protocol is demonstrated on experimental data, where step changes in are indicated by the grey bars (Here, the number of sequences ). The solid red line represents typical performances of MFL, with the shaded area indicating performance within a 68.27% percentile range. For comparison, a dashed purple line indicates an FFT protocol applied cumulatively to all data available up to time , with the corresponding uncertainty from a Lorentzian fit as a shaded area of the same colour. Results after less than 10 data-points are omitted for FFT. (b) Itemisation of the contributions to the average total time taken into account in (a): the precession time , computational () and experimental () overheads . (c) Numerical study of MFL performance in tracking sinusoidally time-dependent magnetic fields , under ideal conditions (, ). The y-axis gives the median time-averaged square error (nms) in the Ramsey frequency estimate, against the peak speed at which changes along each simulated Ramsey sequence (). The blue dashed line refers to the case including only binomial noise in , while the green line is the case with limited readout fidelity (), as defined in Bonato and Berry [2017]. The dashed red line indicates the error obtained via a non-tracking strategy. Shaded areas indicate the 68.27% percentile range.

Experiments were performed using a confocal set-up, at room temperature, with an external magnetic field of \unit450G, parallel to the NV centre axis, giving a Zeeman shift of  Waldherr et al. [2012], where  Jakobi et al. [2017]. For each Ramsey sequence, the electronic spin is initialised and readout with \unit532\nano\meter laser pulses, by detecting the photoluminescence (PL) signal with an avalanche photodiode (APD) for \unit350\nano\second. The PL signal is then normalised to extract an experimental estimate for . For every sequence, the experimental overhead is the sum of the times for the laser pulses length (\unit3\micro\second), an idle time for relaxation (\unit1\micro\second), a short TTL pulse for synchronization (\unit20\nano\second) and the duration of the two MW-pulses (together \unit50 \nano\second).

Data for several hundred Ramsey fringes were generated from experiments on three NV centres, labelled , and (see Supplementary Table S1). In particular, the dataset comprises Ramsey sequences for precession times increasing from to in steps of \unit20 \nano\second. For each , sequences were performed, and data were stored such that the results from each individual sequence could be retrieved. Therefore, subsets of data from could be selected and combined to construct fringes comprised of sequences. Running MFL on a sample of these subsets allowed its performance to be compared over fringes with different (but fixed within a fringe) numbers of sequences including down to , where (due to low collection efficiencies) the average PL count () is approximately one photon. Additional experiments on the three NVs generated further data sets for several hundred fringes that each comprised tens of thousands of averaged sequences. All implementations of MFL are reported as representative behaviour averaged over 1000 independent protocol runs (unless otherwise stated) each using a single fringe from these data sets.

We begin by analysing how the estimate of uncertainty in the magnetic field, , given by the variance of , scales with the number of MFL epochs. For this purpose, we use the dataset , with fringes all obtained with sequences. At every MFL epoch, given the adaptively chosen phase accumulation time , the experimental datum with minimising () is provided to the MFL updater. Figure 2(a) shows an exponential decrease in the scaling of , until epochs are reached. After this point, the precession times selected by MFL saturate at \unit10\micro\second, and is reduced only polynomially fast, by accumulating statistics for already retrieved. This slowdown is analogous to that occurring when the heuristic requires exceeding the system dephasing time Granade et al. [2012] (see Supplementary Information for details). A comparison with FFT methods, inset in Fig. 2(a), finds that is times smaller for MFL.

Neglecting overheads, the sensitivity of a magnetometer, is calculated from


where from epochs. Figure 2(b) plots against , for each epoch, and compares MFL with the standard FFT method, using the same set. The precision of MFL scales as , which overlaps with HL scaling (). The FFT method rapidly approaches the SMS (), whereas (neglecting overheads) the scaling reported for quantum phase estimation methods are qualitatively comparable to MFL, at the expense of more intensive post-processing Said et al. [2011].

For a true measure of absolute sensitivity, experimental and computational overheads must be accounted for. Including initialisation, read-out and computation time, into the total running time , we redefine Eq. 2 for absolute scaling of (see Methods for details). The average number of luminescent photons, , used for readout during each epoch, scales linearly with the number of sequences (); on average, one photon every sequences is detected. As shown in Fig. 3, we use MFL to measure the scaling of with (up to epochs) for decreasing numbers within each epoch. The plots have a shape characterised by an initial slow decrease, followed by a fast increase in precision. The relatively slow learning rate for the short phase accumulation times in the early stages of the algorithm leads to a slow increase in phase accumulation time, since (). The algorithm is slowly learning but the total measurement time is increasing faster than the decrease in uncertainty. However, when the particles start converging to a valid estimate of , the uncertainty decreases exponentially, overcoming the corresponding increase in sensing times. Our analysis compares well with previous results performed under cryogenic conditions, and scaling parameters for linear least squares fitting obtain a consistent overlap with HL scaling for protocol update rates up to \unit13Hz.

Decreasing the number of sequences (thus ) per epoch increases the statistical noise, which extends the slow learning period. However, the total time decreases with to produce an increased sensitivity in a shorter time. For , readout infidelities and losses become the dominant noise mechanisms. In the case for therefore, these additional sources of noise were included in the model. For we obtain a sensitivity of \unit60\nano\tesla\second^1/2 in \unit10 \milli\second (see also the Supplementary Information).

When an NV-sensing algorithm begins to request precession times beyond , where no information can be retrieved, the effectively wasted sensing time reduces the sensitivity. Knowledge of can ensure that all are less than , to prevent this reduction in sensitivity and instead guaranteeing it to scale at the SMS for long sensing times. Learning simultaneously with , as part of a multi-parameter estimation strategy Ralph et al. [2017], Ciampini et al. [2016], can be more efficient than independently estimating ahead of each sensing experiment. MFL naturally operates as a multi-parameter estimation protocol when the prior probability distribution is multivariate Granade et al. [2012], with the uncertainty in its joint probability distribution captured by a normalised covariance matrix .

Each precession time is chosen proportionality to the inverse of the (Frobenius) norm of the covariance matrix (see Methods). This can incur an initial slow learning period due to shorter being initially most useful in estimating while longer precession times are better for an estimation of . We therefore begin MFL in the single parameter estimation mode for , and introduce the simultaneous learning of at epoch (chosen empirically).

Figure 4 shows results from running the MFL algorithm on the data set, where . As is the case for single parameter estimation results, we find an exponential scaling of the generalised uncertainty with the number of epochs, though the learning rate for is faster than that for . There is a discrepancy between the estimate of from MFL shown in 4(a) and the fit (non-weighted least-square) to the decaying sinusoid shown in 4(d). The discrepancy between these two estimates results from the PGH preferentially requesting , such that an estimate of is more informed by data at these relatively shorter time scales (see Methods).

The strength of may not be fixed in time for typical sensing experiments Bonato and Berry [2017]. The Bayesian inference process is conceived to learn on-line when experimentally retrieved likelihoods conflict with its prior information. Thus, the ability to track time-varying magnetic fields follows naturally from the MFL’s processing speed and adaptivity. With minor controls in the Bayesian inference procedure, MFL can account for such fluctuations and high-amplitude changes in the sensed (See Methods for details). Here, we test an algorithm that tracks a -field using the dataset, where was experimentally modulated by changing the position of the permanent magnet (see Fig 1b). Data recording was paused during magnet adjustments, leading to stepwise transitions in this data set, where the magnetic field instantly jumps to a new strength then remains stable for a period of between hundreds and thousands of milliseconds.

Results are shown in Fig. 5(a), with a maximum -fold instantaneous change in . MFL detects when the posterior distribution has become non-representative of the most recent measurements, by increasing the uncertainty, . After approximately 10 epochs, the estimate converges to the new value set for . Figure 5(b) summarises the different computational and experimental contributions to the total running time per epoch ( \unit10\milli\second). The computational time cost of MFL is , with the remaining time costs coming from experimental routines. We note the computational efficiency of MFL allows a computational overhead (\unit0.21\milli\second) that is smaller than the average phase accumulation time (\unit0.41\milli\second) and two orders of magnitude smaller than the experimental overheads (\unit16.28\milli\second).

Figure 5(c) shows numerical results demonstrating the resilience of MFL against a dynamic component of increasing frequency, when tracking an A.C. oscillating field , where we choose , with a constant and . The effectiveness of the tracking for each run is captured by a time-dependent normalised squared error , averaged for all epochs performed, capturing the efficiency of the tracking as is not constant along epochs. Typical estimation errors in are lower than 3% for dynamic components up to \unit18\micro\tesla/ \milli\second.

The performance of magnetic field learning found for our room temperature set-up is comparable to other protocols in cryogenic environmentsBonato et al. [2015]. These methods could be applied to other sensing platforms where noise has been a limiting factor. Alternatively, in pursuit of the fundamental limits in absolute sensing precision they could be used together with single-shot readout Robledo et al. [2011], adaptive measurement bases Bonato et al. [2015], faster communication, and dynamical decoupling techniques MacQuarrie et al. [2015], Farfurnik et al. [2018]. Our methods would be particularly effective in applications where single-spin sensing is desired for nano-scale resolution, but where cryogenic conditions are prohibitive, such as biological sensing and in new nano-MRI applications Barry et al. [2016], Boretti and Castelletto [2016].



MFL execution The data processing was performed by adapting the open Python package QInfer Granade et al. [2017] to the case of experimental metrology.

In order to describe experimental data from Ramsey fringes collected from an NV centre with dephasing time , immersed in a magnetic field of intensity , we adopt the likelihood function as in Granade et al. [2012]:


where is a known parameter, or approximated by in all cases when .

In cases when , the datum adopted was obtained from combined sequences as stated in the main text. Results in Fig. 2a&b and Fig. 5 were all obtained adopting a majority voting scheme to pre-process data from combined sequences Paesani et al. [2017]. Majority voting decides each single-shot datum according to the most frequent outcome. This is done by previously determining, during the characterisation of the experimental set-up, the average photoluminescence counts () detected throughout the execution of a Ramsey sequence. The datum of a single outcome is determined by comparing the number of photons detected during the measurement (extracted from sweeps), , and . If then we set the value of the outcome to , otherwise to . Without this scheme in place, the outcome of a measurement is assigned sampling from the set , with probabilities , respectively, with the maximum photoluminescence counts estimated during the characterisation.

Other than the study of in Fig. 3, further examples of the performance of MFL with no majority voting scheme in place are reported in the Supplementary Information.

Errors in the precision scaling are estimated from a bootstrapping procedure, involving a sampling with replacement from the available runs (). The cardinality of each resample matches . The resampling is repeated times. Median precision scalings from each resample are estimated, and the standard deviation from this approximate population of scaling performances is provided as the precision scaling error.

Absolute scaling In Fig. 3 we reported the absolute scaling of , which requires to take into account the main experimental and computational overheads contributing to the total running time of a phase estimation (PE) protocol (communication time is not considered here). In particular, these can be listed as: the time required by the PE algorithm to compute the next experiment (here \unit0.4 \micro\second per step, per particle on a single-core machine), the duration of the laser pulse for initialisation and readout (\unit3\micro\second in total), the waiting time for relaxation (\unit1\micro\second), a short TTL pulse for the photodetector (\unit20\nano\second) and the duration of MW-pulses (approximately \unit50\nano\second in total). Including variable and constant overheads, we obtain:


after epochs of a PE algorithm.

In the case, the final \unit0.45\micro\tesla after epochs, and \unit18\milli\second, that is \unit60\nano\tesla\second^1/2. In the case, exhibiting a precision scaling that is essentially Heisenberg limited, the uncertainty saturates at protocol convergence ( epochs) to \unit0.3\micro\tesla, for a total running time \unit78\milli\second. This leads to a final sensitivity \unit84\nano\tesla\second^1/2 and \unit12.8\hertz repetition rates.

Multi-parameter Learning For the multi-parameter case, we use again Eq. 3, but now considering the unknown parameters . Each precession time is chosen proportionally to the inverse of the Frobenius norm of the covariance matrix, . The parameters and are introduced to render dimensionless, with the prior at epoch , when both parameters start to be learnt simultaneously. In this analysis this corresponded to \unit11\micro\tesla and \unit20.2\micro\second, however we stress how different choices would be possible, with equivalent results for , up to a normalisation factor. We observed that MFL estimates of the dephasing time may differ consistently from a non-weighted least-square fit. In the presence of dephasing, the heuristic of MFL will preferentially adopt experiments with . This relation is similar to a weighing mechanism of the data (see also Supplementary Information ), preferring more consistent observations. On the other hand, a least-square fit will attempt to equally mediate over data-points where the contrast in the fringes is almost completely lost, underestimating .

MFL tracking We mentioned that Bayesian inference processes are ideally suited for tracking purposes. However, we observe that in cases where the magnitude of the changes in the parameter completely invalidates the a-posteriori credibility region, the recovery time of a standard Hamiltonian learning protocol might be unsuitable for practical applications. To tackle also this situation, we modified here the standard update procedure to reset its prior when the effective sample size of the particles’ ensemble is not restored above the resampling threshold by a sequence of resampling events. Details and a pseudocode are provided in Supplementary Information.

FFT execution For most analyses, FFT estimates were run against the whole datasets available. For example in the case of Fig. 2, the final estimate provided by a single run of FFT was performed using once all of the 500 phase accumulation times, recorded with \unit20\nano\second steps, for a representative subset among those available in (Supplementary Table 1). We emphasise how this amounts to twice as many ’s as those actually used by the MFL algorithm (being the single-run estimate reported as converged after 250 epochs).

The only exception is the tracking in Fig. 5, where the data-points were cumulatively added to the dataset. In such tracking applications, as long as is kept constant, the estimate from FFT compares to MFL in a way similar to Fig. 2. However, FFT keeps estimating from the prominent peak in the spectrum, corresponding to the that was maintained for the longest time, not the most recent. Thus, it fails to track changes as they occur.

Experimental details In Ramsey interferometry, as performed here, we measure the magnetic field component parallel to the NV centres’ symmetry axes. However, the MFL protocol can be expanded to differently orientated NV centres, to detect arbitrary orientated magnetic fields.

The experiments are performed here using two different C isotopically purified diamond samples. For the Ramsey interferometry we use the and electronic sublevels.

Photon number estimation After exciting a single NV centre by a 532nm laser pulse, the red-shifted, individual photons were detected by an avalanche photodiode. A time-tagged single photon counting card with nanosecond resolution was used for recoding. A TTL-connection between the time-tagger and the MW pulse generator synchronises the photon arrival time with respect to the pulse sequence and allows to record the number of detected photons for every single laser pulse. Thereby, the photon detection efficiency is mainly limited by the collection volume, the total reflection within diamond (due to the high refractive index) and further losses due to the optics. This results in a photon detected about every eighth laser pulse. Thus, to readout the NV state with high-fidelity (and about 30% contrast) multiple measurements are usually required for meaningful statistics.

Acknowledgements The authors thank Cristian Bonato and Marco Leonetti for useful discussion. M.G.T. acknowledges support from the ERC starting grant ERC-2014-STG 640079 and and an EPSRC Early Career Fellowship EP/K033085/1. J.G.R. is sponsored under EPSRC grant EP/M024458/1. L.M. and F.J. would like to acknowledge DFG (Deutsche Forschungsgemeinschaft grants SFB/TR21 and FOR1493), BMBF, VW Stiftung, European Research Council DIADEMS (Simulators and Interfaces with Quantum Systems), ITN ZULF and BioQ. A.Laing acknowledges support from an EPSRC Early Career Fellowship EP/N003470/1.

Supplementary Information

Classical and quantum likelihood estimation

In the main text we have introduced the Bayesian inference process underlying the MFL protocol (known as CLE, Classical Likelihood EstimationGranade et al. [2012]) as composed by four main steps. Here we expand the discussion to provide additional details and comments about the adoption of CLE.

  1. At each epoch the prior distribution is used to choose what experimental setting to use for the next iteration. In MFL, the only experimental setting is the phase accumulation time , that can be updated effectively using the so called particle guess heuristic (see the section below).

  2. The quantum system undergoes an appropriate evolution, according to the Hamiltonian .

    1. The system is prepared in an appropriate initial state , chosen such to have informative evolution under . E.g. a state orthogonal to the Hilbert subspace spanned by the Hamiltonian eigenstates. We remark how is not adaptive in CLE. In this work, the NV centre is always prepared in .

    2. Let the system evolve according to its Hamiltonian for the chosen time .

  3. A measurement is performed on the system (here the quantum sensor). In MFL we perform a projective measurement on the computational basis, obtaining a bipartite outcome .

  4. The computed likelihoods are used to update the probability distribution of the Hamiltonian parameters

    1. The same experiment is also performed on a simulator, implementing a generic parametrised Hamiltonian , thus providing an estimate for the likelihood , i.e. the probability of measuring outcome when is chosen as parameter.

    2. It is thus possible to apply Bayes’ rule:


      where can be immediately inferred from the prior at the corresponding epoch, while is a normalization factor.

Steps 1 – 4 are repeated until the variance of the probability distribution converges, or falls below a pre-definite target threshold. In cases with limited readout fidelity like for the NV centre set-up presented here, in step 3 a meaningful statistics might be cumulated for repeating the same measurement a number of times , as suggested in the main text. Evidences from the text suggest that in most cases, this is a suboptimal choice for the absolute scaling performance of the MFL protocol. Note how only steps 2 & 3 involve the quantum sensor. All other steps require instead a simulator. In particular, step 4(a) can be performed on a classical or quantum simulator, the choice of the second being justified whenever the size of the sensor, and the eventual lack of an analytical model to simplify the evolution, make the system simulation classically expensive. In this case, the inference process is known as Quantum Likelihood Estimation (QLE, Wiebe and Granade [2016]).

label NV centre
sets (n)
sequences (M)
(ns) (T)
(from fit, \unit\micro\second)
120 18500 20 52 - (-)
115 18500 20 710 - (-)
67 30000 100 8.3 16 - (-)
1 20275 20 58 - - ( )
1 8876 200 6.0 64 (-)
1 44000 20 - - (-)
Table S1: Synopsis of available data. Table summarising the different data sets and systems used in this analysis, along with representative MFL performances in precision scaling. Datasets are discussed only in the Supplementary Information.
Figure S1: Analysis of the behaviour of the PGH for datasets where or , reported in the plots in darker and brighter colours, respectively. The first dataset is collected with \unit\micro\tesla, whereas the second has \unit\micro\tesla. a, Renormalized photon counts along two different Ramsey experiments with the same NV centre (scatter plots). Superimposed a least-square fit (dashed lines), adopting the oscillatory function with depolarizing noise as in Eq. 3 of Methods. b, Estimated uncertainty and ratio between PGH-generated time and as available from the first dataset, plotted against each epoch of the MFL algorithm. A majority voting method is adopted, under the hypothesis that . Solid lines are median values calculated over 1000 independent runs, whereas shaded areas are 68.27% percentile ranges centred around the median. Superimposed as a scatter plot, a sample of times generated by the PGH during an representative run. c, Same as in b, for the case where . No majority voting is in place, and data from the experiment are extracted probabilistically from the experimentally estimated likelihoods.

Sequential Monte Carlo approximation and Particle Guess Heuristic

The protocol performances in terms of computational overhead are made possible by adopting advanced approximate Bayesian inference methods. In particular, MFL inherits from CLE the Sequential Monte Carlo (SMC) approximation Granade et al. [2012], Wiebe et al. [2014], Hincks et al. [2018]. Within this approximation only a finite number of values (called particles) are sampled from the prior distribution, and thus used to approximate the prior in each update step. This approximation makes the Bayesian update as in Eq. S1 numerically feasible.

If the particle positions were held constant throughout the inference process, and starting the protocol from a uniform prior, the cardinality of their set should scale approximately as , where is the expected magnetic field range to be sensed, and is the targeted uncertainty upon convergence. This is inefficient, as with the inference progressing through epochs, many particles will provide very limited support to the updated prior approximation. Indeed, for a successful learning process for the weights of most particles, as they have been effectively ruled out by the observations.

This inefficiency can be addressed with resampling methods, that allow the particles to be sampled again from the updated posterior, whenever their weights signify that the effective size of the sampled particles has fallen below a user-defined threshold. Following Granade et al. [2017], here we adopt a Liu-West resampler (LWR) with optimised resampling threshold and smoothing parameter . These parameters allow to tune when and to what extent the positions of the particles can be altered by the LWR Granade et al. [2012]. Hence, it was possible to accurately represent throughout the whole protocol execution, whilst employing not more than particles for the discretisation in most cases. The only exceptions were limited fidelity cases, as for the absolute scaling we chose the number of particles according to the empirical rule


with the number of averaged single sequences selected. This increase in the number of particles can be justified by a corresponding reduction in the risk of “aggressive” resampling leading to inference failures, heralded especially by multi-modality in the parameter distribution Hincks et al. [2018].

The particle guess heuristic (PGH) plays a fundamental role in the effectiveness of the MFL protocol. PGH was introduced in Wiebe et al. [2014] to provide optimal choice of the experimental (here the phase accumulation time) in analytically tractable cases of Hamiltonian Learning protocols. Such cases happen to include the sensing Hamiltonian presented in the main paper as Eq. 1. The PGH samples two particles from the particle distribution , and then chooses:


In the single parameter where only is sensed, , where represents the standard deviation of the Gaussian-approximated posterior distribution . Intuitively, this corresponds to selecting longer, more informative accumulation times, as the estimated uncertainty about the parameter to be learned shrinks.

The role of in time adaptivity

In the main text, we observed the emergence of a slowdown in the learning rate, when MFL chooses accumulation times . This slowdown appears when plotting either the scaling in as well as (respectively Fig. 2a&b, referring for example to the dataset ). This dataset represents a situation, where is chosen as a maximum time budget per-epoch. In this case, once the PGH encounters the limit, learning by statistical accumulation of data-points with will occur, and MFL precision scaling will tend to . We highlight this behaviour, using averaged sequences from the whole set , in Fig. S1b, plotting the scaling in alongside with the ratio . When the plateau in occurs, we correspondingly observe that a typical run of MFL starts suggesting , before saturating as the uncertainty converges.

Note, this artefact deriving from the artificial choice of a maximum time budget is equivalent to the phenomenon exhibited in correspondence of dephasing noise, reducing the contrast from experimental Ramsey fringes, like in the data reported in Fig. 4. To prove it, we show in Fig. S1c the same performance for the dataset , where \unit100\micro\second (estimated from a least-squares fit), and \unit6\micro\tesla to have approximately the same number of periods in the corresponding Ramsey fringe, as in dataset (refer to Fig. 4). For MFL to deal properly with decaying data, in this analysis we remove any majority voting scheme from the data processing, and at each epoch the corresponding datum is probabilistically extracted from the experimentally estimated likelihood (see Methods). This justifies the slowdown in the scaling of , as each data-point is now affected by the same amount of binomial noise that would occur in a set-up with the same readout fidelity, but single-shot measurements. In other words, the additional information acquired about the likelihood by combining sequences for each measurement is partially removed from the inference process by the bipartite sampling. For this case, we observe the plateau in occurs when the adaptive choice of the phase accumulation time saturates in average to (though a single run will oscillate around this value, as emphasised by the behaviour for a single run also reported in Fig. S1c). Similarly, also the scaling in precision plateaus when (not reported for brevity), slowing towards . A formal discussion of this saturation is performed in the following section.

Figure S2: Key performances of MFL as the average number of photons collected is increased, estimated via 1000 independent runs of MFL, with an underlying lossless model (see Eq. 3 in Methods). a Scaling of the median error estimate for a selection of cases, as reported in the colour-coding legend. b Final median error achieved by MFL for all the cases considered, along with a power-law fit (dashed line). Error bars indicate to 68.27 % percentile ranges. c Scaling of the quadratic losses for the same representative runs as in a. d Comparison of the final estimates for the Ramsey frequency provided for representative cases, by FFT and QHL methods, respectively in solid and dash-dotted lines. In a&c, shaded areas indicate the credible intervals corresponding to 68.27 % percentile ranges.

Precision bounds and sensitivity

In assessing the performance of MFL, it is helpful to compare the uncertainties achieved with those achieved by FFT, using the same datasets. We begin by considering the Cramér–Rao bound Cover and Thomas [2006]. Suppose that our procedure implements a function of the entire data record (data) that estimates the true magnetic field . After, we want to minimise the squared error as much as possible. If as the average of over all possible data records is zero, then we state that our procedure is unbiased. In this case, the Cramér–Rao bound provides, that on average over all data records one finds Ferrie [2014],


where is the evolution time used at the th step of the MFL procedure. We stress that this inequality holds only on average; after all, we might be “lucky” with the estimate that we assign to any particular data record.

The right-hand side of this inequality is derived using the Fisher information for a single measurement,


The Fisher information for an experiment consisting of multiple independent measurements is given by the sum of the Fisher informations for each measurement, giving the Cramér–Rao bound (Eq. S4).

Let be the total phase accumulation time used for a single “run” of a magnetometry procedure; in our case, . By the above argument, can then scale no better than , corresponding to consolidating our phase accumulation into a single measurement. This observation is sometimes referred to as the Heisenberg limit for magnetometry.

At the other extreme, suppose that we have a total time budget of , that we are able to spend on a magnetometry experiment, such that we can consider repeating a given procedure times. The factor of then factors out of the Cramér–Rao bound, giving


The observation that is sometimes referred to as the standard quantum limit,in the case that we repeat a magnetometry procedure for independent iterations. Indeed, we can use this observation to motivate a general figure of merit for the time budget of the a given magnetometry procedure. Assume that the Fisher information for a given procedure is , where is the Fisher information for a single repetition using phase accumulation time . Then it follows . Next, we define as the sensitivity of the proposed magnetometry procedure.

Using this definition, we can then restate the standard quantum limit as the statement that is constant in . That is, a magnetometry procedure bound by the standard quantum limit gains no advantage from phase accumulation time beyond that conferred by repeating the entire procedure for independent runs. By contrast, a Heisenberg limited magnetometry procedure has a sensitivity which scales as , indicating that an additional advantage can possibly be gained by using longer phase accumulation times.

So far we have considered the case in which , such that we can approximate the dynamics of our magnetometry experiment as dephasing-free. The dichotomy between the Heisenberg and standard quantum limit scalings, however, is changed by dephasing such that we have to consider the definition of the sensitivity in the dephasing-limited case. In particular, Ref. Ferrie [2014] derived that the Fisher information for -limited magnetometry is given by


where is used to represent dephasing in frequency units, in analogy with . We note, that unlike the Fisher information describing the noiseless case, the bound Equation S8 for the dephasing-limited case is not independent of the true value of . Thus, to determine the achievable sensitivity in the case of dephasing-limited magnetometry, we must either assume a particular value of the field being estimated, or must generalise beyond the Cramér–Rao bound. We choose the latter case in this work, which provides further insight into the trade-off between phase accumulation time and experimental repetitions for .

Specifically, we consider the van Trees inequality (also known as the Bayesian Cramér–Rao bound) Gill and Levit [1995],


where describes the error that can be achieved using prior information, and an expectation value over a distribution of different hypotheses about the field . We intentionally do not further define , as this term depends on the context in which a magnetometry procedure is used, rather than on the magnetometry procedure itself. Moreover, the effect of is minimal in the limit of large experimental data sets, such that ineffectively consists of a correction to the Cramér–Rao bound in the case of finite data records Opper [1999].

In analogy to the Fisher information derivation above, the field-averaged Fisher information in the dephasing-limited case gives for a single phase accumulation . Hence, the analogous bound to Equation S4 is given by


To derive the sensitivity in the van Trees case, let be the average Fisher information for a dephasing-limited procedure. We can then define the average sensitivity for a total phase accumulation time to reformulate the van Trees inequality in a more practical form for our purposes, thus


Following Equation S10, is constant if a fixed phase accumulation time is used, while if . The average Fisher information saturates at , however, such that the Heisenberg and standard quantum limits coincide as approaches . Therefore, the performance observed in Fig. 1a is limited by saturation near .

Absolute precision scaling

As discussed in the main text and Methods, using a room temperature set-up can be challenging for the effect of quantum projection noise and readout infidelities. These need to be properly addressed when reduced sequence repetitions lead to a low number of PL photons to be detected when recording a fringe. The results in terms of absolute scaling have already been discussed (see e.g. Fig. 3 in the main paper). Here, we complete those analyses with additional studies. In Fig. S2a&b, we report respectively the scaling and ultimate uncertainty achievable by MFL after 150 epochs for a subset of cases with . Fig. S2b suggests an approximate gain in the uncertainty achievable halting the protocol after a fixed number of steps. From Fig. S2c we observe that the choice for in this case is motivated by running MFL for enough steps, to observe for all cases the convergence of the median quadratic losses – i.e. the square error in the parameters’ estimate, here . To estimate the true , we run MFL once over the whole dataset (), checking that the result is consistent with FFT.
We remark how the advantages in increasing the used (along with the higher precision scaling that increases from for to Heisenberg limited for ) come at the expense of worse final absolute sensitivities achievable by the protocol. This is due to the linear increment of experimental overheads with .
The robustness against sources of noise present in the room temperature set-up is emphasized by Fig. S2d, where we plot the estimates obtained by MFL for a similar subset of ’s analysed. We observe that for , MFL estimates are all substantially consistent with the result obtained for , within the estimated uncertainty and taking into account minor fluctuations in that might have occurred during the collection of the whole dataset. By contrast, we observe how FFT estimates are completely unreliable at the noise level corresponding to .

Figure S3: Ramsey sets mediated through different numbers of single sequences, corresponding to the various reported on the axis below. Experimental data (as blue dots) are reported together with a sinusoidal fit obtained from the case (as red lines). The unbalance towards the measurement outcome is evident in the cases . Data-points whose normalisation is higher than correspond to Poissonian distributed multi-photon events still present in this case.

The role of noise for low PL photon counts

Finally, we observe how for , the Bayesian process fails due to increased experimental noise and reduced statistics, underestimating both the real and the uncertainty associated with it. For example, this is evident from Fig. S2c, as the does not improve with the number of epochs. In particular, losses in the system cause an asymmetry between and , respectively the overall readout fidelities for the states and (i.e. taken all sources of noise and loss into account). From experimental raw data for (see Fig. S3), we observed that if we assume , then . This translates in unbalanced output probabilities, that conflict with the underlying assumption made so far of a binomial model for the outcomes , with probabilities given by the likelihood in Eq. 3. This level of “poisoning” in the assumed model is evidently beyond the CLE noise robustness Granade et al. [2017].

In order to prove there is no fundamental limit preventing MFL to provide correct estimates, within uncertainty, given a correct model, we thus modified the likelihood such that:


where , and for we recover the usual of Eq. 3. In order to estimate , we use it as the free parameter in a preliminary CLE run against the same dataset, but assuming known from the inference process with (i.e. having ). We thus obtain , and use this as a known parameter when running MFL with . In principle, could also be estimated from a multi-parameter inference model.

The result is reported in Fig. S4. Intuitively, measurement outcomes are interpreted as less informative by the inference process, as might be due to additional losses. This effectively slows down the learning rate per-epoch, but at the same time restores a correct behaviour of MFL .

Figure S4: Noise-compensation in the inference process. a, Estimate of the precession frequency from the set , using average photons collected per step, in peak configuration. In violet the result adopting the usual likelihood in Eq. 3, capable of handling only Poissonian noise in the the data. In blue, results from the modified likelihood Eq. S12, allowing for an extended number of epochs. Shaded areas here represent the median % credible interval provided intrinsically by MFL at each epoch, averaged over 1000 runs. b, Estimates for , and uncertainties as a Gaussian fit over the learnt posterior, for the two cases in a, along with some other representative runs from Fig. S2, after 150 epochs. The inference process with no model for infidelities in place, and , falls outside of the plotted interval.

Wide range operability of MFL methods

It is known how in Ramsey experiments, adaptive choices of time can lead not only to scalings beyond the standard quantum limit, but also to improved dynamic ranges for the sensed magnetic field , up to . Given that MFL is adaptive in the choices of the phase accumulation time , and we have shown that its precision scaling is Heisenberg limited, it follows naturally that also MFL benefits from the high-dynamic range already reported by previous experiments.

In the main paper, we already showed applicability of MFL for cases in the dataset , with \unit\micro\tesla (see Fig. 2 in the main text). Here we complement this study with an additional case () exhibiting \unit713\micro\tesla. In the case of this dataset, equivalently to , single sequences were collected and averaged from the experimental set-up, so it was reasonable to adopt a majority voting scheme to use the additional information in the data. We stress that such high intensities of tend to make least-squares fit procedures with no initial guess of the parameters fail.

The results in terms of and precision scalings are reported in Fig. S5. We observe how after 250 epochs, the difference in the final uncertainties provided by MFL is \micro\tesla. It can thus be considered approximately independent of the strength of the magnetic field. Also the precision scaling is the same, within error, of the one observed for the lower field in .

Figure S5: Sensing high-intensity magnetic fields ( \unit710\micro\tesla) with . a, Scaling in the median uncertainty over 1000 protocol runs performed each on a random Ramsey set, among those available in the ensemble of Supplementary Table 1. b, Precision scaling for MFL (in green), calculated over the same ensemble, compared with previous approaches (purple, blue) and the Heisenberg limited scaling (black). The error associated to the scaling is estimated via a bootstrap technique. The results from single runs are reported as a density plot in green. All offsets for clarity. (See for comparison and further details Fig. 2a&b in the main text.)


In the main text, we tested against experimental data the tracking capabilities of the MFL protocol. In Fig. 4 we reported the results in the case where the magnetic field intensity is synthetically altered stepwise, at random times, in a fashion completely equivalent to a stochastic time-dependent Poisson process . Such random, abrupt variations in the magnetic field might for example reproduce applicative scenarios such as the raster scanning of a surface embedding magnetic nanoparticles. A sketch of a possible experimental set-up is provided in Fig. S6a. The modifications to the standard CLE inference process required by this particularly demanding tracking scenario are summarised as pseudocode in Algorithm 1. The modifications to the standard inference process adopted amount to detect changes in the sensed parameter, that completely invalidate the current posterior, and thus suggest a reset of the prior as the most effective update step. Without triggering such reset events, huge stepwise changes would otherwise require a long time for MFL to react, because of the little support provided by the prior to the new value.

In Fig. S6b, we show a simulated performance of MFL in a representative run with time-varying . The figure exemplifies the decrease in the rate of failure events, as the frequency of the oscillating signal is decreased with time. We loosely define failure events, all those at which the quadratic loss of a single run , the mean performance achievable by the protocol, estimated across independent runs. We modify synthetically the magnetic field in the simulations as with , equivalently to Fig. 4c of the main text, but in this case we chirp the oscillating frequency for each run, and thus , with and constants. We notice how points where the second derivative of the oscillating magnetic signal is highest are those where failure events tend to occur.

Finally, in Fig. S6c, we display the performance expected for MFL when tracking a brownian-like varying signal. Here the ‘true’ signal is numerically simulated according to an Ornstein-Uhlenbeck process, similarly to the theoretical analysis in Bonato and Berry [2017].

Figure S6: Magnetic field on-line tracking via MFL. a, Pictorial representation of possible applications of a tracking protocol, where an NV centre positioned at the end of a scanning microscope is used to scan the magnetic field in the proximity of a molecule absorbed on a substrate. b, Simulation of MFL capabilities to track a chirped sinusoidal signal, with no experimental overhead and only Poissonian noise present in the data (i.e. high-fidelity readout). The frequency is linearly increased after each update step. c, Average performance, mediated over 1000 independent runs, of CLE tracking a magnetic field undergoing an Ornstein-Uhlenbeck stochastic process. In b&c, shaded areas corresponds to the usual % percentile credible range adopted in this paper.
Input : An initial prior distribution over models.
Input (additional) : a rate parameter estimating how many occur before Resample is called
Input (additional) : parameter adjusting the frequency of posterior-reset events
function EstimateAdaptive (n, , N, (the resampling parameter), (the ), Optimize, Util, n\textsubscriptguesses, GuessExperiment):
draw each independently from
for do:
if : if the effective sample size is below the threshold
if OR : resample as usual
Resample( )
store last resampling event
else: reset the procedure
draw each independently from
store last reset event
continue from
end if
end if
Mean append the new estimate from the mean
end for
end function
Output: , storing the instantaneous values of the unknown parameter
Algorithm 1 MFL algorithm with stepwise change detection
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description