Optimal fits of diffusion constants from single time data points of Brownian trajectories

# Optimal fits of diffusion constants from single time data points of Brownian trajectories

Denis Boyer Instituto de Física, Universidad Nacional Autónoma de México, D.F. 04510, México    David S. Dean Université de Bordeaux and CNRS, Laboratoire Ondes et Matière d’Aquitaine (LOMA), UMR 5798, F-33400 Talence, France    Carlos Mejía-Monasterio Laboratory of Physical Properties, Technical University of Madrid, Av. Complutense s/n 28040, Madrid, Spain Department of Mathematics and Statistics, P.O. Box 68 FIN-00014, Helsinki, Finland    Gleb Oshanin Laboratoire de Physique Théorique de la Matière Condensée (UMR CNRS 7600), Université Pierre et Marie Curie, 4 place Jussieu, 75252 Paris Cedex 5 France
July 13, 2019
###### Abstract

Experimental methods based on single particle tracking (SPT) are being increasingly employed in the physical and biological sciences, where nanoscale objects are visualized with high temporal and spatial resolution. SPT can probe interactions between a particle and its environment but the price to be paid is the absence of ensemble averaging and a consequent lack of statistics. Here we address the benchmark question of how to accurately extract the diffusion constant of one single Brownian trajectory. We analyze a class of estimators based on weighted functionals of the square displacement. For a certain choice of the weight function these functionals provide the true ensemble averaged diffusion coefficient, with a precision that increases with the trajectory resolution.

###### pacs:
05.40.Jc, 31.15.xk, 87.16.dp, 61.43.Er

Single particle tracking (SPT) generates the time series of the position of an individual particle trajectory in a medium (see, e.g., bra (); saxton ()). Properly interpreted, the information so obtained provides an insight into the mechanisms and forces that drive or constrain the motion of the particle moerner (). Nowadays single particle tracking is extensively used to characterize the microscopic rheological properties of complex media mason () and to probe the active motion of biomolecular motors greenleaf (). In biological cells and complex fluids, SPT methods have become instrumental in demonstrating deviations from standard Brownian motion (BM) golding (); weber (); bronstein (); seisenberger (); weigel ().

The reliability of the information drawn from SPT analysis is not always clear: data is obtained at high temporal and spatial resolution but at the expense of statistical sample size. Time averaged quantities associated with a given trajectory are subject to large fluctuations across trajectories. For a wide class of anomalous diffusion problems, for instance, time-averages of certain particle’s observables are, by their very nature, random variables distinct from their ensemble averages rebenshtok (); ralf (); he (); lubelski ().

Even though standard BM is much better understood than anomalous diffusion processes, averaging problems persist and complicate the analysis of single trajectories. Moreover, in bounded systems, substantial manifestations of sample-to-sample fluctuations occur in first passage time phenomena carlos (). Standard fitting procedures applied to a finite Brownian trajectory unavoidably lead to fluctuating estimates of the diffusion coefficient, due to different thermal histories, particle interactions with different defects, or simply due to blur and localization errors, as discussed in berglund (); michalet (); mb (). In fact, might be very different from the true ensemble average value , as noticed in SPT measurements of diffusion along DNA austin (), in the plasma membrane saxton () or in the cytoplasm of mammalian cells goulian ().

The broad dispersion of estimate values extracted from common SPT analysis raises an important question: Does an optimal methodology able to determine the diffusion coefficient from just one single trajectory exist? Clearly, it is highly desirable to have an estimator of this kind even for hypothetical pure cases, such as the unconstrained standard BM with perfectly known location at a given time. Such an estimator should possess an ergodic property, i.e., its most probable value should converge to the ensemble average and its variance should vanish as the observation time increases. Besides, the knowledge of the distribution of a family of estimators could provide a way to disentangle the effects of the medium complexity or localization errors from variations due to the thermal noise driving microscopic diffusion.

In this paper we study the ergodic properties of weighted one-time fits to a BM trajectory. We focus on a family of weighted least-squares estimators () of the diffusion coefficient of standard -dimensional BM, given by the following quadratic functionals of a trajectory :

 uμ=AμT∫T0dtω(t)B2t, (1)

where , is some “trial” weight function of the form:

 ω(t)=1(t0+t)μ, (2)

being a tunable exponent, - the observation time, - a certain lag time () and - the normalization constant. The term “least-squares” and the choice of the weight function will be made clear below.

Here we evaluate the distribution for arbitrary and spatial dimension . To easily compare the accuracy of estimators with different values of , we chose such that , where the symbol denotes the ensemble average. Hence, with being the diffusion constant,

 D=E{B2t}2dt, (3)

and - its estimate from . The best choice of should produce whose maximum is the closest to the ensemble averaged value and have the smallest variance . Ultimately, we seek the choice at which is ergodic, i. e. , independently of as .

Before we proceed, two remarks are in order. First, note that corresponds to the standard least square estimate (LSE) of the square displacement saxton (); berglund (); goulian (); saxton2 (). The case arises when the unconditional probability of observing the whole trajectory is maximized (assuming that it is Brownian). It is the so-called maximum likelihood estimate (MLE), known to be more accurate than the LSE berglund (); michalet (); mb (); boyer (); boyer1 ().

We next give a physical interpretation of the estimators in Eq. (1). Consider a least squares functional:

 F=12∫T0ω(t)dtt(B2t−2dDft)2, (4)

which we generalize by adding a time dependent weight function (the standard choice -LSE- is ). The value of that minimizes is:

 DfD=(1T∫T0dtω(t)B2t)/(2dDT∫T0dttω(t)) (5)

One recovers Eq. (1) by choosing the weight function and identifying the denominator with . Hence minimizes a functional (4) and can be referred to as a weighted least-squares estimator.

The moment generating function of the random variable , Eq. (1), is defined as

 Φ(σ)=E{e−σuμ}. (6)

This function can be calculated using the Feynman-Kac formula (see Refs.boyer (); boyer1 () for more details). For , we find that to leading order in

 Φ(σ)=⎡⎣Γ(ν)(σχ1)ν−12I1−ν(2√σχ1)⎤⎦−d/2, (7)

for , while for it obeys

 (8)

where , , and is the modified Bessel function abramowitz ().

The variance of is obtained by differentiating Eqs. (7) or (8) twice with respect to . For arbitrary it is then given explicitly by

 Var(uμ)=2d{(2−μ)/(3−μ),μ<2,(μ−2)/(2μ−3),μ>2. (9)

The consequence of the latter equation is shown in Fig. 1. Unexpectedly, the variance can be made arbitrarily small at leading order in by taking gradually closer to , either from above or from below! The slopes at and appear to be the same, so that the accuracy of the estimator will be the same for approaching from above or below.

A word of caution is now in order. Finite- corrections to the result in Eq. (9) are of order of for . Therefore the asymptotic behavior above can be only attained when . In other words, the variance can be made arbitrarily small by choosing closer to , but only at the expense of increasing the experimental resolution ( or ).

To confirm our analytical results we simulated random walks on a lattice and computed using Eq. (1) from a large ensemble of trajectories, for different values of and different resolution . For or , the variance computed numerically is well described by Eq. (9) and is independent of (Fig. 1). Near , corrections due to the finite resolution are noticeable, but the numerics clearly show that the variance of the distribution decreases as .

The large- and small- asymptotics of can be deduced directly from Eqs. (7) and (8). For and , shows a singular behavior:

 P(uμ)∼exp(−d24χ1uμ)1uζμ,ζ=32+d4μ|2−μ|. (10)

The asymptotic behavior for can be obtained from Eq. (10) by simply replacing . Note that Eq. (10) describes a bell-shaped function with a maximal value when from above or below for arbitrary . Next, for and , we find

 P(uμ)∼ud/2−1μexp(−χ1γ21−ν,14uμ), (11)

where is the 1st zero of the Bessel function abramowitz (). Results for follow from Eq. (11) via the replacements and . As gradually approaches , the distribution becomes increasingly narrow: the left tails vanish because of the divergence of the factor in the exponential, while the right tails vanish because and diverge.

The distributions , obtained by inverting Eqs. (7) and (8), are plotted in Fig. 2. Indeed, the maximal value when either from above or from below. Already for (or ) we get the most probable value , which outperforms the LSE () and the MLE (). For the variance , which is an order of magnitude less than the variances observed for LSE () and the MLE (). Similarly to Fig. 1, finite-resolution corrections are negligible for , and is well described by Eq. (7). For and finite resolution , we obtain a broader distribution and with a smaller than the corresponding to Eq. (7) for infinite resolution. However, note that the most probable value of that we obtain at finite resolution is , which outperforms the LSE and MLE for infinite resolution.

We turn next to the case with small but finite, seeking the variance and the distribution of . We consider a slightly more general form for :

 ω(t)={2ξ/t20,for t

where is a tunable amplitude. For such a choice, the moment generating function is given explicitly by

 Φ(σ)=(2δε(δ−1)/2ϕ+)d/2[1+ϕ−ϕ+εδ]−d/2, (13)
 ϕ±=(δ±1)(ch(√2γξσ)±δ∓12√2γξσsh(√2γξσ)),

where and . Differentiating Eq. (13), we find

 Var(u2)=43d3ln(1/ε)−3(1−ε)+2(1−ε)ξ+ξ2(ξ+ln(1/ε))2. (14)

It is a non-monotonic function of with a minimum at

 ξ=ξopt=(2+ε)ln(1/ε)−3(1−ε)ln(1/ε)+ε−1. (15)

The corresponding optimized variance is given by:

 Varopt(u2)=43d3ln(1/ε)−4+5ε−ε2ln(1/ε)(ln(1/ε)+1+2ε)−3(1−ε). (16)

From Eq. (16) we find that in for , respectively. When , vanishes as

 Varopt(u2)∼4d1ln(1/ε). (17)

Therefore, can be made arbitrarily small but at expense of a progressively higher resolution. In the limit the distribution converges to a delta-function. The estimators with are the only, in the family defined by Eq. (1), that possess an ergodic property. This is shown in Fig. 3 where we plot obtained by numerically inverting Eq. (13) for different resolutions. The symbols in this figure correspond to numerical simulations using the weight function of Eq. (12).

Finally let us consider the case of BM recorded at discrete time steps , , (), which is an important in its own right problem but also will allow us to justify the choice of the weight function in Eq. (2). We focus on the estimator of a general form

 ~u=12dΔN∑j=1ωjB2Δ⋅j/N∑j=1jωj, (18)

where now is an arbitrary weight function. The culminating point of our analysis is to determine, via a variational approach, the function which yields the lowest possible variance , given from Eq. (18) by

 Var(~u)=2dN∑j=1ωjN∑k=1ωkmin(k,j)2/(N∑j=1jωj)2 (19)

where is the minimum of and . Minimizing

 ~F=12N∑j=1ωjN∑k=1ωkmin(k,j)2−λ(N∑j=1jωj−1), (20)

with respect to each ( is a Lagrange multiplier enforcing the constraint ), we find that the optimal weight obeys

 N∑j=1ωjmin(k,j)2=λ⋅k,k=1,…,N (21)

which can be solved exactly to give

 λ=N(N∑k=1k4k2−1)−1 (22)

and

 ωj=2λ4j2−1=⎛⎜⎝N∑Nk=1k4k2−1⎞⎟⎠14j2−1. (23)

The optimal variance in this case reads

 (24)

Therefore, the weight function in Eq. (23) minimizes the discretized least squares functional in Eq. (20) and produces an ergodic estimator: the smallest possible variance (for the class of estimators defined by Eq. (18)) vanishes as . Choosing some initial time lag and turning to the limit and , while keeping fixed, the weight function in Eq. (23) converges to the form in Eq. (2) with , which thus justifies our choice of the power-law trial weight function for continuous-time Brownian motion. Note that for , the leading asymptotic behavior of the variance in Eq. (24) coincides with Eq. (17), but produces slightly higher values of the variance (as the former estimator is based on an everywhere discrete process).

To conclude, we have analyzed the ergodic properties and the asymptotic behavior of a family of least-squares estimators in Eq. (1). We have demonstrated that the estimators with are the only that possess an ergodic property, i.e., they can provide the true ensemble averaged diffusion coefficient from a single trajectory data with any necessary precision, but at expense of a progressively higher experimental resolution. Convergence to the true ensemble average value appears, however, to be rather slow: the variance of such an optimal estimator vanishes only in proportion to . This means that for practical purposes the methods based on two-time correlation functions can provide better estimators, because the variance of the corresponding estimator decays faster, as , even in the presence of localisation errors berglund (); michalet (); mb ().

###### Acknowledgements.
We thank Gregory Putzel for valuable comments. The authors acknowledge partial support from the European Science Foundation through the Research Network ”Exploring the Physics of Small Devices”. CMM is supported by the European Research Council and the Academy of Finland.

## References

• (1) C. Bräuchle, D. C. Lamb, and J. Michaelis, Eds., Single particle tracking and single molecule energy transfer (Wiley-VCH, Weinheim, 2010).
• (2) M. J. Saxton and K. Jacobson, Ann. Rev. Biophys. Biomol. Struct. 26, 373 (1997).
• (3) W. E. Moerner, Proc. Natl. Acad. Sci. USA 104, 12596 (2007).
• (4) T. G. Mason and D. A. Weitz, Phys. Rev. Lett. 74, 1250 (1995).
• (5) W. J. Greenleaf, M. T. Woodside, and S. M. Block, Annu. Rev. Biophys. Biomol. Struct. 36, 171 (2007).
• (6) I. Golding and E. C. Cox, Phys. Rev. Lett. 96, 098102 (2006).
• (7) S. C. Weber, A. J. Spakowitz, and J. A. Theriot, Phys. Rev. Lett. 104, 238102 (2010).
• (8) I. Bronstein et al., Phys. Rev. Lett. 103, 018102 (2009).
• (9) G. Seisenberger et al., Science 294, 1929 (2001).
• (10) A. V. Weigel, B. Simon, M. M. Tamkun and D. Krapf, Proc. Natl. Acad. Sci. USA 108, 6438 (2011)
• (11) A. Rebenshtok and E. Barkai, Phys. Rev. Lett. 99, 210601 (2007).
• (12) J. -H. Jeon et al., Phys. Rev. Lett. 106, 048103 (2011).
• (13) Y. He, S. Burov, R. Metzler and E. Barkai, Phys. Rev. Lett. 101, 058101 (2008).
• (14) A. Lubelski, I. M. Sokolov, and J. Klafter, Phys. Rev. Lett. 100, 250602 (2008).
• (15) C. Mejía-Monasterio, G. Oshanin, and G. Schehr, J. Stat. Mech. P06022 (2011).
• (16) A. J. Berglund, Phys. Rev. E 82, 011917 (2010).
• (17) X. Michalet, Phys. Rev. E 82, 041914 (2010); 83, 059904 (2011)
• (18) X. Michalet and A. J. Berglund, Phys. Rev. E 85, 061916 (2012)
• (19) Y. M. Wang, R. H. Austin and E. C. Cox, Phys. Rev. Lett. 97, 048302 (2006).
• (20) M. Goulian and S. M. Simon, Biophys. J. 79, 2188 (2000).
• (21) M. J. Saxton, Biophys. J. 72, 1744 (1997).
• (22) D. Boyer and D. S. Dean, J. Phys. A: Math. Gen. 44, 335003 (2011).
• (23) D. Boyer, D. S. Dean, C. Mejía-Monasterio and G. Oshanin, Phys. Rev. E 85, 031136 (2012).
• (24) M. Abramowitz and I. R. Stegun, Eds., Handbook of mathematical functions (Dover, New York, 1972).
• (25) D. Boyer, D. S. Dean, C. Mejía-Monasterio and G. Oshanin, unpublished.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters