A The Persistence length

# Cavity approach for modeling and fitting polymer stretching

## Abstract

The mechanical properties of molecules are today captured by single molecule manipulation experiments, so that polymer features are tested at a nanometric scale. Yet devising mathematical models to get further insight beyond the commonly studied force–elongation relation is typically hard. Here we draw from techniques developed in the context of disordered systems to solve models for single and double–stranded DNA stretching in the limit of a long polymeric chain. Since we directly derive the marginals for the molecule local orientation, our approach allows us to readily calculate the experimental elongation as well as other observables at wish. As an example, we evaluate the correlation length as a function of the stretching force. Furthermore, we are able to fit successfully our solution to real experimental data. Although the model is admittedly phenomenological, our findings are very sound. For single–stranded DNA our solution yields the correct (monomer) scale and, yet more importantly, the right persistence length of the molecule. In the double–stranded case, our model reproduces the well-known overstretching transition and correctly captures the ratio between native DNA and overstretched DNA. Also in this case the model yields a persistence length in good agreement with consensus, and it gives interesting insights into the bending stiffness of the native and overstretched molecule, respectively.

###### pacs:
87.15.âv, 05.65.+b, 75.10.Nr, 89.75.Da

## I Introduction

The development of techniques such as atomic force spectroscopy Binnig and Rohrer (1987); Binnig et al. (1986), magnetic tweezers Gosse and Croquette (2002), or laser optical tweezers Ashkin (1970a, b) allows us today to manipulate single molecules in the laboratory. These experiments are aimed at unveiling the mechanical properties that are key to the biological processes such molecules are involved with. Understanding the energy scales involved in pulling, bending or twisting a molecule, or the structural conformation this assumes, is fundamental to fully appreciate and possibly control processes that such as DNA replication or RNA translation. In this work we focus in particular on DNA stretching experiments, which have been carried out since the early 1990s Smith et al. (1992); Bustamante et al. (1994) and that allowed us, in these two decades, to understand several properties of this molecule. Single DNA manipulation revealed, for instance, that DNA shows an elastic response to small pulling forces and a resistance to bending Hagerman (1988); Bustamante et al. (1994), implying that DNA develops a non negligible correlation along its molecular chain. Furthermore, stretching experiments allowed to uncover the overstretching transition of double–stranded DNA (dsDNA), which consists of a 1.7–fold sudden elongation of the B-DNA (i.e. the DNA in its well-known helical configuration) when it is stretched by forces of around 65 pN Smith et al. (1996); Cluzel et al. (1996); Williams et al. (2002); van Mameren et al. (2009); Lebrun and Lavery (1996); Cocco et al. (2004); Whitelam et al. (2010); Fu et al. (2010).

To go beyond these results and understand them on a more quantitative basis, it is necessary to complement the experiments with some theoretical models. Such models should be, in principle, simple enough to be easily fit to the experimental data in a laboratory, but yet realistic enough to give insights into the actual mechanisms behind the observations. Among the main models in the literature, the freely jointed chain (FJC) Flory (1969) is too simplistic and only works in a limited range of forces Smith et al. (1992); Marko and Siggia (1995), while the worm like chain (WLC) is not analytically solvable Fixman and Kovac (1973); Bustamante et al. (1994); Marko and Siggia (1995); Samuel and Sinha (2002). Additionally, the FJC does account for the monomer–monomer interaction that makes the molecule stiff, while the WLC completely discards the discreteness one expects to observe in real molecules. The Kratky-Porod (KP) model Kratky and Porod (1949), which has been lately revisited by some more recent works, tries to capture these two aspects together Storm and Nelson (2003); Chakrabarti and Levine (2005, 2006); Rahi et al. (2008). These models are in turn generally solved with transfer matrices, which are diagonalised by some (variational) eigenvector; an approach that is also adopted to solve the WLC model Marko and Siggia (1995); Bouchiat et al. (1999). Such an eigenvector, however, has no clear interpretation in terms of the physics of the stretching process Storm and Nelson (2003); Rahi et al. (2008); Chakrabarti and Levine (2006), so that gaining insight beyond the usual force–elongation relations is rather involved.

Here we propose a full solution of a KP–like model that relies on the so–called cavity method Mezard and Parisi (2001), which, conveniently, is exact on a chain system. Such method is a common technique in the field of disordered systems Mulet et al. (2002); Mézard et al. (2002) and has already proved useful in tackling other types of biological problems Braunstein et al. (2008); Weigt et al. (2009); Massucci et al. (2013). The equations we derive only assume the polymer to be sufficiently long and the solution we propose is thus as correct as a full numerical diagonalisation of the transfer matrix. The method we present, though, can give more insight into the processes happening locally on the molecule, because the set of equations that lie at the core of our approach allows us to compute, with no extra cost, several observables and molecular properties (such as the correlation length for varying force) when fitting the model to the experimental data. Finally, because we avoid choosing any arbitrary analytical form to carry out the solution, the quantitative outcome of our fits may not be biased by such a choice.

The remainder of this article is thus organised as follows: in the next section we introduce the model and the method adopted to solve it in the case of single–stranded DNA (ssDNA) stretching. In Sec. III we generalise the discussion to dsDNA and its overstretching transition. In both cases we successfully fit our analytical solution to experimental measurements and uncover insightful results.

## Ii Single–stranded DNA

The KP model describes a sequence of interacting segments of length stretched by an external force . It can be expressed by the Hamiltonian:

 −H(^t;\overarrow@\arrowfill@\leavevmode\nobreak \leavevmode\nobreak \raisebox−5.5pt[1.0pt][1.0pt]$\mathchar0"017E$f)=JBN−1∑i=1^ti⋅^ti+1+bBN∑i=1^ti⋅\overarrow@\arrowfill@\leavevmode\nobreak \leavevmode\nobreak \raisebox−5.5pt[1.0pt][1.0pt]$\mathchar0"017E$f\leavevmode\nobreak . (1)

Here is the orientation of monomer , , and is a ferromagnetic coupling to favour local alignment of the monomers. Such coupling has the dimensions of an energy and fixes the polymer rigidness: the larger , the stiffer the molecule. In particular, bigger values of yield a greater molecular persistence length , given by (Appendix A):

 ξ0=−bBlogL(βJB), (2)

where is the Langevin function. For fixed and , the persistence length is a monotonically increasing function of . When grows, interactions are stronger and ‘information’ propagates more easily.

The free energy per monomer is as usual given by taking the logarithm of the partition function :

 F=−1NβlogZ\leavevmode\nobreak ,Z=∫SNd^t\leavevmode\nobreak e−βH(^t;\overarrow@\arrowfill@\leavevmode\nobreak \leavevmode\nobreak \raisebox−5.5pt[1.0pt][1.0pt]$\mathchar0"017E$f)\leavevmode\nobreak , (3)

where denotes the integration over the 3D unit sphere. The elongation per monomer can be derived by differentiating with respect to the force. In doing so it is convenient to choose a reference frame with the axis along the force direction and to study the elongation along :

 L(f):=−∂F∂f=bBZNN∑i=1∫Sd^t\leavevmode\nobreak (e−βH^ti⋅^z). (4)

The elongation at small and large stretching force can be computed by expanding in and , respectively (see Fig 1). One finds that at small the elongation goes linearly with the stretching force as (Appendix B):

 L(f)=βb2Bζ0f, (5)

with given by Eq. (26). Formula (5) reproduces the known elastic properties of DNA at small force Bustamante et al. (1994) and correctly yields the FJC and the WLC results Flory (1969); Marko and Siggia (1995) when performing the and limits, respectively (Appendix B). Indeed, normalising the elongation as , for vanishing Eq. (5) gives the small force FJC limit Flory (1969). Conversely, when , Eq. (5) yields the WLC elongation , with being the persistence length, Eq. (2) Marko and Siggia (1995). Note that the limit is in practice achieved as long as , so that for sufficiently small forces (or ) the model reproduces the WLC elongation (more on this in Sec. II.2). When is large, the elongation saturates to (see Appendix C) and its expansion, reported in Eq. (34), reproduces the FJC elongation by evaluating the limit.

For the intermediate range of forces, the model can be solved in a few different ways: one could use for instance transfer matrices Storm and Nelson (2003); Chakrabarti and Levine (2005), noticing that the partition function can be represented as the trace over the –th power of an integral operator, or numerically, by using a Monte Carlo method to minimize the energy Chakrabarti and Levine (2006). Nevertheless, both approaches involve some approximation –analytical, in the first case, and stochastic, in the second– of the full analytical solution and make the fitting of the model to real measurements (which after all is the main point of building a model of this sort) unnecessarily cumbersome. We thus choose to follow another path: we borrow a standard technique from the field of disordered systems, i.e. the cavity method Mezard and Parisi (2001), and solve the model with the sole assumption of a large polymer limit 1.

### ii.1 Solution with the cavity method

We start by noticing that the Hamiltonian (1) only involves local terms and that the elongation can be calculated by means of local probability marginals that factorize. To see this, let us focus for instance on site in the bulk of the chain. The orientation of such monomer has the marginal probability density function (pdf)

 Pi(^ti)=1Z∫Sd^t∖ie−βH(^t;f), (6)

which is the trace of the Boltzmann weight over all variables except . Now the Hamiltonian can be rewritten as , where the term does not depend on . can be further divided into two independent terms, one containing site and its side of the chain, and the second including and the rest. This trick allows to integrate over all variables except and to rewrite Eq. (6) in the factorized form:

 Pi(^ti)=eβfbB^ti⋅^zZi∏j=i±1∫Sd^tj\leavevmode\nobreak eβJB^ti⋅^tjP(i)j(^tj), (7)

where is the marginal pdf of site in the system (the so called cavity marginal) and is a normalising constant (different from ). Note that is just the system where has been removed. As mentioned, in such system, the two sides of the chain at the left and the right of are independent and removing does not affect site . One can now define by connecting back site to and removing , so that the pdf of site reads:

 P(i+1)i(^ti)=eβfbB^ti⋅^zZ(i+1)i∫Sd^ti−1\leavevmode\nobreak eJB^ti⋅^ti−1P(i)i−1(^ti−1), (8)

with being a normalising constant. Equation (8) allows thus to mutually relate the cavity marginals and can be solved iteratively, for each pair of neighboring sites. The fixed point solution of Eq. (8) can be in turn plugged into Eq. (7) to compute the actual marginal of site .

The crucial points of the method are that, since the system is a chain, the factorization performed in Eq. (7) is exact and that the same cavity marginals appear in the integral of both Eqs. (7) and (8).

Note that so far no approximation has been made: in principle, we can choose a suitable chain size , fix appropriate boundary conditions and solve the resulting 2 equations for the cavity marginals iteratively. This would yield to single site marginals that are, by all means, exact. Nevertheless, if we are interested in the properties of the polymer’s bulk, it is possible to simplify the system by neglecting boundary effects. We can assume that the molecule is sufficiently long and that, because of homogeneity, each cavity marginal may be written as:

 Pcav(^t)=1ZcaveβfbB^t⋅^z∫Sd^s\leavevmode\nobreak eβJB^ti⋅^sPcav(^s)\leavevmode\nobreak , (9)

with being a normalisation constant. We thus choose to solve Eq. (9) iteratively and use its fixed point solution to evaluate the physical local marginals in the polymer’s bulk as:

 P(^t)=1ZeβfbB^t⋅^z[∫Sd^s\leavevmode\nobreak eβJB^ti⋅^sPcav(^s)]2\leavevmode\nobreak . (10)

Note that it is also possible to express the free energy of the system in terms of the cavity marginals Mézard and Parisi (2003), so that all we need to do to solve the model is to compute the fixed point solution of Eq. (9). With a reasonable choice of the integration routine, such fixed point is reached very quickly, so that the method is also convenient because of its reduced computational cost.

The present approach has another interesting feature, since it allows to easily compute averages of single– and two–site observables. For instance the elongation per site of the molecule is simply equal to

 L(f)=bB∫Sd^t\leavevmode\nobreak (P(^t)^z⋅^t), (11)

which gives the average of the projection of the molecule along the force axis. Two–site averages are instead computed by iterating the equations for the derivatives of the cavity marginals. As an example, we can evaluate the correlation length for any value of . To do so, let us first recall that the susceptibility at fixed force is equivalently given by differentiating twice the free energy with respect to or by summing the correlation function over the distance :

 ζf=−1β∂2F∂f2≡bBf∂L∂bB,ζf=ζ(0)+2∞∑r=1ζ(r), (12)

respectively. Now, the first relation can be related to the derivative (see Sec. II.2 and Appendix D for more details), while the second reduces to a geometric series when assuming that the correlation decays exponentially as . The constant is the connected (self) correlation at fixed . By expanding the geometric series one has in particular

 ξ(f)=−bBlog[(ζf−ζ(0))/(ζf+ζ(0))]. (13)

Since we can easily evaluate with marginal (10) and by iterating the derivative of the cavity marginal, we are able to get for any value of at practically no extra cost.

As an illustrative example to validate the method, we can compute some of these quantities and check their values against some numerical simulation. The model described by (1) can indeed be easily simulated with a Monte Carlo Heat Bath algorithm Miyatake et al. (1986), so that the accuracy of our calculations can be readily tested. In Fig. 1, we compare our analytical force–elongation curve, Eq. (11), with the outcome of the numerical simulations and the large and small force limits of the theory (Eqs. (5, 34), respectively), for a particular choice of the parameters and . The cavity solution agrees extremely well with both the numerical simulations and the two limits, which are computed without relying on the cavity formalism. In the inset of Fig. 1 we show, furthermore, the correlation length at fixed force computed with our method and we compare it with the correlation length evaluated with the numerical simulations. Again, the two results are seen to overlap extremely well and to reproduce, for , the correct persistence length at null force given by Eq. (2). These results show that the cavity approach can be successfully applied to solve the KP model. Hence, once the properties of our solution have been characterized, we can move forward in illustrating how the cavity approach comes particularly convenient when fitting the theory to the experimental results.

### ii.2 Fit to the experiments

Our aim is now to find the parameter values that best fit Eq. (11) to the experimental elongation. The fit value of and may be used, in turn, to compute other observables such as, for instance, the persistence length, given by Eq. (2), and the correlation length at fixed force, Eq. (13). Given the experimentally measured force–elongation points , the best guess for the parameter vector is thus found by least squares minimising the cost function

 χ2=N∑κ=1(L(exp)κ−L(f(exp)κ;μ))2, (14)

where is the value of Eq. (11) at force for a fixed parameter vector . Equation (14) measures in practice the distance between the elongation predicted by our model, Eq. (11), and the experimental elongation 2. The analytical minimisation of is achieved by computing the gradient , which ultimately depends on the derivatives of with respect to the parameters of the model. Interestingly, such derivatives turn out to be correlation functions that may be well studied within the cavity method (see Appendix D). As a consequence, solving the minimisation gives additional information on non local processes taking place in the molecule. Direct differentiation of the cavity equations ultimately leads to

 ∂Zc(^t)β∂JB=eβfbB^t⋅^z∫Sd^ueβJB^t⋅^u[^t⋅^uZc(^u)+∂Zc(^u)β∂JB]∂Zc(^t)β∂bB=f^t⋅^zZc(^t)+eβfbB^t⋅^z∫Sd^ueβJB^t⋅^u∂Zc(^u)β∂bB, (15)

where and its cavity counterpart . Indeed

 ∂P∂μi(^t)=1Z∂Z(^t)∂μi−P(^t)1Z∂Z∂μi\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak i=1,2. (16)

The fixed point of Eq. 15 yields and allows to implement the fit.

Therefore, we numerically iterate Eqs. (9), (15) for , , respectively. The fixed point solution is then used to compute . The value of yielding is finally retained as the best fit of the model to the experimental measurements. The key advantage of the approach is that by iterating these few equations, one is able to minimise the and to obtain simultaneously the force–elongation relations Eq. (11), the correlation length Eq. (13), or whichever additional thermodynamic quantity one may be seeking. We stress again that the computational cost to reach the fixed point solution of our equations is very limited: at fixed , convergence is reached on average in 14 iterations. By using Gaussian quadrature integration with 14 weights, which yields a very good estimate of the functions, this is achieved in less then 0.1s on an Intel i7 processor.

We hence go on to fit a stretching experiment performed on a ssDNA chain of 3000 bases in a 10 mM NaCl solution. The interbase distance was estimated to approximately 0.7 nm by x–ray diffraction. The fit, which is plotted in Fig. 2, allows us to find a monomer length 0.64 nm, i.e. a value very close to the actual interbase distance. Such finding suggests that the model is able to capture the real molecular fundamental length. We also have for the coupling constant pN nm. As mentioned in the previous section (and as explained in Appendix A), parameters and can be used to compute the persistence length of the model. Applying formula (2), we find nm, which is also consistent with experimental results Smith et al. (1996); Tinland et al. (1997).

In Fig. 2 we also show a fit of the WLC interpolation formula Marko and Siggia (1995); Hsin et al. (2011) to the experimental points. The resulting is larger than the cavity fit (Fig. 2 caption) and the corresponding persistence length nm is longer than our result, probably due to the approximation made by the interpolation formula in the intermediate range of forces (see, e.g., Fig. 3 in Ref. Marko and Siggia (1995)). Indeed, fitting the variational WLC formula Marko and Siggia (1995), which is a better estimate of the WLC exact solution, yields basically the same curve and roughly the same persistence length of the cavity fit (not shown). This is to be expected, since, as mentioned, our model should reproduce the WLC limit as long as , which is the case for a broad range of the experimental forces. Such findings stress thus the importance of choosing an exact approach against an approximated one. In this sense, we feel that our method is a convenient alternative to the WLC and other approaches in the literature Marko and Siggia (1995); Storm and Nelson (2003), because, in addition to the persistence length and the molecular fundamental unit , it allows us to evaluate other observables at no extra cost. In the inset of Fig. 2 we show for instance the correlation length at any force value ; Eq. (13). This is seen to be a decreasing function of , which correctly equals the persistence length nm at .

In conclusion, we have shown in this section how our method allows one to simultaneously compute different observables by simply fitting the theory to the experimental data. We derived indeed very reasonable estimates of the molecule elongation, of the length of its unit blocks , of the persistence length , and of the correlation length at fixed force , in a straightforward fashion.

## Iii Double–stranded DNA

Motivated by the accurate results obtained with ssDNA stretching, we now generalise the model to the case of dsDNA and to its structural transition. As explained in the introduction, B-DNA abruptly extends 1.7 times its native length when stretched by pN, getting into a state called S-DNA Smith et al. (1996); Cluzel et al. (1996). In the following, we are not making any explicit assumption about the actual nature of such overstretched DNA, but we take into account that it is different, somehow, from the native B-DNA. We thus allow the monomers in Eq. (1) to be into two possible states, similarly to Refs. Storm and Nelson (2003); Rahi et al. (2008) and in line with the spirit of Ref. Fiasconaro and Falo (2012), where a dynamical model of overstretched DNA featuring a double well potential was studied. For the sake of simplicity we will call the overstretched DNA S-DNA, but we do not make any further speculation on its actual conformation.

To proceed in our modeling, let us reconsider the chain of monomers described by Eq. (1), but let us denote the state of each monomer through a set of binary variables such that , . Since every site may now be in either or state, the parameters and must depend on the variables . We allow thus two different monomer lengths and three different interaction terms (we have ). We introduce some further terms based on energetics considerations: as B-DNA alone is found in nature, we make it energetically more stable than S-DNA by adding an effective field which is different from zero at site only if . The parameter can be interpreted as the energy needed to break a B-DNA monomer and transform it into S-DNA, so that increasing it enlarges the force needed to perform the transition. We also include the energetic cost of having a boundary between and regions, through the parameter . Varying this parameter changes instead the steepness of the transition. We note in passing that this same setup can be applied, for instance, to study the breaking of secondary structures (helix–coil transition) in protein stretching experiments Chakrabarti and Levine (2005, 2006).

Taking all this into account, our dsDNA Hamiltonian takes the form:

 H(^t,σ;\overarrow@\arrowfill@\leavevmode\nobreak \leavevmode\nobreak \raisebox−5.5pt[1.0pt][1.0pt]$\mathchar0"017E$f)=−N∑i=1(fbσi^ti⋅^z+γBδσi,B)−N−1∑i=1(Jσi,σi+1^ti⋅^ti+1+ϵBS(1−δσi,σi+1)), (17)

where is a Kronecker delta, equal to one only if . It is known that the fundamental blocs of real DNA are extensible themselves Smith et al. (1992); Marko and Siggia (1995), and that DNA experiences an enthalpic elongation for large enough stretching forces. To capture this feature we also introduce a stretching modulus and rewrite the monomerâs length as , with the Young modulus Wang et al. (1997) and being the native B monomer length at . On the contrary, we assume to be rigid. Considering this, the elongation along the axis may be again derived by differentiating the free energy with respect to the force modulus :

 L(f)=limN→∞1NN∑i=1⟨(δσi,BcB+δσi,SbS)^ti⋅^z⟩. (18)

Here denotes the usual thermal average, while the term is produced by differentiating with respect to .

The elongation and other thermal properties of the system may be again derived comfortably with the cavity method. Similarly to what was done for the ssDNA, one can notice that the Hamiltonian (17) still depends only on local terms, and, as in Eq. (7), one can write the marginal pdf of each site in terms of the factorized pdf of its neighbours. Discarding boundary effects, one is finally able to express the cavity marginals in the bulk of the molecule as:

 Pcav(^t,σ)=Z−1cav\leavevmode\nobreak exp[γBβδσ,B+βfbσ^t⋅^z]×∑τ∈{B,S}eβεστ∫Sd^ueβJστ^t⋅^uPcav(^u,τ)\leavevmode\nobreak , (19)

where now depends on the vector and on the binary variable . The constant assures normalisation so that . The physical marginal is found by connecting a given site to both its neighbours:

 P(^t,σ)=Z−1exp[γBβδσ,B+βfbσ^t⋅^z]×⎛⎝∑τ∈{B,S}eβεστ∫Sd^ueβJστ^t⋅^uPcav(^u,τ)⎞⎠2, (20)

where is again a normalising constant. Equation (20) above expresses the probability of finding a monomer in either the or the state, oriented along the direction . As a consequence, such marginal can in turn be used to compute the elongation as:

 L(f)=∫Sd^t(^t⋅^z)[cBP(^t,B)+bSP(^t,S)]\leavevmode\nobreak . (21)

Furthermore, it is possible to compute the fraction of and as the marginal probability of being in state at fixed force:

 P(σ)=1Z∫Sd^tP(^t,σ). (22)

The cavity approach is thus again particularly convenient: Eq. (19) may be solved via iteration and its fixed point solution can be used to compute and by expressing in terms of the cavity marginal. Again, we checked the validity of our approach against Heat Bath Monte Carlo simulations, where we added the degrees of freedom. Our solution was found to be in excellent agreement with the numerical results.

The strength of the method, though, lies again in that explicit differentiation of the cavity marginals allows one to easily compute the correlations that arise when fitting the model to experimental measurements, as we will show in the next section.

### iii.1 Fit to the experimental measurements

We proceed now to fit model Eq. (17) to a set of dsDNA stretching measurements. Once again, we fit the theoretical elongation, given by Eq. (21), to an experimentally measured one. The parameters to be fit are in this case , , , , , , and their value is sought again by minimising a cost function of the form of Eq. (14). We achieve such minimisation by equating to zero the gradient of with respect to the vector of parameters . The resulting value of the parameters enables us to quantify in turn several properties of the stretched molecule. In essence, and give an estimate of the B and S–DNA monomer lengths respectively, , and yield the stiffness of the different portions of the molecule, returns the transition free energy, i.e. the cost of breaking a B–DNA segment into a S–DNA, while is related to the energy penalty of having a B–S boundary. Once more, computation yields a series of correlation functions that may be seen as the propagation of perturbations of the the cavity marginals (see the appendix D.2 and E).

Thus, a convenient method to minimise is to compute its derivatives by means of the cavity equations. Let us express again the cavity and the physical marginals (19) and (20) by and , respectively. We have to evaluate the derivatives of , which are given by

 ∂μiZ(^t,σ)=2e−γBβδσ,B−βfbσ^t⋅^zZc(^t,σ)∂μiZc(^t,σ)−Z(^t,σ)∂μi(γBβδσ,B+βfbσ^t⋅^z). (23)

This equation is the key ingredient of the fit; once its fixed point solution is known, one is able to compute the derivatives of , and by setting them to zero, to minimise it. We thus proceed as before and iterate Eqs. (19) and (23) for the cavity marginals and of their derivatives to fixed point. The fixed point solution is used to evaluate , which is used to minimize the cost function. The optimal fixed point solution is finally plugged into (20), (21) to compute the elongation.

Since the model (17) features several parameters, we choose to divide the fitting procedure into two steps: we use the ssDNA elongation (11) to fit the experimental measurements below 50 pN, where no overstretching occurs. We then fix the values of , , and to fit Eq. (21) for larger stretching forces. Our results are summarised in Table 1 and plotted in Fig. 3.

Let us discuss and interpret the fit results. When fitting Eq. (11) below 50 pN, we find nm, i.e. about 10 base pairs. Such value is interestingly close to the dsDNA helix period (3.3 nm), suggesting that, for B–DNA, one can phenomenologically consider a whole helix twist as a monomer. We also find the Young modulus pN, which is considerably larger than the typical results obtained with the FJC and the WLC Storm and Nelson (2003), implying that our model captures the phenomenology of the stretching up to greater values of the force, without relying on . We finally uncover pN nm, which, plugged into Eq. (2), returns a persistence length of about 43 nm, which is again in good agreement with dsDNA typical values Hagerman (1988); Marko and Siggia (1995); Bouchiat et al. (1999). Fixing those parameters and fitting Eq. (21) for forces larger than 50 pN yields the results for the overstretched DNA. We find the unit length to be 5.63 nm, about 1.75 times bigger than the native one, yielding the known ratio between S-DNA and B-DNA total length. Confronting the and values with , we see that the native strand is much stiffer (and so more correlated as well) than the overstretched portion of the molecule. This finding may help to understand which type of overstretched DNA we are actually fitting: indeed, as recently shown Fu et al. (2010); King et al. (2013), different experimental conditions in terms of temperature and salt concentration may lead to different overstretched DNA configurations, which can either be melted or double stranded. In our case, the ratio suggests we are actually fitting a melted DNA, because this is expected to be about 60–fold more flexible than B–DNA. The value pN nm can be used to evaluate the B–S DNA transition free energy per base pair (bp). Since there are about 10 bp per native monomer , normalising over this factor such energy is about , which is again a reasonable estimate of the cost needed to break a B segment into a S–DNA Cocco et al. (2004). In Fig. 3b we also show the molecular fraction of B-DNA and (overstretched) S-DNA as a function of . We find that, in a very small force range around 70 pN, the molecule goes from an entire B-DNA conformation to a completely overstretched state, implying that the overstretching transition is highly cooperative, in agreement with consensus Smith et al. (1996); Cluzel et al. (1996). The cooperativity is modulated by , whose value is enough to confer such sharp and narrow shape to the B–S transition. We acknowledge that the resulting value is smaller than what expected for the boundary free energy Cluzel et al. (1996); Cocco et al. (2004), however such energy shall be evaluated from the transition width (e.g. as outlined in Ref. Rouzina and Bloomfield (2001) for a heteropolymer), rather than from . Additionally, it must be noted that, in a melting transition as it looks to be our case, the actual cooperativity will ultimately be dominated by the heterogeneity of the dsDNA bps stability Rouzina and Bloomfield (2001). These effects are not captured by the parameter and are beyond the scope of the present model, although they may be experimentally relevant. Finally, in Fig. 3c, we plot the correlation length, Eq. (13), as a function of . We find it again to decrease when the stretching force gets larger, starting from the correct persistence length value nm when .

The results obtained suggest that our model is also able to capture some of the main features of a dsDNA stretching experiment. We are able to fit the experimental data in a quite straightforward manner and to get quick conclusions about the mechanical properties of the molecule. It is perhaps useful to stress that the involved mathematical treatment required by, e.g., the transfer matrix method does not allow one to get to the same neat conclusions. The cavity method seems instead a more natural way to solve the problem, thanks to the local quantities it allows to compute, the useful relation between the derivatives of the cavity equations and correlations, and because it yields an exact solution in the case of a long molecule.

## Iv Conclusions

A paradigmatic issue in modeling polymer stretching is the difficulty in devising a realistic enough yet also mathematically treatable theory. The main models in literature, the FJC and the WLC, both lack some characteristics of real stretched molecules. The variational solution to the KP model, conversely, is an approximation that does not allow the simultaneous evaluation of several observables. In this work we presented an alternative approach based on a KP–like model, which describes the molecule as discrete and stiff at the same time. Our main contribution consists in the full and exact analytical treatment of the problem, through the cavity method, which proves to be very practical to compute different thermal observables and to fit the model to a set of experimental measurements. We were indeed able to comfortably fit two different versions of the model to experimental measurements, one of ssDNA stretching and the other for dsDNA, where the overstretching of the molecule was taken into account.

For ssDNA, we obtained a monomer length of 0.64 nm, a size comparable to the actual interbase distance measured by x–ray diffraction (0.7 nm). We also computed a persistence length of 0.78 nm, in fair agreement with consensus. The dsDNA model native monomer length was seen to be 3.22 nm, which is intriguingly close to the dsDNA helix period. This result suggests that our model captures as a single monomer a full helix twist. We also found the ovestretched monomer length to be 1.75 longer, reproducing the well known scaling between the two different types of DNA. The B-DNA persistence length was evaluated to be 43 nm, again in good agreement with experimental knowledge. Also, we estimated the transition free energy per bp to be of the order of , a value reasonably close to previous estimates. Finally, we saw B-DNA to be far stiffer and to resist more to bending than S-DNA. As recently shown, different overstretched configurations may be obtained depending on the experimental setup: the difference in B– and S–DNA flexibility we uncovered suggests in our case we dealt with a melted overstretched DNA.

The reasonable results we found encourage us to extend further our approach. Besides the obvious applications in the experimental context, we see some natural extensions of our work. A very similar approach may be used for instance to study molecules subject to random transverse forces Benetatos and Terentjev (2011), where more than one force is applied to the polymer, or to investigate block copolymers Blundell and Terentjev (2009). These can be sketched as networks of molecules locally attached together, where one wants to know the force deformation relations when a force is applied. We believe those problems may be profitably studied with a straightforward extension of our approach, thanks to its plain and easy implementation.

###### Acknowledgements.
We would like to thank Felix Ritort, Joan Camunas and Josep Maria Huguet for providing the experimental measurements carried out at the âSmall Biosystems labâ in Barcelona and for discussions. We kindly thank Roger Guimerà for a critical reading of the manuscript and for useful discussions. FAM acknowledges financial support from European Union Grants PIRG-GA-2010-277166 and PIRG-GA-2010-268342. CJPV is supported by the Spanish Mineco through grant FIS2012-38266-C02-02 and by the Generalitat de Catalunya grant 2014-SGR-608.

## Appendix A The Persistence length

We would like here to show how to compute the persistence length, that is, the correlation length at zero force, as a function of the parameters and of the KP model, Eq. (1). As explained in Sec. II, fixes the monomers length and , which has the dimension of an energy, sets the strength of the monomer interactions along the chain. When , model (1) is just a Heisenberg chain in zero external field. In this case it is possible to write down the problem in spherical coordinates and to solve it with the transfer matrix method Takahashi (1999). In particular, the largest eigenvalue of the matrix gives an estimate of the free energy as , while further expansion in eigenvalues allows to evaluate the correlation function at distance as:

 ζ(r)=13(cothβJB−1βJB)r\leavevmode\nobreak . (24)

Assuming that the correlation function decays exponentially, i.e. , we can define the persistence (or correlation) length of the model at zero force as written in Eq. (2):

 ξ0=−bBlogL(βJB)\leavevmode\nobreak ,

where is the Langevin function and is put to fix the length scale.

## Appendix B Small force expansion

As shown at the beginning of Sec. II, the elongation per monomer, , of the model is computed by differentiating once the free energy with respect to the stretching force . Thus, to study the force elongation relation in the small forces regime, one can perform a Taylor expansion of the free energy about and retain the first terms:

 L(f)=−limN→∞1N∞∑k=0fkk!∂k+1fF|f=0≃limN→∞fβb2BNN∑i,j=1⟨(^z⋅^ti)(^z⋅^tj)⟩0C\leavevmode\nobreak , (25)

where the first term vanishes because of symmetry at . Here the subscript stands for a connected correlation, i.e. , while the superscript denotes that averages are performed over an Hamiltonian with zero external force, i.e. the well known zero force KP model Kratky and Porod (1949) or, equivalently, the one dimensional Heisenberg chain in zero field Takahashi (1999). Note that such model describes a random walk on a sphere Ghosh et al. (2012), and can be solved exactly. The sum in Eq. (25) is known to give the Heisenberg susceptibility at , so that finally the elongation at small is equal to what shown in Eq. (5), i.e.:

 L(f)≃βb2Bζ0f,

where

 ζ0:=13JB+JBcoth(βJB)−β−1JB−JBcoth(βJB)+β−1\leavevmode\nobreak . (26)

Formula (5) reproduces the known elastic DNA response to weak stretching Bustamante et al. (1994) and it also correctly captures the FJC limit when . In such case and one has, normalising as :

 ℓFJC(f)≃βbB3f, (27)

which is the well known small force limit of the FJC Flory (1969).

It is also possible to compute the WLC limit of Eq. (5) by performing . To see this, we first note that Eq. (26) may be written in terms of the persistence length , Eq. (2), as:

 ζ0=131+e−bB/ξ01−e−bB/ξ0. (28)

Plugging such formula into Eq. (5) and normalising the elongation, one has

 ℓ≃131+e−bB/ξ01−e−bB/ξ0βbBf, (29)

with once again a normalised elongation ranging from 0 to 1. The limit of Eq. (28) yields , so that when , Eq. (29) reproduces the small WLC elongation Marko and Siggia (1995):

 ℓWLC(f)≃23βξ0f. (30)

## Appendix C Large force expansion

When the stretching force is much larger than , the contribution of the latter becomes negligible and one is allowed to expand in around zero. In such a situation the second term of the Hamiltonian (1), which is basically a FJC model, dominates over the first term. Therefore, we expect the large force extension to reproduce the FJC elongation plus corrections. Note, however, that such reasoning is correct as long as the monomer length is finite. If, conversely, one assumes to be in a large force limit but to have , one should change the sums to integrals, ending up with the known WLC large force limit, which scales as .

Keeping a finite length for , one may compute again the elongation by differentiation of the free energy as:

 L(f)=−limN→∞1N∞∑k=0JkBk!∂f∂kJBF|JB=0≃LFJC(f)+limN→∞βJBbBN∑i,j⟨(^z⋅^ti)(^tj⋅^tj+1)⟩FJCC. (31)

The first term of the expansion equals to the elongation of the FJC model , given by the Langevin function . The second term is a connected correlation in the FJC model, which can be evaluated by expressing the scalar products as

 ^z⋅^ti=cosθi,^tj⋅^tj+1=sinθjsinθj+1cos(ϕj−ϕj+1)+cosθjcosθj+1\leavevmode\nobreak .

In the FJC the terms depending on angle vanish when averaging over the whole period, and one is left with the computation of . This can be separated into a term having and another with the remainder. The latter however cancels out when computing connected correlations, as in the FJC all sites are independent. It only remains to evaluate , which, in the thermodynamic limit and for the polymer’s bulk, may be cast as:

 Missing or unrecognized delimiter for \Biggr (32)

where is the Langevin function and we assumed homogeneity dropping indices and replacing the sum with a factor 2. Placing everything together in formula (31), the large force elongation reads:

 Missing or unrecognized delimiter for \Bigr (33)

Expanding further the Langevin function and the hyperbolic cotangent for large , one finally gets to:

 L≃bB(1−1βbBf+2βJB(βbBf)2)\leavevmode\nobreak . (34)

which predicts saturation when . We see that the model adds a correction to the FJC large–force limit, such a correction incorporates indeed the monomers interaction. However, we remark again that the above limit is correct when the monomer length is finite.

## Appendix D Propagating the instability of the cavity equations

### d.1 Expansion in f

Equation (25) relates the small– elongation to a correlation, which can be also evaluated by direct differentiation in the cavity theory. Similarly to the correlation obtained in (25), differentiation of (11) yields the following relation in the thermodynamic limit:

 limN→∞1NN∑i

The correlation can be expressed in terms of the distance between and

 limN→∞1NN∑i

In the cavity theory, Eq. (10), one has that is produced by differentiating with respect to the cavity marginal :

 1βb2B∂L∂f∣∣f=0=∫Sd^t\leavevmode\nobreak (^z⋅^t)2P(^t)∣∣f=0+1βbB∫Sd^t\leavevmode\nobreak ^z⋅^t\leavevmode\nobreak Q(^t) (37)

with

 Q(^t)=2ZeβfbB^t⋅^z∫Sd^sd^u\leavevmode\nobreak eβJB^t⋅(^u+^u)Pcav(^s)∂fPcav(^u),

evaluated at . Now the first term on the right–hand side of Eq. (37) reproduces the local average appearing in the correlation (36). The remaining terms of relation (36) come from evaluation of , viz

 ∂Pcav∂f(^t)=Pcav(^t)(βbB(^z⋅^t)−1Zcav∂Zcav∂f)+eβtbBf^z⋅^tZcav∫Sd^s\leavevmode\nobreak eβJB^t⋅^s∂Pcav∂f(^s)\leavevmode\nobreak . (38)

Indeed one can write the above expression in a closed form to see the equivalence with the formula (36):

 ∂Pcav∂f(^t0)=βbB∞∑k=0Z−kcav∫d^tk\leavevmode\nobreak T(^t0,^tk