# Base pair opening and bubble transport in a DNA double helix induced by a protein molecule in a viscous medium

###### Abstract

We study the nonlinear dynamics of a protein-DNA molecular system by treating DNA as a set of two coupled linear chains and protein in the form of a single linear chain sliding along the DNA at the physiological temperature in a viscous medium. The nonlinear dynamics of the above molecular system in general is governed by a perturbed nonlinear Schrödinger equation. In the non-viscous limit, the equation reduces to the completely integrable nonlinear Schrödinger (NLS) equation which admits N-soliton solutions. The soliton excitations of the DNA bases make localized base pair opening and travel along the DNA chain in the form of a bubble. This may represent the bubble generated during the transcription process when an RNA-polymerase binds to a promoter site in the DNA double helical chain. The perturbed NLS equation is solved using a perturbation theory by treating the viscous effect due to surrounding as a weak perturbation and the results show that the viscosity of the solvent in the surrounding damps out the amplitude of the soliton.

###### pacs:

87.15.-v, 05.45.Yv, 02.30.Jr## I Introduction

Protein-Deoxyribonucleic acid(DNA) interaction plays an important role in a large number of cellular processes such as gene expression, suppression, replication, transcription, recombination, repair and in several other processes ref1 (). DNA participation in the above processes are mediated or catalyzed by DNA-binding proteins like polymerases, helicases, nucleases, isomerases, ligases and histones. Usually cellular processes start with the binding of a protein (enzyme) to the DNA. For example, the transcription process starts with the binding of RNA-polymerase (an enzyme protein) with a promoter site of the DNA and this binding is known to change the conformation of DNA by opening the bases. The above conformation change or base pair opening stresses the relevance of dynamics in understanding protein-DNA interaction. Experimentally, these conformational changes in DNA during protein-DNA interaction have been studied through atomic force microscopy ref2 (), dynamic spectroscopy ref3 (), X-ray crystallography of protein-DNA co-crystal ref4 (), NMR studies ref5 () and electrophoresis experiments ref6 () as well as through fluorescent measurements ref6a (). Results from free energy calculations combined with molecular dynamics simulations also explain the base flipping through protein-DNA interaction ref7 (); ref8 (); ref9 (); ref10 (). However, from a theoretical point of view, the understanding of protein-DNA interaction and its dynamics is still in its infancy. This is because the individual dynamics of DNA and protein itself is a very complex one. Also, interaction of the DNA molecule with the surrounding viscous medium and thermal fluctuations add to the complexity. The knowledge of the DNA dynamics through soliton-like excitations describe base pair opening or base flipping during DNA functions. In this direction, Englander et al ref11 (), used soliton excitations to explain base pair opening in DNA for the first time. Yomosa ref12 (); ref13 () proposed a plane base rotator model by taking into account the rotational motion of bases in a plane normal to the helical axis, and then Takeno and Homma ref14 (); ref15 () generalized the same and the nonlinear molecular excitations were shown to be governed by kink-antikink solitons. In contrast to the above, Peyrard and Bishop ref16 () and Christiansen and his collegues ref17 () proposed a model by taking into account the transverse and longitudinal motions of bases in DNA to describe the base pair opening through breather. Later, several authorsref18 (); ref20 (); ref21 () including the present authors ref22 (); ref23 () suggested that either kink soliton or breather would be a good candidate to play a basic role in base pair opening in DNA. In all the above studies, the dynamics of DNA molecular system has been studied without taking into account thermal fluctuations and viscosity of the surrounding medium and hence, it was shown that, the soliton representing the base pair opening in DNA may travel for infinite distance and time. Recently, few authors studied the effect of thermal fluctuations cm (); jm1 () and viscosity of the surrounding medium on unzipping and soliton-like base-pair opening and showed that these effects damp the solitons and hence travel only for a limited distance jm2 (); ref19 (); sm1 (); sm2 (); vs1 (). Like DNA dynamics, the dynamics of protein through soliton excitations also plays an important role in describing energy transfer in alpha helix proteins. In this regard, Davydov dvs (); dvs1 () who for the first time used soliton excitations to explain energy transfer in alpha helical proteins by proposing a new model by considering the coupling between the quantum transition occurring due to the vibrational structure of the bond and the elastic longitudinal wave propagation along the chain with the helix behaving like a spring. Later on, the above study was extended by several authors dvs2 (); dvs3 () including one of the present authors dvs4 (); dvs5 () to study the effect of exchange excitation between the chains, temperature, higher order interactions and interspine coupling. In contrast to Davydov’s model, Yomosa toda1 () and very recently Sataric et al toda2 () studied soliton excitations in protein by considering the molecule as a toda lattice chain. The recent results on the statistical mechanics of protein-DNA system through coarse-grain and worm-like chain models describe unzipping of the DNA chain sp (); sp1 (); sp2 (); sp3 (). Normally, protein molecule interacts with DNA either through non-specific interaction (sequence-independent) which is mainly driven by the electrostatic attractive force between positively charged amino acids and negatively charged phosphate groups of the DNA back bone or through specific interactions (sequence-dependent) including hydrogen bonds, van der Waals force and water mediated bonds between the protein molecule and specific site of DNA muru (). In a recent paper, Sataric et al ref24 () studied the impact of regulatory proteins through hydrogen bonds on breather excitations in DNA by considering Davydov model of amide-I vibration for protein dynamics and Peyrard-Bishop’s model of stretching of hydrogen bonds for DNA dynamics and found that binding of protein to DNA gives rise to a large-amplitude breather. Except the above, no author has so far studied the impact of protein on the dynamics of DNA. Hence, in the present paper, we study the conformational changes that take place in the form of base pair opening in DNA through nonlinear excitations induced by a protein molecule through interaction at the physiological temperature in a viscous surrounding medium. The paper is organized as follows. In section II, we present details of the above model and the Hamiltonian for the DNA molecular chain. The model Hamiltonian for protein-DNA interaction is given in section III. In section IV, we derive the equation of motion for the protein-DNA molecular system in the continuum limit and in the next section (section V) base pair opening in DNA as soliton solution of the associated nonlinear dynamical equation in a non-viscous medium is shown. The effect of viscosity of the surrounding medium on the base pair opening in DNA is understood through perturbation analysis in section VI. The results are concluded in section VII.

## Ii Model Hamiltonian for DNA dynamics

We consider a DNA double helix in B-form along with a protein molecule (say RNA-polymerase) at physiological temperature in a surrounding viscous medium and study the impact of protein interaction on the dynamics of DNA. The model we propose here for the study treats DNA as a set of two coupled linear molecular chains and protein as a single linear chain interacting through a linear coupling. A schematic representation of the above protein-DNA molecular system is shown in Fig. 1(a). In the figure and represent the two complementary strands in the DNA double helix. Each arrow represents the direction of the base attached to the strands and the dots between arrows represent the net hydrogen bonding effect between the complementary bases. The shaded ellipse overlapping the DNA double helical structure represents interaction of protein with the DNA molecule. The conformation and stability of DNA double helix is mainly determined by the stacking of bases through intrastrand dipole-dipole interaction and through hydrogen bonds between the complementary bases (interstrand interaction). From a heuristic argument, it was assumed that the hydrogen bonding energy between the complementary bases depends on the distance between them. Generally, the distance between the complementary bases can be expressed through longitudinal, transverse and rotational motion of bases. Among the different motions, the rotational motion of bases is found to contribute more towards the opening of bases pairs. Hence, for our study, we consider a plane-base rotator model for DNA ref12 (); ref14 () which the authors have extensively used for studying pure DNA dynamics in the recent times ref22 (); ref23 (). In order to find the distance between the complementary bases, in Figs. 1(b) and 1(c), horizontal projections of the base pair in the xy and xz-planes are presented respectively. In the figure, and denote the tip of the bases belonging to the complementary strands and at and respectively. Let and represent the angles of rotation of the bases in the base pair in the xz and xy-planes respectively.

By using simple geometry in Figs. 1(b,c), we can write the distance between the tips of bases as ref15 ()

(1) | |||||

where ‘’ is the radius of the circle depicted in Fig. 1(b).

The hydrogen bonding energy can be understood in a more clear and transparent way by introducing quasi-spin operators and . Using the above, Eq.(1) can be rewritten as

(2) | |||||

While writing Eq.(2), we have chosen . It is interesting to note that the form of given in Eq.(2) is the same as the Hamiltonian for a generalized form of the Heisenberg spin model. Therefore, the intrastrand base-base interaction in DNA also can be written using the same consideration. Also, it is reasonable to think that if such a quasi-spin model can be used in this problem, the double strand DNA and the rung-like base pairs can be conceived as a coupled spin chain model or a spin ladder system.

With the above consideration, we are at liberty to use the following Heisenberg model of the Hamiltonian for a coupled spin chain model or spin ladder system with ferromagnetic-type exchange interaction between nearest neighbouring spins in the same lattice (equivalent to stacking of bases in one strand i.e. intrastrand interaction) and ferromagnetic or antiferromagnetic rung-coupling (equivalent to hydrogen bonds between complementary bases i.e. interstrand interaction).

(3) |

Thus, the DNA double helical chain is mapped onto a two coupled spin chain model or a spin ladder system with ferromagnetic legs () and ferromagnetic () or antiferromagnetic () rungs. Therefore, in Hamiltonian (3), the terms proportional to correspond to stacking interaction between the base and its nearest neighbours in the two strands and the last term which is proportional to corresponds to the interstrand interaction or hydrogen bond energy between the complementary bases. In equilibrium, the parameter is expected to be less than zero (corresponding to antiferromagnetic coupling).

As DNA works at the biological temperature, the hydrogen atom attached to the bases are also normally in a thermally excited state. Therefore, it is necessary to generalize the above model into a thermal DNA. Thus, to include the effect of thermal phonons into the system, we add the following Hamiltonian.

(4a) | |||||

(4b) |

where , with overdot representing the time derivative and represents the elastic constant. is the mass of the hydrogen atom attached to the base and represents the displacement of it at the site along the direction of the hydrogen bond. The interaction Hamiltonian given in Eq. (4b) represents the coupling between the vibration of the above hydrogen atom (thermal fluctuation) and the rotation of bases.

## Iii Model Hamiltonian for Protein-DNA interaction

As protein binding to DNA induces large mechanical stress on it, which causes conformational changes, and thus acts as a precursor for all the functions of DNA. Hence, we investigate here the dynamics of a DNA when a protein molecule interacts with it by sliding on the DNA chain. Eventhough, proteins are much larger in size than the DNA(substrate), only a small portion of the protein molecule, namely the active site is directly interacting with the DNA molecule (see Fig.1 (a)). Hence, we consider the short active site region of the protein molecule that interacts with the DNA as a linear chain as has been treated by several other authors (see for e.g. Sataric et al toda2 ()). Thus, as said earlier, we propose the model for the above protein-DNA molecular system by considering DNA as a set of two coupled linear chains and the active site region of the protein molecule as a linear molecular chain interacting with the bases which is schematically shown in Fig. 2. The DNA part of the sketch is the linearized chains of Fig. 1(a) and represent the and sites representing the bases (similar to in Figs. 1(b,c) ). represents the linear protein molecular chain (active site region) interacting with the bases by sliding on the DNA chains. To explain the model for the above system further, we consider the protein molecule as a collection of mass points, with each mass point representing a peptide unit and connected by linear springs exhibiting longitudinal stretching motion parallel to the helical axis of DNA, i.e. along z-direction which couples to the hydrogen bonds of bases in a linear way. Hence, the model Hamiltonians for the longitudinal stretching motion of the protein molecule () and its interaction with the DNA chain () is written as

(5a) | |||||

(5b) |

where , and is the mass of the peptide. denotes displacement of the peptide in the protein chain from its equilibrium position and represents the elastic constant associated with the small amplitude oscillation of the protein molecule.

The interaction Hamiltonian given in Eq.(5b) is chosen to represent the change in hydrogen bonding energy due to change in the distance between the adjacent peptide units along the hydrogen bonding spines of the protein molecule and is the coupling coefficient. Further, as the protein molecule is assumed to slide on the DNA chain, the interaction energy along z-direction is expected to be dominant over the energy in the xy-plane normal to it (see Eq.(5b)). Thus, the total Hamiltonian for our model can be written using Eqs. (3), (4a), (4b), (5a) and (5b) as

(6) | |||||

Before proceeding further, for the sake of completeness, we present the form of the Hamiltonian (6) in terms of angles of rotation of bases as

(7) | |||||

In the case of Heisenberg spin systems, when the spin value is large, the spin dynamics is understood either through a classical approach or under semi-classical approximation by bosonizing the Hamiltonian (see for e.g. Ref. mdlk ()). Also, it should be mentioned that creation and annihilation operators were used to represent the Hamiltonian while studying the transport of charge and hole along short DNA molecules hol1 (); hol2 () and while investigating the nonlinear dynamics of alpha helical protein molecules using the model proposed by Davydov dvs (). Therefore, along the same lines, in order to understand the dynamics of the above protein-DNA molecular system, here also, we bosonize the Hamiltonian (6) using Holstein-Primakoff (H-P) representation hp () for quasi spin operators by writing where . In the low temperature limit, and hence, the H-P transformation can be expanded in a power series in terms of the parameter as

(8a) | |||||

(8b) |

and similar expansions for and in terms of . Here and represent creation and annihilation operators of the bases and satisfy the usual commutation relations, Substituting Eqs. (8a) and (8b) in Eq.(6) , we have the Hamiltonian upto O as

(9) | |||||

## Iv The dynamical equations

Having written down the Hamiltonian in the semi-classical description, the dynamics of the protein-DNA molecular system can be understood by constructing the equations of motion as

(10) | |||||

and similar one for . The equations of motion for and are written using the Hamilton’s equations of motion and . The explicit form of the equations of motion can be derived by substituting Hamiltonian (9) in the above equations of motion for and . Thus, we get

(11a) | |||||

(11b) | |||||

(11c) | |||||

(11d) | |||||

While writing the above equations (11a-11d), we have rescaled the time variable and redefined and . In order to represent the large amplitude collective modes by coherent states, we introduce Glauber’s coherent state representation glab () for boson operators and with and where and are the coherent amplitudes of the operators and for the system in the states and respectively. Further, as the length of the DNA and the protein chains are very large compared to the lattice parameter, we make a continuum approximation by introducing the new fields and , where with the expansions and similar ones for and . Under the above approximations, the equations of motion (11a), (11b), (11c) and (11d) after rescaling and redefining and upto can be written as

(12a) | |||||

(12b) | |||||

(12c) | |||||

(12d) |

In Eqs. (12a-12d), the suffices and represent partial derivatives with respect to time and the spatial variable respectively. On subtracting Eq. (12a) from (12b) and by choosing , Eqs. 12 (a-d) can be written as

(13a) | |||

(13b) | |||

(13c) |

It may be mentioned that, addition of Eqs. (12a) and (12b) satisfy identically. The term proportional to in Eq. (13a) can be transformed away using the transformation and the result reads (after dropping the hat)

(14) |

The set of coupled equations (13b,c) and (14) describe the dynamics of our protein-DNA molecular system at the biological temperature, when the protein molecule binds to the DNA double helical chain through linear harmonic coupling. The dynamics is found to be governed by the excitation of DNA bases and thermal vibration of the hydrogen atoms attached to the bases combined with the longitudinal motion of peptide units of the binding protein. In particular, we are concerned with the nonlinear excitation of bases induced by protein and thermal fluctuations, in which a cluster of DNA bases may undergo a large excursion as compared to the rest of the bases. It may be noted that when , Eqs. (14), (13b) and (13c) are decoupled and reduced to a set of linear equations. Thus, when the protein molecule is detached from the DNA chain (), the dynamics is governed by the following well known set of linear equations.

(15a) | |||||

(15b) |

While Eq. (15a) is the time-dependent Schrödinger equation for a free particle, Eqs.(15b) are the homogeneous linear wave equations. Eq. (15a) admits plane transverse wave solution of the form with the dispersion relation where is the constant amplitude. On the other hand, Eq. (15b) admits linear non-dispersive wave solutions and where and are arbitrary functions and and , represent the constant phase velocities of the wave. When the protein molecule started interacting with the DNA molecular chain at the physiological temperature, i.e when and , the excitation energy of the protein-DNA molecular system at the physiological temperature (thermal fluctuation) increases, and nonlinearity started playing its role. Thus, the set of full coupled nonlinear equations become important and it is essential that Eqs. (13b,c) and (14) should be solved in their full form to understand the underlying nonlinear dynamics.

## V Soliton, Base Pair Opening and Bubble Transport

In order to solve the set of coupled equations (14), (13b) and (13c), in their full form we differentiate Eqs. (13b) and (13c) with respect to once and define and , so that Eqs. (14), (13b) and (13c) are written as

(16a) | |||

(16b) | |||

(16c) |

Now, we rewrite Eqs. (16b) and (16c) by introducing the wave variable and writing .

(17a) | |||

(17b) |

where and . On integrating Eqs. (17a) and (17b) with respect to twice and assuming both the integration constants to be zero, we get and , which upon using in Eq. (16a) gives

(18) |

While writing Eq. (18), we have made use of the transformation .

Eq. (18) is the well known completely integrable nonlinear Schrödinger (NLS) equation which has been solved for N-soliton solutions using the inverse scattering transform (IST) method ref25 (). For instance, the one soliton solution is written as

(19) |

where and are four real parameters which determine the propagating amplitude, velocity, initial position and initial phase of the soliton. The solitons in the protein-DNA molecular system with thermal fluctuation are formed as a result of the dynamical balance between the dispersion due to interaction of intrastrand (stacking) dipole vibrations in each strand of the DNA with the nonlinearity provided by the interaction between the hydrogen bonds in DNA and the local displacement of the peptide groups in the protein molecule and the thermal phonons. The longitudinal waves that arise in the protein molecule and hydrogen atoms in DNA in turn provide a potential well that prevents dispersion of the rotational energy of the bases in DNA. Thus, the propagation of rotation of bases in DNA is coupled to the longitudinal waves of protein and the coupled excitations propagate as a localized and dynamically self-sufficient entity called solitons which travel along each strand of the DNA chain. In Fig. 3(a), we have plotted the square of the absolute value of the one soliton solution i.e. as given in Eq. (19). In Fig. 3(b), we present a schematic representation of the coherent base excitations in DNA in terms of rotation of bases induced by the protein molecule in the form of solitons propagating along the two strands which collectively form a travelling bubble created by energy delocalization due to nonlinear effects. Thus, the soliton solution describes an open state configuration in the individual strands of the DNA double helix which collectively represent a bubble. In the figure, the shaded ellipse represents the region of interaction of the protein molecule with DNA where the bubble is formed. Thus, the protein molecule acts as a zip-runner in opening the bases in DNA chain during the process of transcription. Similar results have been observed experimentally by Ha et alref26 (), on the winding and unwinding of E-coli Rep helicase-DNA complex. Further, our results on bubble propagation in DNA due to protein interaction is in accordance with the experimental data on the binding of RNA-polymerase to promoter ref27 (); ref28 (); ref29 (); ref30 (); ref31 (). From the expression for the soliton namely , it is noted that the amplitude of the soliton depends on the coupling of the DNA excitations to the thermal phonons () and to the molecular vibrations of the protein () which were also respossible for nonlinearity in the soliton equation. For large coupling, it is expected that the amplitude of the soliton decreases. Recently Campa cm () also showed through simulation studies that, in the case of large thermal coupling, the bubble travels only for a short distance with decreasing amplitude.

## Vi Effect of Viscosity

In a more realistic description of the dynamics of protein-DNA system, it is important to consider the effect of the surrounding medium or environment. Effectively, the interaction of DNA with the surrounding medium reduces to viscous damping effect. It is known that, in the case of a protein-DNA system the solvating water acts as a viscous medium that makes the nucleotide oscillations to damp out mp1 (). The effect of viscous force exerted on the DNA chain can be taken into account by adding a term of the form to the right-hand side of the completely integrable nonlinear Schrödinger equation (19) which now becomes a damped nonlinear Schrödinger equation as given below.

(20) |

As the viscosity of water is temperature dependent, from a simple fluid mechanics argument, one can estimate the magnitude of the damping coefficient as very small at the physiological temperature sm1 (). Hence, we treat the term proportional to in Eq. (20) as a weak perturbation. When Eq. (20) reduces to the completely integrable NLS equation as given in Eq. (18) and the one soliton solution is given in Eq. (19) which can be rewritten for convenience in the form

(21) |

where and .

We
carry out a perturbation analysis mdjb () to understand the impact of the viscous force by introducing a slow time variable and treat the quantities and as functions of this time scale and hence the envelope soliton solution (21) is written as

(22) |

Under the above assumption of quasi-stationarity, Eq. (20) reads

(23) |

where

(24) |

We assume Poincaré-type asymptotic expansion for as and further restrict ourselves to calculation of order such that , where . Further, we assume that , where and are real. On substituting the above, in Eqs. (23) and (24), we obtain

(25a) | |||

(25b) |

where

(26a) | |||||

(26b) |

In Eqs. (25a) and (25b), and are self-adjoint operators. It may be checked that, the solutions of the homogeneous part of Eqs. (25a) and (25b) are and and hence we have the following secularity conditions.

(27a) | |||

(27b) |

On evaluating the above integrals after substituting the values of and , we obtain , which can be written in terms of the original time variable after integrating once as

(28) |

where and are the initial amplitude and velocity of the soliton. The first of Eq. (28) says that when the protein-DNA system interacts with the surrounding viscous medium the velocity of the soliton remains constant. However, the second of Eq. (28) says that, the amplitude of the soliton decreases as soliton propagates. In other words, the viscous nature of the solvent medium damps out the soliton exponentially and hence the soliton is expected to travel only a short distance.

Now, we construct the perturbed soliton by solving Eqs. (25a) and (25b) . For that, first we solve the homogeneous part of Eq. (25a), which admits the following two particular solutions.

(29a) | |||||

(29b) | |||||

The general solution can be found out by using the following expression.

(30) | |||||

Here and are arbitrary constants. We construct by substituting the values of and given in Eqs. (29a), (29b) and (26a) in Eq. (30) and after evaluating the integrals, we get

(31) | |||||

The last term in Eq.(31) is a secular term that leads to a solution which is unbounded and hence, it is removed by choosing the arbitrary constant . Further, by applying the boundary conditions constant = and , we obtain and . Using the above results in Eq. (31), the general solution is written as

(32) |

Next, we solve Eq.(25b), the homogeneous part of which admits the following particular solutions.

(33a) | |||||

(33b) |

Knowing two particular solutions, the general solution of Eq.(25b) can be found from

(34) | |||||

where and are arbitrary constants. We construct the explicit form of by substituting the values of and given in Eqs. (33a), (33b) and (26b) and evaluating the integrals.

(35) | |||||

The above solution for contains secular term that is a term proportional to which can be removed by choosing