Fast and Flexible Geometric Method For Enhancing MC Sampling of Compact Configurations For Protein Docking Problem
Abstract
EASAL (efficient atlasing and sampling of assembly landscapes) is a recently reported geometric method for representing, visualizing, sampling and computing integrals over the potential energy landscape tailored for small molecular assemblies. EASAL’s efficiency arises from the fact that small assembly landscapes permit the use of socalled Cayley parameters (interatomic distances) for geometric representation and sampling of the assembly configuration space regions; this results in their isolation, convexification, customized sampling and systematic traversal using a comprehensive topological roadmap.
By sampling the assembly landscape of 2 TransMembrane Helices, with shortrange pairpotentials, this paper demonstrates that EASAL provides reasonable coverage of crucial but narrow regions of low effective dimension with much fewer samples and computational resources than traditional MonteCarlo or Molecular Dynamics based sampling. Promising avenues are discussed, for combining the complementary advantages of the two methods.
Additionally, since accurate computation of configurational entropy and other integrals is required for estimation of both free energy and kinetics, it is essential to obtain uniform sampling in appropriate cartesian or moduli space parameterization. EASAL’s flexibility is demonstrated with a variety of sampling distributions, from Cayley sampling skewed towards lower energy regions, to uniform Cartesian sampling at the two ends of the spectrum.
University of Florida] CISE department, University of Florida, Gainesville, FL 326116120 Carnegie Mellon University] Chemistry Department, Carnegie Mellon University, Pittsburgh, PA 15213 University of Florida] CISE department, University of Florida, Gainesville, FL 326116120 \phone+1 352 392 1200 \fax+1 352 392 1220 Carnegie Mellon University] Chemistry Department, Carnegie Mellon University, Pittsburgh, PA 15213
1 Introduction
The problem of proteinprotein assembly is an area of active research and development. [Janin(2010), Ritchie(2008), Vajda and Kozakov(2009), FernándezRecio and Sternberg(2010)]. Currently the most successful approach to docking two proteins together uses a direct exhaustive search of the whole configurational space. It is usually performed in the inverse space for translational moves on a cubic grid (using the fast Fourier Transform (FFT)). The FFT algorithm makes the translation search very efficient, but it has to be repeated for all orientations of a molecule being docked resulting in thousands of FFT operations, each comprising millions of translations. The majority of the available software for molecular docking can only deal with two proteins (i.e. a dimer). A set of docking procedures that is based on the shape recognition and image segmentation techniques of Computer Vision. PatchDock algorithm starts with a smooth representation of the molecular surface as a set of discrete points, but the set is restricted to critical points (convex caps, toroidal belts and concave pits), and the normal vectors at these points. A geometric hashing algorithm performs a very fast matching of the caps and pits with opposing normal directions on two surfaces, and collects all the rigidbody solutions that are geometrically acceptable after rejecting volume overlaps. Other geometric criteria can easily be incorporated in the procedure, for instance molecular symmetry in SymmDock, which allows building models of oligomeric proteins with up to twenty subunits.
It is more common that problem of computational protein assembly and folding is typically approached by the third class of methods: molecular dynamics (MD) or MonteCarlo (MC) simulations. Both these techniques sample the system’s configurational space with probabilities corresponding to the Boltzmann distribution. Ideally, such simulation can produce a probability density function for the whole phase space of the system. An absolute free energy and relative probabilities of various states in the phase space can then be estimated. However, in practice, systems of interest are rarely ergodic, in a sense that their energy landscapes consist of an unknown number of energy minima separated by large energy barriers, moreover, in tightly packed molecular systems, the majority of phase space has high energy and low probability. In such conditions, most sampling procedures have tendency to oversample local basins of the energy function, and have difficulty crossover between low energy basins. This results in uncertainty in both, i) relative probabilities of visited states, as well as ii) in whether the range of low energy configurations visited during simulations is ever complete. Despite recent progress all currently existing methods of protein assembly are extremely computationally expensive.
To overcome the problem of incomplete sampling of relevant phase space when modeling protein assembly we apply a recently introduced approach called EASAL (Efficient Atlasing of Assembly Landscapes), for representing, visualizing, sampling and computing integrals over the potential energy landscape tailored for small molecular assemblies. EASAL’s efficiency arises from the fact that small assembly landscapes permit the use of socalled Cayley parameters (interatomic distances) for geometric representation and sampling of constant potential energy regions of the assembly configuration space. This results in the isolation, convexification, customized sampling and systematic traversal of regions using a comprehensive topological roadmap, providing reasonable coverage of low potential energy, but narrow regions of low effective dimension, with surprisingly few samples.
Additionally, since accurate computation of configurational entropy and other integrals is required for estimation of both free energy and kinetics, it is essential to obtain uniform sampling in appropriate cartesian or moduli space parameterization. EASAL’s flexibility permits a variety of sampling distributions, from Cayley sampling skewed towards lower energy regions, to uniform Cartesian sampling at the two ends of the spectrum. The theory and algorithms behind EASAL appears in [Ozkan and Sitharam(2011), Ozkan et al.(2014)Ozkan, Wu, Peters, and Sitharam, Ozkan and Sitharam(2014)] and is sketched in the next section. Preliminary extensions of EASAL beyond dimer assemblies and viability of using EASAL for atlasing wide variety of assembly systems including clusters of up to 12 assembling spherical particles is demonstrated in [Ozkan et al.(2014)Ozkan, Wu, Peters, and Sitharam]. The software implementation of EASAL (architecture and functionalities) is reported in [Ozkan et al.(2014)Ozkan, Pence, Wu, Baker, Willoughbyand, Peters, and Sitharam]; the software has recently been employed to predict crucial intermonomeric interface interactions for viral capsid assembly [Wu et al.(2012)Wu, Ozkan, Bennett, AgbandjeMckenna, and Sitharam, Wu et al.(2014)Wu, Ozkan, Bennett, AgbandjeMcKenna, and Sitharam].
By sampling the assembly landscape of 2 TransMembrane Helices, with shortrange pairpotentials, our result demonstrates that EASAL provides reasonable coverage of crucial but narrow regions of low effective dimension with much fewer samples and computational resources than traditional MonteCarlo or Molecular Dynamics based sampling. Promising avenues are discussed, for combining the complementary advantages of the two methods.
2 Materials and Methods
2.1 Metropolis Monte Carlo method.
Metropolis Monte Carlo (MC) is an importance sampling method that generates an ensemble according to the Boltzmann factor. MC simulations were performed in order to explore the conformational space that is accessible by translational and rotational random steps of rigid helixes. In all simulations reported protein transmembrane alphahelixes were held rigid. The rigid body MC simulator was implemented in HARLEM program.
2.2 Move Sets.
Trial conformations of rigid bodies are generated by a basic move set of a small translational and rotational displacement. The maximum step size for both type of displacements are to follow a well known criterion of the MC acceptance ratio. This criterion establishes that for an optimum sampling the acceptance ratio should fluctuate around 50 % of trial move should be accepted. However, from our experience this high ratio of acceptance will generate random structures with small conformational fluctuations. In order to overcome high energy barriers we implemented a move set based on the exponential distribution of the maximum step size of translation and rotation. The purpose of this move set is to randomly generate trial conformations with higher probability to jump over the energy barrier.
Moreover, analyzing MC trajectories we found that low energy conformations of different energy basins are conformational different by the rotation around the principal axis of a rigid helix. Using this information we included in our MC implementation a move set in which a helix is randomly rotated around its principal axis.
Description of the level of representation, energy terms.
2.3 Scoring Energy
The intermolecular energy of a structure model in this work is calculated as
(1) 
Where is a pairwise distancedependent potential of mean force of interaction between residues and , is the steric overlap energy, is the energy term that constrains the TM helixes in the membrane plane and prevents the helix mass center to sample farther than a radius of 15 Å. accounts for the interaction of amino acids in different regions in the lipid bilayer. A description of each of these terms follows:
2.4 Background: Theory underlying EASAL
This subsection gives background from [Ozkan and Sitharam(2011), Ozkan et al.(2014)Ozkan, Wu, Peters, and Sitharam] for the theoretical underpinnings of EASAL’s key features  geometrization, stratification and convexification using Cayley parameters  culminating in the concept of an atlas of an assembly configuration space. We begin with a description of the input to EASAL. An assembly system consisting of the following.

A collection of rigid molecular components, drawn from a small set of rigid component types (often just a single type). Each type is a is specified as the set of positions of atomcenters, in a local coordinate system. In many cases, an atomcenter could be the representation for the average position of a collection of atoms in a residue. Note that an assembly configuration is given by the positions and orientations of the entire set of rigid molecular components in an assembly system, relative to one fixed component. Since each rigid molecular component has 6 degrees of freedom, a configuration is a point in dimensional Euclidean space. The maximum number of atomcenters in any rigid molecular component is denoted .

The potential energy is specified using LennardJones (which includes HardSphere) pairwise potential energy functions. The pairwise LennardJones term for a pair of atoms, and , one from each component, is given as a function of the distance between and ; The function is typically discretized to take different constant values on 3 intervals of the distance value : Typically, is the socalled Van der Waal or steric distance given by ”forbidden” regions around atoms and And is a distance where the interaction between the two atoms is no longer relevant. Over these 3 intervals respectively, the LennardJones potential assumes a very high value , a small value , and a medium value All of these bounds for the intervals for , as well as the values for the LennardJones potential on these intervals are specified constants as part of the input to the assembly model. These constants are specified for each pair of atoms and , i.e., the subscripts are necessary. The middle interval is called the well. In the special case of Hard Spheres, .

A nonpairwise component of the potential energy function in the form of global potential energy terms that capture other factors including the implicit solvent (water or lipid bilayer membrane) effect [Lazaridis and Karplus(1999), Lazaridis(2003), Im et al.(2003)Im, Feig, and Brooks]. These are specified as a function of the entire assembly configuration.
It is important to note that all the above potential energy terms are functions of the assembly configuration.
Note that the input to the assembly usually specifies the configurations of interest i.e., a region of the configuration space, often specified as a collection of atom pairs ”of interest” with the understanding that the only configurations of interest are those in which at least one of these pairs in occupy their corresponding LennardJones well. Clearly . In addition, we assume the desired level of refinement of sampling is specified as a desired number of sample configurations .
Geometrization
Observe that for the purposes of this paper stated in Section 1, it is sufficient to view the assembly landscape as a union of constant potential energy regions. Thus an assembly system can alternatively be represented as a set of rigid molecular components drawn from a small set of types, together with assembly constraints, in the form of distance intervals. These constraints define feasible configurations (where the pairwise interatoms distances are larger than , and any relevant tether and implicit solvent constraints are satisfied). The set of feasible configurations is called the assembly configuration space. The active constraints of a configuration are those atompairs in the configuration that lie in the LennardJones well. An active constraint region of the configuration space is a region consisting of all configurations where a specified (nonempty) set of constraints is active, i.e, those LennardJones interatom distances between atoms and lie in their corresponding wells, i.e, the interval .
Stratification, Active Constraint Graphs
Consider an assembly configuration space of rigid components, defined by a system of assembly constraints. The configuration space has dimension , the number of internal degrees of freedom of the configurations since a rigid object in Euclidean 3space has rotational and translational degrees of freedom. For , this dimension is at most and in the presence of two active constraints, it is at most .
A ThomWhitney stratification of the configuration space (see Fig. 3) is a partition of the space into regions grouped into strata of that form a filtration , . Each is a union of nonempty closed active constraint regions where the set of pairwise constraints are active, meaning each pair in lies in its corresponding LennardJones well, and the constraints are independent (i.e., no proper subset of these constraints generically implies any other constraint in the set). Each active constraint set is itself part of at least one, and possibly many, hence indexed, nested chains of the form . See Figures 4 and 3(left). These induce corresponding reverse nested chains of active constraint regions : Note that here for all , is closed and effectively dimensional; by which we mean that if all the LennardJones wells that define the active constraint set narrowed to zero width (i.e, if they degenerated to a HardSphere potentials), then the active constraint region would be dimensional.
We represent the active constraint system for a region, by an active constraint graph (sometimes called contact graph) whose vertices represent the participating atoms (at least in each rigid component) and edges representing the active constraints between them. Between a pair of rigid components, there are only a small number of possible active constraint graph isomorphism types since there are at most contact vertices. For the case of these are listed in Figure 5, and for higher a partial list appears in Figure 6.
There could be regions of the stratification of dimension whose number of active constraints exceeds , i.e. the active constraint system is overconstrained, or whose active constraints are not all independent. Dependent constraints diminish the set of realizations. For entropy calculations, these regions should be tracked explicitly, but in the present paper, we do not consider these overconstrained regions in the stratification. Our regions are obtained by choosing any independent active constraints.
Convex Representation of Active Constraint Region and Atlas
A new theory of Convex Cayley Configuration Spaces (CCCS) recently developed by the author [Sitharam and H.Gao(2010)] gives a clean characterization of active constraint graphs whose configuration spaces are convex when represented by a specific choice of socalled Cayley parameters i.e., distance parameters between pairs of atoms (vertices in the active constraint graph) that are inactive in the given active constraint region (nonedges in the active constraint graph). See Figure 14. Such active constraint regions are said to be convexifiable, and the corresponding Cayley parameters are said to be its convexifying parameters. See Figures 9 9
In general, the active constraint regions for an active constraint graph , can be entirely convexified after ignoring the remainder of the assembly constraint system, namely the atom markers not in and their constraints. Fig. 14 The true active constraint region is subset of , however the cut out regions are also defined by active constraints, hence they, too, could be convexified. See Figures 9, 9.
When a constraint (edge ) not in becomes active (at a configuration in ), defines a child active constraint region containing . This new region belongs to the stratum of the assembly configuration space that is of one lower dimension (Definition 2.4.2) and defines within a boundary of the smaller, true active constraint region . We can still choose the chart of as tight convex chart for , but now region has an exact or tight convex chart of its own. Then the configurations in the region have lower potential energy since the configurations in that region lie in one more LennardJones well. Hence they should be carefully sampled in free energy and entropy computations although the region has one lower effective dimension (e.g, represents a much narrower boundary channel). However, sampling in the larger parent chart of (of one higher effective dimension) often does not provide adequate coverage of the narrow boundary region . For example, Fig. 14 shows that providing a separate chart for each active constraint region can reveal additional realizations at the same level of sampling.
The Atlas of an assembly configuration space is a stratification of the configuration space into convexifiable regions. In [Ozkan and Sitharam(2011)], we have shown that molecular assembly configuration spaces with 2 rigid molecular components have an atlas. The software EASAL (Efficient Atlasing and Search of Assembly Landscapes) efficiently finds the stratification, incorporates provably efficient algorithms to choose the Cayley parameters [Sitharam and H.Gao(2010)] that convexify an active constraint region, efficiently computes bounds for the parametrized convex regions [Chittamuru(2010)], and converts the parametrized configurations into standard cartesian configurations [Peters et al.(2004)Peters, Fan, Sitharam, and Zhou].
EASAL variants and Sampling Distributions
EASAL’s flexibility is demonstrated with a variety of sampling distributions in Cayley space, which translates to sampling variants in Cartesian space: skewed towards lower energy regions at one end of the spectrum, towards higher energy regions at the other end of the spectrum and explicit adjustment to uniform Cartesian sampling at the middle. Since the basic EASAL (uniform sampling in Cayley) is already skewed towards lower energy regions in Cartesian, we would expect the second option to approximate the uniform Cartesian without explicit adjustment.
Accordingly, we have tested the following variants of the basic EASAL. EASAL3 is designed to have more samples closer to the boundaries of all active constraint regions; i.e, it uses step size linearly proportional with the Cayley parameter value. EASAL2 is designed to have more samples in the interiors; i.e, it uses step size inversely proportional with the Cayley parameter value. EASALJacobian uses a sophisticated Cayley sampling method [Ozkan and Sitharam(2014)] to force uniform sampling in Cartesian. It recursively and adaptively computes the next Cayley step size and direction using an interative computation of the Jacobian of the CartesianCayley map to achieve this goal. See Figure 23.
3 Experimental Results
We first describe the experimental setup.
We use reduced system with only 11 residues per helix represented by 20 atoms to reduce the size of the sampling space. See 24 for reduced TM helix.
3.1 Grid Generation

The Grid is uniform along the Cartesian configuration space.

The bounds of the Cartesian configuration space for both Grid and EASAL are:
20 to 20 Angstroms
3.5 to 3.5 Angstroms 
The angle parameters are described in Euler angles representation (Cardan angle ZXZ).
to 
Inter principalaxis angle degrees where where and are the principal axis of each rigid body. I.e. and are eigenvectors of the inertia matrix.

Additionally, there is the pairwise distance lower bound criterion:
For all atom pairs belonging to different rigid molecular components, where and are residues, is the distance for residues and , and are the radius of residue atoms and . 
Million Grid configurations are generated in this manner.

Over of them are discarded to ensure at least one pair , i.e, an active constraint and to eliminate collisions. About Million Grid configurations remain.
3.2 Epsilon Coverage
Ideally, we would expect each Grid point to be covered by at least one EASAL sample point that is situated in an cube centered around a Grid point with a range of in each of the 6 dimensions.

The value of is computed as follows: = ( of Grid points / # of Easal points)

We set to be since Grid points are by definition a discrete number of steps from each other.

In order to compute the coverage, we assign each EASAL/MC sample to its closest Grid point. Call those Grid points EASAL/MCmapped Grid points. We say that a Grid point is covered by EASAL/MC if there is at least one EASAL/MC mapped Grid point within the cube centered around
3.3 MultiGrid
All variants of EASAL are designed to isolate and sample each active constraint region with a variety of sampling distributions in Cayley space. In addition, EASALJacobian samples each such region uniformly in Cartesian. Yet, when we combine all such regions, those regions where more pairs of atoms are in their LennardJones wells (regions with more active constraints) will have denser sampling. i.e. EASAL tends to oversample the lower energy regions. This is a positive feature of EASAL that we preserve in EASALJacobian.
Since the 5D strata of the atlas generated by both versions of EASAL would sample a configuration that has active constraints times (once for each of the 5D active constraint regions in which the configuration lies), the meaningful comparison would require similarly replicating such configurations in the Grid, which we call the MultiGrid.
3.4 Computational Time/Resources
MC has around 100 million configurations. MultiGrid/Grid have around 12/6 million configurations.
EASAL1/2/3 have around 100K/40K/30K configurations respectively.
EASAL_Jacobian has around 1 million configurations.
Since EASAL is light in terms of resources and can generate
reasonable atlases with very few samples,
any undersampling caused by that should be taken
into account during comparisons.
The specification of the processor for MC is I52540 and for EASAL is Intel Core 2 Quad CPU Q9450 @ 2.66GHz x 4 with Memory:3.9 GiB.
EASAL1 took 3 hours 8 minutes(188 minutes).
EASAL2 took 4 hours 24 minutes(264 minutes).
EASAL3 took 10 hours 20 minutes(620 minutes).
EASALJacobian took 14 hours 22 minutes(862 minutes).
Now we are ready to show the results.
3.5 Coverage Results
sampling method  EASAL1  EASAL2  EASAL3  EASAL_Jacobian  MC 

coverage percentange 
EASAL1 and EASAL2 have more than coverage (See 1). EASAL has a dense coverage after we expand cube range to be as seen by fig. 35. Hence EASAL is able to have almost full coverage for where is in the range (0, 1).
We have located those uncovered regions on projection for cube by Fig. 43. They are populated around the outer circle where two molecules are far from each other that cause increase in in Equation 1. Hence we are assuming them not to be low energy regions. Hence they are most probably the regions out of interest.
Yet, EASAL2 has very similar coverage as MC, even though it has very less number of configurations. In a short time with few samples, EASAL2 covers most of the Grid.
3.6 Comparison of MC  Grid  EASAL
Next we look at projection on and where low energy regions are located. For the following plots, we have divided Grid into larger regions i.e., ( by ) instead of original ( by ) subregions. I.e. a region with value of in has index , in has index and so on. The and axes of following plots denote the ’indices’ for these and ranges.
Note. During any ratio computations, to avoid division by zero, we compute ratios only in the ’accepted’ region where there is guaranteed to be at least one Grid point.
For the following plots, there are no samples on the outer rim since Evol prevents the helix mass center to sample further than a radius of 15 Å. There are also no samples in the center of the plots because in these regions the helix mass center of the molecules are so close to each other hence forces pairwise distances between residues that causes elimination by steric constraints. The feasible regions that are close to center are boundary regions that possibly satisfy more contacts. Each additional active constraint (contact) essentially does drop the energy hence those boundary regions can be considered as lower energy regions. Hence those lower dimensional regions do have to be accurately sampled.
MultiGrid, which is effectively a weighted version of Grid according to number of contacts, shed light on those regions that satisfy more contacts. As seen by Fig. 51 a) and c) MultiGrid has denser sampling insider regions where contacts happens to be more (low energy regions) than Grid. EASAL is similar to MultiGrid in that sense. Whereas, MC is very similar to Grid as seen by Fig. 51 a) and b). MC has denser sampling at outer surrounding regions compared to MultiGrid see Fig. 57 a).
Hence, EASAL2 represents low energy regions better than MC since the regions with low energy basins have more probability to be sampled by EASAL thanks to higher refinement at the lower dimensions.
3.7 Distribution over Atlas Nodes
3.8 Coverage Results for Lower Dimensional Regions
We would like to see the lower dimesional sampling density through all these methods in order to see how they are good at covering the lower energy regions. Assume we are experimenting on a low dimensional region with i.e. contacts. This low dimensional region is populated through d regions. Then the ratio of number of samples of dimensional region over total number of samples in all d regions that lead to dimensional region of interest would give us how often this dimensional region is sampled.
For the experimental purpose, we have picked a random dimensional region with contacts , , where a contact represented by atom indices from helices.
The following 2 shows the results of the frequency of sampling for the chosen lower dimensional region.
sampling method  EASAL1  EASAL2  EASAL3  EASAL_Jacobian  MC  Grid  MultiGrid 

ratio percetange 
For all EASAL versions, the denominator is the total number of samples of three d nodes each labeled by one of contacts. The numerator is the sum of d samples per each d node.
For Grid and MC, the denominator is the total number of samples that satisfies any contacts, the numerator is the number of samples that satisfies all contacts. To compute that number, we have read all samples from provided trajectories and then identify the samples that satisfies the specific set of contacts.
For MultiGrid, the denominator and numerator are same as Grid however each sample that satisfy any contacts replicated that many times.
Among all those methods, EASAL2 covers more lower dimensional regions even though it has very low number of samples compared to MC/Grid/MultiGrid.
The MultiGrid also have a good coverage ratio, however to compute MultiGrid ratio, the number of samples in numerator is multiplied exactly times since they all satisfy all contacts i.e. each sample that contributes to numerator replicated times. Since it is not producing different samples hence not finding different lower dimensional regions, it is not considered exactly as sampling method. We keep MultiGrid for meaningful comparison purpose.
Besides, when the full EASAL is run, then a dimensional ( contact) sample will potentially appear in dimensional regions and should be appropriately weighted.
For the case of , d samples will be generated by d nodes, d nodes and d node, regions in total. The current ratios for EASAL in 2 considers only d regions. Hence when the full EASAL is run the ratio will increase and be multiplied times.
4 Discussion
By using a large number of samples, we can sample the grid by brute force. Over 90% are typically discarded since they are high energy configurations (no single active constraint). Also, low energy (more than 1 active constraint) has almost no chance of being sampled.
In the case of MC, the simulation has tendency to sample around a localized subset of low energy configurations that are located close by. To ensure coverage, one needs to compensate for the lack of ergodicity by computationally intensive use of a large number of samples.
EASAL has flexibility in sampling distributions, but in all cases guarantees reasonable coverage of low energy regions even if they have low effective dimension. It tends to oversample these lower energy regions (i.e, regions where more pairs of atoms are in their LeonardJones well). Hence EASAL can help MMCS to find high energy barriers and force the MC simulations to move to the locations of low energy basins. Besides, Atlas regions with lower energy values where MC did not sample can be used as a seed for MC in order to generate the free energy profile for the whole configuration space.
More precisely, we expect EASAL to be used to help evaluate/improve MC with the following path.

Run a coarsegrained version of MC with the usual energy function and a relatively small number of samples

Run EASAL with constraints extracted from the MC energy function, verify that EASAL space encompasses the relevant space

List description of Atlas regions where MC did not sample, and give an estimate of the volume of the regions.

Compute energy on these regions, and if low, then run MC seeded on these (lower volume) regions (from which larger volume regions can be reached), in order to generate the free energy profile for the whole configuration space.

Compare new trajectories with old MC trajectories.
5 Conclusion
Our results in sampling the assembly landscape of 2 TransMembrane Helices, with shortrange pairpotentials, demonstrate that variants of EASAL provide good coverage of narrow regions of low potential energy and low effective dimension with much fewer samples and computational resources than traditional MonteCarlo or Molecular Dynamics based sampling. Combining the complementary advantages of the two methods can significantly improve the current state of the art of free energy and other integral computations specifically for small assemblies.
EASAL is tailored for such assemblies and can be used to improve accuracy and ergodicity guarantees for Monte Carlo trajectories. It can also help explain the behavior of MC trajectories by using EASAL to infer geometric and topological features of the configuration space.
EASAL can additionally be used to evaluate and complement MC’s performance. EASAL could help MMCS to find high energy barriers and force the simulations to move to the locations of low energy basins. Additionally, lower energy regions located by EASAL can be used as seeds for MC trajectories, to speed up traversal of the entire configuration space. To validate these hypotheses, an MC simulation enhanced with EASAL will have to be compared with a standard MC simulation in terms of accuracy as well as resourceintensiveness.
References
 Janin, J. Mol. BioSyst. 2010, 6, 2351–2362.
 Ritchie, D. W. Current Protein and Peptide Science 2008, 9, 1–15, doi:10.2174/138920308783565741.
 Vajda, S.; Kozakov, D. Current Opinion in Structural Biology 2009, 19, 164 –170.
 FernándezRecio, J.; Sternberg, M. J. E. Prot.Struct.Func.Bioinfo. 2010, 78, 3066, 3065.
 Ozkan, A.; Sitharam, M. EASAL: Efficient Atlasing, Analysis and Search of Molecular Assembly Landscapes. Proceedings of the ISCA 3rd International Conference on Bioinformatics and Computational Biology, 2011.
 Ozkan, A.; Wu, R.; Peters, J.; Sitharam, M. (on arxiv).
 Ozkan, A.; Sitharam, M. (on arxiv).
 Ozkan, A.; Pence, J.; Wu, R.; Baker, T.; Willoughbyand, J.; Peters, J.; Sitharam, M. (on arxiv).
 Wu, R.; Ozkan, A.; Bennett, A.; AgbandjeMckenna, M.; Sitharam, M. Robustness Measure for an Adenoassociated Viral Shell Selfassembly is Accurately Predicted by Configuration Space Atlasing Using EASAL. Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, New York, NY, USA, 2012; pp 690–695.
 Wu, R.; Ozkan, A.; Bennett, A.; AgbandjeMcKenna, M.; Sitharam, M. (on arxiv).
 Lazaridis, T.; Karplus, M. Proteins 1999, 35, 133–152.
 Lazaridis, T. Proteins 2003, 52, 176–192.
 Im, W.; Feig, M.; Brooks, C. L. Biophysical Journal 2003, 85, 2900–2918.
 Sitharam, M.; H.Gao, Discrete and Computational Geometry 2010.
 Chittamuru, U. Efficient Iterative algorithm for bounding and sampling the Cayley configuration space of partial 2trees in 3D; M.S. Thesis University Of Florida, 2010.
 Peters, J.; Fan, J.; Sitharam, M.; Zhou, Y. Elimination in generically rigid 3D geometric constraint systems. Proceedings of Algebraic Geometry and Geometric Modeling, Nice, 2004; pp 27–29.