ROMEO: A Plug-and-play Software Platform of Robotics-inspired Algorithms for Modeling Biomolecular Structures and Motions
Motivation: Due to the central role of protein structure in
molecular recognition, great computational efforts are devoted to
modeling protein structures and motions that mediate structural
rearrangements. The size, dimensionality, and non-linearity of the
protein structure space present outstanding challenges. Such
challenges also arise in robot motion planning, and robotics-inspired
treatments of protein structure and motion are increasingly showing
high exploration capability. Encouraged by such findings, we debut
here ROMEO, which stands for Robotics prOtein Motion ExplOration
framework. ROMEO is an open-source, object-oriented platform that
allows researchers access to and reproducibility of published
robotics-inspired algorithms for modeling protein structures and
motions, as well as facilitates novel algorithmic design via its
Availability and implementation: ROMEO is written in C++ and is available in GitLab (https://github.com/). This software is freely available under the Creative Commons license (Attribution and Non-Commercial).
Protein molecules regulate virtually all processes that maintain and replicate a living cell, and their tertiary structure largely determines their interactions with molecular partners (Boehr and Wright, 2008). A detailed understanding of the structure(s) at the disposal of a protein for biological activity and of the motions that mediate rearrangements between different structures for activity modulation is central to understanding molecular mechanisms (Boehr et al., 2009). Decades of research have shown that such understanding cannot be obtained via experimentation or computation alone; in particular, while great progress is made on modeling biomolecular structures and motions, challenges related to the size, dimensionality, and non-linearity of structure spaces associated with macromolecules, such as proteins, remain (Maximova et al., 2016). Over the past decade, great advancements in the ability to explore complex protein structure spaces have come from robotics-inspired algorithms that leverage analogies between molecular and robot configuration spaces (Shehu and Plaku, 2016).
Encouraged by such progress, we debut here ROMEO, which stands for Robotics prOtein Motion ExplOration framework. ROMEO is an open-source, object-oriented platform that allows researchers access to and reproducibility of published robotics-inspired algorithms for modeling protein structures and motions. For instance, ROMEO provides templates for popular robotics motion planning algorithms, such as the Rapidly-exploring Random Tree (RRT) and Probabilistic RoadMap (PRM), and offers disseminated adaptations of these canonical algorithms that address diverse application settings, from template-free protein structure prediction to modeling of motions that mediate re-arrangements between stable and meta-stable structures (Amato et al., 2002; Shehu and Olson, 2010a; Molloy and Shehu, 2013; Devaurs et al., 2015).
2. Plug and Play Design
ROMEO follows a plug-and-play architecture to facilitate novel algorithmic design and so allow researchers to further advance algorithmic research in molecular biology. ROMEO is written in C++ and its object-oriented design allows easy adaptation and expansion of its classes. These classes have been designed around a core set of components shared by many motion planning algorithms, which we summarize below (see Figure S1 for a visualization of the architecture).
Representation, Energy, and Forward Kinematics: The cfg class represents a protein configuration. Included with this initial release is an extension of this class that interfaces with the Rosetta package (Bradley et al., 2005) and utilizes the coarse-grained representation known as the centroid mode, which tracks heavy backbone atoms and a side-chain centroid pseudo-atom via dihedral angles. Developers can easily extend this class to support Rosetta’s all-atom representation or others. A forward kinematics class allows projecting a configuration (under the selected representation) into Cartesian space space (retrieving Cartesian coordinates for each of the represented atoms). This class also associates an energy/fitness score with a configuration. In this initial release, ROMEO utilizes Rosetta scoring functions.
Planners: Sampling-based algorithms are popular in robot motion planning due to their ability to handle high-dimensional configuration spaces. ROMEO provides base classes for tree- and roadmap-based planners. This initial release includes support for RRT, PRM, and variants (the reader is referred to the review in (Shehu and Plaku, 2016) for a description of these algorithms). These planners share a common set of operations, some of which we highlight below. (i) ROMEO provides samplers to create new configurations and offspring generators to extend the tree or roadmap in a target direction. Producing new samples or offspring using traditional robotic techniques results in high-energy, unrealistic configurations. ROMEO offers offspring generators that employ molecular fragment replacement, which remedies this issue (Bradley et al., 2005). (ii) Acceptors are responsible for determining whether configurations are to be added to the roadmap/tree. ROMEO provides examples of extending this class via an energy threshold or dynamic techniques utilizing the Metropolis. (iii) ROMEO provides distance classes to measure the distance between two configurations, supporting both angle-based and Cartesian coordinate-based representations. (iv) Edge cost evaluators are responsible for weighting each of the transitions encoded into the graph/tree. ROMEO provides many extensions to a baseline class to support encoding basic distances, energetic differences, and those based on mechanical work (Devaurs et al., 2015). (v) A goal acceptor class determines if a given configuration is similar enough to a given goal configuration and has connectivity to a given start configuration.
3. Usage examples
We showcase two selected examples of how ROMEO can be utilized for modeling structures and motions.
Structure Prediction: We showcase the plug-and-play nature of ROMEO by the FeLTR algorithm proposed in (Shehu and Olson, 2010b) for decoy generation in template-free protein structure prediction. FeLTR grow a search tree in the protein’s configuration space to search for low-energy configurations. To implement FeLTR, the planner class is extended to utilize a low-dimensional projection to guide the exploration. This projection has been shown to strike a good balance between breadth (the unexplored frontier) and depth (low-energy). Only 300 lines of code were required to extend ROMEO to implement FeLTR (source code and examples of running this method are included in the software distribution). Section S1.1 in the Supplementary Information showcases results.
Motion Computation: Tree-based planners have been used to compute energetically-feasible paths connecting known structures of a protein. ROMEO includes an example of such an algorithm known as SPRINT (Molloy and Shehu, 2013). SPRNT grows a tree in the configuration space from a start configuration and biases its selection based on a low-dimensional projection based on a progress coordinate to the goal configuration. The software distribution includes an example of utilizing SPRINT on the cyanovirin-n protein, where the goal is to compute motions that connect two given structures more than Åapart from each-other. Details and results are available in Section S1.2 of the Supplementary Information.
The examples above illustrate the power of ROMEO’s plug-and-play architecture. This software can be useful in advancing algorithmic research structure and motion computation. It can also be employed in a classroom setting to allow instructors to initiate students in computational molecular biology and easily extend components of the framework.
This work has been partially supported by the National Science Foundation Grant No.1440581.
ROMEO is written in C++ and consists of a core set of components that are shared among all sampling-based robot motion planning algorithms. The object-oriented design of ROMEO allows easy adaptation and expansion of its core components, making it possible for developers to customize ROMEO for additional applications.
Figure 1 illustrates the class inheritance hierarchy of the software. The following sections summarize the purpose of each class and describe in greater detail selected member functions.
Choice of Representation
The cfg class is the base class to represent a configuration. The class stores a vector of the backbone dihedral angles of a protein structure and in doing so assumes an idealized geometry (where bond lengths and valence angles do not deviate from equilibrium values). This choice of representation is popular in template-free protein structure prediction and motion computation. We point that other applications, such as folding, necessitate extending the base class by allowing bond lengths and valence angles to deviate (thus including them in the representation) or by introducing other configuration parameters.
Since the potential energy associated with a given protein structure is used in many planning operations (for instance, when determining whether to accept a new configuration or when calculating the cost associated with moving from a current to a new configuration), energy is associated with a configuration and is also stored in the cfg class.
The class MolecularStructureRosetta provides the interface between ROMEO and the Rosetta software suite. This class, derived from the CfgForwardKinematics class offers the ability to extract a configuration from a Protein DataBank (PDB) (Berman et al., 2003) file (which is the conventional way of storing information about a tertiary structure) and to score a configuration using a selected Rosetta scoring/energy functions. Member functions are also provided to perform forward kinematics on a cfg object, that is, placing the protein in a particular configuration and obtaining the Cartesian coordinates for each atom in a given representation. ROMEO provides support for Rosetta’s centroid representation, which tracks heavy backbone atoms and a side-chain centroid pseudo-atom.
ROMEO utilizes sampling-based motion planning algorithms to explore the configuration space of a protein. Direct support is provided for two popular sampling-based planners: the Rapidly Exploring Random Tree (RRT) (LaValle and Kuffner, 2001) and the Probabilistic RoadMap (PRM) (Kavraki et al., 1996). As a proof of concept, to illustrate ROMEO’s plug and play architecture, the FeLTR (Shehu and Olson, 2010a) and SPRINT (Molloy and Shehu, 2013) methods have also been implemented in this release of ROMEO. By expanding the sampler classes, many variants of these planners can be implemented by other developers.
These planners all share a common set of operations or core components, such as: generating new configurations, determining the validity of a new configuration, measuring the distance between two configurations, and projecting configurations into the workspace. ROMEO abstracts these operations using a set of base classes that allow for easy “plug and play” replacements of these components.
Samplers and Offspring Generators
ROMEO offers two classes, one for sampling and another for offspring generation. The sampling class is employed is during the generation of at-random samples/configurations. Examples include obtaining in the RRT planner and generating landmark configurations to populate the roadmap/graph in PRM. The offspring generator class is used to modify an existing configuration, potentially by perturbing it in a given direction. This is employed, for instance, when extending in the direction of in the RRT extend step, or when connecting landmark configurations during the local planning step within the PRM planner. ROMEO extends these baseline classes to support utilizing Rosetta’s molecular fragment libraries for structure and motion computations.
The acceptor class is used when deciding whether a new configuration should be added to the graph or tree maintained by the planner. A simple acceptor test would be to set a maximum energy value for all configurations (thus, the acceptor verifies that new configurations are below this threshold). When studying molecular transitions, the Metropolis criterion (which utilizes differences in energy between two configurations) is commonly employed. For this reason, ROMEO also provides an acceptor class based on MMC; in particular, ROMEO implements the transition test utilized in the SPRINT (Molloy and Shehu, 2013) and Transition-based RRT (T-RRT) planners (Jaillet et al., 2011; Jaillet et al., 2008).
Planners utilize a distance function to determine nearby configurations. ROMEO comes with basic distance classes (Euclidean distance), as well as distances commonly used in the study of proteins (such as the least root-mean-square-distance, or lRMSD(McLachlan, 1972).
Selected Examples of Applicability
This section outlines two applications that utilize the ROMEO framework. The first example utilizes ROMEO to perform template-free protein structure prediction by implementing a robotics-inspired method known as FeLTR (Shehu and Olson, 2010a). The second example employs RRT to compute energetically-feasible paths that connect two functionally-relevant configurations of the cyanovirin-n protein. All source code and scripts to run these examples are included with the ROMEO distribution.
This example showcases the FeLTR method (Shehu and Olson, 2010a) for performing template-free protein structure prediction. We summarize how ROMEO’s “plug and play” architecture is utilized to easily implement FeLTR.
As described in greater detail in (Shehu and Olson, 2010a), FeLTr employs an EST-like planner that expands the search tree iteratively via selection and expansion operations and utilizes projection layers for configuration selection. The SelectionVertex function of ROMEO’s TreeSamplingBasedPlanner class is overloaded to support FeLTr’s novel node selection technique that utilizes a low-dimensional projection of a configuration. The AddVertex function is overloaded to place new vertices into FeLTr’s projection layers. Changes to the planner are limited to the addition of lines of code (in addition to supporting code for computing projection coordinates from configurations). The ease with which this complex method is implemented in ROMEO highlights the advantages of its object-oriented design.
ROMEO’s distribution provides the required scripts and configuration files to run FeLTR to predict the structure of the ibeta subdomain of the mu end DNA binding domain of phage mu transposase. Figure 2 showcases the closest (in terms of lRMSD) sampled structure when compared to the native structure cataloged in the PDB. Different weighting schemes can be explored when selecting nodes from FeLTR’s low-dimensional projection, as described in (Molloy et al., 2013). Figure 3 highlights some of the behavior of each of these weighting schemes. The quad scheme utilizes a greedy strategy, selecting nodes with the lowest energies with high probability. The linear and norm (Gaussian based) schemes approach the lower lRMSD structure more gradually, and have been shown for longer executions to provide closer samples to the native structure (Molloy et al., 2013).
ROMEO can also be used to compute the motions that mediate rearrangements between two distinct functionally-relevant structures. We highlight this capability here on the cyanovirin-n protein, where we treat two distinct structures (found under PDB IDs 2EZM and 1L5E) as start and goal structures located almost 16 Å lRMSD apart from one another. We highlight here an implementation of the SPRINT method (Molloy and Shehu, 2013) with ROMEO.
We executed ROMEO for 12 hours and tested two different energy acceptors. Rosetta’s score3 energy scheme was utilized with the radius of gyration terms disabled (since this rewards more compact structures). The first acceptor limited the acceptable energy of a configuration to no higher than 60 Rosetta Energy Units (REUs). The second scheme utilized the Metropolis criterion, setting the temperature, such that an increase of REUs had a 0.1 probability of being accepted. Each scheme resulted in pathways that ended with configurations (structures) within 3 Å lRMSD of the goal configuration. The energy profiles for each are shown in Figure reffig:CVNPathEnergy. As expected, the Metropolis acceptor (MMC) shows a more gradually-increasing and overall a lower-energy path compared to the max energy acceptor. A few sample structures along the pathway computed from the MMC execution are showcased in Figure 5.
- Amato et al. (2002) N. M. Amato, K. A. Dill, and G. Song. 2002. Using motion planning to map protein folding landscapes and analyze folding kinetics of known native structures. J. Comp. Biol. 10, 3-4 (2002), 239–255.
- Berman et al. (2003) H. M. Berman, K. Henrick, and H. Nakamura. 2003. Announcing the worldwide Protein Data Bank. Nature Struct Biol 10, 12 (2003), 980–980.
- Boehr et al. (2009) D. D. Boehr, R. Nussinov, and P. E. Wright. 2009. The role of dynamic conformational ensembles in biomolecular recognition. Nature Chem Biol 5, 11 (2009), 789–96.
- Boehr and Wright (2008) D. D. Boehr and P. E. Wright. 2008. How do proteins interact? Science 320, 5882 (2008), 1429–1430.
- Bradley et al. (2005) P. Bradley, K. M. S. Misura, and D. Baker. 2005. Toward High-Resolution de Novo Structure Prediction for Small Proteins. Science 309, 5742 (2005), 1868–1871.
- Devaurs et al. (2015) D. Devaurs, K. Molloy, M. Vaisset, A. Shehu, T. Simeon, and J. Cortes. 2015. Characterizing Energy Landscapes of Peptides Using a Combination of Stochastic Algorithms. IEEE Transactions on NanoBioscience 14, 5 (July 2015), 545–552. https://doi.org/10.1109/TNB.2015.2424597
- Jaillet et al. (2011) L. Jaillet, F. J. Corcho, J.-J. Perez, and J. Cortés. 2011. Randomized tree construction algorithm to explore energy landscapes. J Comput Chem 32, 16 (2011), 3464–3474.
- Jaillet et al. (2008) L. Jaillet, J. Cortés, and T. Siméon. 2008. Transition-based RRT for path planning in continuous cost spaces. In IROS. IEEE/RSJ, Stanford, CA, 22–26.
- Kavraki et al. (1996) L. E. Kavraki, P. Svetska, J.-C. Latombe, and M. Overmars. 1996. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transaction on Robotics and Automation 12 (1996), 566–580.
- LaValle and Kuffner (2001) S. M. LaValle and J. J. Kuffner. 2001. Randomized kinodynamic planning. IJRR 20, 5 (2001), 378–400.
- Maximova et al. (2016) T. Maximova, R. Moffatt, B. Ma, R. Nussinov, and A. Shehu. 2016. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comp. Biol. 12, 4 (2016), e1004619.
- McLachlan (1972) A. D. McLachlan. 1972. A mathematical procedure for superimposing atomic coordinates of proteins. Acta Crystallographica 26, 6 (1972), 656–657.
- Molloy et al. (2013) K. Molloy, S. Saleh, and A. Shehu. 2013. Probabilistic Search and Energy Guidance for Biased Decoy Sampling in Ab-initio Protein Structure Prediction. IEEE/ACM Trans. Bioinf. and Comp. Biol. 10, 5 (2013), 1162–1175.
- Molloy and Shehu (2013) K. Molloy and A. Shehu. 2013. Elucidating the Ensemble of Functionally-relevant Transitions in Protein Systems with a Robotics-inspired Method. BMC Struct. Biol. 13, Suppl 1 (2013), S8.
- Shehu and Olson (2010a) A. Shehu and B. Olson. 2010a. Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration. Intl. J. Robot. Res. 29, 8 (2010), 1106–1127.
- Shehu and Olson (2010b) Amarda Shehu and Brian S. Olson. 2010b. Guiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration. I. J. Robotics Res. 29, 8 (2010), 1106–1127.
- Shehu and Plaku (2016) A. Shehu and E. Plaku. 2016. A Survey of omputational Treatments of Biomolecules by Robotics-inspired Methods Modeling Equilibrium Structure and Dynamics. J Artif Intel Res 597 (2016), 509–572.