# Knoto-ID: a tool to study the entanglement of open protein chains using the concept of knotoids

###### Abstract

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

Abstract: The backbone of most proteins forms an open curve.
To study their entanglement, a common strategy consists in searching for the presence of knots in their backbones using topological invariants.
However, this approach requires to close the curve into a loop, which alters the geometry of curve.
Knoto-ID allows evaluating the entanglement of open curves without the need to close them, using the recent concept of knotoids which is a generalization of the classical knot theory to open curves.
Knoto-ID can analyse the global topology of the full chain as well as the local topology by exhaustively studying all subchains or only determining the knotted core.

Availability and Implementation: Knoto-ID is written in C++ and includes R (www.R-project.org) scripts to generate plots of projections maps, fingerprint matrices and disk matrices.
Knoto-ID is distributed under the GNU General Public License (GPL), version 2 or any later version and is available at https://github.com/sib-swiss/Knoto-ID. A binary distribution for Mac OS X and Linux with detailed user guide and examples can be obtained from http://www.vital-it.ch/software/Knoto-ID.

Contact: julien.dorier@sib.swiss

## I Introduction

The observation that protein backbones can form knots(Mansfield (1994)) initiated numerous studies of their nature and potential advantages or disadvantages that they may provide (e.g. Virnau et al. (2006); Dabrowski et al. (2016)). In this context, it was important to classify protein knots in terms of their topology.

A knot is a closed curve in 3-dimensional space that doesn’t intersect itself and it can be freely deformed as long as it does not pass through itself (Adams (1994)). However, the backbone of many biomolecules and specifically of many proteins correspond to open spatial curves and so, in strict topological sense, such curves are classified as unknotted. Until recently, the only way to study the topology of an open protein chain was to first close them and then proceed with the study of the entanglement. Of course, closing the chain alters its geometry. In 2012, V. Turaev introduced the concept of knotoids as a generalization of the classical knot theory to open knots (Turaev (2012)). Knotoids were studied further by N. Gügümcü and L. H. Kauffman (Gügümcü et al. (2017)). As a consequence, a number of studies emerged that implemented this new mathematical tool in the analysis of a protein backbone (Alexander et al. (2017); Goundaroulis and Dorier et al. (2017); Goundaroulis and Gügümcü et al. (2017)).

In this note we introduce Knoto-ID, a command line tool that is able to analyse and classify open spatial curves using this new mathematical concept. Moreover, we provide the possibility of closing an open 3D curve, if a knot analysis is required, with either a direct closure (e.g. Taylor (2000)) or using the uniform closure technique (e.g. Sulkowska et al. (2012)). This note focuses on individual open protein chains, however Knoto-ID can be used to analyse any open linear conformation in 3-space such as chromosomes (Siebert et al. (2017)), synthetic polymers, random walks.

## Ii Implementation

To analyse a protein, the coordinates of the atoms of the protein backbone have to be extracted from a .pdb file downloaded from the PDB (www.rcsb.org, Berman et al. (2000)) and then stored to a text file, which is the input of Knoto-ID. The backbone is then placed inside a large enough enclosing sphere. Each point of the sphere defines a direction of projection. For a given direction of projection, two infinite lines are introduced that pass from each of the termini of the curve and are parallel to the chosen direction. A triangle elimination method based on the KMT algorithm (Koniaris and Muthukumar (1991); Taylor (2000)) is then applied to simplify the curve while preserving the its underlying topology with respect to the two parallel lines. A knotoid diagram is obtained by projecting the curve on an oriented surface (a plane or a sphere). Finally, a topological invariant is evaluated on the knotoid diagram. For curves projected on a sphere the topological invariant is the Jones polynomial for knotoids (Turaev (2012)), while for curves projected on a plane it is the Turaev loop bracket polynomial (Turaev (2012)) (see the Knoto-ID user guide for a brief description of the theory). The knotoid type corresponding to the resulting polynomial can be optionally obtained using a list knotoid types distributed with Knoto-ID. Different choices of projection directions may yield different diagrams and so one has to sample an adequate number of projection directions in order to approximate the spectrum of knotoid types that corresponds to the spatial curve. The spectrum of knotoid types can be visualized using projection maps generated by Knoto-ID (Figure 1a and 1b). Knoto-ID is also able to handle closed chains as input, or to create a closed chain from an open one using either direct and uniform closure.

Knoto-ID is also able to analyse all subchains of a given curve to produce a fingerprint matrix (King et al. (2007)), for the case of open chains or a disc matrix (Rawdon et al. (2015)), for the case of closed chain (see Figure 1c and 1d). In addition, Knoto-ID can also find the knotted core of the chain, which is the shortest subchain obtained by progressively altering the length of the input chain by 1 point without changing the dominant knot or knotoid type in the process.

## Iii Conclusion

Knoto-ID is the first tool that is able to handle, analyse as well as classify open linear conformations in 3-space such as proteins in terms of their topology without requiring them to be closed into a loop, using the concept of knotoids.

## Acknowledgements

This is the Author’s Original Version of the article has been accepted for publication in Bioinformatics Published by Oxford University Press.

We thank Louis H. Kauffman for fruitful discussions, Frédéric Schütz for his advice on Knoto-ID packaging. We also thank Eric Rawdon, Elizabeth Annoni and Nicole Lopez for kindly providing the list of projections distributed with Knoto-ID.

## Funding

The work was funded in part by Leverhulme Trust (RP2013-K-017) and by the Swiss National Science Foundation (31003A-138267), both credited to Andrzej Stasiak.

## References

- Adams (1994) Adams, C. C. (1994) The Knot Book Freeman, New York.
- Alexander et al. (2017) Alexander, K., Taylor, A.J., Dennis, M. (2017) Proteins analysed as virtual knots. Sci. Rep., 7, 42300.
- Berman et al. (2000) Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., and Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235-242.
- Dabrowski et al. (2016) Dabrowski-Tumanski, P., Stasiak, A., Sulkowska, J.I. (2016) In Search of Functional Advantages of Knots in Proteins. PloS one, 11(11), e0165986.
- Goundaroulis and Dorier et al. (2017) Goundaroulis, D., Dorier, J., Benedetti, F., Stasiak, A. (2017) Studies of global and local entanglements of individual protein chains using the concept of knotoids. Sci. Rep., 7, 6309.
- Goundaroulis and Gügümcü et al. (2017) Goundaroulis, D., Gügümcü, N., Lambropoulou, S., Dorier, J., Stasiak, A., Kauffman, L. (2017) Topological models for open-knotted protein chains using the concepts of knotoids and bonded knotoids. Polymers, 9, 444.
- Gügümcü et al. (2017) Gügümcü, N. and Kauffman, L. H. (2017) New invariants of knotoids, Eur. J. Combin., 65, 186-229.
- King et al. (2007) King, N. P., Yeates, E. O., Yeates, T. O. (2007) Identification of rare slipknots in proteins and their implications for stability and folding, J. Mol. Biol., 373, 153-166.
- Koniaris and Muthukumar (1991) Koniaris, K., Muthukumar, M. (1991) Self-entanglement in ring polymers. J. Chem. Phys., 95(4), 2873-2881.
- Mansfield (1994) Mansfield, M.L. (1994) Are there knots in proteins? Nat. Struct. Biol., 1, 213-214.
- Rawdon et al. (2015) Rawdon, E. J., Millett, K. C., Stasiak, A. (2015) Subknots in ideal knots, random knots, and knotted proteins. Sci. Rep., 5, 8298.
- Shi et al. (2006) Shi, Dashuang, Yu, Xiaolin, Roth, Lauren, Morizono, Hiroki, Tuchman, Mendel, Allewell, Norma M. (2006) Structures of N-acetylornithine transcarbamoylase from Xanthomonas campestris complexed with substrates and substrate analogs imply mechanisms for substrate binding and catalysis, Proteins: Struct., Funct., Bioinf., 64, 1097-0134.
- Siebert et al. (2017) Siebert, J., Kivel, A., Atkinson, L., Stevens, T., Laue, E., and Virnau, P. (2017) Are There Knots in Chromosomes? Polymers. 9, 317.
- Sulkowska et al. (2012) Sulkowska, J.I., Rawdon, E.J., Millett, K.C., Onuchic, J.N., Stasiak, A. (2012) Conservation of complex knotting and slipknotting patterns in proteins. Proc. Natl. Acad. Sci. U. S. A., 109, E1715.
- Taylor (2000) Taylor, W. R. (2000) A deeply knotted protein structure and how it might fold. Nature, 406(6798), 916.
- Turaev (2012) Turaev, V. (2012) Knotoids. Osaka J. Math., 49, 195-223.
- Virnau et al. (2006) Virnau, P., Mirny, L.A., Kardar, M. (2006) Intricate knots in proteins: Function and evolution. PLoS Comput. Biol., 2, 1074-1079.