A probabilistic model for crystal growth applied to protein deposition at the microscale
Abstract
A probabilistic discrete model for 2D protein crystal growth is presented. This model takes into account the available space and can describe growing processes of different nature due to the versatility of its parameters which gives the model great flexibility. The accuracy of the simulation is tested against a real protein (SbpA) crystallization experiment showing high agreement between the proposed model and the actual images of the nucleation process. Finally, it is also discussed how the regularity of the interface (i.e. the curve that separates the crystal from the substrate) affects to the evolution of the simulation.
1 Introduction
A crystal is a (three dimensional) periodic arrangement of repeating ‘structural motifs’, which can be atoms, molecules, or ions [1]. Technically, the process in which a (small) crystal becomes larger is called crystal growth. Crystallization is commonly referred to the fact of creating nucleation points first, followed by crystal growth. Crystal growth is important for the understanding of biomineralization, molecular diffusion and adsorption, or fractal formation [2, 3, 4]. Theoretical crystallization models distinguish between thermodynamic and kinetic conditions [5, 6]. For problems concerning growth rate and growth from limited resources, the Avrami [7, 8] and the Gompertz functions can be used, respectively, for data modelling [9].
Two dimensional invivo protein crystallization is commonly utilized to study the function and structure of proteins [10]. Insitu crystal growth can also be used to test growth and molecular adsorption models. In this work we have taken advantage of a bacterial protein, SbpA from Lysinibacillussphaericus CCM2177, which forms crystalline arrays on different interfaces [11]. In particular, SbpA is able to selfassemble from solution building square lattices on silicon dioxide and selfassembly monolayers [12]. Former studies on SbpA crystallization indicate on one hand that the process consists of a transition from amorphous to crystalline state [13], and on another hand, that proteinsubstrate determines the properties of the crystalline domain [14].
In this work we propose an approach to describe and model 2D protein crystal growth. The structure of the paper is as follows. In Section 2 we present the probabilistic model and define all the parameters involved in it. This model takes into account the available space for growing and also reproduces different shapes and porosity (or lacunarity). In Section 3 we analyze the different parameters and how they affect the development of the simulation. All these different tunable parameters give the model a very high flexibility and allow us to successfully simulate real protein crystallization processes. Finally, in Section 4 we test the model by simulating the growth of a SbpA crystal. We conclude the paper discussing how the regularity of the interface affects the evolution of the crystallization process.
2 The model
The model we present here is a discretespace discretetime model for a square region in which a regular square mesh is defined. At each step , one cell is filled by the crystal, and the corresponding time is denoted by , with the initial time. In our model, is computed at each step and it determines only the growth rate of the crystal, but it does not affect to the structure, i.e. its shape.
2.1 Structure
The space filled by the crystal when the th cell is occupied is given by the occupation matrix , defined by if cell is occupied, and if it is free.
In this way, we also define a probability matrix so that the probability of the cell to be occupied at the th step is given by . Nevertheless, in the simulation procedure we shall use a relative probability matrix which gives the number of chances that has a cell to be occupied, and it is related with in this way:
For example, if we want that each cell has exactly the same probability of being occupied at the beginning, then we can set . On the other hand, if we want the cells on a particular region to be twice as likely to be occupied at the beginning, then we can set for such cells and otherwise. Moreover, we can also force to have, for example, only one crystallization nucleus taking except for one cell, as it is done in Figure 5 with a cell in the middle of the square region.
In our model, proteins will depose onto the substrate, occupying cells, following the next three structure rules:

If a cell is occupied, it cannot be occupied again, i.e. if , then .

If a cell is occupied, then the probability of occupation of the adjacent free cells is increased, see (1).
This rules are independent of and hence they only determine the structure of the crystal, not its growth rate. Note that the first rule implies that this is a fully 2D model, i.e. no height increase is considered here.
Also note that, taking into account the second rule, the space scale is much larger than the characteristic length of the crystal structure. That is, each cell does not represent a single crystal, but a larger quantity. In fact, the scale is such that one occupied cell only increases the probability of occupation of the adjacent cells.
In our model, we propose that the chance of occupation of a free cell at the th step is given by
(1) 
where is a positive constant, is the number of adjacent occupied cells, and is a function related to the available area surrounding the free cell (see (2)). The real parameter determines the importance of the adjacent occupied cells and it is usually set to (see Section 3). Note that, if a free cell has not any adjacent occupied cell, then its chance of occupation is .
Remark 2.1.
In order to speed up the algorithm, the third rule can be modified by only considering free cells adjacent to an occupied cell and so, equation (1) would only apply to free cells with . Hence, the chance of occupation of a free cell with no adjacent occupied cells (i.e. ) would be instead of . It gives practically the same results when the number of crystallization nuclei is low and the algorithm is much faster.
For determining we have to take into account that in the diagonal directions there is a weight factor of (see Figure 1). This weight factor is set in order to avoid squareshaped evolution produced by the discretization in space, where circular ones should appear.
With respect to , first we have to define some concepts. The effective radius, , of a free cell in a given direction with an opening angle is the distance from the cell to the nearest occupied cell lying in the range of this direction. Then, the corresponding effective area is the area of the circular section with radius and angle (see Figure 2). The underlying idea is that the effective area is a zone where the free cell can capture proteins, but in this model, it is very simplified.
Although could be a parameter, in our model we take because it greatly simplifies the discretized algorithm and it produces results matching the real measures (see Section 3). Moreover, we only take into account effective radii lesser than a given parameter called influence radius, , beyond which, occupied cells do not hinder the crystal growth. So, if (even ) then we take . Next, we define the maximum effective radius of a free cell at the th occupation stage as the maximum of the corresponding effective radius, but only in the eight main directions (for operational purposes): east, northeast, north, northwest, west, southwest, south and southeast.
Hence, for a free cell, we define
(2) 
where , , and hence, . The parameter determines the importance of the maximum effective radius in the model and it is usually taken close to . Setting means that the probability of occupation of a cell does not depend on the maximum effective radius and then, rule 3 is not satisfied. On the other hand, we usually set because in this case is the ratio between the effective area and the maximum possible effective area (i.e. the area of the circular section of radius and angle ). For convenience, for an occupied cell. With this statement, rule 1 is a consequence of equation (1).
2.2 Kinetics
Let be the area defined by the crystal measured in occupied cells. Note that, although in this model is a step function, in this work we model the physical phenomena in which can be considered as a derivable real variable function. Nevertheless, we are going to consider the occupied area proportion
(3) 
that ranges from to . In this section, we are going to set and hence . Moreover, will denote the number of expected occupied cells per time unit at . It is not too important for the kinetics in a qualitative sense because, for example, a simulation with will produce practically the same results as with (leaving all other parameters unchanged), but in half the time.
With respect to the crystal growth rate, we can fit the the Avrami function, designed for modeling crystallization and some chemical reactions [7, 8]:
(4) 
where and . However, in our case, where the crystal growth is 2dimensional, can be considered equal to . Regarding , it is given by the initial crystal growth rate, since
(5) 
The Avrami function fits well in the first stages of the crystal growth, but when the free space resource becomes scarce there are bigger differences (see Figure 8) since the Avrami function does not take into account this kind of resource. To solve this problem in the last stages, we can fit the Gompertz function, designed for growth with limited resources (in our case, the free space) [9]:
(6) 
where . But note that this function can only model the crystal growth rate in an advanced state, since the initial occupied area in the Gompertz model is not zero. So, it can be done a piecewise fit using the Avrami function (4) for the first stages, and the Gompertz function (6) for the rest, when the lack of free space becomes significant.
Finally, as an alternative for the piecewise fit, we can use a unique “hybrid” AvramiGompertz function:
(7) 
where . Note that (7) can only be applied in a time interval with because, according to (7), for . Moreover, the parameter in (7) is also equal to , as in (4), but the parameter in (7) is not the same as in (6).
3 The parameters
There are two kinds of parameters: structural and kinetic. The structural parameters determine the shape of the crystal, while the kinetic parameters are responsible for the crystal growth rate. We should also mention the parameter (size of the square mesh) which is of great importance since its value affects both, the structure and the kinetics of the crystal growth.
3.1 Structural parameters

: the “difficulty of filling” parameter, see (2). It ranges from to and determines the importance of the maximum effective radius in the model. It is also related to the rate at which void regions are filled. Setting means that the probability of occupation of a cell does not depend on the maximum effective radius. In Figure 5 it is shown how this parameter (along with ) affects the shape of the crystal.

: the “effective dimension” parameter, see (2). It must be positive and, as it is explained before, in our model it is set to . Also determines how the maximum effective radius affects the crystal growth.
3.2 Kinetic parameters

: Parameters of the Gompertz function (6), that models the last stages of the process.
4 Results
4.1 Protein recrystallization
In order to validate the model we tested it against the growth of a bacterial which crystallizes forming structures called Slayers (SbpA). This SbpA protein was recrystallized on a SiO substrate and the overall process was scanned by an AFM obtaining height images (see [12] for the specific details of the experiment). In Figure 7, the first row of images shows the crystallization process for times ranging from 10 to 110 minutes. The crystal growth continues steadily until more than the 98% of the substrate is covered 12 hours later.
Taking the images of the crystal deposition at different time stamps, the total surface occupied by the crystal was estimated by transforming the images to black and white 8bits images and using a selfdeveloped MATLAB code which computes the number of black pixels over the total amount of pixels of the image. The second row of images in Figure 7 depicts, as an example, the transformation into black and white images of the original images shown in the top row.
Once the images are transformed and the occupied fraction is estimated, a growth curve is fit to the data according to the growth models described in Section 2.2 (see Figure 8). It is noteworthy how the Avrami model fits the data accurately at the beginning, but the growth rate for later moments is too high because it does not consider the limitations of space. As the crystal grows, the available space is reduced and so is the crystal growth rate. On the other hand, the Gompertz model captures the slow growth rate from the first 30 minutes, but it is not suitable for the first stages of the crystallization process. In fact, since the Gompertz growth model does not vanish at , it cannot be applied there. Finally, the AvramiGompertz model is able to accurately fit the growth rate of the recrystallization process both at the beginning and at the final steady state moments. The nonlinear least square fits were performed using the R statistical software [15] and the LevenbergMarquardt algorithm provided by the minpack.LM package [16].
In order to assess the validity of the model as a description of the recrystallization process at the micro scale, we run several simulations using as initial condition the black and white image of the crystallization at minutes. The choice of this initial condition is based on the fact that, with a 50% of occupied fraction, despite the stochastic nature of the model, the simulation would give results that could be compared to the experimental data. Figure 9 shows the evolution of the recrystallization process for one of the runs of the simulation (all of them gave results which were indistinguishable at a glance). The top row shows the black and white pictures obtained from the original AFM images while the second row depicts the simulation results at the same occupation fractions. The equivalence between the occupation fraction and the time stamp was obtained by the AvramiGompertz kinetic model. The initial (leftmost) pictures are the same in both rows. It is remarkable the close agreement between the simulation results and the experimental data.
4.2 Influence of the contour regularity
Once the growth model has proven suitable for describing the protein recrystallization process, the question of the dependence of the growth rate on the regularity of the interface contour arises. The rationale behind all this is based on the fact that every system will evolve so its free energy is minimized. In crystal growth processes, the energy is highly dependent on the surface tension, or the line tension (for a 2D interface) [17]. Therefore, the higher the length of this tension line is, the higher the free energy will be. Thus if the system seeks to minimize the free energy in the fastest possible way, the crystal should grow the most in the parts were the interface line is longer. Since the length of a curve is closely related to its regularity, we should find that a crystal with an irregular contour grows faster (under the same conditions) than a crystal with a regular one.
In order to assess how the regularity of the contour affects the growth process in our model, a simulation experiment was conducted with the following initial condition: a regular and an irregular nucleus both with the same size (measuring 10000 occupied cells) were placed in opposite corners of a square big enough so the existence of any of the two nuclei did not affect the evolution of the other (see Figure 10 left picture). We run 2000 iterations of the simulation setting the parameters to , and . We also set for cells with no adjacent occupied cells so no new nuclei could appear. Figure 10 shows how the regular nucleus (bottom left corner) grows slower than the irregular nucleus (top right corner) and how its contour becomes less and less regular as the crystal deposition continues.
For a quantitative estimation of the differences between the regular and irregular nuclei, 100 simulations were conducted as described above for and . Figure 11 shows how the growth of the regular nucleus is slower than the irregular one. For example, in the case of , from the 2000 new cells grown by the simulation, only around 800 were added to the regular nucleus, while the other 1200 were added to the irregular one. It is also remarkable how, at the beginning, the differences in the growth rates are higher, and as the contour of the regular nucleus becomes more and more irregular, the growth rates are equalized. This also becomes clear in Figure 12, in which the ratio between the probabilities of occupation of a cell adjacent to both nuclei is plotted vs. the number of new deposited cells (). It can be seen how the curves are above 1, meaning that the irregular nucleus has more occupation probability than the regular one and how the curves decrease as the crystals grow, due to the fact that the differences in regularity between the two nuclei are less marked. In the long run, the quotients are asymptotically stabilized around a value slightly greater than 1, because the irregular nucleus has grown more in the initial stages of the simulation and these differences remain over time.
References
 [1] Charles Kittle. Introduction to Solid State Physics, 7th ed. Wiley India Pvt. Limited, 2007.
 [2] Arthur Veis and Jason R. Dorvee. Biomineralization mechanisms: A new paradigm for crystal nucleation in organic matrices. Calcified Tissue International, 93(4):307–315, Oct 2013.
 [3] Chengbin Huang, Shigang Ruan, Ting Cai, and Lian Yu. Fast surface diffusion and crystallization of amorphous griseofulvin. The Journal of Physical Chemistry B, 121(40):9463–9468, 2017. PMID: 28885850.
 [4] N. M. Semenova, A. A. Dick, V. N. Skokov, and V. P. Koverda. Fractal structurization under crystallization of amorphous germanium in threelayer films aggeag. physica status solidi (a), 139(2):287–293, 1993.
 [5] C. Himawan, V.M. Starov, and A.G.F. Stapley. Thermodynamic and kinetic aspects of fat crystallization. Advances in Colloid and Interface Science, 122(1):3 – 33, 2006. ECICXVII, XVIIth European Chemistry at Interfaces Conference, 27 June1 July, 2005, Department of Chemical Engineering, Loughborough University.
 [6] Deliang Zhou, Geoff G. Z. Zhang, Devalina Law, David J. W. Grant, and Eric A. Schmitt. Thermodynamics, molecular mobility and crystallization kinetics of amorphous griseofulvin. Molecular Pharmaceutics, 5(6):927–936, 2008. PMID: 18986160.
 [7] Melvin Avrami. Kinetics of phase change. i general theory. The Journal of Chemical Physics, 7(12):1103–1112, 1939.
 [8] Melvin Avrami. Kinetics of phase change. ii transformationtime relations for random distribution of nuclei. The Journal of Chemical Physics, 8(2):212–224, 1940.
 [9] Anna Kane Laird. Dynamics of tumor growth. British Journal of Cancer, 18(3):490–502, sep 1964.
 [10] Jonathan P.K. Doye and Wilson C.K. Poon. Protein crystallization in vivo. Current Opinion in Colloid and Interface Science, 11(1):40 – 46, 2006.
 [11] Uwe B Sleytr, Margit Sára, Dietmar Pum, and Bernhard Schuster. Characterization and use of crystalline bacterial cell surface layers. Progress in Surface Science, 68(7):231 – 278, 2001.
 [12] Aitziber Eleta López, Susana MorenoFlores, Dietmar Pum, Uwe B. Sleytr, and José L. TocaHerrera. Surface dependence of protein nanocrystal formation. Small, 6, 2010.
 [13] Sungwook Chung, SeongHo Shin, Carolyn R. Bertozzi, and James J. De Yoreo. Selfcatalyzed growth of s layers via an amorphoustocrystalline transition limited by folding kinetics. Proceedings of the National Academy of Sciences, 107(38):16536–16541, 2010.
 [14] Ainhoa Lejardi, Aitziber Eleta López, José R. Sarasua, U. B. Sleytr, and José L. TocaHerrera. Making novel biointerfaces through bacterial protein recrystallization on biocompatible polylactide derivative films. The Journal of Chemical Physics, 139(12):121903, 2013.
 [15] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2017.
 [16] Timur V. Elzhov, Katharine M. Mullen, AndrejNikolai Spiess, and Ben Bolker. minpack.lm: R Interface to the LevenbergMarquardt Nonlinear LeastSquares Algorithm Found in MINPACK, Plus Support for Bounds, 2016. R package version 1.21.
 [17] A.L. Barabási and H. E. Stanley. Fractal Concepts in Surface Growth. Cambridge University Press, Cambridge, 1995.