Learning to grow: control of materials self-assembly using evolutionary reinforcement learning

Learning to grow: control of materials self-assembly using evolutionary reinforcement learning


We show that neural networks trained by evolutionary reinforcement learning can enact efficient molecular self-assembly protocols. Presented with molecular simulation trajectories, networks learn to change temperature and chemical potential in order to promote the assembly of desired structures or choose between isoenergetic polymorphs. In the first case, networks reproduce in a qualitative sense the results of previously-known protocols, but faster and with slightly higher fidelity; in the second case they identify strategies previously unknown, from which we can extract physical insight. Networks that take as input the elapsed time of the simulation or microscopic information from the system are both effective, the latter more so. The network architectures we have used can be straightforwardly adapted to handle a large number of input data and output control parameters, and so can be applied to a broad range of systems. Our results have been achieved with no human input beyond the specification of which order parameter to promote, pointing the way to the design of synthesis protocols by artificial intelligence.

Molecular self-assembly is the spontaneous organization of molecules or nanoparticles into ordered structures Whitesides et al. (1991); Biancaniello et al. (2005); Park et al. (2008); Nykypanchuk et al. (2008); Ke et al. (2012). It is a phenomenon that happens out of equilibrium, and so while we have empirical and theoretical understanding of certain self-assembling systems and certain process that occur during assembly Doye et al. (2004); Romano and Sciortino (2011); Glotzer et al. (2004); Doye et al. (2007a); Rapaport (2010); Reinhardt and Frenkel (2014); Murugan et al. (2015); Whitelam and Jack (2015); Nguyen and Vaikuntanathan (2016); Whitelam et al. (2014a); Jadrich et al. (2017); Lutsko (2019), we lack a predictive theoretical framework for self-assembly. That is to say, given a set of molecules and ambient conditions, and an observation time, we cannot in general predict which structures and phases the molecules will form, and what will be the yield of the desired structure when (and if) it forms.

Figure 1: In this paper we show that neural-network policies trained by evolutionary reinforcement learning can enact efficient time- and configuration-dependent protocols for molecular self-assembly. A neural network periodically controls certain parameters of a system, and evolutionary learning applied to the weights of a neural network (indicated as colored nodes) results in networks able to promote the self-assembly of desired structures. The protocols that give rise to these structures are then encoded in the weights of a self-assembly kinetic yield net.

Absent such theoretical understanding, an alternative is to seek assistance from machine learning in order to attempt to control self-assembly without human intervention. In this paper we show that neural-network-based evolutionary reinforcement learning can be used to develop protocols for the control of self-assembly, without prior understanding of what constitutes a good assembly protocol. Reinforcement learning is a branch of machine learning concerned with learning to perform actions so as to achieve an objective Sutton and Barto (2018), and has been used recently to play computer games better than humans can Watkins and Dayan (1992); Mnih et al. (2013, 2015); Bellemare et al. (2013); Mnih et al. (2016); Tassa et al. (2018); Todorov et al. (2012); Puterman (2014); Asperti et al. (2018); Riedmiller (2005); Riedmiller et al. (2009); Schulman et al. (2017); Such et al. (2017); Brockman et al. (2016); Kempka et al. (2016); Wydmuch et al. (2018); Silver et al. (2016, 2017). Here we consider stochastic molecular simulations of patchy particles, a standard choice for representing anisotropically-interacting molecules, nanoparticles, or colloids Zhang and Glotzer (2004); Romano et al. (2011); Sciortino et al. (2007); Doye et al. (2007b); Bianchi et al. (2008); Doppelbauer et al. (2010); Whitelam et al. (2014b); Duguet et al. (2016). While a neural network cannot influence the fundamental dynamical laws by which such systems evolve Frenkel and Smit (1996), it can control the parameters that appear in the dynamical algorithm, such as temperature, chemical potential, and other environmental conditions. In this way the network can influence the sequence of microstates visited by the system. We show that a neural network can learn to enact a time-dependent protocol of temperature and chemical potential (called a policy in reinforcement learning) in order to promote the self-assembly of a desired structure, or choose between two competing polymorphs. In both cases the network identifies strategies different to those informed by human intuition, but which can be analyzed and used to provide new insight. We use networks that take only elapsed time as their input, and networks that take microscopic information from the system. Both learn under evolution, and networks permitted microscopic information learn better than those that are not.

Networks enact protocols that are out of equilibrium, in some cases far from equilibrium, and so are not well-described by existing theories. These “self-assembly kinetic yield” networks act to promote a particular order parameter for self-assembly at the end of a given time interval, with no consideration for whether a process results in an equilibrium outcome or not. It is therefore distinct from feedback approaches designed to promote near-equilibrium behavior Klotsa and Jack (2013). Our approach is also complementary to efforts that use machine learning to analyze existing self-assembly pathways Long et al. (2015); Long and Ferguson (2014), or to infer or design structure-property relationships for self-assembling molecules Lindquist et al. (2016); Thurston and Ferguson (2018); Ferguson (2017). The network architectures we have used are straightforwardly extended to observe an arbitrary number of system features, and to control an arbitrary number of system parameters, and so the present scheme can be applied to a wide range of experimental and simulation systems.

In Section I we describe the evolutionary scheme, which involves alternating physical and evolutionary dynamics. In Section II we show that it can be used to increase the fidelity with which a certain structure assembles, until assembly fidelity is comparable to that of previously-know slow cooling protocols, but achieved in a fraction of the time. In Section III we show that networks can learn to select between two polymorphs that are equal in energy and that both form under slow cooling protocols. The strategy used by the networks to achieve this selection provides new insight into the self-assembly of this system. We conclude in Section IV. Networks learn these efficient and new self-assembly protocols with no human input beyond the specification of which target parameter to promote, pointing the way to the design of synthesis protocols by artificial intelligence.

I Evolutionary reinforcement learning of self-assembly protocols

The evolutionary learning scheme by which a self-assembly kinetic yield neural network learns is sketched in Fig. 1. We consider a computational model of molecular self-assembly, patchy discs of diameter on a two-dimensional square substrate of edge length . The substrate (simulation box) possesses periodic boundary conditions in both directions. Discs, which cannot overlap, are minimal representations of molecules, and the patches denote their ability to make mutual bonds at certain angles. By choosing certain patch angles, widths, and binding-energy scales it is possible to reproduce the dynamic and thermodynamic behavior of a wide range of real molecular systems Whitelam et al. (2014b).

Two discs receive an energetic reward of if their center-to-center distance is between and , and if the line joining those discs cuts through one patch on each disc Kern and Frenkel (2003). In addition, we sometimes require patches to possess certain identities in order to bind, mimicking the ability of e.g. DNA to be chemically specific Whitelam (2016). In this paper we consider disc types with and without DNA-type specificity. Bound patches are shown green in figures, and unbound patches are shown black. In figures we often draw the convex polygons formed by joining the centers of bound particles Whitelam et al. (2014b). Doing so makes it easier to spot regions of order by eye, and polygon counts also serve as a useful order parameter for self-assembly. We denote by the number of convex -gons within a simulation box.

Figure 2: (a) A 3-patch disc with chemically selective patches can form a structure equivalent to the 3.12.12 Archimedean tiling Whitelam (2016) (a tiling with a 3-gon and two 12-gons around each vertex). (b) The time network used with this disc in the evolutionary scheme of Fig. 1 produces progressively better yields of 12-gons with generation. We show the top 5 yields per generation, with the best shown in blue. The protocols leading to these yields are shown in (c,d), the better yields corresponding to rapid cooling and evacuation of the substrate, and snapshots from different generations are shown in (e). 12-gons are shown green, and 3-gons blue. (f) The microscopic network used in the same evolutionary scheme produces better yields than the time network, using (g,h) similar but slightly more nuanced protocols.

We simulated this system in order to mimic an experiment in which molecules are deposited on a surface and allowed to evolve. We use two stochastic Monte Carlo algorithms to do so. One is a grand-canonical algorithm that allows discs to appear on the substrate or disappear into a notional solution Frenkel and Smit (1996); the other is the virtual-move Monte Carlo algorithm Whitelam et al. (2009); Hedges () that allows discs to move collectively on the surface in an approximation of Brownian motion Haxton et al. (2015). If is the instantaneous number of discs on the surface then we attempt virtual moves with probability , and attempt grand-canonical moves otherwise. Doing so ensures that particle deposition occurs at a rate (for fixed control parameters) that is roughly insensitive of substrate density. The acceptance rates for grand-canonical moves are given in Ref. Whitelam et al. (2014b) (essentially the textbook rates Frenkel and Smit (1996) with the replacement , which preserves detailed balance in the face of a fluctuating proposal rate). One such decision constitutes one Monte Carlo step 1.

The grand-canonical algorithm is characterized by a chemical potential , where is the energy scale of thermal fluctuations. Positive values of this parameter favor a crowded substate, while negative values favor a sparsely occupied substrate. If the interparticle bond strength is large, then there is, in addition, a thermodynamic driving force for particles to assemble into structures. (In experiment, bond strength can be controlled by different mechanisms, depending upon the physical system, including temperature or salt concentration; here, for convenience, we sometimes describe increasing as “cooling”, and decreasing as “heating”.) For fixed values of these parameters the simulation algorithm obeys detailed balance, and so the system will evolve toward its themodynamic equilibrium. Depending on the parameter choices, this equilibrium may correspond to an assembled structure or to a gas or liquid of loosely-associated discs. For finite simulation time there is no guarantee that we will reach this equilibrium. Here we consider simulations or trajectories of fixed duration, corresponding to individual Monte Carlo steps (not sweeps, or steps per particle), starting from substrates containing 500 randomly-placed non-overlapping discs. These are relatively short trajectories in self-assembly terms: the slow cooling protocols of Ref. Whitelam (2016) used trajectories about 100 times longer.

Each trajectory starts with control-parameter values and , which does not give rise to self-assembly. As a trajectory progresses, a neural network chooses, every Monte Carlo steps, a change and in the two control parameters of the system (and so the same network acts 1000 times within each trajectory). These changes are added to the current values of the relevant control parameter, as long as they remain within the intervals and (if a control parameter moves outside of its specified interval then it is returned to the edge of the interval). Between neural-network actions, the values of the control parameters are held fixed. Networks are fully-connected architectures with 1000 hidden nodes and two output nodes, and a number of input nodes appropriate for the information they are fed. We used tanh activations on the hidden nodes; the full network function is given in Section S1.

Training of the network is done by evolution. We run 50 initial trajectories, each with a different, randomly-initialized neural network. Each network’s weights and biases are independent Gaussian random numbers with zero mean and unit variance. The collection of 50 trajectories produced by this set of 50 networks is called generation 0. After these trajectories run we score each according the the number of convex -gons present in the simulation box; the value of depends on the disc type under study and the structure whose assembly we wish to promote. The 5 networks whose trajectories have the largest values of are chosen to be the “parents” of generation 1. Generation 1 consists of these 5 networks, plus 45 mutants. Mutants are made by choosing at random one of these parent networks and adding to each weight and bias a Gaussian random number of zero mean and variance 0.01. After simulation of generation 1, we repeat the evolutionary procedure in order to create generation 2. Alternating the physical dynamics (the self-assembly trajectories) and the evolutionary dynamics (the mutation procedure) results in populations of networks designed to control self-assembly conditions so as to promote certain order parameters.

Figure 3: A self-assembly trajectory produced by the best generation-18 microscopic network of Fig. 2(f–h). Panel (a) and the time-ordered snapshots in (d) show the dynamics to be hierarchical in an extreme way, with most 3-gons (blue) forming before 12-gons (green) are made. Snapshot times are , from top to bottom. More detail can be seen in snapshots by enlarging them on a computer screen. Defects, such as disordered regions and 10- and 14-gons, also form. Panel (b) shows the temperature and chemical potential protocols chosen by the network, and (c) shows the inputs to the network.

Each evolutionary scheme used one of two types of network. The first, called the time network for convenience, has a single input node that takes the value of the scaled elapsed time of the trajectory, . The second, called the microscopic network for convenience, has input nodes, where is the number of patches on the disc, which take the value , the number of particles in the simulation box that possess engaged patches (divided by 1000). The time network is chosen to explore the ability of a network to influence the self-assembly protocol if it cannot observe the system at all. The microscopic network is chosen to see if a network able to observe the system can do better than one that cannot. Note that the microscopic network sees the number of bonds possessed by a particle, but does not have access to the evolutionary parameter , the number of convex -gons in the box. Thus it must learn the connection between these order parameters.

Dynamical trajectories are stochastic, even given a fixed protocol (policy), and so networks that perform well in one generation may be eliminated in the next. This can happen if, for example, a certain protocol promotes nucleation, the onset time for which varies from one trajectory to another. By the same token, the best yield can decrease from one generation to the next, and independent trajectories generated using a given protocol have a range of yields. To account for this effect one could place evolutionary requirements on the yield associated with many independent trajectories using the same protocol. Here we opted not to do this, reasoning that over the course of several generations the evolutionary process will naturally identify protocols that perform well when measured over many independent trajectories.

Ii Promoting self-assembly

Figure 4: Evolutionary search of self-assembly protocols using the microscopic network (a) applied to a 4-patch disc with angles and between patch bisectors. This disc can form a structure equivalent to the Archimedean tiling (b) or a rhombic structure (c). Evolutionary search can be used to do polymorph selection, promoting the assembly of the tiling (d–g) or the rhombic structure (h–k). In snapshots, 6-gons are dark blue, 3-gons are light blue, and 4-gons are red. More detail can be seen in snapshots by enlarging them on a computer screen.

In Fig. 2 we consider the “3.12.12” disc of Ref. Whitelam (2016), which has three, chemically specific patches whose bisectors are separated by angles and . This disc can form a structure equivalent to the 3.12.12 Archimedean tiling (a tiling with one 3-gon and two 12-gons around each vertex). The number of 12-gons is a suitable order parameter for this network. We use the time network within the evolutionary scheme of Fig. 1. Generation-0 trajectories are controlled by essentially random protocols, and many (e.g. those that involve weakening of interparticle bonds) result in no assembly (see Fig. S1). Some protocols result in low-quality assembly, and the best of these are used to create generation 1. Fig. 2(b) shows that assembly gets better with generation number: evolved networks learn to promote assembly of the desired structure. The protocols leading to these structures are shown in Fig. 2(c,d): early networks tend to strengthen bonds (“cool”) quickly and concentrate the substrate, while later networks strengthen bonds more quickly but also promote evacuation of the substrate. This strategy appears to reduce the number of obstacles to the closing of these relatively open and floppy structures. The most advanced networks refine these bond-strengthening and evacuation protocols.

The microscopic network [Fig. 2(f–h)] produces slightly more nuanced versions of the time-network protocols, and leads to better assembly. Thus, networks given access to configurational information learn more completely than those that know only the elapsed time of the procedure. In Fig. 3 we show in more detail a trajectory produced by the best generation-18 microscopic network. The self-assembly dynamics that results is a hierarchical assembly of the type seen in Ref. Whitelam (2016), in which trimers (3-gons) form first, and networks of trimers then form 12-gons, but is a more extreme version: in Fig. 3 we see that almost all the 3-gons made by the system form before the 12-gons begin to form. Thus the network has adopted a two-stage procedure in an attempt to maximize yield.

Networks given either temporal or microscopic information have therefore learned to promote self-assembly, without any external direction beyond an assessment, at the end of the trajectory, of which outcomes were best. Moreover, the quality of assembly is as least as good as under the intuition-driven procedures we have previously used. The procedure used in Ref. Whitelam (2016) was to increase linearly with time, at fixed , as slowly as possible in order to promote assembly and give defects plenty of time to anneal. In Fig. S2 we show a typical trajectory obtained under such a protocol, compared with a trajectory produced by the best generation-18 microscopic network from Fig. 2(f–h). The latter produces slightly better assembly as measured by the 12-gon count, and substantially fewer unnecessary 3-gons. Moreover, the neural network, which has been designed to act on short trajectories of length , produces structures 100 times faster than does slow cooling.

Here we have provided no prior input to the neural net to indicate what constitutes a good assembly protocol. One could also survey parameter space as thoroughly as possible, using intuition and prior experience of assembly, before turning to evolution. In such cases generation-0 assembly would be better. We found that even when generation-0 assembly was already of high quality, the evolutionary procedure was able to improve it. In Fig. S3 we consider evolutionary learning using the regular three-patch disc without patch-type specificity Whitelam et al. (2014b); Whitelam (2016). This disc forms the honeycomb network so readily that the best examples of assembly using 50 randomly-chosen protocols (generation 0) are already good. Nonetheless, evolution using the time network or microscopic network is able to improve the quality of assembly, with the microscopic network again performing better.

Iii Polymorph selection

Figure 5: Evolutionary search of self-assembly protocols using the microscopic network (left two columns) or the time network (right two columns) applied to a 4-patch disc with angles and between patch bisectors. Networks instructed to maximize the number of 6-gons (columns 1 and 3) or 4-gons (columns 2 and 4) learn to promote the assembly of the tiling or the rhombic structure. As in the other cases studied, the microscopic network is more effective than the time network. The boxed panels indicate the polygon number chosen as the evolutionary order parameter. We show the top 5 protocols per generation, with the best shown in blue.

In Fig. 4 we consider evolutionary search of self-assembly protocols using a 4-patch disc with angles and between patch bisectors. This disc can form a structure equivalent to the Archimedean tiling (a tiling with two 3-gons and two 6-gons around each vertex), or a rhombic structure. Particles have equal energy within the bulk of each structure, and at zero pressure (the conditions experienced by a cluster growing in isolation in a substrate) there is no thermodynamic preference for one structure over the other. Slow cooling therefore produces a mixture of both polymorphs Whitelam (2016). The polymorph can be selected by making the patches chemically selective Whitelam (2016), but here we do not do this. Instead, we show that evolutionary search can be used to develop protocols able to choose between these two polymorphs.

In Fig. 4 we show results obtained using generation-10 microscopic neural networks. Panels (d–g) pertain to simulations in which , the number of 6-gons, is the evolutionary order parameter. Panels (h–k) pertain to simulations in which , the number of 4-gons, is the evolutionary order parameter. Neural networks have learned to promote the polymorph (d–g) or the rhombic polymorph (h–k). Both contain defects and grain boundaries, but cover substantial parts of the substrate. In the case considered in Section II we already knew how to promote assembly – although the evolutionary protocol learned to do it more quickly and with higher fidelity – but here we did not possess advance knowledge of how to do polymorph selection using protocol choice.

Inspection of the polygon counts (d,h), the control-parameter histories (e,i), and the snapshots (g,k) provide insight into the selection strategies adopted by the networks. To select the tiling (d–g) the network has induced a tendency for particles to leave the surface (small ) and for bonds to be moderately strong (moderate ). The balance of these things appears to be such that trimers (3-gons), in which each particle has two engaged bonds, can form. Trimers serve as a building block for the structure, which then forms hierarchically as the chemical potential is increased (and the bond strength slightly decreased). By contrast, the rhombic structure appears to be unable to grow because it cannot form hierarchically from collections of rhombi (which also contain particles with two engaged bonds): growing beyond a single rhombus involves the addition of particles via only one engaged bond, and these particles are unstable, at early times, to dissociation. To select the rhombic structure (h–k) the network selects moderate bond strength and concentrates the substrate by driving large. In a dense environment it appears that the rhombic structure is more accessible kinetically than the more open network. In addition, in a dense environment there is a thermodynamic preference for the more compact rhombic polymorph, a factor that may also contribute to selection of the latter.

The microscopic network receives microscopic information periodically from the system, but the information it receives – the number of particles with certain numbers of engaged bonds – does not distinguish between the bulk forms of the two polymorphs. Moreover, the time network learns substantially similar protocols, albeit with slightly less effectiveness: in Fig. 5 we show that both microscopic and time networks evolve to promote the self-assembly of one polymorph or the other.

Neural-network policies evolve so as to certain optimize structural order parameters, despite the fact that we provide no such information as input. The time network receives only elapsed time, and cannot view the system at all; the microscopic network receives only local bond information averaged over all discs. A network must therefore learn the relationships between these inputs, its resulting actions, and the final-time order parameter.

Iv Conclusions

We have shown that neural networks trained by evolutionary reinforcement learning can control self-assembly protocols in molecular simulations. Networks learn to promote the assembly of desired structures, or choose between polymorphs. In the first case, networks reproduce the structures produced by previously-known protocols, but faster and with slightly higher fidelity; in the second case they identify strategies previously unknown, and from which we can extract physical insight. Networks that take as input only the elapsed time of the protocol are effective, and networks that take as input microscopic information from the system are more effective.

The problem we have addressed falls in the category of reinforcement learning in the sense that the neural network learns to perform actions (choosing new values of the control parameters) given an observation. The evolutionary approach we have applied to this problem requires only the assessment of a desired order parameter (here the polygon count ) at the end of a trajectory. This is an important feature because in self-assembly the best-performing trajectories at short times are not necessarily the best-performing trajectories at the desired observation time: see e.g. Fig. S1. For this reason it is not obvious that value-based reinforcement-learning methods Sutton and Barto (2018) are ideally suited to a problem such as self-assembly: rewarding “good” configurations at early times may not result in favorable outcomes at later times. Self-assembly is inherently a sparse-reward problem; which of the many ways of doing reinforcement learning is best for self-assembly is an open question.

Our results demonstrate proof of principle, and can be extended or adapted in several ways. We allow networks to act 1000 times per trajectory, in order to mimic a system in which we have only occasional control; the influence of a network could be increased by allowing it to act more frequently. We have chosen the hyperparameters of our scheme (mutation step size, neural network width, network activation functions, number of trajectories per generation) using values that seemed reasonable and that we subsequently observed to work, but these could be optimized (potentially by evolutionary search). The network architectures we have used are standard and can be straightforwardly adapted to handle an arbitrary number of inputs (system data) and outputs (changes of system control parameters). In addition, we have shown that learning can be effective using a modest number of trajectories (50) per generation. The evolutionary scheme should therefore be applicable to a broad range of experimental or computational systems. The results shown here have been achieved with no human input beyond the specification of which order parameter to promote, pointing the way to the design of synthesis protocols by artificial intelligence.

Acknowledgments – This work was performed as part of a user project at the Molecular Foundry, Lawrence Berkeley National Laboratory, supported by the Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02–05CH11231. I.T. performed work at the National Research Council of Canada under the auspices of the AI4D Program.

Appendix S1 Neural network

Each network is a fully-connected architecture with input nodes, hidden nodes, and output nodes. Let the indices , , and label nodes in the input, hidden, and output layers, respectively. Let be the weight connecting nodes and , and let be the bias applied to hidden-layer node . Then the two output nodes take the values




and denotes the input-node value(s). For the time network we have and . For the microscopic network we have , where is the number of patches on the disc, and is the number of particles in the simulation box having engaged patches (divided by 1000). The output-node values are taken to be the changes and , provided that and remain in the intervals and , respectively.

Appendix S2 Supplemental Figures

Figure S1: Supplement to Fig. 2. (a) Generation-0 trajectories of the time network applied to the 3.12.12 disc; most networks fail to produce assembly. (b) Generation-18 trajectories generally result in much better assembly. However, note that some networks, although they are offspring of successful generation-17 networks, result in low-quality assembly.
Figure S2: Supplement to Fig. 2. We compare (a) the slow cooling procedure of Ref. Whitelam (2016) with (b) a trajectory produced by the best generation-18 microscopic network from Fig. 2(f–h). The network produces slightly better assembly as measured by the 12-gon count (which is more relevant to the quality of assembly of the 3.12.12 structure than is the 3-gon count) and does so about 100 times faster.
Figure S3: Supplement to Fig. 2, but using evolutionary learning of self-assembly protocols with the regular three-patch disc without patch specificity. This disc forms the honeycomb network (a) so readily that assembly using 50 randomly-chosen protocols (generation-0) is already good; see panel (b). Hexagons are colored light blue. Nonetheless, evolution using the time network (c) or microscopic network (d) can improve the quality of assembly, and, again, the microscopic network is better than the time one. We show the top 5 protocols per generation, with the best shown in blue.


  1. The natural way to measure “real” time in such a system is to advance the clock by an amount upon making an attempted move. Dense systems and sparse systems then take very different amounts of CPU time to run. In order to move simulation generations efficiently through our computer cluster we instead updated the clock by one unit upon making a move. In this way we work in the constant event-number ensemble.


  1. George M Whitesides, John P Mathias,  and Christopher T Seto, “Molecular self-assembly and nanochemistry: A chemical strategy for the synthesis of nanostructures,” Science 254, 1312–1319 (1991).
  2. Paul L Biancaniello, Anthony J Kim,  and John C Crocker, “Colloidal interactions and self-assembly using dna hybridization,” Physical Review Letters 94, 058302 (2005).
  3. Sung Yong Park, Abigail KR Lytton-Jean, Byeongdu Lee, Steven Weigand, George C Schatz,  and Chad A Mirkin, “Dna-programmable nanoparticle crystallization,” Nature 451, 553–556 (2008).
  4. Dmytro Nykypanchuk, Mathew M Maye, Daniel van der Lelie,  and Oleg Gang, “Dna-guided crystallization of colloidal nanoparticles,” Nature 451, 549–552 (2008).
  5. Yonggang Ke, Luvena L Ong, William M Shih,  and Peng Yin, “Three-dimensional structures self-assembled from dna bricks,” Science 338, 1177–1183 (2012).
  6. J. P. K. Doye, A. A. Louis,  and M. Vendruscolo, “Inhibition of protein crystallization by evolutionary negative design,” Physical Biology 1, P9 (2004).
  7. Flavio Romano and Francesco Sciortino, “Colloidal self-assembly: patchy from the bottom up,” Nature materials 10, 171 (2011).
  8. SC Glotzer, MJ Solomon,  and Nicholas A Kotov, “Self-assembly: From nanoscale to microscale colloids,” AIChE Journal 50, 2978–2985 (2004).
  9. J. P. K. Doye, A. A. Louis, I. C. Lin, L. R. Allen, E. G. Noya, A. W. Wilber, H. C. Kok,  and R. Lyus, “Controlling crystallization and its absence: Proteins, colloids and patchy models,” Physical Chemistry Chemical Physics 9, 2197–2205 (2007a).
  10. D. C. Rapaport, “Modeling capsid self-assembly: design and analysis,” Phys. Biol. 7, 045001 (2010).
  11. Aleks Reinhardt and Daan Frenkel, “Numerical evidence for nucleated self-assembly of dna brick structures,” Physical Review Letters 112, 238103 (2014).
  12. Arvind Murugan, James Zou,  and Michael P Brenner, “Undesired usage and the robust self-assembly of heterogeneous structures,” Nature Communications 6 (2015).
  13. Stephen Whitelam and Robert L Jack, “The statistical mechanics of dynamic pathways to self-assembly,” Annual review of Physical Chemistry 66, 143–163 (2015).
  14. Michael Nguyen and Suriyanarayanan Vaikuntanathan, “Design principles for nonequilibrium self-assembly,” Proceedings of the National Academy of Sciences 113, 14231–14236 (2016).
  15. Stephen Whitelam, Lester O Hedges,  and Jeremy D Schmit, “Self-assembly at a nonequilibrium critical point,” Physical Review Letters 112, 155504 (2014a).
  16. RB Jadrich, BA Lindquist,  and TM Truskett, “Probabilistic inverse design for self-assembling materials,” The Journal of Chemical Physics 146, 184103 (2017).
  17. James F Lutsko, “How crystals form: A theory of nucleation pathways,” Science advances 5, eaav7399 (2019).
  18. Richard S Sutton and Andrew G Barto, Reinforcement learning: An introduction (2018).
  19. Christopher JCH Watkins and Peter Dayan, “Q-learning,” Machine learning 8, 279–292 (1992).
  20. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra,  and Martin Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602  (2013).
  21. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature 518, 529 (2015).
  22. Marc G Bellemare, Yavar Naddaf, Joel Veness,  and Michael Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research 47, 253–279 (2013).
  23. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver,  and Koray Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in International conference on machine learning (2016) pp. 1928–1937.
  24. Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al., “Deepmind control suite,” arXiv preprint arXiv:1801.00690  (2018).
  25. Emanuel Todorov, Tom Erez,  and Yuval Tassa, “Mujoco: A physics engine for model-based control,” in Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on (IEEE, 2012) pp. 5026–5033.
  26. Martin L Puterman, Markov decision processes: discrete stochastic dynamic programming (John Wiley & Sons, 2014).
  27. Andrea Asperti, Daniele Cortesi,  and Francesco Sovrano, “Crawling in rogue’s dungeons with (partitioned) a3c,” arXiv preprint arXiv:1804.08685  (2018).
  28. Martin Riedmiller, ‘‘Neural fitted q iteration–first experiences with a data efficient neural reinforcement learning method,” in European Conference on Machine Learning (Springer, 2005) pp. 317–328.
  29. Martin Riedmiller, Thomas Gabel, Roland Hafner,  and Sascha Lange, “Reinforcement learning for robot soccer,” Autonomous Robots 27, 55–73 (2009).
  30. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford,  and Oleg Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347  (2017).
  31. Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley,  and Jeff Clune, “Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning,” arXiv preprint arXiv:1712.06567  (2017).
  32. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang,  and Wojciech Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540  (2016).
  33. Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek,  and Wojciech Jaśkowski, “Vizdoom: A doom-based ai research platform for visual reinforcement learning,” in Computational Intelligence and Games (CIG), 2016 IEEE Conference on (IEEE, 2016) pp. 1–8.
  34. Marek Wydmuch, Michał Kempka,  and Wojciech Jaśkowski, “Vizdoom competitions: Playing doom from pixels,” arXiv preprint arXiv:1809.03470  (2018).
  35. David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al., ‘‘Mastering the game of go with deep neural networks and tree search,” nature 529, 484 (2016).
  36. David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al., “Mastering the game of go without human knowledge,” Nature 550, 354 (2017).
  37. Zhenli Zhang and Sharon C Glotzer, “Self-assembly of patchy particles,” Nano Letters 4, 1407–1413 (2004).
  38. Flavio Romano, Eduardo Sanz,  and Francesco Sciortino, “Crystallization of tetrahedral patchy particles in silico,” The Journal of chemical physics 134, 174502 (2011).
  39. Francesco Sciortino, Emanuela Bianchi, Jack F Douglas,  and Piero Tartaglia, “Self-assembly of patchy particles into polymer chains: A parameter-free comparison between wertheim theory and monte carlo simulation,” The Journal of chemical physics 126, 194903 (2007).
  40. J.P.K. Doye, A.A. Louis, I.C. Lin, L.R. Allen, E.G. Noya, A.W. Wilber, H.C. Kok,  and R. Lyus, “Controlling crystallization and its absence: proteins, colloids and patchy models,” Physical Chemistry Chemical Physics 9, 2197–2205 (2007b).
  41. Emanuela Bianchi, Piero Tartaglia, Emanuela Zaccarelli,  and Francesco Sciortino, “Theoretical and numerical study of the phase diagram of patchy colloids: Ordered and disordered patch arrangements,” The Journal of Chemical Physics 128, 144504 (2008).
  42. Günther Doppelbauer, Emanuela Bianchi,  and Gerhard Kahl, ‘‘Self-assembly scenarios of patchy colloidal particles in two dimensions,” Journal of Physics: Condensed Matter 22, 104105 (2010).
  43. Stephen Whitelam, Isaac Tamblyn, Thomas K Haxton, Maria B Wieland, Neil R Champness, Juan P Garrahan,  and Peter H Beton, “Common physical framework explains phase behavior and dynamics of atomic, molecular, and polymeric network formers,” Physical Review X 4, 011044 (2014b).
  44. Étienne Duguet, Céline Hubert, Cyril Chomette, Adeline Perro,  and Serge Ravaine, “Patchy colloidal particles for programmed self-assembly,” Comptes Rendus Chimie 19, 173–182 (2016).
  45. D. Frenkel and B. Smit, Understanding Molecular Simulation: From Algorithms to Applications (Academic Press, Inc. Orlando, FL, USA, 1996).
  46. Daphne Klotsa and Robert L Jack, “Controlling crystal self-assembly using a real-time feedback scheme,” The Journal of Chemical Physics 138, 094502 (2013).
  47. Andrew W Long, Jie Zhang, Steve Granick,  and Andrew L Ferguson, “Machine learning assembly landscapes from particle tracking data,” Soft Matter 11, 8141–8153 (2015).
  48. Andrew W Long and Andrew L Ferguson, ‘‘Nonlinear machine learning of patchy colloid self-assembly pathways and mechanisms,” The Journal of Physical Chemistry B 118, 4228–4244 (2014).
  49. Beth A Lindquist, Ryan B Jadrich,  and Thomas M Truskett, “Communication: Inverse design for self-assembly via on-the-fly optimization,” Journal of Chemical Physics 145 (2016).
  50. Bryce A Thurston and Andrew L Ferguson, “Machine learning and molecular design of self-assembling-conjugated oligopeptides,” Molecular Simulation 44, 930–945 (2018).
  51. Andrew L Ferguson, “Machine learning and data science in soft materials engineering,” Journal of Physics: Condensed Matter 30, 043002 (2017).
  52. Norbert Kern and Daan Frenkel, “Fluid–fluid coexistence in colloidal systems with short-ranged strongly directional attraction,” The Journal of Chemical Physics 118, 9882 (2003).
  53. Stephen Whitelam, “Minimal positive design for self-assembly of the archimedean tilings,” Physical Review Letters 117, 228003 (2016).
  54. Stephen Whitelam, Edward H Feng, Michael F Hagan,  and Phillip L Geissler, “The role of collective motion in examples of coarsening and self-assembly,” Soft Matter 5, 1251–1262 (2009).
  55. L. O. Hedges, “http://vmmc.xyz” .
  56. Thomas K Haxton, Lester O Hedges,  and Stephen Whitelam, “Crystallization and arrest mechanisms of model colloids,” Soft matter 11, 9307–9320 (2015).
  57. The natural way to measure “real” time in such a system is to advance the clock by an amount upon making an attempted move. Dense systems and sparse systems then take very different amounts of CPU time to run. In order to move simulation generations efficiently through our computer cluster we instead updated the clock by one unit upon making a move. In this way we work in the constant event-number ensemble.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description