Predicting Electron Paths
Chemical reactions can be described as the stepwise redistribution of electrons in molecules. As such, reactions are often depicted using “arrow-pushing” diagrams which show this movement as a sequence of arrows. We propose an electron path prediction model (Electro) to learn these sequences directly from raw reaction data. Instead of predicting product molecules directly from reactant molecules in one shot, learning a model of electron movement has the benefits of (a) being easy for chemists to interpret, (b) incorporating constraints of chemistry, such as balanced atom counts before and after the reaction, and (c) naturally encoding the sparsity of chemical reactions, which usually involve changes in only a small number of atoms in the reactants. We design a method to extract approximate reaction paths from any dataset of atom-mapped reaction SMILES strings. Our model achieves state-of-the-art results on a subset of the UPSTO reaction dataset. Furthermore, we show that our model recovers a basic knowledge of chemistry without being explicitly trained to do so.
Predicting Electron Paths
John Bradshaw University of Cambridge Max Planck Institute, Tübingen firstname.lastname@example.org Matt J. Kusner Alan Turing Institute University of Warwick email@example.com Brooks Paige Alan Turing Institute University of Cambridge firstname.lastname@example.org Marwin H. S. Segler BenevolentAI email@example.com José Miguel Hernández-Lobato University of Cambridge Alan Turing Institute firstname.lastname@example.org
noticebox[b]Preprint. Work in progress.\end@float
The ability to reliably predict the products of chemical reactions is of tremendous importance for areas as diverse as health care, renewable energy, and construction, providing molecules which serve as medicines, energy capturing devices, and nanomaterials.
Theoretically, all chemical reactions can be described by the stepwise rearrangement of electrons in molecules . This sequence of bond-making and breaking is known as the reaction mechanism. Understanding the reaction mechanism is crucial because it not only determines the products (formed at the last step of the mechanism), but it also provides insight into why the products are formed on an atomistic level. Mechanisms can be treated at different levels of abstraction. On the lowest level, quantum-mechanical simulations of the electronic structure can be performed, which is prohibitively computationally expensive for most systems of interest. On the other end, chemical reactions can be treated as rules that “rewrite” reactant molecules to products, which abstracts away the individual electron redistribution steps into a single, global transformation step. To combine the advantages of both approaches, chemists use a tremendously powerful intermediate conceptual model, which simplifies the stepwise electron shifts using sequences of arrows which indicate the path of electrons throughout molecular graphs .
Recently, there have been a number of machine learning models proposed for directly predicting the products of chemical reactions [2, 7, 20, 21, 22, 24], largely using graph-based or machine translation models. The task of reaction product prediction is shown on the left-hand side of Figure 1.
In this paper we propose a machine learning model to predict the reaction mechanism, as shown on the right-hand side of Figure 1, for a particularly important subset of organic reactions. We argue that not only is our model more interpretable than product prediction models but it is easier to encode in it the constraints imposed by chemistry. Proposed approaches to predicting reaction mechanisms have been based on combining hand-coded heuristics and quantum mechanics [1, 11, 17, 18, 23, 26], rather than machine learning. We call our model Electro, as it directly predicts the path of electrons through molecules (i.e., the reaction mechanism). To train the model we devise a general technique to obtain approximate reaction mechanisms purely from data about the reactants and products. This allows one to train our a model on large, unannotated reaction datasets such as USPTO . We demonstrate that not only does our model achieve state-of-the-art results, surprisingly it also learns chemical properties it was not explicitly trained on.
In Figure 1 we show the two challenges we tackle in this paper. On the left we show product prediction, where the goal is to predict the reaction products, given a set of reactants and reagents. However, for this task we do not care how the reactants react. This job of finding the how is the target of reaction mechanism prediction. Before describing how our model predicts mechanisms, we use this section to detail the types of reactions we consider in this paper, and their properties.
Molecules and Chemical reactions.
Organic molecules (those involving predominantly carbon) can be modeled as a graph structure, where each node is an atom and each edge is a covalent bond. These covalent bonds represent the fact that one or more pairs of electrons are shared between the atoms that the bond connects.
Just as electrons describe the current structure of molecules, they also describe how molecules react with other molecules to produce new ones. All chemical reactions involve the stepwise movement of electrons along the atoms in a set of reactant molecules. This movement causes the formation and breaking of chemical bonds that changes the reactants into a new set of product molecules . Reaction mechanisms can be classified by the topology of their "electron-pushing arrows". Here, the class of reactions with linear electron flow topology is by far the most important, followed by those with cyclic topology .
In this work, we will only consider reactions with linear topology that involve the movement of pairs of electrons.
Reactions as single electron paths.
If reactions fall into this class, then a chemical reaction can be modeled as pairs of electrons moving in a single path through the reactant atoms. Further, this electron path will alternately remove existing bonds in molecules, and form new ones. We show this alternating structure in the right hand part of Figure 1. The reaction formally starts by taking the pair of electrons between the Li and C atoms and moving them to the C atom (step 1); this is a remove bond step. Next comes an add step where electrons are moved from the C atom to form a bond between the two reactant molecules (step 2). Then a pair of electrons are removed between the C and O atoms and moved to the O atom, giving rise to the products (step 3). Predicting the final product is thus a byproduct of predicting this series of electron steps.
Easy to interpret: If the model makes a mistake, it is easy to see where, and possibly why, it goes wrong by comparing the steps of the path with the correct steps.
Sparse: Reactions often only affect between 3 and 7 atoms out of anywhere from 10-50 reactant atoms. Modeling the reaction as a path allows us to exploit this sparsity.
Chemically consistent: Learning a path allows us to easily incorporate chemical constraints, such as the alternating removal and addition of bonds, among others.
Generalizable: As reaction paths follow similar trends in different molecules, our model naturally generalizes to unseen molecules.
The only other work we are aware of to use machine learning to predict reaction mechanisms are [3, 8, 9, 10]. All of these model a chemical reaction as an interaction between atoms which function as electron donors and those which function as electron acceptors. They predict the reaction mechanism via two independent models: one that identifies these likely electron sources and sinks, and another that ranks all combinations of them. However, this combining and then ranking of electron sources and sinks can be a slow process, as many plausible reactions need to be considered (the number of subgraphs of reacting atoms, where single bonds are either added or removed is ). These models also have only so far been successfully trained on small hand-curated datasets.
In this section we define a probabilistic model that describes the movement of electrons that define linear topology reactions. We represent a set of molecules as a set of graphs , with atoms as vertices and bonds as edges; each connected component of the graph defines an individual molecule. We can associate an ordering over all the atoms in all the molecules in the set using an atom map number: an integer label assigned to each non-hydrogen atom in both the reactants and the products which both permits easy matching between atoms before and after the reaction, and gives us a consistent way to index particular atoms. Each atom includes a set of features, such as its atom type (e.g. carbon, oxygen, …); the full list of input atom features can be found in Table 3 of the appendix. Molecules input into the model are first put in a Kekulé form, a process which makes explicit the location of single and double bonds in aromatic structures; each bond is either a single, double, or triple bond.
Given an initial set of reactant molecules and a set of reagent molecules , our model defines a conditional distribution over a sequence of atoms (which we also refer to as actions) , which fully characterizes the electron path. This electron path in turn deterministically defines both a final product , denoting the outcome of the reaction, as well as a sequence of intermediate products , for , which correspond to the state of the graph after the first steps in the subsequence are applied to the initial . We also define a stopping sequence which indicates if the reaction should stop (i.e, if the reaction should stop and is otherwise).
We propose to learn a distribution over electron movements. We first detail the generative process that specifies , before describing how to train the model’s parameters, .
3.1 Generative process
First note that since our reactions are a single path of electrons through the reactants then, at any point, the next step in the path depends only on (i) the intermediate molecule formed by the action path up to that point, (ii) the previous action taken (indicating where the free pair of electrons are) and (iii) the point of time through the path, indicating whether we are on an add or remove bond step. We make the simplifying assumption that the stop probability and the actions after the initial action do not depend on the reagents. This leads to a parameterized model with dependency structure:
where we have defined to be the probability of continuing a reaction given the current molecule set . The other terms include , the probability of the initial state given the reactants and reagents; the conditional probability of next state given the intermediate products for ; and the probability that the reaction terminates with final product .
It is possible to stop prior to selecting a first atom , indicating that no reaction would take place. However, we restrict our model to not stop at step , as it is necessary to pick up a complete electron pair. Given any particular selected atom which extends the reaction path, we can deterministically update the previous molecular graph to produce the next set of (intermediate) products .
Given our reaction assumptions, then, as stated earlier, there are two types of electron movements that alternate: (i) movement that removes an existing bond, and (ii) movement that adds a new bond. We define atoms with free electrons as having a self-bond. Thus, all reactions start by first selecting an atom, removing a bond (between two different atoms, or a self-bond), and then alternately adding and removing bonds; we can determine whether a particular step is an add step or a remove step by inspecting . Note that , as the initial action of selecting does not form or remove any bonds. Figure 2 presents a simple example reaction which demonstrates all the critical features of the model; the subfigures show the sequence of intermediate products and the distributions over actions.
Each of the conditional probabilities in Eq. (1) are parameterized by neural networks. At each step of the electron path, the network takes the current intermediate graphs, the previous action, and the reagents if relevant, and computes a probability distribution over next possible actions (i.e., selecting a particular atom, or stopping). The structure of these networks is described in the following section.
3.2 Computing atom and molecule features
We are left now with defining the functional form of our conditional distributions for continuing , picking the initial action , and picking subsequent actions . However, before describing these modules we need to describe how we compute node embeddings and graph embeddings, as these are essential to each. Full architectural details (e.g. number of layers and hidden units) and training settings are deferred to the appendix.
Node embeddings are representations of all the atoms in all the molecules present in . We denote them by the matrix . Each row contains a -dimensional embedding of an atom. A natural ordering for the rows are the atom-map numbers assigned to each atom. We define the function to map a set of graphs representing each molecule to these node embeddings. In general could be any graph model that uses the graph structure of to get graph-isomorphic node features, usually via message-passing techniques ; we choose to use gated graph neural network (GGNN) message functions .
It is also useful to be able to calculate graph embeddings , which are vectors that represent groups of nodes belonging to one or more graphs; i.e. an entire molecule or set of molecules. We define the function that maps node features belonging to each atom to their graph embedding by . These are similar to the readout functions used for regressing on graphs detailed in [4, Eq. 3] and the graph embeddings described in Li et al. [14, §B.1]. Specifically, consists of three functions, , and , which could be any multi-layer perceptron (MLP) but in practice we find that linear functions suffice. They are used to form the graph embedding, , as
Where is a sigmoid function. We can break this equation down into two stages. In stage (i), similar to Li et al. [14, §B.1], we form an embedding of one or more molecules (with vertices and with ) by performing a gated sum over the node features. In this manner the function is used to decide how much that node should contribute towards the embedding, and projects the node embedding up to a higher dimensional space; following Li et al. [14, §B.1], we choose this to be double the dimension of the node features. Having formed this embedding of the graphs, we project this down to a lower dimensional space in stage (ii), which is done by the function .
3.3 Computing probabilities over actions
Having described how we compute node and graph embeddings, we are now ready to define each of our parameterized distributions over actions. The simplest of these is , which is the probability of continuing given the set of intermediate products at time . This probability is computed from a graph embedding via the function , which projects down to a single dimension. We then map this to the interval via the sigmoid function which yields the overall expression
Each of the three parameterized conditional probability distributions for the start, add and remove steps have similar forms, each defining a probability vector over actions. The transition distribution which selects the next atom in the sequence can be split into two distributions depending on the parity of : the remove bond step distribution is used when is odd, and the add bond step distribution is used when is even.
These three modules each have the same overall functional form
where is one of the networks , or ; is a context vector, and is a binary mask.
Each of the three actions has a different context and mask. The add step and remove step , have as context the node embedding of the atom selected at the previous step, . For the initial step, this context vector is an embedding of all the reagents present, computed by a graph embedding function . When computing the output probabilities, we use the binary vector to mask out specific actions known to be impossible. The value of this differs for the start, add and remove steps; for the start step any action can be picked, so is the all-ones vector. For the remove step, masks out (i.e. is set to zero for) any bonds that do not currently exist and thus cannot be removed (noting though that self-bonds are permitted in the first remove step). For the add step, only masks out the previous action, preventing the model from stalling in the same state for multiple time-steps.
We can learn the parameters of all the parameterized functions, by maximizing the log likelihood of a full path . This is evaluated by using a known electron path and intermediate products extracted from training data, rather than on simulated values. This allows us to train on all stages of the reaction at once, given electron path data. We train our models using Adam  and an initial learning rate of , with minibatches of size one, where minibatches often consist of multiple intermediate graphs.
Once trained, we can use our model to sample chemically-valid paths given an input set of reactants and reagents , simply by simulating from the conditional distributions until sampling a stop value . We instead would like to find a ranked list of the top- predicted paths, and do so using a modified beam search, in which we roll out a beam of width until a maximum path length , while recording all paths which have terminated. This search procedure is described in detail in Algorithm 1 in the Appendix.
4 Reaction Mechanism Identification
Our dataset for evaluating our model is a collection of chemical reactions extracted from the US patent database . We take as our starting point the 479,035 reactions, along with the training, validation, and testing splits which were used by Jin et al. , referred to as the USPTO dataset. This data consists of, per reaction, a group of bond changes and reaction SMILES strings . The bond changes indicate pairs of atoms which are connected differently in the reactants and products. The SMILES strings encode the molecules present in a text based format. Before we can apply our method, we perform two data preprocessing tasks described in the subsections below (using the open-source chemo-informatics software RDKit ). These steps automatically extract a subset of data appropriate for training our model of electron movement during a reaction.
4.1 Reactant and reagent separation
Typically, reaction SMILES strings are split into three parts — reactants, reagents, and products.
The reactant molecules are those which are consumed during the course of the chemical reaction to form the product, while the reagents are any additional molecules which provide context under which the reaction occurs (for example, catalysts), but do not explicitly take part in the reaction itself; we see this in the example in Figure 1.
Unfortunately, the USPTO dataset as extracted does not differentiate between reagents and reactants. We elect to preprocess the entire USPTO dataset by separating out the reagents from the reactants using the process outlined in Schwaller et al. , where we classify as a reagent any molecule for which either (i) none of its constituent atoms appear in the product, or (ii) the molecule appears in the product SMILES completely unchanged from the pre-reaction SMILES. This allows us to properly model molecules which are included in the dataset but do not materially contribute to the reaction.
4.2 Identifying reactions with linear electron topology
To train our model, it is necessary to extract a ground-truth representation of the electron paths from the SMILES strings and bond changes. Furthermore, not every reaction in the USPTO dataset has a linear electron topology; such reactions (for example, multi-step reactions and cycloadditions) will not have a single unique path through the atoms which describes the movement of the electrons.
The first step is to look at the bond changes present in a reaction. Each atom on the ends of the path will be involved in exactly one bond change; the atoms in the middle will be involved in two. We can then line up bond change pairs so that neighboring pairs have one atom in common, with this ordering forming a path. For instance, given the pairs "11-13, 14-10, 10-13" we form the unordered path "14-10, 10-13, 13-11". If we are unable to form such a path, for instance due to two paths being present as a result of multiple reaction stages, then we discard the reaction.
For training our model we want to find the ordering of our path, so that we know in which direction the electrons flow. To do this we examine the changes of the properties of the atoms at the two ends of our path. In particular, we look at changes in charge and attached implicit hydrogen counts. The gain of negative charge (or analogously the gain of hydrogen as H ions without changing charge) indicates that electrons have arrived at this atom, implying that this is the end of the path; vice-versa for the start of the path. However, sometimes the difference is not available in the USPTO data, as unfortunately only major products are recorded, and so details of what happens to some of the reactant molecules’ atoms may be missing. In these cases we fall back to using an element’s electronegativity to estimate the direction of our path, with more electronegative atoms attracting electrons towards them and so being at the end of the path.
The next step of filtering checks that the path alternates between add steps (+1) and remove steps (-1). This is done by analyzing and comparing the bond changes on the path in the reactant and product molecules. Reactions that involve greater than one change (for instance going from no bond between two atoms in the reactants to a double bond between the two in the products) can indicate multi-step reactions with identical paths, and so are discarded. Finally, as a last sanity check, we use RDKit to produce all the intermediate and final products induced by our path acting on the reactants, to confirm that the final product that is produced by our extracted electron path is consistent with the major product SMILES in the USPTO dataset.
The end result of this is extracted reaction paths for those entries in the USPTO dataset which correspond to reactions of linear topology. This comprises of the dataset, containing 349,898 total reactions, of which 29,360 form the held-out test set.
5 Experiments and Evaluation
In our experiments we consider two variants of our model: the first model Electro is exactly as defined in Section 3, including all reagent information; a second version we call Electro-Lite ignores the reagents when selecting the initial action (i.e. no context vector is input into ), allowing us to gauge the importance of reagents in determining what reaction occurs. We now evaluate our model on the reaction mechanism prediction and reaction product prediction problems.
5.1 Reaction Mechanism Prediction
For mechanism prediction we are interested in ensuring we obtain the exact sequence of electron actions correctly. For instance, when forming a bond between two pairs of atoms we want to know which one of the atoms donated the electron pair needed to form the bond, even if the end result is the same. The representation of the reaction mechanism produced by our model is a sequence of atoms, detailing the path taken by the electrons in a series of alternating steps in which bonds are broken and formed; using the atom mapping from the USPTO dataset this takes the form of a series of integers.
The most straightforward approach then to evaluate our accuracy at predicting reaction mechanisms is to check whether the sequence of integers extracted from the raw data as described earlier is an exact match with the sequence of integers output by Electro; the top-1, top-2, top-3, and top-5 accuracies evaluated in this manner are reported in Table 1.
5.2 Reaction Product Prediction
Reaction mechanism prediction is useful for ensuring that we formed the correct product in the correct way. However, this can underestimate the model’s actual predictive accuracy: although a single atom mapping is provided as part of the USPTO dataset, in general atom mappings are not unique (e.g., if a molecule contains symmetries).
Here this would manifest as multiple different sequences of integers which correspond to chemically-identical electron paths. The first figure in the appendix shows an example of a reaction with symmetries, where different electron paths produce the same product.
Recent machine learning approaches to reaction product prediction [7, 20] have evaluated whether the major product reported in the test dataset matches predicted candidate products generated by their system, independent of mechanism. In our case, the top-5 accuracy for a particular reaction may include multiple different electron paths that ultimately yield the same product molecule.
Identifying whether two product molecules are chemically the same is equivalent to solving a graph isomorphism over the atoms and bond types, comparing the output of our system to the product molecule. To perform this comparison, we take the sequence of edits on the reactants graph defined by an electron path, apply these edits to define a product graph, and then define a deterministic mapping from the edited graph to a canonical string representation. This is done by mapping the molecule to Kekulé form, then applying the sequence of edits to the reactants graph, setting explicit charges or hydrogen counts on the first and last atom in the electron path in order to satisfy valence constraints. We then strip all atom map numbers from the graph. If this graph corresponds to a valid product molecule, we can use RDKit to express the molecule in a canonical SMILES string format (predicted electron paths which yield chemically infeasible products are considered failures). We can then evaluate whether a predicted electron path matches the ground truth by a string comparison.
To use our model to produce a ranked list of predicted products, we compute the canonicalized product SMILES for each of the predictions found by beam search over electron paths, removing duplicates along the way. These product-level accuracies are reported in Table 2. For product prediction we compare with the state-of-the-art graph-based method ; we use their evaluation code and pre-trained model111https://github.com/wengong-jin/nips17-rexgen, re-evaluated on our extracted test set. We also compare against the Seq2Seq model proposed by ; unfortunately, however, we were unable to evaluate it on our extracted test set, and instead quote their reported performance on a different USPTO test set for reference. Overall, Electro outperforms all other approaches on this metric, with 87% top-1 accuracy and 95.9% top-5 accuracy. Omitting the reagents in Electro degrades top-1 accuracy slightly, but maintains a high top-3 and top-5 accuracy, suggesting that reagent information is necessary to provide context in disambiguating plausible reaction paths.
If the ultimate desired goal is to predict the product molecule rather than the reaction mechanism, a benefit of our approach is the predicted electron paths can then serve as an explanation. In this manner, when showing predicted products, we can list, alongside the maximum likelihood path, any other candidate paths that result in the same product.
5.3 Qualitative Analysis
Complex molecules often feature several potentially reactive functional groups, which compete for reaction partners. To predict the selectivity, that is which functional group will predominantly react in the presence of other groups, students of chemistry learn heuristics and trends, which have been established over the course of three centuries of experimental observation. To qualitatively study whether the model has learned such trends from data we queried the model with several typical text book examples from the chemical curriculum (see Figure 3 and the appendix). We found that the model predicts most examples correctly. In the few incorrect cases, interpreting the model’s output reveals that the model made chemically plausible predictions.
In this paper we proposed Electro, a model for predicting electron paths. These electron paths, or reaction mechanisms provide a detailed description of how two reactants react together. Our model (i) produces output that is easy for chemists to interpret, and (ii) is able to exploit the sparsity involved in chemical reactions, in which often only small numbers of atoms in the reactants interact. As a byproduct of predicting reaction mechanisms we are also able to perform reaction product prediction, and achieve state-of-the-art performance on this task.
We would like to thank Jennifer Wei, Dennis Sheberla, and David Duvenaud for their very helpful discussions. BP and MK are supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1. JB acknowledges support from an EPSRC studentship.
- Bergeler et al.  Maike Bergeler, Gregor N Simm, Jonny Proppe, and Markus Reiher. Heuristics-guided exploration of reaction mechanisms. Journal of chemical theory and computation, 11(12):5712–5722, 2015.
- Coley et al.  Connor W Coley, Regina Barzilay, Tommi S Jaakkola, William H Green, and Klavs F Jensen. Prediction of organic reaction outcomes using machine learning. ACS central science, 3(5):434–443, 2017.
- Fooshee et al.  David Fooshee, Aaron Mood, Eugene Gutman, Mohammadamin Tavakoli, Gregor Urban, Frances Liu, Nancy Huynh, David Van Vranken, and Pierre Baldi. Deep learning for chemical reaction prediction. Molecular Systems Design & Engineering, 2018.
- Gilmer et al.  Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In ICML, 2017.
- Herges [1994a] Rainer Herges. Coarctate transition states: The discovery of a reaction principle. Journal of chemical information and computer sciences, 34(1):91–102, 1994a.
- Herges [1994b] Rainer Herges. Organizing principle of complex reactions and theory of coarctate transition states. Angewandte Chemie International Edition in English, 33(3):255–276, 1994b.
- Jin et al.  Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with Weisfeiler-Lehman network. In NIPS, pages 2604–2613, 2017.
- Kayala and Baldi  Matthew A Kayala and Pierre Baldi. Reactionpredictor: Prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Mod., 52(10):2526–2540, 2012.
- Kayala and Baldi  Matthew A. Kayala and Pierre F. Baldi. A machine learning approach to predict chemical reactions. In NIPS, 2011.
- Kayala et al.  Matthew A Kayala, Chloé-Agathe Azencott, Jonathan H Chen, and Pierre Baldi. Learning to predict chemical reactions. J. Chem. Inf. Mod., 51(9):2209–2222, 2011.
- Kim et al.  Yeonjoon Kim, Jin Woo Kim, Zeehyo Kim, and Woo Youn Kim. Efficient prediction of reaction paths through molecular graph and reaction network analysis. Chemical Science, 2018.
- Kingma and Ba  Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- Li et al.  Yujia Li, Richard Zemel, Marc Brockschmidt, and Daniel Tarlow. Gated graph sequence neural networks. In ICLR, 2016.
- Li et al.  Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324, 2018.
- Lowe  Daniel Lowe. Chemical reactions from US patents (1976-Sep2016). 6 2017. doi: 10.6084/m9.figshare.5104873.v1. URL https://figshare.com/articles/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873.
- Lowe  Daniel Mark Lowe. Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 2012.
- Nandi et al.  Surajit Nandi, Suzanne R McAnanama-Brereton, Mark P Waller, and Anakuthil Anoop. A tabu-search based strategy for modeling molecular aggregates and binary reactions. Computational and Theoretical Chemistry, 1111:69–81, 2017.
- Rappoport et al.  Dmitrij Rappoport, Cooper J Galvin, Dmitry Yu Zubarev, and Alán Aspuru-Guzik. Complex chemical reaction networks from heuristics-aided quantum chemistry. Journal of chemical theory and computation, 10(3):897–907, 2014.
-  RDKit, online. RDKit: Open-source cheminformatics. http://www.rdkit.org. [Online; accessed 01-February-2018].
- Schwaller et al.  Philippe Schwaller, Theophile Gaudin, David Lanyi, Costas Bekas, and Teodoro Laino. " found in translation": Predicting outcome of complex organic chemistry reactions using neural sequence-to-sequence models. arXiv preprint arXiv:1711.04810, 2017.
- Segler and Waller  Marwin HS Segler and Mark P Waller. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J., 23(25):5966–5971, 2017.
- Segler et al.  Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep neural networks and symbolic ai. Nature, 555(7698):604, 2018.
- Simm and Reiher  Gregor N Simm and Markus Reiher. Context-driven exploration of complex chemical reaction networks. Journal of chemical theory and computation, 13(12):6108–6119, 2017.
- Wei et al.  Jennifer N Wei, David Duvenaud, and Alán Aspuru-Guzik. Neural networks for the prediction of organic chemistry reactions. ACS central science, 2(10):725–732, 2016.
- Weininger  David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
- Zimmerman  Paul M Zimmerman. Automated discovery of chemically reasonable elementary reaction steps. Journal of computational chemistry, 34(16):1385–1392, 2013.
Appendix A Example of symmetry affecting evaluation of electron paths
In the main text we described the challenges of how to evaluate our model, as different electron paths can form the same products, for instance due to symmetry. Figure 4 is an example of this.
Appendix B More training details
In this section we go through more specific model architecture details omitted from the main text.
b.1 Model architectures
In this section we provide further details of our model architectures.
Section 3 of the main paper discusses our model. In particular we are interested in computing three conditional probability terms: (1) , the probability of the initial state given the reactants and reagents; (2) the conditional probability of next state given the intermediate products for ; and (3) the probability that the reaction terminates with final product .
Each of these is parametrized by NNs. We can split up the components of these NNs into a series of modules, all introduced in the main text: , , , , and . In this section we shall go through each of these in turn.
The function computes node embeddings, , which are used as input to all the other modules. For this we use Gated Graph Neural Networks (GGNN) [13, 4]. We use 4 propagation steps. The atom features we feed in are detailed in Table 3. These are calculated using RDKit. In total there are 101 features and we maintain this dimensionality in the hidden layers during the propagation steps of the GGNN. Three edge labels are defined: single bonds, double bonds and triple bonds. RDKit is used to Kekulize the reactant molecules.
|Atom type||72 possible elements in total, one hot|
|Degree||One hot (0, 1, 2, 3, 4, 5, 6, 7, 10)|
|Explicit Valence||One hot (0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14)|
|Hybridization||One hot (SP, SP2, SP3, Other)|
|Part of an aromatic ring||boolean|
As mentioned in Section 3 of the main paper both , consist of three linear functions. For both, the function is used to decide how much each node should contribute towards the embedding and so projects down to a scalar value. Again for both, projects the node embedding up to a higher dimensional space, which we choose to be 202 dimensions. This is double the dimension of the node features, and similar to the approach taken by Li et al. [14, §B.1]. Finally, differs between the two modules, as for it projects down to one dimension (to later go through a sigmoid function and compute a stop probability), whereas for , projects to a dimensionality of 100 to form the reagent embedding.
The modules for and , that operate on each node to produce a action logit, are both NNs consisting of one hidden layer of 100 units. Concatenated onto the node features going into these networks are the node features belonging to the previous atom on the path.
The final function, , is represented by an NN with hidden layers of 100 units. When conditioning on reagents (ie for Electro) the reagent embeddings calculated by are concatenated onto the node embeddings and we use two hidden layers for our NN. When ignoring reagents (ie for Electro-Lite) we use one hidden layer for this network. In total Electro has approximately 250,000 parameters and Electro-Lite has approximately 190,000.
We train everything using Adam  and an initial learning rate of 0.0001, which we decay after 5 and 9 epochs by a factor of 0.1. We train for a total of 10 epochs. For training we use reaction minibatch sizes of one, although these can consist of multiple intermediate graphs.
Appendix C Prediction using our model
At predict time, as discussed in the main text, we use beam search to find high probable chemically-valid paths from our model. Further details are given in Algorithm 1.
Appendix D Further example of actions proposed by our model
Figure 5 shows the model’s predictions for the mechanism of how two molecules will react.