UltraFast Reactive Transport Simulations When Chemical Reactions Meet Machine Learning: Chemical Equilibrium
Abstract
During reactive transport modeling, the computational cost associated with chemical reaction calculations is often 10–100 times higher than that of transport calculations. Most of these costs results from chemical equilibrium calculations that are performed at least once in every mesh cell and at every time step of the simulation. Calculating chemical equilibrium is an iterative process, where each iteration is in general so computationally expensive that even if every calculation converged in a single iteration, the resulting speedup would not be significant. Thus, rather than proposing a fastconverging numerical method for solving chemical equilibrium equations, we present a machine learning method that enables new equilibrium states to be quickly and accurately estimated, whenever a previous equilibrium calculation with similar input conditions has been performed. We demonstrate the use of this smart chemical equilibrium method in a reactive transport modeling example and show that, even at early simulation times, the majority of all equilibrium calculations are quickly predicted and, after some time steps, the machinelearningaccelerated chemical solver has been fully trained to rapidly perform all subsequent equilibrium calculations, resulting in speedups of almost two orders of magnitude. We remark that our new ondemand machine learning method can be applied to any case in which a massive number of sequential/parallel evaluations of a computationally expensive function needs to be done, . We remark, that, in contrast to traditional machine learning algorithms, our ondemand training approach does not require a statisticsbased training phase before the actual simulation of interest commences. The introduced ondemand training scheme requires, however, the firstorder derivatives for later smart predictions.
Geothermal Energy and Geofluids Group,
Department of Earth Sciences, ETH Zürich, Switzerland
[0.5em]Laboratory for Waste Management, Nuclear Energy and Safety Research Department,
Paul Scherrer Institut, 5232 Villigen PSI, Switzerland
1 Introduction
During reactive transport simulations, the following three degrees of chemical reactivity behavior may be observed across space and over time: weak, moderate, and intense. For example, at relatively distant locations from where a fluid enters a medium, none or little reactivity may occur, potentially over long periods of time, if those regions were initially in chemical, thermal, and mechanical equilibrium. Eventually, these equilibrium conditions are gradually disrupted due to fluid flow, heat transfer, and chemical transport, causing some moderate chemical reactivity. Later, the introduced perturbations reach those once calm locations and cause relatively intense local chemical changes.
Interestingly, after such a wave of perturbations has passed, very often local chemical reactivity becomes weak once again. This implies that only a relatively small percentage of the entire medium is experiencing fast and substantial chemical changes at any moment in time. As a result, most chemical reaction calculations are performed for lowreactivity regions at every time step of the simulation. Even though reaction calculations are relatively faster in lowreactivity regions, compared to highreactivity regions, their overall computational cost is typically 10–100 times greater than that for calculations of physical processes (e.g., mass transport, heat transfer). Thus, reactive transport simulations spend most of the computation time calculating chemical reactions and not transport processes, and typically require very long compute times. Hence, speeding up chemical calculations, without compromising accuracy, is crucial for significant performance gains in reactive transport simulations.
Whenever chemical reaction rates are considerably faster than rates of physical processes, the local chemical equilibrium assumption is a plausible and sufficient rate model for chemical reactions. In general, however, a combination of fast and slow reaction rates is present, so that the fast reactions are reasonably modeled employing the chemical equilibrium assumption and slow reaction rates are modeled using chemical kinetics. This approach is termed the local partial chemical equilibrium assumption [Ramshaw1980, Ramshaw1981, Ramshaw1985, Ramshaw1995, Lichtner1985, Steefel1994, Steefel1996]. Nevertheless, what needs to be noticed about these assumptions is that chemical equilibrium calculations are needed to model chemical processes in either case. In fact, such equilibrium calculations need to be performed at least once at every mesh node/cell during every transport time step. Thus, millions to billions of chemical equilibrium calculations tend to accumulate over the course of massive numerical reactive transport simulations, particularly when using fineresolution meshes, large threedimensional domains, and/or long simulation times. Hence, speeding up reactive transport modeling by a significant factor can only be accomplished by accelerating chemical reaction calculations in general, and chemical equilibrium calculations in particular. Here, we focus on accelerating chemical equilibrium calculations, whereas a companion paper (Leal2017c) introduces a similar strategy to accelerate chemical kinetics calculations.
Chemical equilibrium calculations are computationally expensive. This is because (i) they involve the iterative solution of a system of nonlinear algebraic equations, requiring at every iteration (ii) the evaluation of thermodynamic properties, such as activity coefficients, concentrations, activities, chemical potentials (e.g., using the Pitzer1973 model for aqueous solutions and the Peng1976 model for gaseous solutions), and (iii) the solution of a system of linear equations with dimension on the order of either the number of chemical species or the number of chemical elements [Leal2017]. Over several decades, major advances in developing fast, accurate, and robust methods for chemical equilibrium calculations have been made, either based on Gibbs energy minimization (GEM) or law of mass action (LMA) formulations [White1958, Smith1980, Smith1982, Alberty1992b, Crerar1975, DeCapitani1987, Eriksson1989, Ghiorso1994, Gordon1971, Gordon1994a, Gordon1994b, Harvey2013, Harvie1987, Karpov1997, Karpov2001, Karpov2002, Koukkari2011a, Kulik2013, Leal2013, Leal2014, Leal2016a, Leal2016c, Leal2017, Morel1972, Neron2012, Nordstrom1979, Trangenstein1986, VanZeggeren1970, Vonka1995, Wolery1975, Zeleznik1960]. However, even if one could devise an algorithm that would always converge in one single iteration, instead of the typical few to dozens of iterations, the computational cost of chemical equilibrium calculations would still be dominant among all other calculations. The question of utmost importance is, thus: Is it possible to calculate chemical equilibrium without actually solving the nonlinear equations governing equilibrium, which requires several evaluations of thermodynamic models and as many solutions of systems of linear equations? In this paper, we demonstrate that this can, indeed, be achieved with exceptional speed and accuracy.
Our hereintroduced smart chemical equilibrium algorithm operates like a human being. A human begins life with none to little problem solving skills. At an early age, the child is exposed to a variety of challenges and learns how to solve them, often with assistance from others. However, once the child has learned how to solve a specific problem, it can solve similar problems without requiring assistance. Solving a problem with the help of an already acquired skill is much faster than having to first learn how to solve that problem, since learning is a difficult and timeconsuming process. As the child grows, it can solve more and more problems until, eventually, as an adult, external help is needed only occasionally, under exceptional circumstances, i.e., in situations that have rarely or never been encountered before. Importantly, in contrast to going to school, this way of “learning by doing” may be viewed as ondemand learning or training, a highly efficient way of learning as one only learns what is actually needed at least once.
The above metaphor illustrates some key features of our new, ondemand machine learning algorithm for fast chemical equilibrium calculations. Initially, the algorithm has no recollection of any past calculation. Thus, the first time the algorithm faces a new chemical equilibrium problem, it fully solves the problem and saves the inputs and outputs describing the calculated equilibrium state. This is, as discussed previously, a computationally expensive process, now considered as an act of learning. The next time the algorithm is employed, it recalls that a calculation has previously been performed and attempts to quickly predict the new chemical equilibrium state, using sensitivity derivatives to project the previously recorded equilibrium state into the new one. Then, the algorithm checks if the prediction is accurately acceptable within some given tolerance. If it is, then the predicted equilibrium state is accepted as a solution, otherwise a full chemical equilibrium calculation is performed and the corresponding inputs and outputs recorded to describe the newly learned equilibrium state. This way, the memory of past and fully solved equilibrium problems grows, permitting searches among all saved and learned problems to determine the one that is closest to the new equilibrium problem.
In contrast to traditional machine learning algorithms, which first require a timeconsuming training phase, in which potentially many outcomes are calculated for problems that may later never occur, our new machine learning algorithm is only trained with problems that actually occur, relying on the recurrence of subsequent similar problems to quickly respond with an accurate prediction. Hence, traditional machine learning is equivalent to going to school, while our machine learning algorithm is more akin to ondemand learning or “learning by doing.” Furthermore, there is no clear criterion to decide, when the training phase of traditional machine learning algorithms is ideally terminated, i.e., when it is time to “leave school.”
The hereintroduced smart chemical equilibrium algorithm is particularly useful for applications that require repetitive calculations, such as chemical equilibrium calculations during reactive transport modeling. For such applications, the algorithm can eventually achieve a knowledgeable state, so that, after some time steps during a reactive transport simulation, all equilibrium calculations can be rapidly and accurately performed using previously learned key chemical equilibrium states. Even if the smart algorithm occasionally faces some exceptional circumstances that require additional ondemand training (e.g., a sudden change in the chemistry or temperature of the fluid entering a region of the medium), one can still reasonably expect significant speedups with this machinelearningaccelerated algorithm, if the occasions during which learning is needed are considerably less frequent than the occasions during which the algorithm can make smart predictions.
The “intelligence” of the hereintroduced algorithm is, thus, a combination of both the memory of already solved equilibrium problems, that needed to be solved anyway, and its ability to “learn” new ones ondemand. Physiologically speaking, the algorithm is like a brain that not only can record its experiences on solving chemical equilibrium problems, but can also make decisions about when it needs additional training, following its judgment on the accuracy/acceptability of an estimated chemical equilibrium outcome. This ondemand training is key to:

Performance: learning and keeping only what is needed results in compact storage and thus fast search operations when finding the closest past equilibrium problem already solved;

Accuracy: finer control on minimizing errors resulting from predicted equilibrium states by performing additional training whenever needed;

Convenience: neither a dedicated prior training stage nor anticipation of possible chemical conditions during the simulation are required;

Simplicity: users immediately benefit from highperformance reactive transport simulations without having to prepare any prior configuration and training of the smart chemical equilibrium solver.
Our ondemand training philosophy differs radically from most previous efforts in speeding up chemical reaction calculations in reactive transport simulations, which use conventional machine learning methods. Jatnieks2016, for example, present the steps necessary to construct a socalled surrogate model for fast speciation calculations. The construction of this surrogate model requires a prior training phase, during which many random input conditions are used in a speciation solver (PHREEQC [Parkhurst2013]) and the resultant outputs collected for statistical learning. During their numerical experiment, 32 different statistics and machine learning methods were tried to identify the potentially best one. For a specific reactive transport modeling problem, they collected all possible inputoutput combinations in speciation calculations, and from these combinations, 7880 random inputs were used for training the statistics model, which represented 80% of all collected input conditions. The various constructed surrogate models were subsequently used in a reactive transport simulation. Each surrogate model was constructed with different 7880 inputoutput samples.
Our new ondemand machine learning approach has clear advantages over conventional statisticsbased machine learning methods. First, the use of sensitivity derivatives of the calculated equilibrium states results in a machine learning method that better understands the behavior of chemical systems regarding its reaction behavior following changes in the input equilibrium conditions. Secondly, the use of these sensitivity derivatives permits extrapolating and predicting new equilibrium states with much higher accuracy and confidence. Thirdly, it requires no statistical training before it can be applied in a reactive transport simulation, because its ondemand training characteristics allow it to spontaneously learn only what is needed during a reactive transport simulation to keep the predictions accurate enough. Furthermore, our approach is not only simpler from a user point of view, but also potentially faster, as it will save only a few input conditions compared to any statistical approach that can only get more accurate the more it knows a priori. Finally, our method produces outputs that always satisfy the mass conservation conditions of chemical elements and electric charge, since these constraints are incorporated into the calculation of the sensitivity derivatives (see Leal2017) and, thus, our method contrasts with statisticsbased machine learning methods that can fail predicting equilibrium states that satisfy given mass balance conditions.
This communication is organized as follows. In Section 2, we introduce definitions and notation that are needed to describe the new algorithm rigorously. In Section LABEL:sec:Method, we formulate the smart chemical equilibrium method, providing details on the prediction of new equilibrium states from previously saved and learned states as well as on the acceptance criteria for the predicted equilibrium states. In Section LABEL:sec:Results, we compare the performance and accuracy of a reactive transport simulation using the hereproposed smart equilibrium algorithm against that using a conventional Newtonbased equilibrium algorithm. Finally, we discuss in Section LABEL:sec:ConclusionsandFutureWork the implications and conclusions of this study together with a planned roadmap for further research efforts in this direction.
The hereintroduced smart chemical equilibrium algorithm is implemented in Reaktoro [Leal2015b], a unified opensource framework for modeling chemically reactive systems (reaktoro.org).
2 Definitions and Notation
Consider a chemical system is a collection of chemical species composed of one or more components and distributed among one or more phases. The species can be substances such as aqueous ions (e.g., Na(aq), Cl(aq), HCO(aq)), neutral aqueous species (e.g., SiO2(aq), CO(aq), HO(l)), gases (e.g., CO(g), CH(g), N(g)), and pure condensed phases (e.g., CaCO(s, calcite), SiO(s, quartz), AlSiO(OH)(s, kaolinite)). The phases are each composed of one or more different chemical species with homogeneous properties within their boundaries; a phase can be aqueous, gaseous, liquid, solid solutions, a pure mineral, a plasma, etc. Note that substances with the same chemical formula, but in different phases, are distinct species (e.g., CO(aq) and CO(g) are different species). The components are chemical elements (e.g., H, O, C, Na, Cl, Ca, Si) and electrical charge (Z), but it can also be a linear combination of these, commonly known as primary species (e.g., H(aq), HO(l), CO(aq)). We shall use from now on the words elements and components interchangeably, with elements not only denoting chemical elements but also electrical charge [Leal2017].
A chemical system can exist at infinitely many chemical states. A chemical state is defined here as the triplet , where is temperature, is pressure, and is the vector of molar amounts of the species, with denoting the mole amount of the th species and N the number of species. If we denote by the vector of molar amounts of the elements, with denoting the amount of the th element and E the number of elements, then and are related through the following mass conservation equation:
(1) 
where is the formula matrix of the chemical system [Smith1982], with denoting the coefficient of the th element in the th species, and thus a matrix with dimensions . Below is an example of a formula matrix for a simple chemical system, containing species and elements, distributed among two phases, namely an aqueous and a gaseous phase, with names of aqueous species suffixed with (aq), except HO(l), and gaseous species with (g):
(2) 