AI Feynman: a Physics-Inspired Method for Symbolic Regression
A core challenge for both physics and artificial intellicence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult test set, we improve the state of the art success rate from 15% to 90%.
In 1601, Johannes Kepler got access to the world’s best data tables on planetary orbits, and after 4 years and about 40 failed attempts to fit the Mars data to various ovoid shapes, he launched a scientific revolution by discovering that Mars’ orbit was an ellipse Koyré (2013). This was an example of symbolic regression: discovering a symbolic expression that accurately matches a given data set. More specifically, we are given a table of numbers, whose rows are of the form where , and our task is to discover the correct symbolic expression for the unknown mystery function , optionally including the complication of noise.
Growing data sets have motivated attempts to automate such regression tasks, with significant success. For the special case where the unknown function is a linear combination of known functions of , symbolic regression reduces to simply solving a system of linear equations. Linear regression (where is simply a linear function) is ubiquitous in the scientific literature, from finance to psychology. The case where is a linear combination of monomials in corresponds to linear regression with interaction terms, and to polynomial fitting more generally. There are countless other examples of popular regression functions that are linear combinations of known functions, ranging from Fourier expansions to wavelet transforms. Despite these successes with special cases, the general symbolic regression problem remains unsolved, and it is easy to see why: If we encode functions as strings of symbols, then the number of such strings grows exponentially with string length, so if we simply test all strings by increasing length, it may take longer than the age of our universe until we get to the function we are looking for.
This combinatorial challenge of an exponentially large search space characterizes many famous classes of problems, from codebreaking and Rubik’s cube to the natural selection problem of finding those genetic codes that produce the most evoutionarily fit organisms. This has motivated genetic algorithms Amil et al. (2009); Pal and Wang (2017) for targeted searches in exponentially large spaces, which replace the above-mentioned brute-force search by biology-inspired strategies of mutation, selection, inheritance and recombination; crudely speaking, the role of genes is played by useful symbol strings that may form part of the sought-after formula or program. Such algorithms have been successfully applied to areas ranging from design of antennas Lohn et al. (2002); Linden (2002) and vehicles Yu and Yu (2003) to wireless routing Panthong and Jantarang (2003), vehicle routing Oh et al. (2010), robot navigation Ram et al. (1994), code breaking Delman (2004), investment strategy Bauer Jr et al. (1994), marketing Venkatesan and Kumar (2002), classification La Cava et al. (2018), Rubik’s cube McAleer et al. (2018), program synthesis Koza and Koza (1992) and metabolic networks Schmidt et al. (2011).
The symbolic regression problem for mathematical functions (the focus of this paper) has been tackled with a variety of methods McRee (2010); Stijven et al. (2011); Kong et al. (2018), including genetic algorithms Searson et al. (2010); Dubčáková (2011). By far the most successful of these is, as we will see in Section III, the genetic algorithm outlined in Schmidt and Lipson (2009) and implemented in the commercial Eureqa software Dubčáková (2011).
The purpose of this paper is to further improve on this state-of-the-art, using physics-inspired strategies enabled by neural networks. Our most important contribution is using neural networks to discover hidden simplicity such as symmetry or separability in the mystery data, which enables us to recursively break harder problems into simpler ones with fewer variables. The rest of this paper is organized as follows. In Section II, we present our algorithm and the six strategies that it recursively combines. In Section III, we present a test suite of regression mysteries and use it to test both Eureqa and our new algorithm, finding major improvements. In Section IV, we summarize our conclusions and discuss opportunities for further progress.
Generic functions are extremely complicated and near-impossible for symbolic regression to discover. However, functions appearing in physics and many other scientific applications often have some of the following simplifying properties that make them easier to discover:
Units: and the variables upon which it depends have known physical units
Low-order polynomial: (or part thereof) is a polynomial of low degree
Compositionality: is a composition of a small set of elementary functions, each typically taking no more than two arguments
Smoothness: is continuous and perhaps even analytic in its domain
Symmetry: exhibits translational, rotational or scaling symmetry with respect to some of its variables
Separability: can be written as a sum or product of two parts with no variables in common
The question of why these properties are common remains controversial and not fully understood Mhaskar et al. (2016); Lin et al. (2017). However, as we will see below, this does not prevent us from discovering and exploiting these properties to facilitate symbolic regression.
Property (1) enables dimensional analysis, which often transforms the problem into a simpler one with fewer independent variables. Property (2) enables polynomial fitting, which quickly solves the problem by solving a system of linear equations to determine the polynomial coefficients. Property (3) enables to be represented as a parse tree with a small number of node types, sometimes enabling or a sub-expression to be found via a brute-force search. Property (4) enables approximating using a feed forward neural network with a smooth activation function. Property (5) can be confirmed using said neural network and enables the problem to be transformed into a simpler one with one independent variable less (or even fewer for rotational symmetry). Property (6) can be confirmed using said neural network and enables the independent variables to be partitioned into two disjoint sets, and the problem to be transformed into two simpler ones, each involving the variables from one of these sets.
ii.1 Overall Algorithm
The overall algorithm is schematically illustrated in Figure 1. It consists of a series of modules that try to exploit each of the the above-mentioned properties. Like a human scientist, it tries many different strategies (modules) in turn, and if it cannot solve the full problem in one fell swoop, it tries to transform it and divide it into simpler pieces that can be tackled separately, recursively re-launching the full algorithm on each piece. Figure 2 illustrates an example of how a particular mystery data set (Newton’s law of gravitation with 9 variables) is solved. Below we describe each of these algorithm modules in turn.
ii.2 Dimensional Analysis
Our dimensional analysis module exploits the well-known fact that many problems in physics can be simplified by requiring the units of the two sides of an equation to match. This often transforms the problem into a simpler one with a smaller number of variables that are all dimensionless. In the best case scenario, the transformed problem involves solving for a function of zero variables, i.e., a constant. We automate dimensional analysis as follows.
Table 3 shows the physical units of all variables appearing in our 100 mysteries, expressed as products of the fundamental units (meter, second, kilogram, kelvin, volt) to various integer powers. We thus represent the units of each variable by a vector u of 5 integers as in the table. For a mystery of the form , we define the matrix M whose column is the u-vector corresponding to the variable , and define the vector b as the u-vector corresponding to . We now let the vector p be a solution to the equation and the columns of the matrix U form a basis for the null space, so that , and define a new mystery where
By construction, the new variables and are dimensionless, and the number of new variables is equal to the dimensionality of the null space. When , we have the freedom to choose any basis we want for the null space and also to replace p by a vector of the form for any vector ; we use this freedom to set as many elements as possible in p and U equal to zero, i.e., to make the new variables depend on as few old variables as possible. This choice is useful because it typically results in the resulting powers of the dimensionless variables being integers, making the final expression much easier to find than when the powers are fractions or irrational numbers.
ii.3 Polynomial Fit
Many functions in physics and other sciences either are low-order polynomials, e.g., the kinetic energy , or have parts that are, e.g., the denominator of the gravitational force . We therefore include a module that tests if a mystery can be solved by a low-order polynomial. Our method uses the standard method of solving a system of linear equations to find the best fit polynomial coefficients. It tries fitting the mystery data to polynomials of degree 0, 1, …, and declares success if the best fitting polynomial gives r.m.s. fitting error (we discuss the setting of this threshold below).
ii.4 Brute Force
Our brute-force symbolic regression model simply tries all possible symbolic expressions within some class, in order of increasing complexity, terminating either when the maximum fitting error drops below a threshold or after a maximum runtime has been exceeded. Although this module alone could solve all our mysteries in principle, it would in many cases take longer than the age of our universe in practice. Our brute force method is thus typically most helpful once a mystery has been transformed/broken apart into simpler pieces by the modules described below.
We generate the expressions to try by representing them as strings of symbols, trying first all strings of length 1, then all of length 2, etc., saving time by only generating those strings that are syntactically correct. The symbols used are the independent variables as well a subset of those listed in Table 1, each representing a constant or a function. We minimize string length by using reverse Polish notation, so that parentheses become unnecessary. For example, can be expressed as the string “xy+”, the number can be expressed as the string “0<<1>>/” and the relativistic momentum formula can be expressed as the string “mv*1vv*cc*/-R/”.
“+-*/SPLICER”, “+-*/” and
Inspection of Table 1 reveals that many of the symbols are redundant. For example, “1”=“0>” and “x” = “0x-”. , so if we drop the symbol “P”, mysteries involving can still get solved with “P” replaced by “1N1>*” — it just takes longer.
Since there are strings of length using an alphabet of symbols, there can be a significant cost both from using too many symbols (increasing ) and from using too few symbols (increasing the required , or even making a solution impossible). As a compromise, our brute force module tries to solve the mystery using three different symbol subsets as explained in the caption of Table 1.
To exploit the fact that many equations or parts thereof have multiplicative or additive constants, our brute force method comes in two variants that automatically solves for such constants, thus allowing the algorithm to focus on the symbolic expression and not on numerical constants.
Although the problem of overfitting is most familiar when searching a continuous parameter space, the same phenomenon can occur when searching our discrete space of symbol strings. To mitigate this, we follow the prescription in Wu and Tegmark (2018) and define the winning function to be the one with r.m.s. fitting error that has the smallest total description length
where and is the rank of the string on the list of all strings tried. The two terms correspond roughly to the number of bits required to store the symbol string and the prediction errors, respectively, if the hyperparameter is set to equal the number of data points . We use in our experiments below, to prioritize simpler formulas. If the mystery has been generated using a neural network (see below), we set the precision threshold to ten times the validation error, otherwise we set it to .
ii.5 Neural-network-based tests & transformations
Even after applying the dimensional analysis, many mysteries are still too complex to be solved by the polyfit or brute force modules in a reasonable amount of time. However, if the mystery function can be found to have simplifying properties, it may be possible to transform it into one or more simpler mysteries that can be more easily solved. To search for such properties, we need to be able to evaluate at points of our choosing where we typically have no data. For example, to test if a function has translational symmetry, we need to test if for various constants , but if a given data point has its two variables separated by , we typically have no other examples in our data set with exactly that variable separation. To perform our tests, we thus need an accurate high-dimensional interpolation between our data point.
ii.5.1 Neural network training
In order to obtain such an interpolating function for a given mystery, we train a neural network to predict the output given its input. We train a feed-forward, fully connected neural network with 6 hidden layers, the first 3 having 128 neurons and the last 3 having 64 neurons. We use 80% of the mystery data as the training set and the remainder as the validation set, training for 100 epochs with a batch size of 2048. We use the r.m.s.-error loss function and the Adam optimizer was employed with a weight decay of . Softplus was used as the activation function and a learning rate of 0.005. The learning rate and momentum schedules were implemented as described in Smith and Topin (2018); Smith (2018) using the fastai package Howard et al. (2018). The ratio between the maximum and minimum learning rates is 20 while of the iterations are used for the last part of the training cycle. For the momentum, the maximum value was 0.95 and the minimum 0.85, while . We obtained r.m.s. validation errors between and across the range of tested equation, where is the r.m.s. of the -values in the dataset.
ii.5.2 Translational symmetry and generalizations
We test for translational symmetry using the neural network as detailed in Algorithm 1. We first check if the to within a precision . If that is the case, then depends on and only through their difference, so we replace these two input variables by a single new variable . Otherwise, we repeat this test for all pairs of input variables, and also test whether any variable pair can be replaced by its sum, product or ratio. The ratio case corresponds to scaling symmetry, where two variables can be simultaneously rescaled without changing the answer. If any of these simplifying properties is found, the resulting transformed mystery (with one fewer input variables) is iteratively passed into a fresh instantiation of our full AI Feynman symbolic regression algorithm, as illustrated in Figure 1. We choose the precision threshold to be 7 times the neural network validation error.
We test for separability using the neural network as exemplified in Algorithm 2. A function is separable if it can be split into two parts with no variables in common. We test for both additive and multiplicative separability, corresponding to these two parts being added and multiplied, respectively (the logarithm of a multiplicatively separable function is additively separable).
For example, to test if a function of 2 variables is multiplicatively separable, i.e., of the form for some univariate functions and , we first select two constants and ; for numerical robustness, we choose to be the means of all the values of in the mystery data set, . We then compute the quantity
for each data point. This is a measure of non-separability, since it vanishes if is multiplicatively separable. The equation is considered separable if the r.m.s. average over the mystery data set is less than an accuracy threshold , which is chosen to be times the neural network validation error.
If separability is found, we define the two new univariate mysteries and . We pass the first one, , back to a fresh instantiations of our full AI Feynman symbolic regression algorithm and if it gets solved, we redefine , where represents any multiplicative numerical constant that appears in . We then pass back to our algorithm and if it gets solved, the final solutions is . We test for additive separability analogously, simply replacing and by and above; also will represent an additive numerical constant in this case. If we succeed in solving the two parts, then the full solution to the original mystery is the sum of the two parts minus the numerical constant. When there are more than two variables , we are testing all the possible subsets of variables that can lead to separability, and proceed as above for the newly created two mysteries.
ii.5.4 Setting variables equal
We also exploit the neural network to explore the effect of setting two input variables equal and attempting to solve the corresponding new mystery with one fewer variable. We try this for all variable pairs, and if the resulting new mystery is solved, we try solving the mystery that has the found solution divided out.
As an example, this technique solves the Gaussian probability distribution mystery I.6.2. After making and equal, and dividing the initial equation by the result, we are getting rid of the denominator and the remaining part of the equation is an exponential. After taking the logarithm of this (see the below section) the resulting expression can be easily solved by the brute force method.
ii.6 Extra Transformations
In addition, several transformations are applied to the dependent and independent variables which proved to be useful for solving certain equations. Thus, for each equation, we ran the brute force and polynomial fit on a modified version of the equation in which the dependent variable was transformed by one of the following functions: square root, raise to the power of 2, log, exp, inverse, sin, cos, tan, arcsin, arccos, arctan. This reduces the number of symbols needed by the brute force by one and in certain cases it even allows the polynomial fit to solve the equation, when the brute force would otherwise fail. For example, the formula for the distance between 2 points in the 3D Euclidean space: , once raised to the power of 2 becomes just a polynomial which can be easily discovered by the polynomial fit algorithm. The same transformations are also applied to the dependent variables, one at a time. In addition multiplication and division by 2 were added as transformations in this case.
It should be noted that, like most machine-learning methods, the AI Feynman algorithm has some hyperparameters that can be tuned to optimize performance on the problems at hand. They were all introduced above, but for convenience, they are also summarized in Table 2.
|Tolerance in brute force module|
|Tolerance in polynomial fit module|
|Validation error tolerance for neural network use|
|Tolerance for separability|
|Tolerance for symmetry|
|Tolerance in brute force module after separability|
|Tolerance in polynomial fit module after separability|
|Importance of accuracy relative to complexity|
iii.1 The Feynman Symbolic Regression Database
To facilitate quantitative testing of our and other symbolic regression algorithms, we created the Feynman Symbolic Regression Database (FSReD) and made it freely available for download111 The Feynman Database for Symbolic Regression can be downloaded here: https://space.mit.edu/home/tegmark/aifeynman.html. For each regression mystery, the database contains the following:
Data table: A table of numbers, whose rows are of the form , where ; the challenge is to discover the correct analytic expression for the mystery function .
Unit table: A table specifying the physical units of the input and output variables as 6-dimensional vectors of the form seen in Table 3.
Equation: The analytic expression for the mystery function , for answer-checking.
To test an analytic regression algorithm using the database, its task is to predict for each mystery taking the data table (and optionally the unit table) as input. Of course, there are typically many symbolically different ways of expressing the same function. For example, if the mystery function is , then the symbolically different expression should count as a correct solution. The rule for evaluating an analytic regression method is therefore that a mystery function is deemed correctly solved by a candidate expression if algebraic simplification of the expression (say, with the Simplify function in Mathematica or the simplify function in the Python sympy package) produces the symbol .
|, , ,||Angular momentum||2||-1||1||0||0|
|, , , ,||Dimensionless||0||0||0||0||0|
|, , , , ,||Dimensionless||0||0||0||0||0|
|, , , ,||Dimensionless||0||0||0||0||0|
|, , , , ,||Dimensionless||0||0||0||0||0|
|Drift velocity constant||0||-1||1||0||0|
|Electric dipole moment||3||-2||1||0||-1|
|Grav. coupling ()||3||-2||1||0||0|
|, , ,||Length||1||0||0||0||0|
|, , ,||Length||1||0||0||0||0|
|, , , , ,||Length||1||0||0||0||0|
|, , , , ,||Length||1||0||0||0||0|
|, , ,||Light intensity||0||-3||1||0||0|
|, , ,||Magnetic field||-2||1||0||0||1|
|, , ,||Mass||0||0||1||0||0|
|Surface Charge density||0||-2||1||0||-1|
|, , , ,||Velocity||1||-1||0||0||0|
|,||Volume charge density||-1||-2||1||0||-1|
In order to sample equations from a broad range of physics areas, the database is generated using 100 equations from the seminal Feynman Lectures on Physics Feynman et al. (1963a, b, c), a challenging three-volume course covering classical mechanics, electromagnetism and quantum mechanics as well as a selection of other core physics topics; we prioritized the most complex equations, excluding ones involving derivatives or integrals. The equations are listed in tables 4 and 5, and can be seen to involve between 1 and 9 independent variables as well as the elementary functions , , , , sqrt, , , , , and . The numbers appearing in these equations are seen to be simple rational numbers as well as and .
We also included in the database a set of 20 more challenging “bonus” equations, extracted from other seminal physics books: Classical Mechanics by Herbert Goldstein, Charles P. Poole, John L. Safko Goldstein et al. (2002), Classical electrodynamics by J. Jackson Jackson (1999), Gravitation and Cosmology: Principles and Applications of the General Theory of Relativity by Steven Weinberg Weinberg (1972) and Quantum Field Theory and the Standard Model by Matthew D. Schwartz Schwartz (2014). These equations were selected for being both famous and complicated.
The data table provided for each mystery equation contains about rows corresponding to randomly generated input variables. These are uniformly sampled from a specified range where the mystery function is valid.
|eq.||time (s)||used||needed||by Eureqa||w/o DA||tolerance|
|I.6.20b||4792||sym–, ev, bf-log||no||yes|
|I.9.18||5975||da, sym–, sym–, sep, pf-inv||no||yes|
|I.29.16||2135||da, sym–, bf-squared||no||no|
|eq.||time (s)||used||needed||by Eureqa||w/o DA||tolerance|
|II.6.15a||2801||da, sm, bf||no||yes|
|II.11.28||1708||da, sym*, bf||no||yes|
|II.35.21||1597||da, halve-input, bf||yes||no|
|III.9.52||3162||da, sym–, sm, bf||no||yes|
|Source||Equation||Solved||Methods used||Solved by|
|Rutherford Scattering||yes||no||da, bf-sqrt|
|Friedman Equation||yes||no||da, bf-squared|
|Compton Scattering||yes||no||da, bf|
|Radiated gravitational wave power||no||no||-|
|Relativistic aberration||yes||no||da, bf-cos|
|N-slit diffraction||yes||no||da, sm, bf|
|Goldstein 3.16||yes||no||da, bf-squared|
|Goldstein 3.55||yes||no||da, sym-, bf|
|Goldstein 3.64 (ellipse)||yes||no||da, sym-, bf|
|Goldstein 3.74 (Kepler)||yes||no||da, bf|
|Goldstein 3.99||yes||no||da, sym*, bf|
|Goldstein 8.56||yes||no||da, sep, bf-squared|
|Goldstein 12.80||yes||yes||da, bf|
|Jackson 3.45||yes||no||da, bf-inv|
|Jackson 4.60||yes||no||da, sep, bf|
|Jackson 11.38 (Doppler)||yes||no||da, cos-input, bf|
|Weinberg 15.2.1||yes||yes||da, bf|
|Weinberg 15.2.2||yes||yes||da, bf|
|Schwarz 13.132 (Klein-Nishina)||yes||no||da, sym/, sep*, sin-input, bf|