We apply machine learning to the problem of finding numerical Calabi–Yau metrics. Building on Donaldson’s algorithm for calculating balanced metrics on Kähler manifolds, we combine conventional curve fitting and machine-learning techniques to numerically approximate Ricci-flat metrics. We show that machine learning is able to predict the Calabi–Yau metric and quantities associated with it, such as its determinant, having seen only a small sample of training data. Using this in conjunction with a straightforward curve fitting routine, we demonstrate that it is possible to find highly accurate numerical metrics much more quickly than by using Donaldson’s algorithm alone, with our new machine-learning algorithm decreasing the time required by between one and two orders of magnitude.
Machine Learning Calabi–Yau Metrics
Anthony Ashmore, Yang-Hui He, Burt A. Ovrut
Department of Physics, University of Pennsylvania, Philadelphia, PA 19104, USA
Merton College, University of Oxford, OX1 4JD, UK
Department of Mathematics, City, University of London, EC1V 0HB, UK
School of Physics, NanKai University, Tianjin, 300071, P.R. China
- 1 Introduction
- 2 Calabi–Yau metrics and Donaldson’s algorithm
- 3 Machine learning the Calabi–Yau metric
- 4 Extrapolating out to higher
- 5 Supervised learning and extrapolation: results
- 6 Predicting the metric
- 7 Discussion
- A Donaldson’s algorithm in detail
- B Efficient numerical calculation of and
- C More on machine learning
The promise of string theory as a unified theory of everything rests on the belief that it can reproduce the known physics in our universe. In particular, at low energies it must reduce to the Standard Model. The first, and perhaps still the most promising, way to produce string models with realistic low-energy physics is to compactify the heterotic string on a Calabi–Yau threefold . As it stands today, there are a number of viable heterotic models that lead to three generations of quarks/leptons with realistic gauge groups and the correct Higgs structure [12, 9, 14, 13, 6, 7, 8, 5, 66, 42, 23], with more predicted to exist .
Despite this progress, one should not lose sight of the necessary requirement that such vacua must satisfy; namely, that their observable properties be consistent with all known low energy phenomenology and properties of particle physics. To do this, one must explicitly perform top-down strings computations of observable quantities and compare the results with the experimental data. Within the context of , for example, the masses of the gauge bosons and the Higgs mass have been computed to one-loop accuracy using an explicit renormalization group calculation from the compactification scale, with the results shown to be accurate [2, 1, 64, 65, 66]. It was also demonstrated in this model that all supersymmetric sparticle masses are above their present experimental lower bounds. However, the values of the various dimensionful and dimensionless couplings of the low-energy theory – for example, the Yukawa couplings, the gauge coupling parameters and so on – have not been explicitly calculated to date. Among the many such quantities one would like to compute from a top-down string model, of particular interest are the Yukawa couplings. With these in hand, one could make a concrete prediction for the masses of elementary particles from string theory.
Generic discussions of the mathematical structure of Yukawa couplings within the context of heterotic compactifications have been presented in . Unfortunately, it is not currently possible to compute these couplings explicitly in general. To do so requires finding the gauge-enhanced Laplacian on a Calabi–Yau threefold with a holomorphic vector bundle, using this to compute the harmonic representatives of various sheaf cohomologies and then integrating a cubic product of these harmonic forms over the manifold. Unfortunately, there is no known analytic expression for the metric on a Calabi–Yau manifold, nor does one know the analytic form of the gauge connection on the vector bundle. Hence, it is presently impossible to determine the required harmonic one-forms analytically.
A number of previous works have tried to tackle this problem numerically. Building on the seminal work of Donaldson [36, 37], there are now algorithms that approximate Ricci-flat metrics on Kähler manifolds and solve the hermitian Yang–Mills equations [40, 39, 52, 51, 10, 11, 3, 4]. In principle, once one has the Ricci-flat metric and the gauge connection, one can find the normalized zero modes of various Laplacians on the compactification manifold and then, as stated above, compute the Yukawa couplings. Despite focussed work on this topic, this goal has not yet been achieved. The current state-of-the-art allows numerical calculations of the metric, the gauge connection and the eigenmodes of the “scalar” Laplacian (the Laplacian without the gauge connection acting on functions). There is no conceptual barrier to extending this numerical approach to the full problem of computing zero modes of gauge-coupled Laplacians. However, there is a very serious technical barrier. Moving away from simple Calabi–Yau manifolds, such as the quintic threefold, to non-simply connected Calabi–Yau manifolds with discrete symmetries and complicated gauge bundles – such as those referenced above – greatly increases both the time and computational power needed. A natural question is whether new computational techniques, such as “machine learning”, might be useful in reducing the time and resources required for these more phenomenologically realistic vacua.
Recently, there has been a great amount of interest in applying techniques of machine learning to string theory, as pioneered in [48, 49, 57, 68, 21]. In particular, methods of machine learning have been applied to various “Big Data” problems in string compactifications, such as string vacua, the AdS/CFT correspondence, bundle cohomology and stability, cosmology and beyond [20, 60, 24, 33, 73, 45, 17, 18, 25, 63, 54, 67, 43, 29, 56, 34, 16, 26], as well as the structure of mathematics [50, 53, 46, 47]. The idea of this present work is to apply these same methods to see whether they are able to increase the accuracy and/or reduce the time and cost of numerical calculations, specifically of the Calabi–Yau metric on generic threefolds. As we will see, machine learning does appear to have a part to play in this story.
First, we show that machine learning algorithms can “learn” the data of a Calabi–Yau metric. More specifically, Donaldson’s algorithm involves a choice of line bundle, whose sections provide an embedding of the Calabi–Yau within projective space. The choice of line bundle fixes the degree of the polynomials that appear in an ansatz for the Kähler potential. As increases, the numerical metric becomes closer to Ricci-flat and the algorithm increases in both its run-time and resource requirements. For clarity, we will introduce our machine learning algorithm within the context of the determinant of the Calabi–Yau metric – a single function rather than the nine components required to express the complete metric. The determinant is also of interest in its own right, since it is necessary to compute the so-called -measure which determines how close the metric is to being Ricci-flat. We will show that given the data of the determinant of the metric for low values of , our machine-learning model can predict the determinant corresponding to higher values of (that is, closer to the actual Ricci-flat metric). However, this calculation needs to be “seeded” with some values of the determinant at larger values of ; in other words, this is a supervised learning problem.
Unfortunately, having to “seed” the calculation with values of the determinant at larger – which must be computed using Donaldson’s algorithm – greatly increases the run-time required. Ideally, one would like to take a preliminary numerical approximation to the determinant and improve on its accuracy without needing to input any data for larger values of . There are a number ways one might go about this. In this paper, we use a simple extrapolation based on curve fitting to predict how the determinant behaves at larger values of , leaving more complicated methods to future work. We show that this curve fitting algorithm significantly reduces the time required to compute the determinant to higher accuracy. Be that as it may, although faster than using the above machine learning calculation, curve extrapolation is still rather time and resource expensive.
To overcome this problem, we combine both of these approaches: we use the extrapolated data from curve fitting to seed a supervised learning model. Remarkably, we find that this combination of the two algorithms is able to predict the values of the determinant much more quickly than either of the approaches individually. We compare the accuracy and run-time of this combined model with extrapolation and supervised learning individually, as well as Donaldson’s algorithm. We will see that one does not sacrifice much in the way of accuracy, but gains tremendously in speed. In particular, we will demonstrate a factor of roughly 75 speed-up over Donaldson’s algorithm alone.
As stated above, for clarity we present this combined algorithm within the context of calculating the determinant of the metric. We emphasise, however, that these results are immediately applicable to numerically computing the full Calabi–Yau metric. We will show this explicitly in the penultimate section of this paper. This combined algorithm – using Donaldson’s method to compute the Calabi–Yau metric for low values of , combined with curve fitting to compute a small sample of training data and finally machine learning to predict the metric for the remaining points – is the main result of this paper. It provides a factor of 50 speed-up over using Donaldson’s algorithm alone.
We plan to show:
I) Donaldson’s algorithm can be pushed to greater accuracy using Mathematica’s fast linear algebra routines. However, this remains time and resource intensive, both scaling factorially as we increase . Our aim is to use machine learning to mitigate these problems.
II) Focussing on the determinant of the Calabi–Yau metric for clarity:
Using supervised learning, a machine-learning algorithm (ML) can be trained to predict properties of a Calabi–Yau metric, specifically the determinant. This will show that the geometry of Calabi–Yau manifolds is amenable to the techniques of machine learning, at least in principle.
Unfortunately, the nature of supervised learning means that we need some sample data for whatever we are trying to predict. To side-step this, we use a straightforward curve-fitting analysis to extrapolate from lower accuracy, easily computable data to higher accuracy data that is otherwise very time consuming to obtain via Donaldson’s algorithm.
Curve fitting for a larger number of data sets is also time consuming. To avoid the shortcomings of both the machine-learning and curve-fitting approaches, we combine them. Curve fitting provides an easy way to obtain accurate values of the determinant that can then be used to train machine-learning via supervised learning. The curve fitting needs to be done on only a small sample of the total data since the ML needs only a small training set. Together, this allows one to compute the metric data many times more quickly than Donaldson’s algorithm alone, with a factor of 75 speed-up for the determinant.
III) For the complete Calabi–Yau metric:
The combined algorithm presented for predicting the determinant of the metric, that is, using both machine learning and curve fitting, will be shown to be applicable for computing the complete Calabi–Yau metric. We show that this allows one to compute the complete metric data many times more quickly than using Donaldson’s algorithm alone, with a speed-up by a factor of 50 or so for the full metric.
We begin in Section 2 with an overview of Donaldson’s algorithm for approximating Calabi–Yau metrics, with a more detailed discussion presented in Appendix A. In Section 3 we outline the general ideas of machine learning and the specific kind of machine learning we will be using, namely supervised learning. We then discuss how supervised learning can be applied to predict the data of the approximate Calabi–Yau metric. In Section 4, we outline how to extrapolate higher-accuracy data from lower-accuracy data via curve fitting, and we combine this with supervised learning in Section 5. Section 6 is devoted to showing that this combined algorithm, that is, using machine learning along with curve fitting a small number of training points, is directly applicable to the complete nine-component Calabi–Yau metric. We finish the text with a discussion of future work. The appendices contain a detailed discussion of Donaldson’s algorithm, a description of our numerical routine implemented in Mathematica and a rewriting of various error measures, a discussion of the machine-learning algorithm we use and finally, as a sanity check, we show that machine learning cannot be replaced by simple regression.
2 Calabi–Yau metrics and Donaldson’s algorithm
We begin with a review of Calabi–Yau metrics, Yukawa couplings and Donaldson’s algorithm for finding numerical metrics on Calabi–Yau manifolds . A more detailed discussion for the particular case of the Fermat quintic is included in Appendix A.
Let be a smooth, compact Calabi–Yau threefold, with Kähler form , a compatible hermitian metric and a nowhere-vanishing complex three-form . Together, and define an structure on . The statement that has holonomy is equivalent to the differential conditions
which, in turn, imply that is Ricci-flat. Let , be the three complex coordinates on . Since is hermitian, the pure holomorphic and anti-holomorphic components of the metric must vanish; that is
Only the mixed components survive, which are given as the mixed partial derivatives of a single real scalar function, the Kähler potential :
Note that, to simplify our notation, we will often denote the determinant of the hermitian metric by
The Kähler form derived from the Kähler potential is
where and are the Dolbeault operators. Recall that is only locally defined – globally one needs to glue together the local patches by finding appropriate transition functions (Kähler transformations) so that
Since is Kähler, the Ricci tensor is given by
Practically, finding a Ricci-flat Kähler metric on reduces to finding the corresponding Kähler potential as a real function of and . Yau’s celebrated proof  of the Calabi conjecture  then guarantees that this Calabi–Yau metric is unique in each Kähler class.
The particle content of the low-energy theory one finds after compactifying heterotic string theory on a Calabi–Yau threefold is fixed by topological data of both the manifold and the choice of gauge bundle [19, 69, 70]. The masses and couplings of the particles, roughly speaking, are then fixed by cubic couplings (with masses coming from coupling to Higgs fields). Schematically, these couplings take the form
where the are zero modes of the Dirac operator on coupled to the connection on .
Note that does not give the physical couplings unless the zero modes are correctly normalized.
where indicates a contraction with ’s and ’s to give a -form. The normalized couplings are then given by calculating in a basis of zero modes where . Note that depends on the harmonic representative we take for the modes – it is not enough to only know their cohomology classes. For the simplest example where (and deformations thereof), one can compute using the tools of special geometry. It is not known how or if one can compute using similar tools for general choices of bundle . Instead, one must tackle the problem in its full glory by finding the Ricci-flat metric on , calculating the connection on , and finally explicitly computing the normalized -valued -forms.
To date, no analytic Calabi–Yau metric has ever been found on any compact Calabi–Yau manifold (other than for trivial cases, such as products of tori). Nevertheless, an explicit algorithm to numerically determine the Ricci-flat metric was given by Donaldson . This algorithm has subsequently been explored in a variety of papers where it has been used to find numerical Calabi–Yau metrics, find gauge bundle connections that satisfy the hermitian Yang–Mills equation, examine bundle stability and explore the metric on Calabi–Yau moduli spaces [52, 39, 40, 38, 10, 11, 51, 3, 4, 55].
In the remainder of this section, we describe Donaldson’s algorithm in more detail (with the specific case of the Fermat quintic presented in Appendix A) and discuss the computational problems one faces when trying calculate to higher order in the iterative approximation. These challenges will motivate the machine-learning approach we discuss in the remainder of the paper.
2.1 Donaldson’s algorithm
The general idea of Donaldson’s algorithm  is to approximate the Kähler potential of the Ricci-flat metric using a finite basis of degree- polynomials on , akin to a Fourier series representation (see also  and ). This “algebraic” Kähler potential is parametrized by a hermitian matrix with constant entries. Different choices of correspond to different metrics within the same Kähler class. Following Donaldson’s algorithm, one then iteratively adjusts the entries of to find the “best” degree- approximation to the unique Ricci-flat metric. Here “best” is taken to mean the balanced metric at degree . Note that, as one increases , the balanced metric becomes closer to Ricci-flat at the cost of exponentially increasing the size of the polynomial basis and the matrix . As we will discuss, at some point it becomes computationally extremely difficult to further increase and, hence, to obtain a more accurate approximation to the Ricci-flat metric..
One can check how good the approximation is – that is, how close the balanced metric is to being Ricci-flat for a given value of k – by computing a variety of “error measures”. These include , a measure of how well the Monge–Ampère equation is solved, and , a direct measure of how close the metric is to Ricci-flat. We will describe exactly what these quantities are later in this section.
Let us begin by summarizing the algorithm as given by Donaldson. After this we will discuss how one implements it numerically, with many of the details left to Appendix A.
Let the degree be a fixed positive integer. We denote by a basis of global sections
3of : (2.10)
In other words, we choose an -dimensional basis of degree- holomorphic polynomials on . The values of grow factorially with ; for a quintic Calabi–Yau, is given for any by equation (A.37).
The pairing defines a natural inner product on the space of global sections, so that gives a metric on . Consider the hermitian matrix
where is the integrated volume measure of :
In general, and will be unrelated. However, if they are inverses of each other
the metric on given by is said to be “balanced”. This balanced metric then defines a metric on via the Kähler potential (2.11). We also refer to this metric on as balanced.
Donaldson’s theorem then states that for each a balanced metric exists and is unique. Moreover, as , the sequence of metrics converges to the unique Ricci-flat Kähler (Calabi–Yau) metric on .
In principle, for each , one could solve (2.14) for the that gives the balanced metric using (2.12) as an integral equation. However, due to the highly non-linear nature of the equation, an analytic solution is not possible. Fortunately, for each integer , one can solve for iteratively as follows:
Define Donaldson’s “-operator” as
Let be an initial invertible hermitian matrix.
Then, starting with , the sequence
converges to the desired balanced metric as .
The convergence is very fast in practice, with only a few iterations () necessary to give a good approximation to the balanced metric. For all calculations in this paper, we iterate the -operator ten times.
At this point, one has an approximation to the Calabi–Yau metric, given by the balanced metric computed at degree . A natural question is: just how good is this approximation, that is, how close is the balanced metric evaluated for integer to being Ricci-flat? A number of “error measures” have been introduced in the literature for this purpose [40, 39, 10, 3], two of which we discuss here.
The “ measure” is a measure of Ricci flatness encoded by the Monge–Ampère equation. Consider the top-form defined by the Kähler form. Since is a Calabi–Yau threefold with the unique (up to scaling) non-vanishing -form, these two must be related by an overall constant . That is
This is equivalent to the Monge–Ampère equation which defines the Calabi–Yau metric. Comparing with , one should find they agree pointwise up to an overall constant (which is the same for all points). To avoid computing the constant, we can compare the integral of the two top-forms so that cancels:
Note that one can compute exactly using a residue theorem. Taking , where is the Kähler form for , this equality holds if and only if is the desired Calabi–Yau metric. Said differently, the ratio of and must be 1. Integrating over , the quantity
is 0 if and only if is the Kähler form of the Calabi–Yau metric. In other words, is a measure of how far is from the Ricci-flat metric. As increases, approaches zero at least as fast as [10, 11]. Note that , and thus , can be computed directly from the determinant of the metric . This is one of the reasons we focus on the determinant later in this paper: it is straightforward to check how accurate our machine-learning approach is by computing .
The “ measure” is a global measure of how close to zero the Ricci scalar is. The quantity
where is the Ricci scalar computed using the balanced metric for integer , is zero if and only if is the exact Calabi–Yau metric. The various factors of and that appear in this expression are there to remove any scaling dependence on .
Note that there are other error measures one could use, such as the measure from , or the pointwise values of the Ricci scalar or any components of the Ricci tensor . However, we will not discuss them in this paper. We leave the details of the numerical implementation of Donaldson’s algorithm and the calculation of the error measures and to Appendices A and B. They are also discussed in several reviews in the literature [40, 39, 10, 3, 4]. Here we focus only on those details which will be relevant to machine-learning the Calabi–Yau metric later in this paper.
Both the -operator and the error measures involve integrating over the threefold, suggesting that we need to introduce local coordinate charts and all of the complications that come with these. Fortunately, we can avoid this by approximating integrals by sums over random
The number of points we need to take when approximating the integrals is important and will be explicitly discussed. In addition, once we have found the balanced metric for fixed integer , we need to consider how many points at which to evaluate . That is, there are three (unrelated) numbers of points that must be specified. These are:
Let be the number of random points we sum over to approximate the -operator in (2.15). As we discuss below, once we have chosen the degree – and, hence, – at which to approximate the Ricci-flat metric, is bounded by the requirement that .
Let be the number of points for which we want to know the value of the metric .
We consider each of these in turn.
As discussed in , since the -operator leads to an -matrix , one needs points for convergence of to the balanced metric. If one uses too few points, one finds does not converge properly to the balanced representative and the resulting metric is further away from Ricci-flat than one would otherwise expect. Said differently, iterating the -operator with too few points leads to an -matrix, and the associated metric on , that has error measures larger than those of the matrix computed with . It was found in a previous study of the scalar Laplacian  that a good choice is
Unless otherwise stated, for fixed and hence , we will always evaluate the -operator using this many points in the integral.
We do not need to evaluate the integrals in the error measures for all points. Instead, one can approximate these error measure integrals, for any integer , with a fixed number of points . Roughly, the percentage error when computing, for example, with points is . Hence, if one is interested only in checking how close the numerical metric is to Ricci-flat, it is sufficient to take to get estimates that are good to 1%. Of course, using larger values for will result in even more accurate results. In the remainder of this paper, we will explicitly state the value of that we are choosing for a given calculation.
Finally, for a fixed value of , is the number of points for which we want to know the value of the components of the resulting metric . This is the desired output of Donaldson’s algorithm, giving us the ability to calculate the metric numerically and then to use it to compute other quantities on the Calabi–Yau threefold. For example, if one wants to solve the hermitian Yang–Mills equations [39, 3, 4] or find the eigenmodes of Laplace operators , one needs to know numerically. In the latter case, it was found in  that for a quintic Calabi–Yau threefold it is sufficient to solve for the eigenmodes using 500,000 random points. Since the metric appears in the Laplace operator, one also needs to know for those same random points; that is, . Similarly, when we predict the values of on the quintic later in this paper, we will assume that we want to know these values for random points.
Note that the bound (2.22) has implications for both the speed and feasibility of the numerical calculations. For example, for a quintic hypersurface embedded in , using (A.37) for and one has and respectively. This means one needs and points respectively to be confident that the -operator will converge properly to the balanced metric. For , needs to be on the order of 500 million points. We see that as we push to higher values of , in addition to the size of the polynomial basis increasing factorially, the number of points we sum over to find the balanced metric also grows factorially. Together, these make going to larger values of prohibitive in both time and computational resources. Previous studies have been limited to values of that give on the order of 500 sections (roughly or so for the Fermat quintic).
2.2 A check using Donaldson’s algorithm
Although all analytic and numerical methods that we will discuss are valid on any threefold, for concreteness we focus on the Fermat quintic , defined by equation (A.35), for the remainder of the paper. The main aim of this paper is to use machine learning to enhance the speed of calculation of the Calabi–Yau metric, reducing the time needed compared to using Donaldson’s iterative algorithm. As we discuss in Appendix B, we have chosen to implement this using Mathematica, rather than C++ as was previously used, since it is well suited to the numerical linear algebra calculations that occur in Donaldson’s algorithm and provides an extensive suite of machine-learning tools. As a check of our Mathematica implementation, we first apply it explicitly to Donaldson’s algorithm, compute the various error measures, and compare these with the error measures previously found in  using a C++ implementation. In Figures 1 and 2, we plot and respectively for , for both our new Mathematica implementation – the blue line – and the C++ implementation – the red dashed line – used in . In all cases, calculations of the -operator were carried out using points, fixed by (2.22). The error measures were computed using for all . We use here so that we can directly compare our Mathematica implementation with the results in  which were computed using . We conclude that the C++ results are reproduced by our new Mathematica implementation, which we employ from here onwards.
Note that the numerical approximation to the Ricci-flat metric improves as increases. Unfortunately, as increases, the computational time and resources needed to carry out the numerical integrations grow dramatically. This is due to factorial growth of both the number of sections and the number of integration points . In Figure 3, we plot the times needed to calculate the -matrix as varies. We do indeed see factorial growth. For , calculations take on the order of 50 hours. Such long times might be acceptable if one is interested only in computing the balanced metric once to high accuracy. However, in reality, one would like to vary the complex structure or Kähler parameters to explore the moduli space of the Calabi–Yau threefold without reducing the accuracy of the approximation. In the case of gauge connections, one would like to employ similar methods to explore gauge bundle stability. For both of these, one needs to repeatedly calculate the -matrix quickly – a single calculation that takes 50 hours suddenly looks very slow when one has to repeat it for multiple choices of complex structure, Kähler and bundle moduli. Note that it is unlikely that one will be able to go to much larger values in the near future using Donaldson’s algorithm alone without moving to a cluster – for example, the above timings would suggest that would take approximately 35 years!
Ideally, we would like to find some way of greatly speeding up this calculation and improving the accuracy of our approximation (akin to going to larger values of ). It is clear that, to do so, one must modify the calculational procedure and no longer use Donaldson’s algorithm on its own. The remainder of this paper discusses how this might be done using a combination of machine learning and curve fitting.
3 Machine learning the Calabi–Yau metric
In [48, 49] a paradigm was proposed to use artificial intelligence in the form of machine learning (deep learning in particular) to bypass expensive algorithms in computational geometry. Indeed, [48, 49, 57, 68, 21] brought about much collaboration between the machine-learning and string theory communities. It was found that problems central to formal and phenomenological aspects of string theory, such as computing bundle cohomologies or determining particle spectra, appear to be machine learnable to very high precision (see  for a pedagogical introduction). It is therefore natural to ask whether machine-learning techniques may be of use in our present, computationally expensive problem. Henceforth, we will abbreviate any machine-learning techniques, be they neural networks or decision trees, collectively as ML.
As one can see from (A.37), the size of the monomial basis at degree , and hence the size of the matrix , grows factorially. This presents a problem: the Ricci-flat metric is better approximated as , but the complexity growth with respect to is factorial. Furthermore, Donaldson’s algorithm involves evaluating the monomial basis at each sampled point on the manifold, multiple matrix multiplications and finally a matrix inverse. Taken together, it is clear that pushing the algorithm to higher values of is, at best, computationally expensive and, at worst, impossible with reasonable bounds on accessible hardware.
If one could predict the relevant quantities at higher given data computed at lower , then one could bypass the most expensive steps of Donaldson’s algorithm. In this section, we discuss how this can be done given a small sample of values at the higher value of . Note, however, that the small sample of values must still be computed by following Donaldson’s algorithm – we still need to evaluate the -operator for points to find the balanced metric. This section should therefore be seen as a test of our machine-learning approach. However, to be useful in practice, we must find some way of calculating or predicting the higher values without Donaldson’s algorithm; in the later sections, we outline how this can be done.
3.1 Supervised learning
We begin with a somewhat abstract review of machine learning, focussing on the particular case of supervised learning. We will try to make this more concrete in Section 3.2, where we show how this applies to the problem at hand.
Our problem is a natural candidate for supervised learning:
We have a set of input values for which we know the output . For example, the inputs might be a set of points on the quintic and the values of at each such point. The outputs might be the values of or at each point for a larger value of . This constitutes a set of labelled data of the form .
Using this data, we can train an appropriate ML (which can be any of the standard ones such as a neural network, a classifier, a regressor, etc.) to predict output values from the inputs .
6Here, training means that the ML optimizes its parameters in order to minimize some cost-function (such as the mean squared error, determined by how far off the predicted values are from the actual values of ).
Given a set of new inputs for which we do not know the outputs, we use the trained ML to predict a set of outputs, one for each .
In its simplest form, supervised learning is no different from regression, familiar from rudimentary statistics. The key difference with supervised learning (and machine learning more generally) is that one does not specify a single, usually quite simple, function as in the case of regression, but rather a set of non-linear functions, such as a complicated directed graph of nodes in the case of neural networks, or multi-level output in the case of classifiers. The more sophisticated the structure of the ML, the better it can approximate complicated systems.
In general, one needs a measure of how well trained the ML is; that is how accurate its predictions are. The standard measure uses cross validation. Take the labelled data and split it into two complementary sets, and , so that . These are usually referred to as the training data, , and the validation data, . Cross-validation is as follows:
We train the ML on , the training set. This optimises the parameters of the ML to minimise whichever cost-function we pick.
We apply the optimized ML on the inputs of , the validation set, giving us a set of predicted values .
We then cross-check the predicted values against the known values within the validation set . We do this by examining some goodness-of-fit measure (such as percentage agreement or chi-squared). This allows us to see how well the ML is performing.
We then vary the size of the training set to see how the goodness-of-fit measure varies. For example, we could check how well the ML performs after training on 10%, 20%, etc., of the total data . The plot of against the size of is called the learning curve. Typically, the learning curve is a concave function that increases monotonically as we increase the percentage of training data. In other words, when the ML is trained on a larger sample of data, it performs better, but the improvement diminishes with each added training point.
The particular flavour of ML that we have chosen to focus on is that of gradient-boosted decision trees.
3.2 Learning the determinant
One expects the analytic form of the Kähler potential for a Calabi–Yau metric to be a complicated non-holomorphic function; so complicated, in fact, that no explicit form has ever been written down, even for the simplest of compact Calabi–Yau manifolds.
As we discussed in Section 2.1, Donaldson’s algorithm gives a way to approximate the honest Ricci-flat metric on via a balanced metric , computed at some fixed degree . Since the metric, its determinant and the Ricci tensor can be derived in turn by simple operations such as logarithms and derivatives, we choose to focus on one of them.
Let us consider the determinant of the metric, , because:
It is a convenient scalar quantity, easily calculated from itself.
It encodes curvature information since the mixed partials of its logarithm give the Ricci tensor.
It allows one to integrate quantities over the manifold.
One can use it to compute the accuracy measure . From (2.19), the only approximate quantity appearing in is , but this is fixed by the determinant of via the relation
In other words, gives a straightforward example to which we can apply machine-learning techniques while still allowing us to compute the error measure to check the accuracy of our methods. We could, for example, have focussed on the Kähler potential itself, but since we are predicting the values at each point and not its functional form, we would have been unable to compute to check whether our approach was actually useful.
Since the metric itself can be thought of as a collection of patch-wise functions , our procedure for predicting can also be used to predict the values of itself. This is, of course, what we actually want to do in practice since it is the metric itself that enters into calculations of the gauge connection and various Laplace operators on .
Given the supervised learning routine outlined above, it is natural to ask whether machine-learning techniques can improve the accuracy of our approximation to the Calabi–Yau metric and/or reduce the amount of time needed for the calculation. Specifically, focussing on the determinant, we ask:
Given a set of points on the quintic and the corresponding values of computed at some low degree , can one predict the values of computed at a higher degree ?
As an example, imagine we want the value of the Calabi–Yau determinant for points on the quintic. Using Donaldson’s algorithm, we can find an approximation to by computing the determinant of the balanced metric, where the degree controls the accuracy of the approximation. As we increase , we get a better approximation to the honest Calabi–Yau determinant with the price being an explosion in computational time due to the factorial increases in both and .
Suppose we use Donaldson’s algorithm to compute the value of for all of the points and , for fixed , for only a small sample of them – can we use machine learning to predict the remaining values of the determinant ?
Suppose we use Donaldson’s algorithm to compute the value of for all of the points and , for fixed , for none of them – can we use machine learning to predict all of the values of ?
In the first case, one needs some of the values of as an input to our supervised-learning model, while, in the second, one does not need to have calculated at all. The first problem is amenable to supervised learning since we have both input () and output () data. The second problem, unlike the first, does not naturally fall within supervised learning. Obviously, we would like to find a solution to the second problem as it would side-step having to follow Donaldson’s algorithm for the higher value of , whereas in the first case we still need to compute some of the higher data.
Let us make clear why computing even a small sample of the higher data is unacceptable in practice. In the first problem, we need the value of for a small sample of the points so that we have some data for our ML to learn from. In order to compute any of these values, however, one must first iteratively solve for at degree . But we cannot simply compute the matrix using only the small sample of points we intend to use as the input data! Instead, (2.22) forces us to integrate over sufficient points so that holds. This means there is a hidden, and unacceptably large, computational cost in solving the first type of problem.
For example, imagine we tried to teach an ML to predict from . To generate the values in the first place, we would have to evaluate the -operator for approximately points, otherwise would not converge properly to the balanced metric.
A solution to the second type of problem, where we do not require any of the higher data, would indeed avoid Donaldson’s algorithm for the higher value of and so potentially greatly speed up the time of calculation – this is the main goal of the paper. We will spend the remainder of this section discussing the first problem, leaving a solution to the second problem to Sections 4 and 5.
Let the input values be of the form
where are coordinates
Next, let the output be
where . As increases Donaldson’s algorithm becomes more costly, both in time and computational resources, due to the sizes of the intermediate matrices involved. The idea is to avoid this by training a model to predict the values of the determinant.
To summarize, we have labelled data of the form
where signifies a data set with the values of as inputs and as outputs. Using this data structure, we will perform supervised learning and see whether an ML can accurately predict the determinant for from the values. We will then explore how the accuracy changes as we use higher values of as an input.
3.3 Warm-up: to
As a warm-up, let us first try learning the values of from the points and . Donaldson’s algorithm and the calculation of at are relatively fast, so it is easy to check how well the machine-learning model is doing.
Note that, even though we are eventually interested in the values of at all random points, it is sufficient to limit the data to a much smaller set of such points when checking the validity of our machine-learning algorithm. Here, we take our labelled data to consist of random points on the quintic, together with the values of the determinants of the balanced metrics computed by Donaldson’s algorithm at and . These are organised as in (3.26):
In principle, there is some complicated function which describes this map. Standard regression analysis would require one to guess some non-linear function with parameters which approximates this map, and then optimise the parameters using least-squares, etc. However, even the form of this function is difficult to imagine. Herein lies the power of machine-learning: one does not try to fit a single function, but rather, uses a combination of non-linear functions or decision trees in an interactive and interconnected fashion. The ML can then, in principle, approximate the function without us having to guess its form in the first place.
Suppose we take a training set of 2,000 random samples from . Our validation set will be the remaining 8,000 samples. The ML is trained on , the 2,000 samples of points on the quintic with their associated values of and . Once trained, we present it with the remaining 8,000 samples of from the validation set , and use it to predict the values of for those points. We then want to compare the set with the known values in for the sample of 8,000 points. This comparison is shown graphically in Figure 3(a), where we compare the 8,000 values of predicted by the ML versus the actual values of computed from the balanced metric. One sees that the predicted values are indeed a good approximation to the actual values of , with the points clustered around the line without any obvious bias. The best fit curve is
where perfect prediction corresponds to . We also compare in Figure 3(b) the values of and , both computed using the balanced metrics – this is the distribution one would see if the ML were simply using the input value of as its predicted value . We note that this shows a large deviation away from the perfect-prediction line, indicating that simply taking is worse than the ML. In other words, having seen only 2,000 samples of data, the ML has learned to predict the values of for the remaining 8,000 points with impressive accuracy and confidence, all in a matter of seconds.
Our comparison of the actual values versus predicted values, though reassuring, is rather primitive. The linear coefficient in (3.28) indicates only how good the prediction is for values of the determinant at . What we are really interested in is how close the predicted metric is to the honest Ricci-flat metric (which one would find in the limit ). A good measure of this is the error measure, given in (2.19). Recall that is determined
The error measure , computed using the predicted values , is significantly smaller than and within 10% of the actual value of . This tells us that our ML provides a much better approximation to the determinant of the Ricci-flat metric than , and it is relatively close to in its accuracy.
Note that since Donaldson’s algorithm starts from a Kähler potential, the resulting balanced metric is guaranteed to be Kähler up to the numerical precision we are working with. One might worry that the predicted values (or if one were predicting the components of the metric) no longer correspond to an exact Kähler metric. This will indeed be the case since we are predicting the values of and, hence, its “Kählerness” is no longer built in. However, given the results of Figure 3(a) and other checks (such as comparing calculated using both and ), one can be confident that the underlying predicted metric is still approximately Kähler. This also holds true for the other calculations in this paper.
3.4 Varying the input and output
Having seen that our ML can learn to predict the values of from a small sample of data, it is natural to ask whether it can repeat this for higher values of . That is, can the ML learn to predict , where , from ?
For this experiment we fix random points on the quintic and split them into training and validation sets, each of size 10,000. In the notation of (3.26), for each we take to be 20,000 samples of data and split it into a training set and validation set , each of size 10,000. For each value of , up to , we train an ML on in . Using the ML, we predict the values of at each point for the 10,000 validation samples in and compute the resulting error measure. We plot the predicted values in Figure 5. We see that when the ML is trained on higher k data (as increases, the balanced metrics are closer to Ricci-flat), its predictions for result in smaller error measures. We note, however, that the improvement plateaus around , suggesting that the information contained in is not sufficient to predict the values of with greater accuracy.
Having seen that one can use ML to predict the determinant at higher degrees by training it on a small set of training data, consisting of both and the higher values, we now explore how the accuracy of our routine changes when we increase the degree used to compute the input data, replacing with for . For example, consider training an ML to predict the values of . We might try to predict from instead of .
Again, we fix random points on the quintic and split them into training and validation sets, each of size 10,000. We then train an ML on in , one ML for each pair of and , with and . Using the ML, we predict the values of at each point for the 10,000 validation samples in and compute the resulting error measure. In Figure 6 we plot the predicted values of as we vary the degree for the input determinant. For example, we see that training the ML on (using as input and as output) leads to a larger measure than (using data). As might be expected, the ML’s predictions are better (where “better” is measured by how close the predicted is to , the error measure computed using Donaldson’s balanced metric) when the degree of the input determinant is larger.
As we have seen in this section, an ML is able to learn the determinant of the balanced metric for a total labelled data set having seen only a small amount of the data given in the training set . This provides an important check on our approach. However, as discussed above, the method described in subsections 3.3 and 3.4 still requires the use of Donaldson’s algorithm to compute the values of , albeit for a smaller set of training points. Hence, it remains necessary to calculate at the higher value of – a very time-consuming procedure that becomes factorially slower as the value of increases. In the next few sections, we will discuss how to modify our machine-learning algorithm so as to remove the need for the sample data at the higher value of . We will then compare this new machine-learning algorithm with the known results from the balanced metric. In practice, when one is trying to extend calculations to degrees that are too large for Donaldson’s algorithm to finish in a reasonable time, one will not have the balanced metric to compare with. Thus it is important that we are confident that our supervised-learning model is trustworthy.
Note that the results of this section are of interest on their own – Calabi–Yau metrics (and the balanced metrics that approximate them) are algebraically complicated, so for those not familiar with machine learning it might be surprising that we can achieve such accuracy with so small an amount of training data. Again and again, machine learning has proved able to learn complicated algorithms or infer behaviour from data without any known unified mathematical description. The exact way it does this is often obscure – we are not able to offer any insight into why our data is so amenable to ML.
While Donaldson’s algorithm is factorial in complexity with respect to (the size of the monomial basis, the size of the matrix and the number of points all increase factorially), the machine-learning approach, which focuses only on the final result of as a distribution over the random points, does not grow in complexity. This makes machine learning extremely attractive from a speed point of view. One could well imagine packaging a trained ML to allow researchers to do their own calculations using Calabi–Yau metrics without having to go through the entire process of calculating the balanced metric, and so on.
As we have mentioned many times, the nature of supervised learning means that the ML has to be trained on a sample of values of in computed at the higher value of . To obtain these, one could follow Donaldson’s algorithm for computing the balanced metric and then compute at least some values of , as we did above. Ideally, however, one would like to avoid this calculation entirely, side-stepping the need to compute the balanced metric for the higher value of . In the following section, we present a simple extrapolation approach that does just this. In the section after that, we combine this extrapolation with our machine-learning model to quickly obtain accurate predictions for the determinant without having to compute the balanced metric at the higher value of .
4 Extrapolating out to higher
In the previous section, we saw that supervised learning provides a quick and accurate way to obtain properties of the metric (such as the value of the determinant) for all points using only the data of a small number of training points.
We now discuss how one can obtain a similar result without needing to calculate for the higher values of . We do this using a simple extrapolation based on regression and curve fitting. We will see that given the values of for a small range of values, one can accurately extrapolate to higher values of . On its own, this provides a way to obtain more accurate numerical values of , side-stepping Donaldson’s algorithm and the need to find the balanced metric. Unfortunately, curve fitting for a large number of points, say , is still very time consuming. To mitigate this problem, in the next section, we will combine curve fitting with machine learning: curve fitting will be used to obtain the training data on a relatively small number of points, and then the previously discussed supervised-learning routine can be used to predict the values of for all 500,000 points. Together, this gives a substantial speed up compared with following Donaldson’s algorithm for larger values.
As before, we will focus on a scalar quantity, namely the determinant of the metric. Donaldson’s algorithm produces the balanced metric for each chosen degree . Thus, for every point on the quintic, one can compute, using Donaldson’s algorithm, the determinant for each degree . In this way, we have a list of values of , , and so on, for each point on the manifold. The idea is to examine the behaviour of for each individual point as varies. In Figure 7, we show how the value of changes with for ten randomly selected points. We see that the behaviour of the determinant for each point can be well approximated by a decaying exponential plus a constant term; that is
where , and are fixed parameters that depend on the choice of point .
The form of this equation is not entirely surprising and makes intuitive sense. As , the balanced metric that Donaldson’s algorithm produces gets closer and closer to the honest Ricci-flat metric. Similarly, the value of evaluated at a point on the manifold must also tend to the value corresponding to that of the Ricci-flat metric . The precise way that tends to its final value is certain to be complicated, but it is clear that, other than at particularly singular points, it should approach its asymptotic value in a relatively smooth manner. Moreover, the rate at which it tends to its final value should be such that for each extra degree in , there is a diminishing gain in the accuracy of (as measured by evaluating ). Together, these suggest a decaying exponential with a constant shift would be a reasonable description of how changes with increasing .
One simply fits an equation of the form (4.30) to the values of for each point on the manifold.
One might wonder why we have picked the range as an input for the curve fitting. As we will see in the next section, this range results in predictions that are equivalent in accuracy to the balanced metric computed using Donaldson’s algorithm. This allows us to directly compare the calculation times that one needs to achieve the same accuracy, that is the same measures. In practice, one will not know in advance what kind of accuracy one will achieve with a given range of input values. Instead, the range might be chosen by deciding how much time one is willing to spend calculating the input data. For example, one might compute instead, which will take longer to calculate but will lead to better curve fitting and a lower predicted error measure. For the remainder of this paper, we stick with as the input data for curve fitting.
Recall that we are actually interested in predicting the determinant for points rather than the smaller sample of we have considered here. Unfortunately, fitting a curve for each of points is both time and resource hungry – curve fitting in this manner does not easily scale. Machine learning, however, is well suited to problems with large data sets. Our plan is to use this “curve extrapolated” data to provide the small amount of “seed” output data for the training set for our previous supervised-learning model. The idea is that we use Donaldson’s algorithm to compute the balanced metric, and thus , for just . We can then extrapolate to find the values of out to for a small number of points, say 10,000, and use this data as an input to the training set of our previous supervised-learning model. We can then train this model to estimate the values for the rest of the points. If on wants to obtain the “best” predictions of , one takes in (4.30), resulting in a predicted error measure that can be compared with computed using Donaldson’s balanced metric. As we will see, this provides a quick way to compute without sacrificing much in the way of accuracy.
5 Supervised learning and extrapolation: results
In the previous two sections, we have explored how a particular property of an approximation to a Calabi–Yau metric, namely the determinant, can be captured by machine-learning or simple curve fitting. Let us remind ourselves of one of the goals stated in the introduction. If we are to use string theory to make contact with particle physics, we must be able to compute masses, couplings, and so on, from first principles. As a start, this will involve computing correctly normalized cubic couplings. To do this, we would like to have a robust and relatively quick numerical scheme for computing quantities associated with Calabi–Yau metrics. Practically speaking, this means being able to compute the metric, its determinant, zero modes of the Laplacian, and so on, to high accuracy without needing a supercomputer.
We have already seen that, given a small sample of training data , we can train an ML to accurately and quickly predict values for the determinant of the metric for the remaining points in the validation set . This is wonderful in principle but does not help us much in practice – we still have to “seed” the training data with some of the higher-accuracy (higher ) data. This requires using Donaldson’s algorithm to compute this higher training data, which is computationally expensive and extremely time consuming for large value of . In the previous section, we saw that one can actually extrapolate from lower out to higher using simple curve fitting. Unfortunately, this kind of fitting is also very slow in practice and not suited to computations with points, as are needed when computing Laplacian eigenmodes, for example.
The idea of this section is to combine both of these approaches to obtain accurate predictions for the determinant that are much faster than each of the above individual methods and, hence, useful in practice. Using the data computed for low values of , we will use curve fitting to extrapolate out to larger values of . We have to do this for only a small number of points, say 10,000, since the ML needs only a small amount of data to be trained (as we saw in Section 3). We can then use the extrapolated values of as the outputs in the training set. Using supervised learning, as in Section 3, we train an appropriate ML to quickly predict the values of at the higher value of for the remainder of the points.
Let us lay out explicitly the steps we will follow:
We fix points on the quintic for which we would like to compute to high accuracy.
Using this , we compute the values of for all points.
Select a subset of points along with their values of for . Using the curve fitting approach discussed in the previous section, we predict the values of for each point up to a larger value of . This gives us 10,000 extrapolated values of that we can use as an input data to train an ML.
We train an ML (using the approach outlined in Section 3) using 10,000 samples of the form
where is the affine coordinate of a point, are the determinants computed from the balanced metric and are the values given by the curve fitting. Since we already have the values of for , we may as well include them when training the ML. This input data forms our total training set .
We now have an ML that can be used to quickly predict the values of for the remaining validation samples . We already have the points and values of for the remaining samples, from which the trained ML is able to predict .
Using the predicted values for all points, we can compute to check the accuracy of the predictions.
Following these steps, we have computed the values from the predicted values of for , which we show in Figure 9. We also plot the values of computed using the balanced metric itself (as in Section 2).
Taking in to obtain the “best” possible prediction, combining curve fitting and machine learning then gives a predicted error measure equal to that of directly computing the balanced metric at ; one finds
to approximately 2%. This means we should compare our combined curve fitting and machine learning approach with calculating the balanced metric at . Remember that while the latter forces us to follow Donaldson’s algorithm for , the combined curve-fitting and machine-learning method only requires us to use Donaldson’s algorithm for . Note also that the value of is much smaller than with more than a factor of two improvement (recall that is the most accurate balanced metric that one must compute for the curve fitting). As a sanity check, we also computed the volumes defined by and via (A.59) and found agreement to better than 0.1%.
Most importantly, since the time that Donaldson’s algorithm takes scales factorially with , it turns out that our new combined method is much quicker – let us put some numbers on this. As we saw in Figure 3 in Section 2, following Donaldson’s algorithm with , one can find the balanced -matrix in approximately 182,000 seconds. Given , one can then calculate the values of for the points of interest in another 6,000 seconds, giving a total runtime of 188,000 seconds (or 52 hours). If, instead, we combine curve fitting and machine learning we have to sum: 1) 900 seconds to find for , with 2) 1,400 seconds to calculate the values of for for all 500,000 points, with 3) 130 seconds to curve fit and extrapolate out to for 10,000 points; and, finally with 4) 70 seconds to train an ML using the extrapolated data and predict for the remaining 490,000 points. This gives a total time of approximately 2,500 seconds (42 minutes). That is, for random points on the quintic
Comparing the two times, we see that utilizing curve fitting and machine learning leads to a speed-up by a factor of 75, almost two orders of magnitude.
It is interesting to ask: rather than using both curve fitting and machine learning, might one simply use curve fitting alone. That is, one could simply predict the values of for all 500,000 points. Unfortunately, as we mentioned in Section 4, this is rather slow and much slower than using machine learning. If we had used curve fitting alone, the timings would be: 1) 900 seconds to find for , with 2) 1,400 seconds to calculate the values of for for all 500,000 points, with 3) 6,500 seconds to curve fit