Machine learning and dynamo theory

Exploring helical dynamos with machine learning

[
July 8, 2019
Abstract

We use ensemble machine learning algorithms to study the evolution of magnetic fields in forced magnetohydrodynamic (MHD) turbulence that is helically forced. Using mean field formalism, we model the electromotive force (EMF) both as a linear and non-linear function of the mean magnetic field and current density. The form of the EMF is determined using regularized linear regression and random forests. We also compare various analytical models to the data using Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling. Our results demonstrate that linear regression is largely successful at predicting the EMF and the use of more sophisticated algorithms (random forests, MCMC) do not lead to significant improvement in the fits. We conclude that the data we are looking at is effectively low dimensional and essentially linear. Finally, to encourage further exploration by the community, we provide all of our simulation data and analysis scripts as open source IPython notebooks.

F. Nauman and J. Nättilä]Farrukh Nauman111E-mail: naumanf@chalmers.se, Joonas Nättilä222E-mail: joonas.nattila@su.se
Department of Space, Earth and Environment, Chalmers University, SE-41296 Gothenburg, Sweden.
Nordita, KTH Royal Institute of Technology and Stockholm University, Roslagstullsbacken 23, SE-106 91 Stockholm, Sweden

1 Introduction

Large scale magnetic fields are observed on the Earth, the Sun, galaxies. Understanding the origin and sustenance of these magnetic fields is the subject of dynamo theory. Magnetic fields observed in the galaxies are thought to have their origin in the early phases of cosmological evolution [1]. On relatively smaller scales, magnetic fields likely play a significant role in star [2] and planet formation [3]. For the Sun, a long standing question has been to understand the origin of the cylical behavior of the magnetic North and South poles [4].

A number of outstanding questions remain about dynamo theory. Classical models of dynamo theory [5, 6] assume that the background flow is stationary and the magnetic field is too weak to feedback on the flow. This reduces the problem of magnetic field generation to a closure problem: how does the fluctuating electromotive force (EMF) depend on the mean field? Previous work on dynamo theory has often modeled the EMF as a linear function of the mean field (see [7] for an extensive review). This approximation is reasonable in the growth (kinematic) phase in a flow-driven magnetohydrodynamic (MHD) system where the initial magnetic energy is orders of magnitude smaller than the kinetic energy of the forced fluid. But the limitations become apparent when the magnetic fields have gained enough energy to modify the background flow through the Lorentz force. To model this backreaction of the mean magnetic field on the flow, various proposals have been put forward including models where EMF is a non-linear function of the mean field [8, 9, 10].

Two methods have been widely used in numerical simulations of dynamos to get an explicit form for the electromotive force and both are essentially linear: the test field method [11] and linear regression using the least squares method [12]. The kinematic test field method uses a set of orthogonal ‘test’ fields that are pre-determined and can be used to compute the various coefficients in the EMF expansion corresponding to terms linear in the large scale field and its gradients. As input, the test field only requires the small scale velocity field and dynamically solves (on the fly) the test field equations to compute the test field coefficients. This has the advantage of retaining dynamical information [13, 14]. Linear regression, in contrast, is mostly used in post-processing to model the ‘inverse’ problem of determining the coefficients corresponding to different terms in the linear expansion of the EMF.

Machine learning and deep learning have revolutionized image processing, object detection, natural language processing [15]. Recently there has been a surge of interest in using machine and deep learning tools in dynamical systems and computational fluid dynamics [16, 17, 18]. Data-driven modeling is not new to fluid dynamics; in meteorology, observational data has been used for decades to guide weather predictions (see [19] for a historical overview). Considerable increase in computational power with advances in hardware (GPUs) has allowed modern optimization methods to become increasingly tractable. The end goal is to not only build predictive data-driven models but also to understand underlying complex physics that cannot be captured by simple linear regression.

The motivation to study reduced descriptions of the full 3D MHD simulations comes primarily from the extremely large, and currently numerically intractable, magnetic Reynolds number of most astrophysical systems (, [7]). Reduced-order modeling of dynamical systems is currently a highly active field of research that aims to provide description of physical systems in terms of the minimal possible degrees of freedom [20, 17]. In this paper, we attempt to build reduced-order models for MHD dynamos by applying non-linear regression tools to the widely studied dynamo [7]. In this context, machine learning methods are particularly advantageous as they value generalization. This is usually tested by separating the data into distinct training and validation sets. A high score on the training set but a poor score on validation set indicates poor generalization. This is in contrast to standard linear regression techniques employed in the dynamo literature where the entire data set is taken to be the training set with no validation set to test overfitting [12].

We use data from Direct Numerical Simulations (DNS) of forced helical turbulence to construct a reduced model where the EMF is an unknown (linear and non-linear) function of the large scale field. This allows us to quantitatively assess the standard assumption of modeling EMF as linearly proportional to the mean magnetic field and current density. We employ three classes of machine learning models: (i) Linear regression, (ii) Random forests, (iii) Bayesian inference (with Markov Chain Monte Carlo model minimization). Our selection of a well-studied systems like forced helical MHD turbulence supports our second main motivation: we can compare our results from machine learning tools against relatively well-known results in the literature. The insights gained from such a comparison could then help guide future work on more complex MHD turbulent systems such as MHD turbulence with differential rotation [21, 22].

We describe our data and numerical simulations in the next section. This is followed by a section that begins by explaining the fitting methods considered in this paper. We then apply these machine learning and statistical methods on the data. We discuss our results in section 4 putting out work in the context of previous work and some caveats. Section 5 is conclusions. All of our analysis scripts and data are freely available as interactive IPython notebooks.333https://github.com/fnauman/ML_alpha2

2 Data

We describe the equations, setup and code that was used to conduct DNS of forced turbulence in magnetized flows.

2.1 Description of DNS

We use the publicly available 6th order finite difference pencil-code[23]444http://github.com/pencil-code. The MHD equations are solved in a triply periodic cubic domain of size :

(1)
(2)
(3)

where is the viscosity, , is the forcing function, is the resistivity, and is the current density.

The forcing function is taken to be homogeneous and isotropic with explicit control over the fractional helicity determined by the parameter. It is defined as:

(4)

where the wavevectors and the phase are random at each time step. The wavevectors are chosen within a band to specify a range of forcing centered around . The normalization factor is defined such that the forcing term matches the physical dimensions of the other terms in the Navier Stokes equation, , where is the forcing amplitude, is the sound speed, is the time step. The Fourier space forcing has the form [24]:

(5)

where is a measure of the helicity of the forcing; for positive maximum helicity, , and .

Name
R5e2 500 1.68 4.97 0.0337
R1e3 1000 4.44 9.94 0.0446
R2e3 2000 10.31 19.88 0.052
R3e3 3000 16.71 30.12 0.055
R4e3 4000 22.64 39.76 0.057
R5e3 5000 28.72 49.70 0.058
R6e3 6000 34.61 59.63 0.058
R7e3 7000 40.41 69.58 0.058
R8e3 8000 46.22 79.52 0.058
R9e3 9000 50.16 89.55 0.056
R1e4* 10000 55.79 99.40 0.056
R15e4* 15000 82.12 149.1 0.055
Table 1: Summary of the simulations used in this paper. We used a resolution of and a box size of for all of our simulations. The forcing scale is used for all runs. Shock viscosity was used in starred (*) simulations.

The initial velocity and density are set to zero while the magnetic field is initialized through a vector potential with a small amplitude () Gaussian noise. We define the units [25]:

(6)

such that the magnetic field, , is in the units of Alfv́en speed (). The domain size is , which leads to . Moreover, we use the following dimensionless numbers:

(7)

where is the forcing wavemode. We provide a summary of the simulations in Table 1.

2.2 Reduced model

Numerical simulations of large scale astrophysical flows with magnetic Reynolds number well in the excess of are prohibitive. Moreover, state-of-the-art simulations requires O() CPU hours to compute for moderate Reynolds numbers. To make modeling such systems more tractable, it is therefore necessary to look for simpler surrogate models that can capture the most significant properties of the high fidelity simulations (DNS).

Figure 1: 3D visualization of the saturated vs. vs. field evolution as a function of time (purple curves) and for the time-averaged fields (red curves) for (left panel), and (right). The small simulations have a smoothly evolving circular dependencies while higher cases develop secondary oscillations around the main “torus”. This is consistent with the simulation becoming more “turbulent” in the small scales as Reynolds number is increased.

Large scale dynamo theory concerns itself with the evolution equation of the filtered induction equation,

(8)

where EMF is given as , is the turbulent resistivity, and represents a filtered quantity. The filter could be ensemble, temporal or spatial depending on the problem. The first term on the right-hand side is typically ignored unless there is a strong non-constant mean velocity field such as in the case of galaxies (the mean differentially rotating flow for the galactic dynamo is considered in [26]).

In keeping with previous work that has used horizontal averaging as the filter in equation 8 [25, 7], our data has been averaged over the whole plane:

(9)

This averaging scheme has the interesting property that since (due to incompressibility and periodic boundary conditions). We only consider the fields reduced to 1D by furthermore averaging them in two different ways:

  1. averaged: where , are chosen such that they do not cover the entire z-domain since that will correspond to volume averaged quantities that are zero.

  2. averaged: where , are chosen to correspond to either the kinematic or saturation regime for different runs.

For horizontally averaged (-plane) fields with periodic boundary conditions, due to incompressibility. This leaves two independent magnetic fields and corresponding EMF components. Analytical models that use are termed dynamos and have been extensively studied in the literature. In principle, a filter that retains independent components of the data while using will be termed dynamo model.

In Fig. 1, we show a three-dimensional phase-space visualization of vs. vs. that shows the relation between the EMF and the mean magnetic fields (situation is the same when considering too). The circle in plane represents the fact that the total magnetic energy . The slight tilt with respect to the EMF (i.e., towards direction) is a result of the linear correlation with the EMF. The increasing disorder with increasing is characteristic of highly turbulent systems.

3 Model fits

We describe the 1D linear and non-linear regression for temporal and vertical profiles in the following subsections. In the following, we will refer to the magnetic fields and EMFs as both ‘fields’ and ‘features’, ‘target’ interchangeably. For most of the following, we will focus on but similar results hold for since we are looking at isotropically forced MHD turbulence with no background field.

3.1 Description of algorithms considered in this paper

Ordinary Least Squares (OLS) assumes that the data is linear, features/terms are independent, variance for each point is roughly constant (homoscedasticity), features/terms do not interact with one another. Moreover, causal inference based on linear regression can be misleading [27], a problem shared by all curve fitting methods. We should highlight the distinction between linearity of the regression as opposed to linearity of the features. For example, one can construct a non-linear basis (example: or ) and do linear regression on it. Linear regression on non-linear data can potentially represent non-linear data well assuming that the non-linear basis terms are independent and have low variance. Constructing the correct interacting/non-linear terms requires domain expertise. For this reason, it is important to also consider non-linear regression methods such as random forests that are robust to noise and are capable of modeling interactions and non-linearities.

3.1.1 Lasso

Most real data have considerable variance that can lead to misleading fits from linear regression. Regularized linear regression aims to construct models that are more robust to outliers. Two of the most common regularization schemes for linear regression are [28]: (i) LASSO (Least Absolute Shrinkage and Selection Operator; also known simply as L1 norm), (ii) Ridge (also known as L2 norm). As opposed to ridge, LASSO has the advantage of doing feature selection by reducing coefficients of insignificant features. We will use LASSO regression in this work.

Intuitively LASSO reduces variance at the cost of introducing more bias. LASSO works better than ridge regression if the true model is indeed low dimensional (or the coefficient matrix is sparse). For dense data that requires high dimensional complex models, neither ridge nor LASSO do well. The “Bet on sparsity” principle [28] suggests trying LASSO for all datasets since if the data turn out to be dense, it is unlikely that any regularization scheme will do well. Formally, LASSO optimizes for:

(10)

where represents the target vector (in our case, ), is the feature matrix ( are the columns), is coefficient vector and is a parameter with a range between and . The subscripts ‘1’ and ‘2’ represent L1 () and L2 () norms, respectively. A higher penalizes outliers more strongly and sets coefficients of unimportant features to zero while a lower is similar to linear regression with no regularization (OLS).

3.1.2 Random forests

Random forests belong to the class of ensemble machine learning algorithms [29, 28]. A random forest consists of a large number of decision trees that each take a subset of features with a bootstrapped sample of the data [30]. Moreover, each decision tree only picks a random subset of features to make decisions. This helps in getting rid of strong correlations among different trees but some correlations might still remain. Random forests are one of the most popular and widely used machine learning algorithms because they require little fine-tuning and can handle large noisy data sets. From a computational point of view, each decision tree in the ensemble is independent and can thus be easily processed in parallel. Moreover, because of the weighted average over the ensemble, random forests are robust to strong variance in the data but the use of only a few deep decision trees can lead to bias.

Ensembles of decision trees (random forests and gradient boosting) are non-parametric models meaning they do not require information about the underlying data distribution. In other words, unlike in linear regression where the shape of the function that connects the independent variables to the dependent variable is fixed, the functional form is not known in random forests and is instead determined through training. This offers a different perspective on modeling, termed ‘algorithmic’ modeling as opposed to ‘data’ modeling [31]. The non-parametric nature of random forests and gradient boosting make them applicable to a wide class of problems since in most real-world situations, data distribution is not known a-priori. However, this comes at the cost of interpretability making ensembles of decision trees harder to understand than linear regression [32].

Feature importance and selection: As opposed to linear regression, random forests do not directly measure the coefficient of the features. A single decision tree is highly interpretable but a whole ensemble of decision trees is harder to interpret [31]. Random forests offer some insight through feature importances [28]. We use the “mean decrease impurity” method implemented in python library scikit-learn [33] that is based on [34, 35]. With this method, the feature importance is computed using an average over all trees based on the reduction of error corresponding to a particular feature weighted by the fraction of samples in that particular node. One popular alternative is to randomly permute the data for a particular feature and see whether it has an effect on the mean squared error (or information gain/gini importance for classification tasks). If the mean squared error increases or remains the same, it implies that the particular feature is not important. This is termed as ‘permutation’ importance [29, 32].

3.1.3 Bayesian inference

Bayesian inference is a powerful statistical framework that presents an alternative way to perform the model parameter estimation. The main strengths of Bayesian-based approaches are their ability to quantify the uncertainty of the model and the possibility to incorporate previous a-priori knowledge into the estimation. The downside, when compared to many conventional machine learning methods, is that the model needs to be known before the fit.

In general, let us present our model with and our empirical (simulation) data with . We are then interested in how well can our model describe the data, or quantitatively what is the probability that our model is valid given the data, . By directly comparing our model and the data, what we actually get is, however, the probability of our model given the data, known as the likelihood . Our original question of the validity of our model, can be answered by applying the Bayes’ theorem presented as (see e.g., [36])

(11)

where is our prior probability of the model and is the prior probability. Therefore, because can be computed and is known a priori, we can use the Bayes’ theorem to quantitatively asses the validity of our dynamo models given the simulation data results.

In practice, not only are we interested in the validity of the model, but also what are the parameters of the model, , that best describe the data. In Bayesian inference, these model parameters are determined using a marginal estimation where the resulting posterior parameter for model parameter are obtained by integrating over the probability of all the other model parameters, (). Then, this (one-dimensional) distribution represents the probability that the th model parameter, , will take a particular value given our data .

Here we solve the Eq. (11) (and therefore obtain also posterior distributions for our model parameters) using Monte Carlo Markov Chain (MCMC) integration techniques. To perform the MCMC fit we use the publicly available emcee library [37]. We employ the affine-invariant stretch-move ensemble sampler with typically number of walkers (members of ensemble), where is the number of fit parameters. No prior knowledge is incorporated into the fits and so we use uniform priors for all of our model parameters.

3.2 Vertical profiles

In this subsection, we apply various methods on time averaged data that only has vertical dependence ( grid points).

Figure 2: Heat map of correlation coefficients between the different fields: (left), (right). This analysis implies that the has a strong correlation with while has a strong correlation with . Moreover, mean fields () are not linearly independent of current densities ().

In Fig. 2, we show the linear (Pearson) correlation coefficients among the various fields. The correlation between the current densities and the magnetic fields is nearly indicating they are linearly dependent. Strong linear correlation does not rule out the physical importance of a variable - it only implies that from a curve fitting point of view, it has redundant information. Since we are using helical forcing leading to a helical state, . At high , one might expect that the power in the velocity and magnetic field spectrum will be distributed across multiple modes implying weaker correlations. However, Fig. 2 (right) indicates the correlation between and is still perfect. This means that we can eliminate current density as an independent variable. Nonetheless, in keeping with previous work, for linear basis we do consider the current density as an independent variable but eliminate it for the non-linear (polynomial) basis where its inclusion will lead to several redundant terms.

We consider two sets of basis in this subsection:

  1. Linear: , , , .

  2. Polynomial of : , , , , , .

We note that for kinetically forced simulations, , which means that the total energy conservation equation yields, in the saturated state. This makes and linearly dependent terms and thus we do not consider them separately. Another motivation to use is because work on dynamo quenching often considers , which when expanded for the leads to cubic terms of the form ().

We will be using the publicly available python library scikit-learn (see [38], [39] for introductions) for modeling the data.

Previous work on dynamo theory has often relied on linear regression methods [12, 40] with some effort to incorporate non-linear effects through magnetic helicity conservation ([9, 41]; see also non-linear test field method [42]). In this section, we use linear regression with a regularization term for both the linear and polynomial basis discussed previously.

Figure 3: The fits for linear basis (left) and polynomial basis (right) for random forests and LASSO. Note that the horizontal lines seen in the fits are there because of decision tree regression: decision trees set one value of the -axis for a given range of -axis leading to a step-function like fit. In both cases, LASSO seems to do a better job at capturing the curved shape of the vertical profiles. The black vertical dashed line presents the train-test split ( training data).

In Fig. 3, we show the fits using random forests regression and LASSO. Because our data is small (only points), we use two-fold cross validation to determine various hyperparameters of LASSO and random forests. We found that bootstrapping leads to spurious results so we turned it off. A similar problem was found with random train/test splits: the spatial structure of the fields is like one full sinusoidal wave and randomly picking a fraction of the points leads to strange patterns. Moreover, while deep trees can lead to overfitting, in our case we found that a maximum depth of was optimal through 2-fold cross validation. The end results are still not as good as LASSO perhaps owing to the fact that the training set is only entries ( of the data) - this is too low for an algorithm like random forest to really excel. That is the key lesson we have learnt while modeling this dataset throughout: “small” datasets mean simpler algorithms will outperform more sophisticated algorithms like random forests.

Feature Ranking

Feature OLS Lasso Stability RanFor MIC Corr mean
0.0076 0.0048 0.0 0.0049 0.2 0.6 0.0000 0.1019 0.1149
0.6345 0.2577 0.0 1.0000 0.8 1.0 1.0000 0.9972 0.8362
0.0871 0.0240 0.0 0.0000 0.6 0.0 0.3979 0.1210 0.1538
1.0000 1.0000 1.0 0.4404 1.0 0.8 1.0000 0.9982 0.7793
0.0818 0.0000 0.0 0.0000 0.4 0.4 0.4048 0.1179 0.1756
0.0000 0.0145 0.0 0.0001 0.0 0.2 0.2089 0.1366 0.0700
Table 2: Feature importances using various machine learning models and statistics. All the numbers are normalized such that they are positive and lie between and for the sake of comparison. Stability column refers to randomized LASSO where the LASSO shrinkage coefficient is randomly varied for different features and the feature that is most robust to this variation survives. RFE stands for Recursive Feature Elimination where a model is trained with all features and the top ranking feature is given the most significance, while the bottom ranking one gets the smallest score. We applied this to both OLS and random forests, and reverse sorted the entries to give the highest score to top ranking feature. MIC stands for Maximal Information Coefficient and computes a normalized measure of the mutual information between two variables scaled between and . It gives a quantitative measure of the question: how much information about some variable Y can be obtained through some variable X? MIC is capable of capturing non-linear relationships that Pearson correlation (Corr) cannot. In the last column we take the mean over all models that represents the synthesis of several models (linear, non-linear) to show that and are the two strongest features.

In Table 2, we compare the coefficients from OLS, LASSO, randomized LASSO (stability), Recursive Feature Elimination (RFE) from OLS and from random forests, Mutual Information Criterion (MIC), Pearson correlation (Corr) and the average of all these predictions in the last column. The meaning of these numbers is different. For OLS and LASSO, the numbers corresponding to the fields represent coefficients from linear regression and can be negative or positive. For random forests, the numbers represent the relative feature importance based on ‘mean decrease impurity’ that is only positive. These feature importance numbers are scaled to sum to unity. Stability selection is a general term that refers to checking robustness of model predictions by testing it on different random (bootstrapped) subsets of data with different hyperparameter values. In our particular application, we are using randomized LASSO that randomly varies the the LASSO coefficient [43] 555influenced by: https://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/.

RFE is a model-agnostic (wrapper) feature selection method that starts with a number of features and removes the most important features recursively. This is an example of ‘backward’ feature selection [44, 28] and we used it with OLS and random forests in this case. Maximal Information Coefficient (MIC) is a measure of mutual information between two (random) variables and is capable of accounting for non-linear relationships that the Pearson correlation coefficient cannot (example: correlation between and will be nearly zero, but MIC will yield an answer close to one). We point out that we scaled the whole set of features using standard scaling before calculating coefficients/feature importances. Moreover, because LASSO, OLS, Corr returned negative as well as positive values and RFE returned values ranging from to , we used minmax scaling to re-scale all values between .

Ensembles of machine learning models can lead to improved accuracy [45]. Here we take the simplest possible approach: we take the unweighted average of various predictors to compute the overall score for each feature. While stability selection shows that is the dominant feature, the mean of all predictors gives a slightly higher score. This could be because both and have the same sinusoidal form.

3.3 Time series analysis: Vertically averaged data

In contrast to the previous section, we square and then average the fields vertically and study the temporal dependence. The magnetic fields and the EMF show exponential growth in time and thus it was important to do some pre-processing where we took the logarithm of squared averaged fields. Because of this particular pre-processing, we do not consider polynomial expansions like we did before and instead focus on either the linear form, or the quenched form .

3.3.1 Preprocessing

Figure 4: EMF evolution for . Left: component is shown without any pre-processing (). Right: Here we show , that is the squared EMF averaged over . Moreover, we use a moving average with window size for the orange line to smooth out the fluctuations.

For the vertically averaged data set, we find that the fluctuations in the EMF are quite significant (see Fig. 4). Such large fluctuations make any kind of statistical analysis quite prohibitive. In this section, we therefore decided to use the log of averaged over the direction. Moreover, to reduce the fluctuations we used a moving window of size (right panel of Fig. 4). This allows for cleaner fits.

3.3.2 LASSO and random forest

Figure 5: The fits for time series using LASSO and random forests for . Just like the fits for spatial data in Figs. 3, we find that LASSO does considerably better than random forests, the latter of which again has a step like shape. In the left panel, we show the time series only up to the kinematic phase. In the right panel, where the time series has started to approach saturation, LASSO seems to capture the shape of the curve but is offset. In both cases, random forest prediction returns a nearly horizontal line, characteristic of decision tree regression. The black vertical dashed line presents the train-test split ( training data).

In Fig. 5, we show fits for the time series using LASSO and random forests. Time series data is particularly difficult to fit in the presence of fluctuations and trend as in the figure of . In the kinematic phase (left panel), the random forest fit is not considerably different from LASSO fit whereas the random forest fit does poorly when the same time series is extended up to the saturation phase (right panel). This is a relatively common issue with random forests that they do not perform well when there is a “trend” present in the series. Just like the spatial profiles in last subsection, the LASSO outperforms random forests.

3.3.3 Bayesian inference

Figure 6: MCMC fits for the reconstructed fields. Three panel shows various Monte Carlo realizations of the fits (blue line) against the true curve (red line) for , , and .

In this section we focus on reconstructing square of the “true” EMF field as a function of time . As one of the simplest forms possible, we construct the reconstructed EMF, , from the corresponding magnetic field components as

(12)
(13)

where and are the reconstructed and components of the EMF field. Here and are our model parameters to optimize. This EMF form has the nice property that in the limit it reduces to the linear model.

After this, we are left with the freedom to define our likelihood or score-function that describes the quality of the reconstruction. Here we define it via a simple residual function

(14)

that penalizes the fit from over- and underestimating the reconstructed EMF. Note that other forms of the residual functions are also possible. For discrete simulation data results (defined on an array) the likelihood then simplifies to , where the summation index is taken to be over the time dimension or the direction, depending on the type of fit we are considering. Finally, when employing the MCMC fit note that we are actually minimizing , as is typically done, to get a more shallow and slowly varying score-function.

Examples of the fit results are shown in Fig. 6 for three different values of , , and depicting the maximum a posteriori model. The (quenched) EMF model does reproduce the qualitative behavior of the saturating field but especially for higher the fits become more worse. Most importantly, we have found that constraints on the saturation field strength are always very loose. Results without using the quenching form also lead to similar fits. This gives an indirect support for the previous machine learning results: The data strongly favors isotropic with a simplified model of .

3.4 Results summary

For both the vertical and temporal profiles, various models suggest that is a good fit. But if , this will lead to exponential growth without any saturation. Is there an inconsistency here? In the induction equation, the fields at time ‘’ are related to the fields at earlier times. The machine learning models we use here are not capable of storing long term memory. The fits shown in this paper are comparing and in some nearest neighbor sense and because both and are growing exponentially in time, they appear to be linearly correlated. Capturing the dynamical information requires more sophisticated algorithms that we leave for future work.

4 Discussion

4.1 Caveats

Property OLS LASSO RF Bayesian
Non-linear No No Yes Yes
Interpretable Yes Yes Yes666Feature importances and partial dependence plots can help interpret random forests Yes
Suitable for all sizes of data? Yes Yes Intermediate to big data Small to intermediate data
Interactions No No777Including cross terms like can help model interactions but it still assumes each feature is linearly independent Yes Yes
Prediction of trends No No No Yes
Table 3: Summary of the advantages and disadvantage of different models considered in this work.

We summarize the properties of the different machine learning algorithms considered in this work in Table 3. Linear regression (both OLS and LASSO) are by definition capable of dealing with linear data only. However, they can handle some non-linear properties using a non-linear basis (for example, polynomial regression). The biggest advantage of linear models is the interpretability: the sensitivity of the target variable (EMF) with respect to the features () is as easy as determining the coefficients of these terms. Linear models are also good at dealing with data of all sizes but matrix inverses are expensive for large data. Linear models might also be able to model trends in data if the features they consider have the same trend as the target variable.

Random forests are based on ensembles of decision trees. While random forests are quite robust to outliers and missing data, the feature importances computed from random forests can be misleading [32]. Moreoever, random forests are not as interpretable as linear regression or single decision trees. A typical random forest fit can involve trees implying that interpretation is difficult. But computation of feature importances and partial dependence plots help in making sense of random forest results [31, 28]. Random forests typically bootstrap data and use a random subset of features for each decision tree to determine which one leads to the lowest error. These two sources of randomness imply that random forests are best suited for intermediate to big data. Indeed, in our work we find that bootstrap had to be turned off and the best fit models used less than decision trees.

Bayesian methods offer an interesting complementary toolkit to various machine learning techniques. They also enable us to properly take the noisy nature of DNSs into account. Here the big caveat is, that the model(s) needs to be known beforehand. This, however, further arguments in favour of our hybrid approach applied in this paper. We have used machine learning techniques to detect important features and then applied our physical knowledge to build valid models out from those. Biggest caveat here is that Bayesian model optimization becomes more and more computationally demanding with multidimensional (and often multimodal) data. This puts a limitation on the real-life usability of the method when dealing with increasingly complex data.

The data considered in this paper is either spatially or temporally averaged reducing to entries. Big data typically refers to the case where the observations/rows are . “Small” data faces problems of overfitting and is more likely to be sensitive to outliers. Because of these issues, simpler models like linear regression with regularization tend to do well. It is, therefore, no surprise that LASSO has outperformed random forests. For larger datasets with irregular non-linear patterns, random forests and gradient boosting tend to do much better [46].

4.2 Comparison with previous work

We used the simplest model: isotropic with no non-local effects in either space and/or time. While earlier work has focused primarily on linear models of dynamos where using linear regression and the test field method [7], non-linear models have also been considered [42]. We assumed isotropy, which could be a misleading assumption in the presence of shear (see [12, 47]). Furthermore, in our model, we did not incorporate non-local effects that were considered by [12, 13, 14]. Combining machine learning algorithms with anisotropic, local, non-linear models might lead to new insights, and we leave this for future work.

The dynamo considered in this work is an idealized system but has the advantage that due to maximally helical forcing and periodic boundary conditions, magnetic helicity is both conserved and is gauge invariant (see [41]). This simplistic setup allows developing analytical theory to explain the origin and saturation of magnetic fields [9, 48]. Numerical results seem to be consistent with theory [25, 49]. In this work, we considered existing analytical formulations and used machine learning tools to test whether the EMF is linear or non-linear in the mean magnetic fields. Machine learning models can complement analytical theory in looking for reduced descriptions of high dimensional systems [18].

5 Conclusions

Our main result can be summarized as follows: because we are in the “small” data regime and we are dealing with ordered data (high signal to noise ratio due to helically forced turbulence), regularized linear regression (LASSO) provides the best fit. For small organized data that we have considered in this work, random train/test splits, bootstrapping and other sources of randomness do not help. For this particular dataset, many of the features are strongly correlated with one another complicating the model fits even further. Strong correlations also imply that feature engineering to construct polynomial basis is not particularly useful.

What we intended to demonstrate using the example of is that sophisticated machine learning algorithms can be used to analyze data from DNS of MHD turbulence thanks to the publicly available machine learning libraries. This provides an exciting opportunity to re-visit some long-standing problems in astrophysical flows where linear regression has dominated modeling efforts. For example, one can look at the full data cube without planar averaging and study whether 3D localized structures play a dominant role in the sustenance of the turbulence. Here, deep learning [50] will be a useful alternative as it is known to outperform classical machine learning methods in image processing, voice recognition, natural language processing. One particular problem of interest is shear-driven dynamos [51, 52].

acknowledgments

We thank A. Brandenburg for several stimulating discussions and comments on an earlier draft of the paper. The work has been performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action under the H2020 Programme; in particular, the author gratefully acknowledges the support of Dhrubaditya Mitra (NORDITA, Stockholm) and the computer resources and technical support provided by PDC, Stockholm. We used the following python packages: numpy [53], jupyter notebook [54], matplotlib [55], pandas [56], scikit-learn [33], emcee [37], minepy [57].

References

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
365547
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description