# Turbulence Model Development using CFD-Driven Machine Learning

###### Abstract

This paper presents a novel CFD-driven machine learning framework to develop Reynolds-averaged Navier-Stokes (RANS) models. For the CFD-driven training, the gene expression programming (GEP) method (Weatheritt & Sandberg, J. Comput. Phys., 325, 22-37 (2016)) uses RANS calculations in an integrated way to evaluate the fitness of candidate models. The resulting model, which is the one providing the most accurate CFD results at the end of the training process, is thus expected to show good performance in RANS calculations. To demonstrate the potential of this new approach, the CFD-driven machine learning is applied to develop a model for improved prediction of wake mixing in turbomachines. A new model is trained based on a high-pressure turbine training case with particular physical features. The developed model is shown to have a more compact functional form than models trained without CFD assistance. Furthermore, the trained model has been evaluated a posteriori for the training case and three additional test cases with different physical flow features, and the predicted wake mixing profiles are significantly improved in all cases. With the present framework, the model equation is explicitly given and available for analysis, thus it could be deduced that the enhanced wake prediction is due to the extra diffusion introduced by the CFD-driven model.

Key words: Turbulence Modelling, Wakes

## 1 Introduction

Reynolds-averaged Navier-Stokes (RANS) remains the primary tool in most industrial applications due to its low computational cost compared to high-fidelity (Hi-Fi) simulations such as direct numerical simulation (DNS) and large-eddy simulation (LES). However, RANS suffers from low-fidelity in prediction especially for complex geometries and flow fields (Hunt & Savill 2005), which is mainly due to the limiting assumptions used to model the Reynolds stress tensor . Therefore, there has been continuous interest in both academia and industry to improve the predictive accuracy of RANS models routinely applied to aerodynamic and aerothermal design verifications, especially in the aerospace field.

Recently, machine learning has gained popularity in turbulence modelling (Duraisamy et al. 2019). Various machine-learning techniques have been applied to modify traditional RANS models to enhance the flow prediction, including modifying model parameters with Bayesian methods (Edeling et al. 2014), introducing a correction factor for the turbulence production term using neural networks (Zhang & Duraisamy 2015), and adding a spatially distributed correction field via field inversion and Gaussian Process (Parish & Duraisamy 2016), etc. However, the various types of corrections trained for traditional RANS models are usually not physically interpretable.

Other than simply adding corrections to existing model parameters, more comprehensive efforts have been made to construct new Reynolds stress closures via physics-informed machine learning. In particular, many studies have focused on improving the Boussinesq hypothesis

(1.0) |

which assumes a linear relation between the anisotropy of the Reynolds stress and the deviatoric strain-rate tensor . Such linear relation only holds valid in limited portions of the flow field, while it is questionable in complex flow topologies. Here, denotes density, denotes turbulence kinetic energy, and represents turbulence viscosity. A random forest method was used in Wang et al. (2017) to train Reynolds stress discrepancy functions based on the tensor eigenvalues. The trained discrepancy functions were applied in test cases to show improved prediction of , but the performance of mean quantities needs to be further validated. With basis tensors and scalar invariants decomposed from , Ling et al. (2016) used a deep neural network to train the Reynolds stress tensor conserving Galilean invariance. Nevertheless, the resulting neural network from the training is not straightforward to be implemented in RANS solvers.

In a series of studies, the gene-expression programming (GEP) method has been introduced (Weatheritt & Sandberg 2016) to develop explicit algebraic (Reynolds) stress models (EASM) based on the stress tensor decomposition proposed by Pope (1975). EASM-like turbulence closures are represented in the form

(1.0) |

which can be taken as a solution to the shortcomings of the Boussinesq hypothesis. This implies the non-dimensionalized anisotropic stress tensor is represented by a combination of tensor bases and scalar invariants . Models following equation (1) can be explicitly provided through symbolic regression by GEP, which is used to determine the coefficients based on training data. Applied in RANS solvers, these GEP-trained models have shown improved predictive accuracy in several a posteriori tests such as rectangular ducts (Weatheritt et al. 2017) and turbomachinery flows (Weatheritt & Sandberg 2017; Akolekar et al. 2018).

Although machine learning for turbulence model development is becoming a growing trend, obstacles still exist in implementing the trained models to engineering applications. One of the major concerns is that most of the training methods aim at accurately modelling the Reynolds stress tensor from Hi-Fi data, while the inherent inconsistency between RANS modelling and high-fidelity data has been neglected (Duraisamy et al. 2019). As shown in previous studies (Thompson et al. 2016; Poroseva et al. 2016), RANS prediction can remain unsatisfactory even if Reynolds stresses from DNS are used in the CFD calculation. Such difficulty was also documented by Parneix et al. (1996), who tuned a full Reynolds stress model with the aid of the DNS of a backward facing step, but the application of the tuned model to RANS did not confirm the expected improvements. Therefore, it is questionable whether existing model development methods targeting only to accurately model from Hi-Fi data can provide accurate mean flow fields. Rather than training models exclusively based on high-fidelity data, model development strategies more adaptive to RANS calculations are needed to provide models applicable for pragmatic CFD.

In order to train models adaptive to RANS environments, a novel model development framework named CFD-driven training is introduced in the present work. By integrating RANS calculations in the GEP model training method introduced by Weatheritt & Sandberg (2016), the objective is to develop Reynolds stress closures of the form given in equation (1), which are i) straightforward to be implemented in RANS solvers; ii) robust in a sense of providing stable solutions in RANS calculations; and iii) showing improved predictive accuracy in different a posteriori tests.

The present paper is structured as follows. The CFD-driven training frameworks along with the basics for symbolic regression via the GEP method are introduced in § 2. A brief description of the case setups are also listed in § 2. Then the CFD-driven training is applied to turbine wake mixing cases, and the developed models are tested across different cases in § 3. Finally, conclusions are provided in § 4.

## 2 Methodology and Case Setup

### 2.1 GEP for model development

GEP, first introduced for turbulence model development by Weatheritt & Sandberg (2016), is applied in the present study to train an EASM-like Reynolds stress relation by adding an extra anisotropic stress as

(2.0) |

The is represented as a linear combination of tensor bases and scalar invariants as in equation (1). To focus on statistically two-dimensional flows which are representative of the mid-span section of turbine blades, the invariants and basis tensors considered in the present work are

(2.0) | ||||

Here, and are functions of the non-dimensionalized strain rate tensor and the rotation rate tensor , where is the turbulence time scale.

The GEP model training starts from a randomly generated population consisting of candidate EASMs . The population of models evolves in the GEP iterations by mutations and combinations until the best model in the generation provides an acceptable approximation to pre-set target functions. Through this non-deterministic evolution by GEP, symbolic regression is achieved to find an optimal model that best fits the training data (Ferreira 2001).

While more details about the GEP method are given in Weatheritt & Sandberg (2016), we want to stress here that the model development characteristics are mainly determined by the selection process. The fitness of every generated model is evaluated by its cost function , which describes the deviation between the model and training data. The general principal for the definition of the cost function is that candidate models that better fit training data should have lower cost values, which result in better chances to be selected for the next generation. We note that the specific form of the cost function usually depends on the training framework and the cases, which will be discussed in detail in § 2.2 and § 3.1.

### 2.2 Training frameworks

The training framework, in particular how to evaluate candidate models during evolution, is critical to the quality of the final model. Previous GEP studies (Weatheritt & Sandberg 2016, 2017; Akolekar et al. 2018), along with other existing machine-learning driven model development work (e.g. Ling et al. 2016; Parish & Duraisamy 2016), typically evaluate the models depending on time-averaged Hi-Fi data. This is denoted as ‘frozen’ training in the present study, with the ‘frozen’ here meaning the pre-processed Hi-Fi data remains unchanged in the training. It is noted that the inherent inconsistency between Hi-Fi data and RANS environment, such as discrepancies of the turbulent dissipation rate providing turbulence scales for non-dimensionalization, limits the applicability of the models trained with frozen training (Duraisamy et al. 2019). Therefore, a novel framework that includes CFD calculations in model training iterations is proposed in the present study. The CFD-driven training, aiming to train models well adapted to the RANS mean velocity and turbulence scales, is introduced following a brief description of the frozen training in this section.

#### 2.2.1 Frozen training

For frozen training, candidate models generated by GEP are evaluated based on Hi-Fi data. With the tensor bases and scalar invariants processed from a Hi-Fi flow field, the can be calculated using the candidate model equations from GEP. On the other hand, the directly obtained from Hi-Fi data is considered as ‘accurate’ Reynolds stress anisotropy. The cost functions are defined by the deviation between and . An example used in the present study is

(2.0) |

where denotes the number of training data points. We can infer from equation (2.2.1) that candidate models which better fit Hi-Fi data will have a lower cost. The cost functions are then used to direct the selection and reproduction of the models. Through a series of iterations, the final model with minimized cost functions will be obtained.

We can see that the models are trained exclusively based on Hi-Fi data. The frozen-trained model generally provide accurate predictions as long as Hi-Fi basis tensors and scalar invariants are available as input. However, once the model is implemented into RANS, good performance is not guaranteed. In fact RANS will predict a flow field that may deviate from the Hi-Fi flow field. The impact of such variations on the GEP trained model is non-linear and difficult to forecast.

#### 2.2.2 CFD-driven training

As shown by the schematic for the CFD-driven training in figure 1, the candidate models are evaluated by running RANS calculations that will provide feedback in the GEP training loop. For every generation of candidate models, the model equations are dynamically read into the RANS solver, and then a series of real-time CFD calculations are performed to test different models in parallel. The cost functions are quantified based on the RANS mean flow fields. If required, a new generation of models is formed through GEP evolution. Then the updated models will be evaluated again through the in-loop CFD calculations.

We remark that the CFD-driven training is executable with the GEP method because the model equations are explicitly given and can be instantly implemented into RANS solvers in iterations. With RANS calculations performed in the training loop, the CFD-driven models thus are ready to be implemented into industrial design tools. Another advantage of CFD-driven training is that the definition of cost function is more flexible compared to frozen training. Rather than being restricted to quantities that are part of the closure terms in frozen training, the CFD-driven cost functions can be defined based on any important flow feature of interest to designers, and an example for enhanced wake mixing predictions will be shown in § 3.1.

## 3 Numerical Results

### 3.1 Case setup for turbine wake mixing

As key components of turbomachines, high-pressure turbine (HPT) and low-pressure turbine (LPT) flows are challenging for RANS calculations due to their extremely complex flow features. Four different turbine cases, including the VKI LS89 HPT (Arts et al. 1990) and the T106A LPT (Stadtmüller & Fottner 2001) at different Reynolds numbers, are selected to test the CFD-driven training framework. The geometries of the HPT and LPT blades are shown in figure 2, with periodic boundary conditions implemented in the pitchwise direction denoted by , and inflow/outflow conditions applied at the boundaries in the axial direction by . The blade geometries and the inflow angles indicated by the black dashed arrow in figure 2 show the considerable differences between the cases. Furthermore, as listed in table 1, the flows in these cases are at different Reynolds and Mach numbers, and the flow features of the boundary layers on the blade surface also deviate significantly. We use the HPT case, denoted as case A, as the training case, while the other three cases are used as testing cases for cross-validation, to assess whether the model trained in case A is suitable for a broad parameter space.

Flow features | Purposes | |||
---|---|---|---|---|

HPT A | 0.9 | transition & shocks | training | |

HPT B | 0.9 | transition & shocks | testing | |

LPT C | 0.4 | transition & open separation | testing | |

LPT D | 0.4 | transition & closed separation | testing |

The Hi-Fi data for these cases is provided from previous DNS and high-resolution LES using code HiPSTAR (Sandberg et al. 2015; Michelassi et al. 2015; Pichler et al. 2017), and RANS calculations are conducted with the RANS solver TRAF (Arnone & Pacciani 1995). In the RANS calculations required for model development and testing, the SST with transitional correlation (Langtry & Menter 2009) is applied as the baseline model. A thorough grid independence study has also been conducted for the RANS calculation to ensure results are grid-independent.

As traditional RANS models applied in turbine flows share a common weakness in correctly capturing wake mixing (Pichler et al. 2016), the wake profile prediction, which is of critical interest for aerodynamic performance and characteristics of downstream blade boundary layers, is set as the optimisation target. Accordingly, the GEP fitness function for the CFD-driven training is defined as

(3.0) | ||||

Here, represents the kinetic loss profile along the pitchwise axis as represented by the red dashed lines downstream of the blade in figure 2, with , , and denoting inlet total pressure, outlet total pressure, and outlet static pressure, respectively. Moreover, stands for normalised deviation between wake loss profiles from Hi-Fi and RANS results. In order to train models for the whole wake region rather than just a profile at one location, the final fitness function is defined as a summation of at two axial locations and , as these two axial locations represent the range of most probable downstream blade leading edge position.

To focus on improving model accuracy in wake mixing, training data is extracted from the turbine wake. As indicated by the turbulence kinetic energy (TKE) contour in figure 2(b), the wake region is masked by the criterion . Moreover, another constraint is implemented to avoid the boundary layer and near trailing edge interference (Weatheritt & Sandberg 2017; Akolekar et al. 2018). Note that the trained Reynolds stress closure is only implemented in the masked wake region for both model development and testing.

### 3.2 Model development

The CFD-driven training framework is applied in the HPT case A to develop a new Reynolds stress closure for turbine wake mixing. Due to the RANS calculations required in the training loop, the CFD-driven training in the present study typically requires CPU hours which is more computationally expensive than the frozen training. Nevertheless, this is still acceptable as it is only required once, and the trained models can then be used for subsequent RANS calculations with no additional cost. The resulting model is presented as . In addition, a frozen-trained model from previous work in Weatheritt et al. (2017) is shown for comparison

(3.0) | ||||

It is noted that the CFD-driven model exhibits a much simpler form compared to the frozen model, as the coefficient functions contain less high-order invariants. This is because that the higher-order terms in the frozen model, like , usually result in very high-amplitude values on some points in RANS calculations, reducing numerical stability. Therefore, such terms fail to survive in the CFD-driven evolution, and the training via running RANS calculations in the loop tends to produce a more robust model. As non-linear turbulence model closures are known to come with stability issues, the capability of the CFD driven training to filter out numerically stiff models is very important.

### 3.3 Prediction results

The GEP models in equation (3.2) are first tested a posteriori for the training HPT case A. The trained model equations are implemented into the RANS solver, and the wake mixing profiles from RANS calculations with these models are shown in figure 3(a). In addition, profiles from Hi-Fi data and baseline models without any additional stress tensors are presented for comparison. We can see that the baseline model relying on a linear stress-strain relationship significantly over-predicts the wake peak and under-predicts the wake width. While the frozen model reduces the wake peak, the CFD-driven model provides an even better agreement with the reference Hi-Fi data, in terms of both the peak value and the profile width.

The wake prediction improvement from the trained models can be analysed by considering the leading-order terms in equation (3.2) because the scalar invariants and are less important compared to the constants (Akolekar et al. 2018). With , we have

(3.0) | ||||

Compared to the baseline model, the frozen model increases diffusion by roughly , while the increase of the CFD-driven training is even larger at . With the additional diffusion and extra anisotropic terms from and , the CFD-driven model allows for a stronger mixing and thus a better prediction of the wake profile spreading as seen in figure 3(a).

It is noted that improving predictions for mean flow quantities like kinetic wake loss is of critical interest for turbine designers, which is inherently guaranteed by the CFD-driven training as the fittest models are the ones with the smallest cost values. Improvement in Reynolds stress prediction, however, is also important for capturing the correct physics. As shown in figure 3(b), the CFD-driven model enhances the prediction of the shear stress profile, especially in terms of the suction-side peak value. Nevertheless, discrepancies exist on the pressure-side, which is presumably because steady RANS has limitations in modelling the coherent vortex shedding under the strong pressure-gradient near the blade trailing edge (Wheeler et al. 2016).

### 3.4 Cross-validation

The CFD-driven model in equation (3.2) is now applied to the other cases listed in table 1 to assess its applicability and generality in wake mixing flows with diverse flow parameters, geometries and key flow features. The kinetic wake loss profiles from this cross-validation are presented in figure 4, with the Hi-Fi data and baseline RANS available for comparison. As expected, the baseline RANS calculations based on the Boussinesq hypothesis result in over-prediction of the wake peak and under-prediction of the wake width across the different cases. In contrast, the CFD-driven model presents improved accuracy of the wake prediction in all test cases due to the additional diffusion and anisotropy added in the Reynolds stress. Considering the test cases cover a reasonably wide parameter space, this cross-validation underscores the robustness of the CFD-driven training.

## 4 Conclusions

A novel machine learning framework named CFD-driven training is introduced based on the GEP method to develop turbulence models. As the candidate EASM-like models are explicitly given via symbolic regression in GEP, the model equations can be implemented into a RANS solver, and the cost functions of the models are evaluated by running CFD in the training loop. By integrating the RANS calculations, the CFD-driven training is able to develop models adapted to the RANS environment and straightforward to be implemented into existing RANS solvers.

The CFD-driven framework was applied to one HPT case to train an EASM-like model for wake mixing in turbomachines. Compared to the traditional model training, the CFD-driven training produces a model that is relatively simple. Furthermore, the generated model is tested a posteriori for the training case and three other cases covering different flow configurations. Due to the extra diffusion introduced by the CFD-driven model, the results show overall good agreement with the kinetic loss profiles obtained from Hi-Fi data in all cases.

While the model trained in the present study is limited to wake mixing, the CFD-driven machine learning is shown to be a promising framework for general turbulence model development. Nevertheless, introducing RANS calculations in the model development iterations might increase the computational cost of training. Therefore, an efficient CFD solver needs to be integrated in the training loop.

This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. This work was also supported by the resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia.

- Akolekar et al. (2018) Akolekar, H. D., Weatheritt, J., Hutchins, N., Sandberg, R. D., Laskowski, G. & Michelassi, V. 2018 Development and use of machine-learnt algebraic Reynolds stress models for enhanced prediction of wake mixing in LPTs. J. Turbomach. (accepted manuscript).
- Arnone & Pacciani (1995) Arnone, A. & Pacciani, R.and Sestini, A. 1995 Multigrid computations of unsteady rotor-stator interaction using the Navier-Stokes equations. J. Fluids Eng. 117 (4), 647–652.
- Arts et al. (1990) Arts, T., Lambertderouvroit, M. & Rutherford, A. W. 1990 Aero-thermal investigation of a highly loaded transonic linear turbine guide vane cascade. A test case for inviscid and viscous flow computations. NASA STI/Recon Technical Report N 91.
- Duraisamy et al. (2019) Duraisamy, K., Iaccarino, G. & Xiao, H. 2019 Turbulence modeling in the age of data. Annu. Rev. Fluid Mech. 51, 357–377.
- Edeling et al. (2014) Edeling, W. N., Cinnella, P., Dwight, R. P. & Bijl, H. 2014 Bayesian estimates of parameter variability in the k– turbulence model. J. Comput. Phys. 258, 73–94.
- Ferreira (2001) Ferreira, C. 2001 Algorithm for solving gene expression programming: a new adaptive problems. Complex Syst. 13 (2), 87–129.
- Hunt & Savill (2005) Hunt, J. C. R. & Savill, A. M. 2005 Guidelines and criteria for the use of turbulence models in complex flows. In Prediction of Turbulent Flows, pp. 291–343. Cambridge University Press.
- Langtry & Menter (2009) Langtry, R. B. & Menter, F. R. 2009 Correlation-based transition modeling for unstructured parallelized computational fluid dynamics codes. AIAA J. 47 (12), 2894–2906.
- Ling et al. (2016) Ling, J., Kurzawski, A. & Templeton, J. 2016 Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech. 807, 155–166.
- Michelassi et al. (2015) Michelassi, V., Chen, L., Pichler, R. & Sandberg, R. D. 2015 Compressible direct numerical simulation of low-pressure turbines–Part II: Effect of inflow disturbances. J. Turbomach. 137 (7), 071005.
- Parish & Duraisamy (2016) Parish, E. J. & Duraisamy, K. 2016 A paradigm for data-driven predictive modeling using field inversion and machine learning. J. Comput. Phys. 305, 758–774.
- Parneix et al. (1996) Parneix, S., Laurence, D. & Durbin, P. 1996 Second moment closure analysis of the backstep flow database. In Proceedings of the Summer Program, pp. 47–62. Center for Turbulence Research, Stanford University.
- Pichler et al. (2017) Pichler, R., Sandberg, R. D., Laskowski, G. & Michelassi, V. 2017 High-fidelity simulations of a linear HPT vane cascade subject to varying inlet turbulence. ASME Paper GT2017-63079 .
- Pichler et al. (2016) Pichler, R., Sandberg, R. D., Michelassi, V. & Bhaskaran, R. 2016 Investigation of the accuracy of RANS models to predict the flow through a low-pressure turbine. J. Turbomach. 138 (12), 121009.
- Pope (1975) Pope, S. B. 1975 A more general effective-viscosity hypothesis. J. Fluid Mech. 72 (2), 331–340.
- Poroseva et al. (2016) Poroseva, S. V., Colmenares, J. D. & Murman, S. M. 2016 On the accuracy of RANS simulations with DNS data. Phys. Fluids 28 (11), 115102.
- Sandberg et al. (2015) Sandberg, R. D., Michelassi, V., Pichler, R., Chen, L. & Johnstone, R. 2015 Compressible direct numerical simulation of low-pressure turbinesâPart I: Methodology. Journal of Turbomachinery 137 (5), 051011.
- Stadtmüller & Fottner (2001) Stadtmüller, P. & Fottner, L. 2001 A test case for the numerical investigation of wake passing effects on a highly loaded LP turbine cascade blade. ASME Paper 2001-GT-0311 .
- Thompson et al. (2016) Thompson, R. L., Sampaio, L. E. B., de Bragança Alves, F. A. V., Thais, L. & Mompean, G. 2016 A methodology to evaluate statistical errors in DNS data of plane channel flows. Comput. Fluids 130, 1–7.
- Wang et al. (2017) Wang, J.-X., Wu, J.-L. & Xiao, H. 2017 Physics-informed machine learning approach for reconstructing reynolds stress modeling discrepancies based on DNS data. Phys. Rev. Fluids 2 (3), 034603.
- Weatheritt et al. (2017) Weatheritt, J., Pichler, R., Sandberg, R. D., Laskowski, G. & Michelassi, V. 2017 Machine learning for turbulence model development using a high-fidelity hpt cascade simulation. ASME Paper GT2017-63497 .
- Weatheritt & Sandberg (2016) Weatheritt, J. & Sandberg, R. D. 2016 A novel evolutionary algorithm applied to algebraic modifications of the rans stress–strain relationship. J. Comput. Phys. 325, 22–37.
- Weatheritt & Sandberg (2017) Weatheritt, J. & Sandberg, R. D. 2017 The development of algebraic stress models using a novel evolutionary algorithm. Int. J. Heat Fluid Fl. 68, 298–318.
- Wheeler et al. (2016) Wheeler, A. P. S., Sandberg, R. D.and Sandham, N. D., Pichler, R. & Michelassi, V. 2016 Direct Numerical Simulations of a High-Pressure Turbine Vane. J. Turbomach. 138 (7), 071003.
- Zhang & Duraisamy (2015) Zhang, Z. J. & Duraisamy, K. 2015 Machine learning methods for data-driven turbulence modeling. In AIAA Computational Fluid Dynamics Conf., pp. 2015–2460.