Characterization of Model-Based Uncertainties in Incompressible Turbulent Flows by Machine Learning

Characterization of Model-Based Uncertainties in Incompressible Turbulent Flows by Machine Learning

Mustafa Usta
\tensf Department of Mechanical Engineering
Lehigh University
Bethlehem, PA 18105
Address all correspondence to this author.
   Ali Tosyali
\tensf Department of Industrial and Systems Engineering
Rutgers, The State University of New Jersey
Piscataway, NJ 08854

This work determines the inaccuracy of using Reynolds averaged Navier Stokes (RANS) turbulence models in transition to turbulent flow regimes by predicting the model-based discrepancies between RANS and large eddy simulation (LES) models and then incorporates the capabilities of machine learning algorithms to characterize the discrepancies which are defined as a function of mean flow properties of RANS simulations. First, three-dimensional CFD simulations using k-omega Shear Stress Transport (SST) and dynamic one-equation subgrid-scale models are conducted in a wall-bounded channel containing a cylinder for RANS and LES, respectively, to identify the turbulent kinetic energy discrepancy. Second, several flow features such as viscosity ratio, wall-distance based Reynolds number, and vortex stretching are calculated from the mean flow properties of RANS. Then these flow features are regressed onto the discrepancy using a Random Forests regression algorithm. Finally, the discrepancy of the test flow is predicted using the trained algorithm. The results reveal that a significant discrepancy exists between RANS and LES simulations and ML algorithm successfully predicts the increased model uncertainties caused by the employment of k-omega SST turbulence model for transitional fluid flows.


cylinder diameter
blending function
turbulent kinetic energy
Reynolds number ,
flow velocity vector
average inlet velocity
turbulence closure coefficient
dissipation rate
closure coefficient
turbulent viscosity
kinematic viscosity
turbulent kinematic viscosity
specific dissipation rate
turbulence closure coefficient
turbulence closure coefficient
turbulence closure coefficient
subgrid-scale stress tensor


Turbulent flows constitute most of the fluid flow encountered in many processes. It is characterized by velocity fluctuations in all directions that has an infinite number of degrees of freedom. These flows can be classified as laminar, transition to turbulence and fully developed turbulent flows. The flow in an empty bounded channel becomes laminar until the bulk Reynolds number reaches to 1860 [1]. Once the flow speed exceeds that level transition to turbulence occurs, and after a critical Reynolds number, which depends on the flow type, the flow becomes fully-developed turbulent. However, with the presence of an obstruction, simply a cylinder, the fluid separates from both side of the cylinder and two shear layers develop. The formation of these layers causes a phenomenon called von Karman vortex street. On the vortex street, the transition from laminar to turbulent flow occurs at low Reynolds numbers due to increase flow instabilities [2]. Numerical simulations of such flows in a channel with high blockage ratio becomes very challenging since the interaction of vortex generated by both cylinder and wall adds more level of complexity. Resolving all scales of eddies in this type of flow with direct numerical simulations (DNS) methods becomes infeasible because of the need for significant computational resources. On the other hand, although RANS turbulence models provide slightly better predictions for the transitional to turbulent vortex region, it gives misleading results due to the adverse pressure gradients, flow separation, and laminar behavior in the upstream and further downstream of the cylinder.
The recent development of machine learning based techniques emerges as a promising tool to improve capabilities of computational fluid dynamics (CFD) by integrating data-driven approach to the numerical prediction. In this context, several studies have come to exist to investigate turbulent structures and improve current turbulent models by incorporating the ML techniques more than a decade ago. Milano and Koumoutsakos [3] have developed a neural network methodology in order to reconstruct the near wall field in a turbulent flow. They reported that nonlinear neural networks provided better prediction capabilities for the near wall velocity fields. Marusic et al. [4] carried out researches on real-time feature extraction of coherent spatiotemporal structures. They successfully extended the existing pattern discovery algorithms to establish the relationship among higher order clusters. These studies initiated a new research area, however, due to limitations in computational power, and lack of data analytics knowledge, not many studies have shown up for a while.
Recently, incorporation of ML techniques in fluid mechanics has gained momentum and applied to turbulence modeling in several different contexts. Generic steps of ML algorithms include a model on training observations and then making predictions on unseen testing observations using the fitted model. Yarlanki et al. [5] used an artificial neural network base ML method to optimize the model constants of the - turbulence model. They considered the experimental results of temperature distribution in a data center to be the ground truth training data. The researchers achieved to lowering the RMS error by and absolute average error by compared to the error obtained by using default - model constants. Gorle et al. [6] proposed an approach for uncertainty quantification of turbulent mixing models. First, the range of perturbations was obtained as a function of flow features from LES simulations. Then the prediction algorithms were successfully applied to RANS results in order to assess whether the Boussinesq hypothesis is appropriate.
Tracey et al. [7] developed an ML algorithm to quantify uncertainties of low-fidelity models by using information from related high-fidelity data sets. Even with the limited data, their method managed to provide upper and lower limits to RANS errors. Duraisamy et al. [8] inferred the functional form of deficiencies in known closure models by applying inverse problems to experimental data and developed an ML model to obtain more robust and accurate closure models. Ling and Templeton [9] proposed a classification type ML algorithm to predict the regions in a flow where high RANS uncertainty may occur. They achieved to confident assessment that their method enables the evaluation of RANS uncertainty using data-driven approach and capable of generalizing the markers to flows substantially different from those on which it was trained. Wang et al. [10] proposed a data-driven approach to predict Reynolds stress discrepancies in RANS by using regression type ML technique based on Random Forests (RF). They illustrated that the ML algorithms provide noticeable improvements on the baseline RANS simulations at almost no additional computational cost.
In this study, discrepancy of the turbulent kinetic energy (TKE) between - SST RANS and dynamic one-equation LES turbulence models are determined by conducting three-dimensional CFD simulations in a channel containing circular cylinder with high blockage ratio. The simulations are carried out with the Reynolds numbers ranging between and for the training and testing purposes of the RF. The learning algorithm trained on the flow fields at different Reynolds numbers except the one used for testing. Then the trained RF is used to predict the discrepancy of the unseen testing flow. To illustrate the predictive capability of ML, contour plots and profiles obtained at different locations are presented. The results suggest that ML algorithm successfully characterizes the model-based uncertainties in three-dimensional incompressible turbulent flows as a function of features derived from the mean flow properties of RANS.

Model Description

Governing Equations

In this study, steady and unsteady incompressible turbulent flow is considered for RANS and LES simulations, respectively. The equations governing the flow field are:


conservation of momentum for steady flow;


conservation of momentum for unsteady flow;


Here in is the density, is pressure, is viscosity, is turbulent viscosity, is kinematic viscosity, is velocity, and is time. In equation Governing Equations is the filtered velocity obtained by filtering the Navier-Stokes equation and is the subgrid-scale stress tensor.
To simulate steady-state turbulent flows, RANS equations are solved along with a two-equation eddy viscosity model proposed by Menter [11]. In the Menter’s - SST model the equations governing the and are described as:


Here , , , , and are closure coefficients and is the blending function which takes different values at the near wall and in the bulk.
To obtain high fidelity results LES simulations are conducted for unsteady flow field using a dynamic one-equation subgrid-scale (SGS) model presented by Kim and Menon [12]. The dynamic model improves on the limitation of Smagorinsky model by adjusting the proportionality coefficient, , defined in the subgrid eddy viscosity, locally during computation instead of defining a priori global constant. The equation governing the is described as:


Herein is the SGS stress which is defined as a function of turbulent kinematic viscosity, where is the model coefficient and is the grid scale filter. The three terms on the right-hand-side of Eq. (Governing Equations) represent, the production rate, the dissipation rate, and the transport rate of respectively.

Numerical Model

In the current study, the three-dimensional CFD model of flow in a channel containing circular cylinder was developed for steady and unsteady turbulent flows. Schematic of the flow domain along with the flow direction indication is illustrated in Figure 1. Spatially, , , and directions represent the normalized length, , height, , and width, of the channel, respectively. The normalization factor is the diameter of the cylinder, and the blockage ratio of the channel is . Reynolds number, , is defined based on the averaged inlet velocity, and cylinder diameter .
Regarding the boundary conditions, no-slip and no-penetration conditions are applied to the top, bottom, and cylinder walls. To better uncover the three-dimensional flow effects, transitional periodicity is considered in direction. The intensity of the TKE is set to be zero at the inlet since there is no flow disturbance resulting in a transition from laminar to fully developed turbulent flow in the upstream of the cylinder. The dimensionless wall distance, is achieved to be smaller than unity so that no wall functions are used along the walls for TKE.

FIGURE 1: Schematic of the flow domain.

As for the discretization and solution of the governing equations (Eqs. Governing Equations-Governing Equations), OpenFOAM-v1706 - an open source finite volume method (FVM) solver - is employed. Specifically, simpleFoam and pimpleFoam algorithms are used for RANS and LES simulations, respectively. To calculate and compare the discrepancies time-averaging of LES results are considered.

Random Forests

Machine learning explores algorithms that can learn from data through fitting a model on training observations and then making predictions for unseen testing observations using the fitted model. Typical data matrix for building a learning algorithm consists of input and output features. The features can be characterized as either numerical or categorical. Examples of numerical features include discrepancy of turbulent kinetic energy between RANS and LES simulations, turbulence intensity, and streamline curvature. Examples of categorical features include a point’s uncertainty level in RANS results (low or high) and violation of certain assumptions (yes or no). The problems with categorical output feature are referred to as the classification problems whereas the problems with numerical output feature are referred to as the regression problems. In this study, since our goal is to predict a numerical feature, we focus on the regression problem.
Over the past few decades, a variety of regression techniques have been proposed such as k-nearest neighbors [13], ridge regression [14], lasso [15], artificial neural networks [16], tree-based methods (e.g., regression trees, random forests, boosting) [17], and support vector regression [18]. Among these, we employ the random forests [19] in our study since it does not suffer from the curse of dimensionality and provides good predictions with physical interpretations such as the importance of features.
RF is one of the ensemble methods and is an ensemble of decision trees. Decision trees (DT) divides the input feature space into distinct and non-overlapping boxes, , with the goal of minimizing the variance within regions given by


where is the total number of regions, is the output feature value of training observation , and is the mean value of the output feature of training observations in region . As for the prediction, when a new observation enters the system, DT uses the mean value of training observations in the region in which the new observation falls. Figure 2 shows the schematic representation of partition and the corresponding decision tree.

FIGURE 2: Partition of two-dimensional input feature space into five distinct and non-overlapping regions and its corresponding decision tree representation. is the mean of the observations in region .
FIGURE 3: A graphical illustration of random forests algorithm which is the ensemble of decision trees. Observations are chosen randomly with replacement and two out-of-four attributes are used randomly at each split.

Decision trees are easy to interpret and implement and also computationally inexpensive. However, they are not robust to changes in the training data. Small changes in the training data can lead to large differences in the fitted model and corresponding predictions. Ensemble methods form a ”strong learner” using a group of ”weak learners.” To this extent, RF produces multiple decision trees to address the issue of high variance and then combines them to yield a prediction. First, the training data is split into subsets of observations. The observations and features are chosen randomly at each split. Then a separate decision tree is fitted to each subset. Randomness enhances tree diversity, avoids trees to be very similar to each other, and diminishes the tendency of the model to overfitting. For a given new observation, it is run down all the trees, and the average of the predictions of all trees is considered a single consensus prediction for the new observation. Figure 3 shows the graphical illustration of randoms forests.

Scenario No. Training flows Testing flow
TABLE 1: Training and testing scenarios.

As for the training and testing of the RF algorithm, RANS and LES simulations are carried out using OpenFOAM-v1706 with different flow rates in a three-dimensional transition to turbulence regime, where the data obtained from LES simulations are averaged over time. The mean flow features, which are proposed in [9, 10], are obtained from RANS simulations and used as input and the discrepancy of turbulent kinetic energy is used as the output to the RF algorithm. We use scikit-learn - an open source Python library for ML - for training and prediction purposes of the RF algorithm.

Results and Discussion

Within the range of Reynolds number considered in this study, the flow remains laminar in an empty channel. However, introduction of a cylinder interrupting the upcoming stream induce momentum mixing by flow separation and vortex shedding which results in flow instabilities leading to turbulent flow. Figure 4 illustrates an instantaneous contour and iso-surface of TKE for . As imposed by the inlet condition, TKE intensity remains nearly zero in the upstream flow and transition from laminar to turbulent flow occurs in the downstream. The increased turbulent activity is transported until several diameter away from the cylinder and decays down towards outlet. Such behavior confirms that the flow tends to remain in laminar regime unless it is perturbed. Additionally, the flow in the wake region turns into three-dimensional flow with strong secondary flows, and oscillates which is described by von Karman phenomena, as illustrated in Figure 4b. The accurate numerical solution of this flow field, essentially, requires a turbulent model which can capture the physics in both upstream and downstream of a cylinder where different flow regimes observed.

FIGURE 4: Instantaneous contour (a) and iso-surface (b) of turbulent kinetic energy obtained by LES at . Contour is rendered at mid y-plane and iso-surface level is set to be .

RF Prediction of TKE discrepancies at different Reynolds numbers

The RANS and LES simulations are carried out with four different Reynold numbers - 500, 750, 1000, 1250 - to obtain the train and test flows. The mean flow features are calculated for each flow using the raw flow properties such as mean pressure, velocity, turbulent kinematic viscosity, and wall distance. Then these features are used as input to the learning algorithm. The input features are normalized so that they are in the range of . The log discrepancy of TKE, , is obtained for each flow as an output. In order to nicely illustrate the TKE discrepancy under significant deviations between RANS and LES simulations, we use the log discrepancy as an output to the learning algorithm. Each flow has 2.8 million data points and 10 input features. As for the training and testing of the RF, we created four different scenarios which are shown in Table 1. For each scenario, we train and test our model on 8.4 and 2.8 million data points, respectively. We use decision trees to ensemble for each scenario. The value of the number of trees is obtained by 10-fold cross-validation approach [17].

FIGURE 5: Distributions of log discrepancy of the predicted and true TKE at and .

Figure 6 shows contours of log TKE obtained by - SST RANS, RF ML prediction and dynamic one-equation LES at and . The contours are depicted at mid z-plane to illustrate the TKE distribution in stream-wise and cross-wise directions. It is observed that RANS fails to capture the elevated TKE inside the boundary layers developing over the walls and cylinder. The near wall behavior of TKE becomes important when it comes to accurate predictions of forces exerted on the cylinder or temperature distribution along the wall as seen in many applications. Moreover, RANS simulations tends to under-predict and over-predict the TKE in the upstream and downstream of the cylinder, respectively, regardless of different flow speeds. Particularly, in the recirculation region right behind the cylinder the severity of over-prediction becomes more pronounced. It is apparent that LES simulations offer better solution at the near wall and cylinder under presence of the adverse pressure gradients and flow separation. Regarding the TKE intensity in further downstream of the cylinder, the dissipation of TKE, as suggested by the time-averaged LES, shows the tendency of transitioning back from the unstable flow to laminar.

FIGURE 6: Contours of log turbulent kinetic energy obtained by - SST RANS, RF prediction, and dynamic one-equation LES at (a) and (b) . The contours are rendered at mid z-plane.
FIGURE 7: Contours of log turbulent kinetic energy obtained by - SST RANS, RF prediction, and dynamic one-equation LES at (a) and (b) . The contours are rendered at mid z-plane.
FIGURE 8: Profiles of log turbulent kinetic energy for RANS, predicted, and LES at (a) , (b) , (c) , and (d) . The profiles are depicted along the cross-wise direction at mid z-plane and 10 different stream-wise locations.

The RF ML algorithm can address the discrepancies mentioned above and provides satisfactory predictions, as illustrated in Figure 6. It nicely differentiates the over-prediction in the wake region, and under-prediction at near walls and then incorporates the predictive capabilities to improve the results obtained by low fidelity RANS simulations. In particular, the prediction of TKE distribution at is sufficient however it deviates more when compared to one at since the training is achieved with the flow speeds all greater than . This indicates the necessity of representative data enough to reveal physic of flow behavior.
Figure 7 illustrates contours of log TKE at and . Each subfigure presents the results depicted from mid z-plane for RANS, RF predictions, and LES. As expected, the TKE intensity increases with the increasing flow speed. However, as discussed in the Figure 6, RANS unacceptably over-predicts the intensity in the downstream of the cylinder, particularly within the recirculating region. Apparently, the discrepancy between - SST RANS and dynamic one-equation LES model reaches as much as an order of magnitude. As one of the shortcomings of RANS is discussed in Figure 6, under-prediction of the increase in TKE intensity in the boundary layer persist even at higher Reynolds numbers presented in Figure 7.
ML algorithm characterizes the discrepancies and improves on the RANS solutions at both flow speed, as shown in Figure 7. In accordance with the issue discussed in Figure 6 regarding the training data variability, similar deviation in the prediction of is observed. However, it is realized that the severity of poor prediction is more significant at . To better assess the performance of prediction between and - the minimum and the maximum Reynolds number considered in this study - Figure 5, illustrating the distribution of log discrepancy of predicted and true TKE, is presented. Figure 5 quantitatively confirms the inference made above in regard to the performance of prediction so that the algorithm predicts better than . Such result is counterintuitive since the prediction of both case is achieved by an algorithm trained in the remaining three flow speeds. As such, it can be inferred that the TKE intensity is not that significant in the lowest Reynolds number for the flow geometry considered in this study. In other words, the flow characteristics at fall apart from the rest three flow speeds and some characteristic features of the lowest flow speed is not learned well since such information is not carried by training cases with , , and . This is a clear indication of changing flow phenomena with respect to flow speed in somewhere between and . Beyond predicting the discrepancies, the ML algorithm works as a descriptive analysis tool and reveals crucial information regarding the correlation between TKE intensity and flow speed in transitional flow regimes.
Figure 8 illustrates the profiles of log TKE suggested by RANS, LES, and RF predictions at all Reynolds numbers considered in the present study. The locations of each profile in the stream-wise direction are shown in a geometry schematic at the bottom of Figure 8. It is worth noting that the profiles appear on left side of y-axis since all TKEs are smaller than unity ending up with a negative value once log of the property is considered. As discussed in Figures 6 and 7, RANS offers a lower level of TKE in the upstream of the cylinder whereas it predicts higher level in the downstream. The TKE dissipates toward the walls right around and behind the cylinder. However, regarding the LES, the TKE intensity increases in the vicinity of the walls and interacts with the vortex shedding in the wake region resulting in increased turbulent activity at near wall. Such variation in the prediction obtained by RANS and LES proves that both methods behaves considerably different in the transition from laminar to turbulent flow regime.
Profiles depicted from different locations in each flow speed confirms that RF algorithm accurately characterizes the discrepancies and corrects the low fidelity solutions obtained by RANS. In particular, the learning algorithm differentiates the near wall and bulk region so that it captures the phenomena spatially through the flow domain. Such precise prediction offered by the machine learning algorithm also proves the variability of features which provide sufficient information regarding the flow dynamics.


This paper aims to determine the turbulent kinetic energy discrepancies between RANS and LES and characterize them as a function of features, obtained from mean flow properties of RANS, by means of machine learning algorithms. To accomplish this, three-dimensional CFD simulations in a channel containing a cylinder are conducted for various Reynolds numbers assuring to keep the flow in transition from laminar to the turbulent regime. Then the learning algorithm is trained with different scenarios to predict TKE at different flow speeds.
It is found that (1) RANS and LES predictions significantly deviates, especially in the vicinity of walls and in the downstream of the cylinder; (2) ML successfully predicts the model-based uncertainties in transition to turbulent flow regime; (3) the prediction performance of ML algorithm slightly lower at the lowest Reynolds number due to the fact that the flow characteristic changes at around ; (4) ML approach demonstrates the capability of revealing the correlation between TKE intensity and flow speed in transitional flow regimes.
Overall, the proposed study illustrates how the turbulence models can benefit from the ML algorithms and evidently elucidates that model-based uncertainties of low fidelity approaches can be predicted without requiring a high-fidelity simulation.


This work used the Extreme Science and Engineering Discovery Environment (XSEDE) for computational need, which is supported by National Science Foundation grant number TG-CTS170051. Specifically, it used the Bridges system at the Pittsburgh Supercomputing Center (PSC). The authors also would like to thank Dr. Alparslan Oztekin for assistance with technical discussion.


  • [1] Tsukahara, T., Seki, Y., Kawamura, H., and Tochio, D., 2005. “Dns of turbulent channel flow at very low reynolds numbers”. In TSFP Digital Library Online, Begel House Inc.
  • [2] Blevins, R. D., 1990. “Flow-induced vibration”.
  • [3] Milano, M., and Koumoutsakos, P., 2002. “Neural network modeling for near wall turbulent flow”. Journal of Computational Physics, 182(1), pp. 1–26.
  • [4] Marusic, I., Candler, G., Interrante, V., Subbareddy, P., and Moss, A., 2001. “Real time feature extraction for the analysis of turbulent flows”. In Data Mining for Scientific and Engineering Applications. Springer, pp. 223–238.
  • [5] Yarlanki, S., Rajendran, B., and Hamann, H., 2012. “Estimation of turbulence closure coefficients for data centers using machine learning algorithms”. In Thermal and Thermomechanical Phenomena in Electronic Systems (ITherm), 2012 13th IEEE Intersociety Conference on, IEEE, pp. 38–42.
  • [6] Gorle, C., Emory, M., and Iaccarino, G. “Rans modeling of turbulent mixing for a jet in supersonic cross flow: model evaluation and uncertainty quantification”. In ICHMT DIGITAL LIBRARY ONLINE, pp. 1794–1797.
  • [7] Tracey, B., Duraisamy, K., and Alonso, J., 2013. “Application of supervised learning to quantify uncertainties in turbulence and combustion modeling”. In 51st AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, p. 259.
  • [8] Duraisamy, K., Zhang, Z. J., and Singh, A. P., 2015. “New approaches in turbulence and transition modeling using data-driven techniques”. In 53rd AIAA Aerospace Sciences Meeting, p. 1284.
  • [9] Ling, J., and Templeton, J., 2015. “Evaluation of machine learning algorithms for prediction of regions of high reynolds averaged navier stokes uncertainty”. Physics of Fluids, 27(8), p. 085103.
  • [10] Wang, J.-X., Wu, J.-L., and Xiao, H., 2017. “Physics-informed machine learning approach for reconstructing reynolds stress modeling discrepancies based on dns data”. Physical Review Fluids, 2(3), p. 034603.
  • [11] Menter, F. R., 1994. “Two-equation eddy-viscosity turbulence models for engineering applications”. AIAA journal, 32(8), pp. 1598–1605.
  • [12] Kim, W.-W., and Menon, S., 1995. “A new dynamic one-equation subgrid-scale model for large eddy simulations”. In 33rd Aerospace Sciences Meeting and Exhibit, p. 356.
  • [13] Altman, N. S., 1992. “An introduction to kernel and nearest-neighbor nonparametric regression”. The American Statistician, 46(3), pp. 175–185.
  • [14] Hoerl, A. E., and Kennard, R. W., 1970. “Ridge regression: Biased estimation for nonorthogonal problems”. Technometrics, 12(1), pp. 55–67.
  • [15] Tibshirani, R., 1996. “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288.
  • [16] Yegnanarayana, B., 2009. Artificial neural networks. PHI Learning Pvt. Ltd.
  • [17] James, G., Witten, D., Hastie, T., and Tibshirani, R., 2013. An introduction to statistical learning, Vol. 112. Springer.
  • [18] Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J., and Vapnik, V., 1997. “Support vector regression machines”. In Advances in neural information processing systems, pp. 155–161.
  • [19] Breiman, L., 2001. “Random forests”. Machine learning, 45(1), pp. 5–32.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description