ContinuousScale Kinetic Fluid Simulation
Abstract
Kinetic approaches, i.e., methods based on the lattice Boltzmann equations, have long been recognized as an appealing alternative for solving incompressible NavierStokes equations in computational fluid dynamics. However, such approaches have not been widely adopted in graphics mainly due to the underlying inaccuracy, instability and inflexibility. In this paper, we try to tackle these problems in order to make kinetic approaches practical for graphical applications. To achieve more accurate and stable simulations, we propose to employ the nonorthogonal centralmomentrelaxation model, where we develop a novel adaptive relaxation method to retain both stability and accuracy in turbulent flows. To achieve flexibility, we propose a novel continuousscale formulation that enables samples at arbitrary resolutions to easily communicate with each other in a more continuous sense and with loose geometrical constraints, which allows efficient and adaptive sample construction to better match the physical scale. Such a capability directly leads to an automatic sample construction which generates static and dynamic scales at initialization and during simulation, respectively. This effectively makes our method suitable for simulating turbulent flows with arbitrary geometrical boundaries. Our simulation results with applications to smoke animations show the benefits of our method, with comparisons for justification and verification.
1 Introduction
Fluid simulation in graphics has evolved for more than a decade. Since the pioneering work of [1], fluid simulation methods have been developed significantly, among which directly solving the incompressible NavierStokes equations (INSE) can be considered as the standard and very popular approach to simulating fluid flows. However, under turbulent conditions where the flow usually has small viscosity and thus high Reynolds number, accurately resolving the smallscale turbulence details in INSE effectively without numerical diffusion becomes a great challenge. In order to tackle this problem, different methods have been proposed in the literature [2, 3, 4, 5, 6, 7], but usually at a cost of more algorithmic and computational complexity. Some of them [8, 9, 10], which are mainly based on noise models, may not fully respect the underlying physics, making the simulated results unnatural in some circumstances.
While a large number of methods have been proposed to directly solve INSE, there exist other alternatives which can bypass the difficulty of nonlinear advection and global pressure solve in INSE, making the underlying solution simpler and sometimes more accurate. One of these alternatives is the kinetic approach based on the lattice Boltzmann equations (LBE) [11]. Researchers from the computational fluid dynamics (CFD) field are interested in such a method, since it transforms INSE into a computationally easier set of linear PDE system with a nonlinear source term, and without any global pressure solve. In addition, it is usually formulated by an explicit time evolution with a constant time step, which is much simpler to solve with only local updating dynamics. This facilitates the development of a simple and conservative numerical scheme based on LBE, which is formulated as:
(1) 
where is the th velocity distribution function associated with the th lattice velocity ; is the collision operator designed to be conservative and important to approximate INSE; with proper should be strictly satisfied for stability (CFL=1, with CFL a number defined as CFL= [12], where is the flow speed; larger CFL number indicates larger time stepping and faster simulation over time); and are the macroscopic density and velocity computed as and . Eq. 1 is usually accurate and stable in the incompressible limit (the entire flow speed is small, e.g., , as compared to the speed of sound in normalized lattice units).
Solving fluid flows using LBE has several known advantages. In addition to the benefits introduced before, the local treatment of boundary conditions leads to effective simulations with arbitrary geometrical boundaries and with efficient implementations for parallelism. It is well known that the stability and accuracy of Eq. 1 largely depends on the modeling of , where traditionally the BhatnagarGrossKrook (BGK) [11] and multiplerelaxationtime (MRT) [13] models were often used, but the inherent inaccuracy and instability prevent their widespread use in graphics. Thuerey et al. [14, 15] and Zhao et al. [16, 17] ever promoted the use of LBE in graphics, but only for flows with moderate Reynolds numbers. Liu et al. [18] and Guo et al. [19] also proposed to use LBE for graphical applications, but with more empirical approaches, resulting in less realistic results in some circumstances.
In this paper, we aim to solve fluid flows using the kinetic approach. However, to promote the practical use of such an approach for graphical fluid flow simulations, we need to improve the stability, accuracy and flexibility. To achieve higher stability and accuracy, a more suitable collision operator should be developed in order to approximate INSE more closely and suppress ghost and ringing artifacts more appropriately. To achieve computational flexibility, easy and more continuous adaptation of sample resolutions should be achieved in order to resolve the physical details during simulations. The solution to the above problems form the core contributions of this paper:

To significantly increase the stability and accuracy over the traditional BGK and MRT models, we employ a novel nonorthogonal centralmomentrelaxation (CMR) model for LBE [20], which has higher convergence to INSE given appropriate relaxation parameters, and with simpler algebra. However, like in MRT model, not all the relaxation parameters can give satisfactory results for CMR model, and determining the appropriate relaxation parameters is crucial to retain turbulence details with both stability and accuracy. With the observation that CMR model conducts relaxations independently for different orders of moments, and highorder moments influence the turblence details significantly, we propose to determine the highorder relaxation parameters adaptively according to the local velocity gradient in order to produce stable yet accurate flows under turbulent conditions, which makes the method practical for graphical applications.

On the other hand, traditional LBE simulations usually solve the fluid flows with a uniform lattice, which is hard to adapt computations to spatially and temporally varying physical details. Multiblock approaches [21, 15] solve this problem by dividing a largescale cell into a multiple of integer smallscale cells with strict geometrical alignment along the scale boundary, which is not flexible and violates the continuousscale nature of fluid flows, resulting in undesirable discontinuous structures in turbulent flows. In this work, we propose a novel continuousscale formulation, which allows samples in arbitrary scales to communicate with each other without strict spatial constraint, where mappings and interpolations of distribution functions are properly handled. By “continuousscale”, we mean that the ratio between different scales could be arbitrary rather than only integral. This immediately enables a fully automatic scheme to first generate static scales at initialization that transit more continuously based on domain geometries and inlet positions, and then refine the scales dynamically at runtime during the simulation, which is particularly useful in boundaryinduced turbulent flow simulations.
To justify our arguments, we present the results by applying our method to smoke simulations, where stable solutions can be readily achieved with sufficient turbulence details and over arbitrary geometries, see Fig. 1 for an example of our simulated smoke passing through a turning tube, where the boundary layer induces small vortices, and higher resolution samples are placed along the tube boundary. Note that in our approach, no turbulence nor noise models are used to resolve the fluid details. As a verification, our results are compared to the existing methods for smoke simulations [22, 7, 23], which suggest that more appropriate visual details can be achieved with less number of samples and with higher computational efficiency.
2 Related Work
There are a vast number of research work on fluid simulations in graphics. Here, we summarize the works that are quite related to our work on singlephase turbulent flow simulations, and ignore those that deal with multiphase flows. In addition, we categorize the related methods into direct approaches where INSE is directly solved, and indirect approaches where other equations are solved in order to approximate the solution of INSE.
2.1 Direct approach
To solve INSE directly, Stam [1] proposed the unconditionally stable semiLagrangian advection scheme, with the major drawback of excessive numerical diffusion. To overcome such a problem, many different algorithms have been proposed. Vorticity confinement [24, 8] was the early attempt to add fluid details by artificial force, which was later extended by noisebased approaches [9, 10] where the Kolmogorov energy spectrum is respected. To have better results around object boundary, turbulence models [4, 25] and precomputed artificial boundary layer method [26] were proposed. However, all of these methods do not fully respect the underlying physics, making the simulation sometimes not realistic.
To preserve fluid details without artificial modeling, different classes of methods were proposed. One class of such methods try to increase the accuracy of nonlinear advection where BFECC [27] , MacCormack [22] and highorder WENO schemes [28] , as well as improved highorder constrained interpolation profile (CIP) methods [29, 10] were used. Heo and Ko [30] combined polynomial representation with a highorder reinitialization method to preserve detailed structures of the fluid interface. However, the more widelyadopted approach is the hybrid method where advection is solved using particles while pressure and other parts are solved using grids [31, 32]. Jiang et al. [33] presented a novel technique to preserve linear and angular momentum in the hybrid approach in order to better resolve the details. Such a technique has later been extended to a more generalized local function to greatly improve the energy and vorticity conservation [34]. Vortex methods are also appealing to preserve fluid details [35, 36]. Vortex filaments [5] and vortex sheets [37, 6] are both effective ways to solve for turbulent flows with reduced computation. To improve the efficiency for solving Poisson equation, Zhang and Bridson [38] proposed a hybrid PPPM algorithm. More recently, Zhang et al. [7] proposed the IVOCK scheme to preserve more turbulence details based on the velocity correction from the vorticity equation.
In addition to the above methods, adaptive approaches [2, 39, 40] try to put more computations on fine structures to capture flow details. Zhu et al. [41] presented an adaptive grid to create a farfield coarse grid with fine grid at the focus of the simulation. Setaluri et al. [42] introduced a new data structure for adaptive grids with compact storage and efficient stream processing . Recently, Zhang et al. [23] proposed an adaptive particlegrid scheme to capture boundary layer dynamics more accurately.
There are also many Lagrangian particle solvers for INSE, which are mainly based on smoothed particle hydrodynamics (SPH), and are naturally spatially adaptive, e.g., Becker [43] presented a weakly compressible form of the SPH method for fluid flows based on the Tait equation; Solenthaler [44] presented a novel incompressible SPH method for fluid simulations based on predictioncorrection scheme; Ihmsen [45] proposed a novel formulation of the projection method for SPH; and Winchenbach [46] introduced a novel method for adaptive incompressible SPH simulations.
Another new approach to solving INSE is the datadriven approach based on machine learning [47, 48], which can produce fast solutions, but the results may contain unexpected artifacts.
Compared to the existing approaches above, our method does not solve the INSE directly, which overcomes the difficulty of handling nonlinear advection and global pressure solve with sufficient accuracy and stability. More importantly, we propose an efficient adaptive scale refinement formulation, which allows continuousscale construction with loose geometrical constraint. This facilitates flexible and more efficient simulations.
2.2 Indirect approach
INSE can also be solved indirectly by other model equations, among which the kinetic approach based on LBE is one alternative. The early work of LBEbased approach was pioneered by BGK model [11]. To improve stability and accuracy, MRT model [13] was proposed. However, the most significant progress for LBE modeling is the cascaded model with centralmoment relaxation [49, 50]. Very recently, De Rosis [20] proposed a nonorthogonal centralmomentrelaxation model with simple algebra. Turbulence models, on the other hand, can also be used to stabilize LBE and retain fluid details especially in coarse grid simulations [51, 52], but may introduce numerical artifacts. In this paper, we employ the nonorthogonal centralmoment relaxation model, but propose an adaptive relaxation scheme without the aid of turbulence models for graphical flow simulations, which respects the underlying physics more appropriately.
To enable adaptive computation, multiblockbased grid refinement [53, 54, 21, 17, 55], and unstructured mesh formulations [56, 57] were proposed for LBE. While multiblock formulation lacks scalecontinuity and has strict alignment constraint between scales, unstructured mesh formulation requires complicated meshing/remeshing process, which is difficult for dynamic refinement. Compared to these methods, we propose a novel continuousscale formulation that allows arbitrary scales to communicate with each other without strict spatial constraint. As described in Section 1, this immediately allows flexible scale construction and dynamic refinement to resolve turbulent flow details more appropriately with structure continuity.
There are some other works that deal with the interaction between fluid flows and solid objects using LBE approach in graphics. For example, Wei et al. [16] presented an approach for simulating natural dynamics that emerge from the interaction between a flow field and immersed objects. Zhao et al. [17] provided a physicallybased framework for simulating natural phenomena related to heat interaction between objects and the surrounding air. In addition to kinetic approaches, incompressible nonlinear Schrödinger equation has been recently employed in graphics to solve for inviscid fluid flows [58] with more accurate advection.
3 Fundamentals
Before presenting our approach, we first introduce the fundamentals on the nonorthogonal centralmoment relaxation model for LBE as well as the multiblock formulations. They also serve as a reference to differentiate our formulation with the existing ones.
3.1 Nonorthogonal centralmoment relaxation model
As introduced before, centralmoment relaxation (CMR) models have superior performance than BGK and MRT models in terms of stability and accuracy, and in particular, we employ the nonorthogonal CMR model [20], which is constructed with simpler algebra and sufficient stability.
Unlike the traditional MRT model, the nonorthogonal CMR model constructs the centralmoment space with translated lattice velocities: , where is the original lattice velocity (we use D3Q27 lattice velocity model for 3D simulations, see Fig. 2), and is the macroscopic velocity. The collision is performed in the centralmoment space, and thus a transformation of the distribution function into centralmoment space should be performed.
To obtain such a transformation, a matrix is first constructed as: , where indexes the corresponding velocity component; are the orders of moments, and indexes different components. Then, we perform where and aggregate all values for and to perform the transformation. By constructing an inverse matrix , we can transform the moment vector back to the distribution functions by . Note that the analytical forms for both and can be explicitly obtained, which are directly used during the simulation.
To model collision, the equilibrium state is constructed in centralmoment space as and the collision vector , which contains all the collision operators , is constructed by a relaxation process in centralmoment space as:
(2) 
where is a diagonal relaxation matrix. Note that some relaxation parameters are related to the kinematic viscosity:
(3) 
thus, are all of the same value. Parameters for conservative quantities can be arbitrary, and we set them to be 0. Other parameters are for highorder moments, which can be freely tuned within the range to achieve different accuracy and stability. The specific forms of , , and can be found in Appendix.
3.2 Multiblock lattice Boltzmann formulation
As mentioned in Section 2.2, adaptive approaches have been proposed to simulate LBE in order to save computation, among which multiblock formulations were often used due to simplicity especially considering dynamic refinement. In multiblock formulation, the uniform grid is subdivided in an octree manner. Fig. 3 (a) gives an example of grid refinement with two scales, where the coarse scale (orange one) is subdivided into the fine scale (blue one), with their boundaries matched (see the green line), and the ratio between different scales be strictly integral.
The idea behind multiblock formulation is to keep local Reynolds number invariant at the same sample point between different scales, which results in a mapping of at overlapped sample points between different scales (the green line in Fig. 3 (a)). There have been derivations of such a mapping for BGK [59, 53] and MRT [21, 52] models, but no derivation for nonorthogonal CMR models yet.
Our continuousscale approach is based on multiblock formulation. To explain it, we first define some notations: indicates a specific scale; and indicate the coarse and fine scales; and denote the grid spacing and time step at scale ; is the ratio of spacings between two scales.
To obtain the mapping of for MRT model, the invariance of local Reynolds numbers between coarse and fine scales leads to [21]:
(4) 
with . To further maintain continuity of macroscopic variables across scales, it is derived in [21] that:
(5) 
which results in a diagonal mapping matrix of all the nonequilibrium components in moment space from fine to coarse scales as: . Similarly, we can obtain a diagonal mapping matrix from coarse to fine scales. Returning back to the space of and following [21], the mapping from fine to coarse scales after collision is:
(6)  
where tilde indicates postcollision state and is the identity matrix. This immediately expresses and as:
(7)  
Then, we can rewrite as:
(8) 
Defining and , the mapping from fine to coarse scales can be rewritten as:
(9) 
The mapping from coarse to fine scales can be constructed and formulated similarly.
To summarize, in general, given two scales and , the mapping from to after collision can be given by:
(10) 
where
(11) 
and is a diagonal matrix with
(12) 
and is the ratio of spacings between scales and .
4 Our Formulations
Based on the above fundamental descriptions, we derive our own formulation for LBE, which can achieve stable and accurate simulations, with flexible sample placement and refinement, making our simulator more efficient to produce turbulent flows.
4.1 Adaptive relaxation
Traditional MRT model for LBE is unable to simulate fluid flows with small viscosity, which is mainly due to the violation of Galilean invariance, leading to ghost modes that induce instability. Even though turbulence models (e.g., Smagorinsky model) can stabilize the dynamics, it produces strong ringing artifacts, see Fig. 4 (a), which contaminates the whole velocity field (readers are suggested to see the supplementary video for more obvious ringing artifacts). The nonorthogonal CMR model can significantly reduce the ghost modes, and thus the ringing artifacts, but the selection of the relaxation parameters is crucial. Improper selection of these parameters may lead to oversmoothed results, see Fig. 4 (b) with the original parameter setting from [20].
To preserve turbulence details while reducing ringing artifacts, it is essential that the highorder relaxation parameters should be carefully tuned [20]. However, how these parameters are set to maintain stability while preserving turbulence details is still unknown. By our analysis, we noticed that each highorder relaxation parameter effectively corresponds to a diffusion viscosity similar to Eq. 3, which acts like an artificial viscosity to control high order oscillation modes (ringing artifacts). In practice and with our numerical experiments, to ensure stability and retain accuracy, should be progressively increased with respect to the order, and should be relatively large (e.g., ) for the highest order parameter , and relatively small (e.g., ) for the lowest order parameters . For orders in between, we linearly interpolate them based on these two values and their corresponding orders. This can effectively stabilize the dynamics while retaining sufficient turbulence details, but may not be able to fully suppress the ringing, see Fig. 4 (c) for an example.
To further reduce ringing artifacts, we propose to perform adaptive relaxation, meaning that instead of using fixed relaxation parameters, we adjust the relaxation for highorder moment adaptively according to the flow. In principle, fluctuating regions may generate strong numerical waves that propagate to other smoother regions, resulting in noticeable ringing artifacts. To suppress these rings, we can give more diffusion to the smoother regions, which has the effect of preventing rings from propagating out. Thus, we should set larger for smoother regions, but the interrelationship among the highorder parameters should be maintained in proportion to our previous setting in order to have stable simulations. Hence, we uniformly scale the previous artificial viscosity setting for highorder moments according to the velocity gradient, and for samples with smaller gradients, larger scaling factors are given in front of the original . This results in the following formulation of the new artificial viscosity for :
(13) 
where and are model parameters that can be tuned; is the maximum gradient magnitude for normalization.
Fig. 4 (d) shows the simulation result for the velocity field with such an adaptive relaxation, which is clear that ringing artifacts have been significantly suppressed. Note that such an adaptive relaxation method cannot be applied to the traditional MRT model since the moment of different orders are coupled for MRT, while in nonorthogonal CMR model, these highorder moments are more independent.
4.2 Continuousscale formulation
By using nonorthogonal CMR model with our adaptive relaxation, we can obtain stable turbulent flow simulations with sufficient fine details. However, such a simulation is only performed on a uniform grid, and as argued before, it is difficult to adapt computations to spatially and temporally varying fluid flows with different physical details. In reality, a fluid flow may contain both laminar and turbulent regions, as well as the transition region between them. The variations of flow quantities (such as velocity) are different, leading to the concept of “fluid scale”, which indicates the frequency of such variations. It is well known that in real fluid flows, the scale variations are continuous [60].
In principle, the sample resolution should vary with respect to the fluid scale, where turbulent regions should have more samples. As introduced in Section 3.2, multiblock formulation [21, 61] has been proposed to achieve this goal. While such a formulation is relatively simple, the scales do not respect the continuousscale nature of fluid flows. To retain scale continuity, unstructuredmesh formulations [57] were proposed, but they all inherited the difficulties for mesh construction and adaptive refinement.
In this paper, we propose a novel method from the idea of multiblock formulation, but allow sample resolutions (scales) to be constructed more continuously, with the ratio between different scales no longer restricted to integer values. By breaking such a restriction, we have two important benefits: i. the sample scales can be more continuous in order to better respect the physical scale; ii. as depicted in Section 5, efficient and flexible scale construction and refinement schemes can be developed in order to dynamically adapt sample scales and place more computations on turbulent finescale regions.
Mapping distribution functions
To achieve continuousscale formulation, we need first derive the mapping of distribution functions between different scales for the nonorthogonal CMR model with our adaptive relaxation. Similar as the derivation for MRT model in Section 3.2, we start from the invariance of local Reynolds numbers between a coarse and a fine scale, which leads to Eq. 4. Then, we will derive the relationship of nonequilibrium states between and for the nonorthogonal CMR model in order to obtain the mapping, like the relation in Eq.5.
From [62], we know that , where can be identified by the Knudsen number [63]. We also know that is given by:
(14) 
where is the lattice weight originally present in BGK model; is the speed of sound, and are related to the strain rate tensor through the relation:
(15) 
where is related to the kinematic viscosity as in Eq. 3, and the strain rate tensor is defined as . Since , we can find that is proportional to the gradient of the macroscopic velocity, and it is therefore necessary to be rescaled when communicating between different scales. By assuming and using Eqs. 14 and 15, we have:
(16) 
where and represent the same strain rate tensor with lattice units at fine and coarse scales respectively, which can be renormalized by and , leading to:
(17) 
where is the strain rate tensor in physical units. Thus, we have:
(18) 
Finally we get:
(19) 
Converting to the centralmoment space by multiplying the centralmoment matrix on both sides, we have:
(20) 
This is exactly the same relation expressed in Eq. 5 for MRT model. With the derivation in Section 3.2, we arrive at the same mapping expression for from to as given by Eq. 10 for the nonorthogonal CMR model.
The meaning of Eq. 10 is that when computing fluid flows with two different scales, in addition to interpolating from one scale to another, we need to apply another mapping in order to obtain the correct , which is Reynolds number consistent. As an illustration, take Fig. 3 (a) for an example, which shows the setting for traditional multiblock method, where , and the mapping happens only at the coincided boundary. The coarse scale iterates first before the small scale starts. When iterates, its values at the boundary, e.g., point , should first be interpolated from the nearby points of along the boundary, and then apply Eq. 10 to map from to , which provide the necessary boundary values for iterations at . After iterations at , the values at the boundary points of , e.g., point , are first copied from the overlapped point at , and then apply Eq. 10 to map from to to update the boundary values of . We call the mapping from to as “priormapping” and the mapping from to as “postmapping”.
Spatial scale mapping
As argued before, multiblock formulation with restricted to integers is problematic especially for turbulent flows. Fig. 5 (a) shows an example of the simulation with multiblock method () where turbulence structures are suddenly lost when transiting from fine to coarse scales due to the violation of scalecontinuity. As shown later, this discontinuity can be avoided or much reduced by constructing a more continuousscale setting and employing our continuousscale formulation, see Fig. 5 (c), where four scales are used and turbulence structures are more continuous across the scales with details better preserved even at the coarse scale region, see the red box.
When is not restricted to an integer and the boundaries between two scales do not coincide with each other (see the dark blue and red lines in Fig. 3 (b)), we arrive at our continuousscale setting. In such a case, the coarse scale still iterates first before the small scale , but the priormapping at the boundary of , e.g., at point , requires the interpolation from the eight corner points of the 3D cell at scale where locates, and then again apply Eq. 10 to map from to . The similar procedure applies for the postmapping at the boundary of , e.g., at point .
Transition between scales
The descriptions above assume that the mapping between two scales only happen at the scale boundaries, which is not suitable for turbulent flows, as structure discontinuity may occur at the boundary and vortices may sometimes be blocked from going through the scale boundaries. To avoid these artifacts, we can extend the scales with sufficient overlaps as suggested in [64], which results in overlapped inner samples (samples in the overlapped regions except at the boundaries), e.g., point in Fig. 3 (b), where values should be mapped from to after iterations at . In practice, we always overlay smallscales onto the large scales and overlap the entire smallscale regions.
To justify such a treatment, Fig. 5 makes a comparison, where Fig. 5 (a) shows multiblock formulation with a small portion of the overlapped region along the scale boundary for mapping, which is obvious that the scale boundary blocks some flow structures from successfully going through, leading to structure discontinuity artifacts. This can also lead to blurriness inside the smallscale region. With mapping in the entire overlapped region, such artifacts are reduced, but cannot be removed, see Fig. 5 (b). This indicates that scale continuity can be important to preserve consistent turbulence structures across the scale boundaries. With our continuous scale setting, we still map distribution functions in the entire overlapped region, but with more continuous scale transition (four scales rather than two), see Fig. 5 (c). It is clear that our continuousscale setting and the related treatment can result in more consistent flow structures across different scale boundaries, making the entire fluid flow more reasonable. Note that all the simulations in such a comparison are produced with the nonorthogonal CMR model and with our adaptive relaxation.
Temporal alignment and scale mapping
Mapping distribution functions spatially only addresses the spatial consistency of local Reynolds number. However, there are still temporal alignment and consistency problems during iterations. To solve these problems, we should first select a reference scale with the reference timestep , see the yellow timeline in Fig. 6 (a). Denote and to be the ratios between scale and any other scales larger ( with ) and smaller ( with ) than , respectively, see the red and green timelines in Fig. 6 (a). In multiblock formulation, is selected as the largest scale, so we only have scale with restricted to an integer. This makes temporal alignment simple since after iterations, scale naturally aligns with . However, in our continuousscale setting, we can select any scale to be , and have both and which are not restricted to integers. This raises the problem that after integer number of iterations, the temporal evolutions of and may not align with , and thus temporal interpolation is needed. Moreover, after each iteration at scale and , we need to update the boundary values from to maintain local Reynolds number consistency.
To do this, note that and are determined by and , respectively, which can be arbitrary. Before any iteration of scales and , we need to perform priormapping only on the scale boundaries except the overlapped inner samples, see the arrows at time in Fig. 6 (a), which is beneficial for preserving fluid details. To handle the overlapped inner samples for scale at time , we always iterate scale for one time step and linearly interpolate back at time using its values at and , see the dotted line at scale in Fig. 6 (b). At the boundary of scale , no temporal interpolation is needed, and their values are directly mapped from scale at time , see the arrow at time from scale to scale in Fig. 6 (a). For scale at time , we iterate times, which may exceed by a fractional timestep of . To obtain values of the overlapped inner samples at time , we use quadratic interpolation based on the points , and , see the dotted line at scale in Fig. 6 (b). At the boundary of scale , temporal interpolation of values at scale is not needed either, and their values are directly mapped from scale at time . For the first iterations of scale , values at the boundary should be updated before the iteration by first interpolating from scale temporally using two points at and , and then mapping from the interpolated values. Before the final fractional timestep iteration of scale to reach the time , we directly map values from scale at to the boundary of scale , see the arrows from scale to scale in Fig. 6 (a).
Handling multiple scales
In practice, there can be multiple rather than two regions with continuous scales overlapping with each other at the same point, see Fig. 7 (a). To allow flexibility and adaptivity for complex domains, we construct scales such that smallscales always superimpose over largescales with mapping in the entire overlapped regions, and the reference scale occupies the entire domain, which is very efficient to determine the overlapped regions between scales. In such a case, a specific mapping scheme should be developed.
Taking point and in Fig. 7 (a) for an example, where three scales () are overlapped, and is the overlapped inner sample and is the scale boundary point. Now we consider the iterations at scale (). Note that the reference scale should always be iterated first before any other scales . Priormapping at only happens at the scale boundary, so it is performed only at , where values should always be mapped from the nearest coarse scale , and is the set of all overlapped scales at . For postmapping, both and are needed, and all the scales larger than () should be updated by mapping from directly to these scales in all the overlapped samples, but excluding the scale boundary (otherwise the fluid field may be oversmoothed), see Fig. 7 (b) for an example where . For , the process is reversed for prior and postmappings. For priormapping, we map all the overlapped regions from to instead; however, for postmapping, we only update the boundaries of scales smaller than . The above procedure ensures reasonable continuousscale simulations with any number of continuous scales in arbitrary overlaps, without noticeable artifacts and excessive smoothing.
Note that Zhao et al. [65] employed the integralscale multiblock approach to simulating fluid flows in graphics, with spatial interpolation along grid lines only and without any temporal interpolation. They also applied scale mapping but with priormapping only. Although our spatial and temporal interpolations may introduce a certain amount of numerical viscosity, the influence is not obvious. In addition, since we involve postmapping, the fluid flow information can be transferred from small scales back to large ones, and thus the simulated flows can be more turbulent.
5 Scale construction
The continuousscale formulation proposed in Section 4 immediately allows us to have flexible scale construction since it looses the geometrical restriction between scales. In principle, two types of scale construction schemes can be used for simulation. During initialization, static scale construction, where the scales are purely determined by domain geometries (the shapes of objects inside the fluid region and the domain boundary) and inlet positions, is performed to place finer scale samples around object boundaries and the wakeflow regions behind them, as well as coarser scale samples for fields far from the simulation domain. During the simulation, dynamic scale construction, which refines the scales over time, adapts scale samples dynamically to track the turbulence details.
Static scale construction. To construct static scales, we rely on the distance map of the domain geometry, where each point in the domain is given the shortest distance to the object or domain boundaries, which is an indicator of scales, where smaller distance implies smaller scales. Such a distance field is then quantized and mapped to discrete scales, where is manually determined (e.g., ). To decide the exact spacing for each scale, we assign the smallest and largest spacings for the smallest and largest scales respectively, and linearly interpolate the spacings for scales in between.
In case of wake flows where turbulence could be strong, we determine the scales like a soft shadow in rendering, where the inlet is taken as an area light source, and the scale becomes small when a point is in the “umbra” region behind the object and gradually increases for points in the “penumbra” region. In addition, for practical simulations where the open space is usually configured, we divide the whole simulation domain into domain of interest (DOI) and farfield domain (FFD), like in [41]. In DOI, the largest scale is taken as the reference scale, and in FFD, scales are increased from the boundary of DOI and becomes very large at its outer boundary to damp out turbulent variations to approximate a real open space.
Fig. 8 demonstrates an example of static scale construction for flows around a solid ball, where five static scales are created. As mentioned in Section 4, we always overlay the entire regions of smallscales over largescales. With such a setting, boundary induced turbulence can be better resolved. Note that the samples in Fig. 8 are for illustration only; the true number of samples is several times denser.
Dynamic scale construction. The dynamic scales are fine scales that are dynamically created and overlaid onto all the other scales, which may change over time, and are beneficial to have necessary computations only when needed. The dynamic scales are more complicated to construct and sophisticated methods are required to create sufficiently connected and large enough regions. In this paper, we develop a simple dynamic scale construction method particularly for jet flows, which are very efficient to compute.
To create such scales, we first compute the gradient magnitude of the entire velocity field. Then we select a threshold to remove samples with the gradient magnitude below such a threshold. This is to ensure that the dynamic scales are constructed only in sufficiently fluctuating regions to capture the turbulent flow details. For the remaining samples, we threshold again, but instead based on the velocity magnitude, to further remove samples that have velocity magnitude smaller than the threshold. This is to ensure that the dynamic scales are constructed with sufficient spatial continuity. After these two thresholding processes, the remaining samples form one dynamic scale region. In practice, multiple thresholds can be selected to create multiple dynamic scales, and such a process is repeated for every iterations in our practical simulations.
Fig. 9 shows a typical dynamic scale construction process for a jet flow, where two timevarying dynamic scales are constructed, see the green and red samples, where sufficient flow details can be captured inside these regions. As can be seen in the timing statistics later in the next section, the dynamic scale construction effectively saves computations to solve for turbulent flows, which occupy only a portion of the entire flow region.
6 Results and Discussions
With our continuousscale formulation and scale construction methods, we can realize our fluid simulator, and Algorithm 1 gives the pseudocode as a reference for implementation. Note that when implementing our method, we need to first select a domain with a physical size which could be arbitrary, and then determine the physical spacing for each scale, which is finally used to determine the overlap and resolution of each scale. Note that in our kinetic simulator, is normalized and always set to 1; when we specify different scales, we use instead.
We implement our simulator on a computer installed with an Intel Xeon E52630 v3 @2.4GHz CPU and 48 GB system memory. Our method is easily parallelizable and the main iterations are implemented on an NVIDIA GTX 1080 GPU with 8 GB onboard memory, where our simulations take from to GB memory with overall number of samples from around to depending on the scenario we simulate. For each iteration of the entire domain to finish with respect to the reference scale including the FFD scales, our method takes around to seconds, without rendering. To visualize the velocity field, we take a crosssection and use direct colormapping. Note that to produce one animation frame, we usually have around 10 iterations.
6.1 Initialization and boundary treatment
To initialize the fluid flow field, we first give a constant initial density , and a calm velocity field except at the inlet where the velocity is set as , which is also taken as the Dirichlet boundary to keep injecting the flow into the domain. These macroscopic fields are then converted to distribution functions for each sample based on the equilibrium state of the CMR model: . For boundaries around objects including the ground, we apply a secondorder noslip boundary treatment method described in [62], which is more accurate. Note that the traditional noslip bounceback boundary treatment is firstorder accurate only and cannot give stable simulations. For FFD boundary, we use the Neumann condition. It should be noted that although standard boundary conditions (slipping and noslip conditions) can be easily applied in our solver, some particular boundary conditions should be further derived and may not be straightforward to apply, which can be a potential drawback of the kinetic approach for simulating fluid flows in more complex environments.
6.2 Stability and accuracy
With our adaptive relaxation in nonorthogonal CMR model, stability and accuracy can be retained, which allows very small viscosities (e.g., ) or even zero viscosity with small vortices, and with arbitrary boundary geometries. Our method does not rely on any turbulence model for stabilization, which reduces the uncertainty during simulations. As an example, Fig. 12 shows a rotating vortex ring from the boundary layer induced smoke flow with rotating structures around the ring, which can be generated only with accurate advection solvers especially at the boundary, and could be preserved with our method for a long time even in regions with coarse resolutions. However, the transition between scales rely on interpolation, which may break the conservation of the original LBE and introduce a certain amount of numerical diffusion that may smear out smallscale details.
When we say our solver is more accurate than the traditional ones, we mean it can faithfully preserve the necessary smallscale structures for more visual realism without the aid of any other empirical models. From the computational side, the model accuracy is reflected in two aspects: i. the collision model responsible for approximating INSE is more accurate, where nonorthogonal CMR model with our adaptive relaxation greatly reduces the ghost modes and ringing artifacts, and has a higher approximation order to the corresponding INSE; ii. the discretization on both space and time are second order including the boundary treatment, with conservative advection, which are also important to preserve turbulence structures. The main influence on discretization accuracy is the interpolation when mapping among different scales, but not obvious in practice.
6.3 Parallel implementation
Since our method is local in dynamics, it is easy to be parallelized on the GPU for fast computation. Since scales are coupled in overlapped regions in our method, we start from the reference scale and progress gradually to the smaller and larger scales in a serialized order. However, at each scale, we iterate the dynamics in parallel. If one scale has multiple regions, these regions are also iterated in parallel. Note that in our parallel implementation, we do not have any code or hardwarelevel optimizations, and the timings presented later are based on such a straightforward implementation, which could be further improved in the future.
6.4 Smoke simulation
To demonstrate the applicability, we apply our method to smoke simulations, where we inject around to smoke particles per iteration into the domain from a user specified smoke inlet (note that the smoke inlet can be different from the velocity inlet, where we ignore combustion). The particles move by integrating their positions with respect to the velocity field by 3rdorder RungeKutta method [66], and are rendered with particle renderer from [7], where the average rendering time is 6 to 40 seconds on the CPU per animation frame (we iterate around 10 times to generate one such frame) depending on different scenarios. Our scale construction can easily allow dense samples to be placed around the complex object with sufficient flexibility, which improves the accuracy of the important flow field, and thus more plausible visual effects around complex object boundaries can be obtained.
Fig. 10 shows the snapshots of a smoke simulation with an inlet in the left of the domain. In order to capture sufficient details and reduce storage and computation cost, we employ dynamic scales which are overlaid onto the reference scale and are evolving over time. The spacing for the reference scale is and the spacings for the dynamic scales are . No FFD scales are used. There are samples used in the simulation with seconds on average to produce one animation frame, and the maximum memory cost is 3 GB on both CPU and GPU. It is clear that the fine details can be well preserved.
While Fig. 10 shows a smoke simulation without any object inside, Fig. 11 shows the smoke simulation where a ball object is placed inside the domain and the smoke is injected from the surface of the ball. In such a case, we use static scales only and place dense samples around the ball as well as its wakeflow region. The reference scale is ; the spacing for the FFD scale is ; and the rest static scales are: . There are samples used in the simulation with seconds to produce one animation frame, and the memory cost is 5 GB on both CPU and GPU. It is clear that boundary layer turbulence details can be well preserved. It can also be noticed that a clear rotating vortex ring is produced and maintained for a long time, see Fig. 12. Note that only high resolution simulation with sufficient accuracy around the ball can produce such a vortex ring. Due to our solver flexibility to place higher resolution samples near the ball and thus higher accuracy, the vortex ring can be well generated, and due to conservative advection with more accurate collision model in each scale in our method, the vortex ring can be preserved without over smoothing even in the region with coarse grid resolution (the right region of the domain) which is four times sparser than the resolution around the ball.
The more powerful capability of our solver is to tackle arbitrary geometrical boundaries in an efficient manner. Figs. 13 & 14 give two examples of air flows passing through objects with complex shapes, where smokes are injected near the boundary. For clarity, we only inject smokes over the small area near the object in Fig. 14 . In Fig. 13, the spacing for the reference scale is and we use two FFD scales which are ; the spacings for the rest static scales are: , with CPU and GPU cost of 6.2 GB for samples, and it takes seconds to produce one animation frame. In Fig. 14, the spacing for the reference scale is and we use two FFD scales which are ; the spacings for the rest static scales are: , with CPU and GPU cost of 5.3 GB for samples, and it takes seconds to produce one animation frame. It is clear that visually appealing smoke patterns can be produced. Readers are suggested to refer to the supplementary video for these animations, and for smoke motion in Fig. 14 and in the related animation, it can be observed that the swirling feature of the smoke due to concave geometry can be faithfully resolved.
6.5 Comparisons
To verify our method, we conduct comprehensive comparisons with the wellknown unconditionally stable MacCormack method [22] as well as the more recent work from Zhang et al. [7, 23] for smoke simulations, where similar initial and boundary conditions as well as averaged resolutions are used between their simulations and ours, see Fig. 15. Note that the combustion force in the original simulations of these existing methods is ignored in all following comparative simulations in order to demonstrate and compare the capability of capturing selfinitiated turbulence without external activation. We also compare boundaryinduced turbulence under different resolutions to highlight the advantage of flexibility for our method, see Fig. 17. Readers are suggested to see the supplementary video for animations of these comparisons.
Comparisons with MacCormack advection scheme. In Fig. 15 (a) & (d), we simulate jet flows and boundary layer induced smoke motion respectively both by solving the incompressible Euler equation using the wellknown secondorder unconditionally stable MacCormack advection scheme, which is the standard approach for smoke simulations in graphics. In Fig. 15 (a), samples () are used to obtain the result. Since the advection is not accurate enough, turbulence is activated very late, and less smallscale vortices are created, which makes the simulation not quite realistic. With the same setting, we obtain our simulation result in Fig. 15 (c), where we use only samples in the maximum case with three scales (the reference scale is and the other scales are , without FFD scales) to simulate the smoke motion since we apply dynamic scales in this simulation. Since our method solves INSE, we use a very small viscosity () to approximate the result, which leads to a Reynolds number of .
Both the simulations are implemented on the same GPU, where the unconditionally stable MacCormack scheme with preconditioned conjugate gradient (PCG) pressure solver takes around 3 seconds with 2 iterations to produce one animation frame while our method takes around 4 seconds with iterations. Note that for unconditionally stable MacCormack scheme, we can use larger CFL number and we choose CFL=3 to balance between efficiency and accuracy. Although our method is slower than this traditional scheme, more reasonable turbulence patterns can be generated for our method in both horizontal and vertical directions, making our simulated smoke motion more plausible. The memory usage for Fig. 15 (a) & (c) are 0.4 GB on average and 3 GB in the maximum, respectively.
We also simulate the boundary layer induced smoke motion by solving the incompressible Euler equation with with around samples () using the unconditionally stable MacCormack scheme, with boundary treatment method by [67], which is shown in Fig. 15 (d). In Fig. 15 (f), we obtain our simulation result with the same setting using samples and number of static scales (the reference scale is and FFD scales are ; the other scales are ). Both simulations are also implemented on the same GPU, where it takes around 9.8 seconds with iterations for Fig. 15 (d) to produce one animation frame with CFL=5, while our method takes seconds with iterations to produce result in Fig. 15 (f). The memory usages for Fig. 15 (d) & (f) are 1 GB and 4 GB, respectively. Unlike the previous comparison, the unconditionally stable MacCormack scheme in this boundary layer simulation case has almost similar computational time on the GPU as our method, but unable to capture sufficient turbulence details.
To further capture fine turbulence details for the unconditionally stable MacCormack scheme, we can either reduce CFL number or increase the sample resolution. Both methods will significantly increase the required computing time. For example, Fig. 16 makes a comparison of simulation results between our method (Fig. 16 (b)) in Fig. 15 (f) ( samples) and higher resolution unconditionally stable MacCormack scheme (Fig. 16 (a), four times higher than the simulation in Fig. 15 (d) with samples). It can be seen that Fig. 16 (a) produces more turbulence details than Fig. 15 (d) and is closer to our simulation result, but it also takes much more computing time on GPU (35 seconds per animation frame) with GB memory usage. Thus, the advantage of our method is obvious.
Comparisons with methods of Zhang et al. To preserve vortices and retain more turbulence details with the same CFL number and sample resolution, the unconditionally stable MacCormack scheme can be enhanced by the IVOCK scheme proposed Zhang et al. [7], which is demonstrated in Fig. 15 (b), where samples () are used to obtain the simulation result, which is equal to the number of samples in Fig. 15 (a). It is clear that the turbulence structures are more reasonable compared to the result from the unconditionally stable MacCormack scheme in Fig. 15 (a), but some smallscale details are still missing. In addition, due to the involvement of vorticity correction for IVOCK scheme, much more computation is added to the solver. Since the IVOCK scheme is more difficult to have a GPU implementation, we run it on a CPU with GHz frequency, but with parallel execution on 20 cores to maximize its performance. It costs around seconds with 2 iterations and fixed CFL number (CFL=3) for the method of Zhang et al. [7] to produce one animation frame, which costs 2 GB memory, while our solver costs around seconds, which is much faster, but with more memory (4 GB). If we offload the IVOCK scheme to the GPU and considering that a GPU implementation of an incompressible Euler equation solver is generally only several times faster than its CPU equivalent, our solver is still very promising in performance while capturing even more turbulence details than the IVOCK scheme.
To enhance the simulation for boundary layer induced smoke motion, we employ the method of Zhang et al. [23] for comparison, and Fig. 15 (e) shows such a simulation with samples (two scales), which is almost equivalent to the number of samples in our simulation in Fig. 15 (f). To increase turbulence details, we set zero viscosity for Fig. 15 (e) and a viscosity of for Fig. 15 (f), which results in a Reynolds number of . From the comparison, it is clear that our solver can capture much more boundary layer turbulence details, especially in the wake flow region. Since the solver by Zhang et al. [23] is also more difficult to be implemented on the GPU, we still run it on a CPU with GHz frequency, but with 20 cores for parallel execution to maximize its performance, and it costs around seconds with iteration for the method of Zhang et al. [23] to produce one animation frame with 3 GB memory, while our solver costs around 10 seconds and 4 GB memory, with iterations per animation frame. Such a solver can be implemented on the GPU, and as argued similarly before, the performance can only be several times faster than its CPU equivalent. In this case, our solver is also very promising in performance for boundary layer flow simulations.
Comparison under different resolutions. In addition to the comparisons above, it is also interesting to compare our solver under different resolutions, where Fig. 17 (a) & (c) show the smoke simulations passing though a ball, which are solved on uniform grids only without any adaptive refinement, but with higher (around samples in DOI region in Fig. 17 (a)) and lower (around samples in DOI region in Fig. 17 (c)) resolutions respectively, while Fig. 17 (b) shows the simulation result with our continuousscale setting, but with higher resolution near the ball and the total number of samples ( samples in DOI region) is almost in the middle between those in Fig. 17 (a) & (c).
It is clear that with our continuousscale setting and flexible sample placement around object boundary (the resolution around the ball is even higher than the one in Fig. 17 (a)), more plausible turbulence structures can be captured around the ball, which is closer to the high resolution result in Fig. 17 (a), with even finer details. Such a turbulence structure can also be transmitted to the wake flow region far behind the ball, where we use a very coarse grid (four times sparser than the region around the ball), but the fluctuation can also be well preserved without significant diffusion due to conservative advection and collision in each scale. Considering that almost half number of samples are used in our simulation to produce a result even better than the one with uniform high resolution grid, our method obviously has performance gains.
6.6 Advection efficiency
In traditional kinetic approach with lattice Boltzmann method, the advection is under the restriction of CFL=1, which does not allow flexible tuning of time steps, and may first seem to be slow. However, considering the balance between efficiency and accuracy, we usually do not set a very large CFL number for traditional macroscopic solvers, and the maximum CFL number is usually set around 3 or even smaller. In this case, kinetic solver is not really slow, especially considering the adaptive continuous scale setting where the reference scale is usually much coarser than the smallest scale, resulting in relatively large time steps for one iteration of the entire domain. The parallel implementation is also straightforward, and the whole solution is almost conservative, which is beneficial for turbulent flows. The comparisons in Section 6.5 verified above arguments.
6.7 Limitations
Our method also suffers from several limitations. First, due to interpolations among scales, some smallscale features may be smoothed out. Currently, this can only be improved by increasing the sample resolution of the corresponding scale. Second, since we use D3Q27 lattice structure, and we always overlay small scales onto large ones, more memory will be used than the corresponding INSE approaches. Third, although we parallelize our method on the GPU, the computation is still serialized among different scales, which could be further accelerated in the future.
7 Conclusion
In this paper, we propose a novel continuousscale kinetic approach to simulating fluid flows with flexible sample placement. To significantly increase the stability and accuracy for turbulent flow simulations with kinetic approaches, we propose to employ a nonorthogonal centralmoment relaxation model where we develop a novel adaptive relaxation method to retain stability while preserving sufficient turbulence details. To respect scale continuity, we propose a new continuousscale formulation that can easily combine different scales together with loose geometrical constraint, which directly leads to a flexible scale construction and refinement scheme to adapt scales according to the domain geometries and the flows in simulation in a more continuous manner. The application to smoke simulations demonstrates the effectiveness of our method, with comprehensive comparisons to the existing methods to verify our advantages.
Acknowledgment
The authors would like to thank all reviewers for their constructive comments, as well as Dr. Xinxin Zhang from University of British Columbia for sharing his fluid simulation codes for comparisons. This work is supported by the National Natural Science Foundation of China (NSFC)  Outstanding Youth Foundation (Grant No. 61502305), as well as the startup funding of ShanghaiTech University.
In our continuousscale kinetic fluid simulation, we use the nonorthogonal central moment relaxation (CMR) model for the collision [20]. Here we give a more detailed description about the construction and application of the model. For the derivation, please refer to the paper directly. Like in MRT model, the CMR model first define the central moment space with the following lattice velocity definition based on the D3Q27 lattice structure:
(21)  
where , and are the x, y, zcomponent of the 27 lattice velocities . These velocities are then translated with the flow velocity to define a set of translated lattice velocities:
(22) 
The central moments are then defined by constructing a transformation matrix which transforms the velocity distribution functions to central moment space as:
(23) 
where and are vectors collecting all the components of the distribution functions and their corresponding central moments, and each component of is defined as:
(24) 
where indexes the corresponding velocity component; ; and . Note that indexes the 27 velocities and indexes different moments. By expanding different orders of moments, the specific forms for is defined as follows:
(25)  
where the vector . After we convert the distribution functions to the central moment space by , we can model the collision operator which gathers all the collision operators for each as:
(26) 
where is a diagonal matrix defining the relaxation parameters for different orders of moments and is a vector defining the equilibrium state in central moment space by:
(27)  