A Lagrangian Inertial Centroidal Voronoi Particle method for dynamic load balancing in particle-based simulations

A Lagrangian Inertial Centroidal Voronoi Particle method for dynamic load balancing in particle-based simulations

Zhe Ji zhe.ji@tum.de Lin Fu lin.fu@tum.de Xiangyu Y. Hu xiangyu.hu@tum.de Nikolaus A. Adams nikolaus.adams@tum.de Technical University of Munich, Department of Mechanical Engineering, Chair of Aerodynamics and Fluid Mechanics, 85748 Garching, Germany

In this paper we develop a Lagrangian Inertial Centroidal Voronoi Particle (LICVP) method to extend the original CVP method fu2017physics () to dynamic load balancing in particle-based simulations. Two new concepts are proposed to address the additional problems encountered in repartitioning the system. First, a background velocity is introduced to transport Voronoi particle according to the local fluid field, which facilitates data reuse and lower data redistribution cost during rebalancing. Second, in order to handle problems with skew-aligned computational load and large void space, we develop an inertial-based partitioning strategy, where the inertial matrix is utilized to characterize the load distribution, and to confine the motion of Voronoi particles dynamically adapting to the physical simulation. Intensive numerical tests in fluid dynamics simulations reveal that the underlying LICVP method improves the incremental property remarkably without sacrifices on other objectives, i.e. the inter-processor communication is optimized simultaneously, and the repartitioning procedure is highly efficient.

Inertial Centroidal Voronoi Particle, Centroidal Voronoi Particle, Dynamic Load Balance, SPH, Particle Simulation
journal: XXX


1 Introduction

Large scale parallel computing is essential for a wide range of scientific applications. The objective of the domain decomposition method is to facilitate the algorithms to harness computational resources more efficiently devine2005new (). In Computational Fluid Dynamics (CFD), the configuration of discretization-elements (mesh or particle) may evolve in time schloegel2000graph (), which requires partitioning algorithms to reassign computational load to processors periodically, i.e., achieve dynamic load balance. Comparing to static partitioning, rebalancing needs to meet the same targets, e.g. load balance, locality, optimization for inter-processor communication, but also is subject to further constrains. The key aspect for rebalancing schemes is to minimumize partitioning modifications subject to small topology changes, i.e. incremental property, and to optimize the inter-processor communication at smallest cost hendrickson2000dynamic () fu2016novel (). The reader may refer to bulucc2016recent () and schloegel2000graph () for a comprehensive review regarding recent development of repartitioning approaches.

For particle simulations, the discretization-elements, e.g. SPH particles, move according to the corresponding dynamic system, which may cause deformation and relocation of computational sub-domains. To achieve the incremental property it is crucial to maximize the data reuse during rebalancing process, namely, the rebalancer should be aware of the details of the underlying system, and repartition the system incorporating the existing partition to achieve lower data redistribution cost walshaw1997parallel (). Another problem encountered in particle simulation is that in some cases, e.g. the dambreak problem adami2012generalized (), high velocity impact of a projectile into a plate zhang2017generalized (), the rotating disk problem in astrophysics hopkins2015new (), the computational load distributes anisotropically in the computational domain, and the configuration of discretization-elements varies rapidly. The skew alignment and rapid change of computational load may result in void space which requires zero computing resources. Such a scenario is more problematic for partitioning algorithms with respect to satisfying all constrains simultaneously hendrickson2000dynamic ()meyerhenke2009graph ().

In our previous paper fu2017physics (), a CVP domain decomposition method based on physical analogy has been developed. The load balance target is achieved by solving a Voronoi Particle (VP) evolution model equation. Centroidal Voronoi Tessellation (CVT) is utilized for communication reduction by optimizing the compactness of partitioning sub-domains meyerhenke2009graph (). The CVP method is verified by various static partitioning tests, independently of the mesh-element type. Later, we have integrated the CVP method as dynamic partitioner and develop a new multi-resolution parallel framework for the SPH method zhe_ji_new (). An adaptive rebalancing criterion and monitoring system has been proposed to assess the imbalance during simulation and reassign equivalent load among all the processors.

Although the CVP method inherently features load balance and communication reduction, there is no explicit treatment in our previous work regarding to the handling of the aforementioned additional difficulties that will encountered in dynamic partitioning. The objective of the current paper is to extend the CVP method and improve the performance of particle-based methods in the dynamic partitioning problems. Two concepts are proposed in this paper: a Lagrangian background velocity and an inertial-based partitioning strategy. The newly developed method is called Lagrangian Inertial CVP (LICVP) method. In the following two paragraphs the basic idea of LICVP method is introduced.

With the CVP method each Voronoi particle possesses physical properties either integrated from meshes enclosed within its Voronoi cell region or calculated from neighboring cells. Generally speaking, the CVP method can be viewed as a coarse-grained modeling of underlining mesh elements. Based on this observation we can introduce a background velocity to adevect partitioning generators, i.e. Voronoi paticles, between two consecutive rebalancing steps. If the background velocity is given properly, the positions of the Voronoi particles will be updated describing the evolution of the dynamic system and geometry variation, e.g. mass center, of local sub-domains. When the rebalancing subroutine is triggered, the updated positions of Voronoi particle are utilized as input and initial condition for the new partitioning result. Since the equilibrium is calculated globally, the target of load balance and communication reduction is guaranteed by the CVP method. Moreover, since the new partitioning diagram is calculated aware of the original partitioning, data reuse is improved, i.e. incremental property can be achieved.

To handle problems with skew-aligned computational load and large void space, we propose another extension of the CVP method. The idea is to constrain the motion of Voronoi particles according to the load distribution in space during the rebalancing procedure. Similarly with the Recursive Inertial Bisection (RIB) method simon1991partitioning (), we choose the inertia matrix to characterize the load distribution, such that the skewness of the computational load can be obtained accordingly. A splitting operator is proposed to take the trajectory of Voronoi particles as the input and outputs the resultant vector subject to a certain type of constraint. Moreover, an adaptive filter is proposed, which selects a proper constraint dynamically adapting to the development of the underlying particle system. The filter ensures restoration of the original CVP method when the computational load is distributed homogeneously. With this method Voronoi particles are insensitive of the load variation along the confined direction, thus the convergence and incremental property are improved. We refer the new method as Inertial CVP.

In this paper, we focus on a specific version of particle-based method, the Smoothed Particle Hydrodynamics method. The validation of our new algorithm is performed with the code we have developed previously zhe_ji_new (). The remaining of this paper is arranged as follows. In section 2 the CVP method and imbalance monitoring strategy for dynamic load balancing is briefly reviewed. The concept and model equations of the Lagrangian Inertial CVP method are introduced in section 3. Numerical algorithms and boundary conditions are elaborated in section 4. Section 5 gives five numerical tests to verify our method.

2 Brief review of the CVP method

We briefly review the model equations of the CVP method and the imbalance monitoring strategy from previous work (fu2017physics, )zhe_ji_new ().

2.1 Model equations

The key concept of the CVP method is to combine CVT okabe2009spatial () and Voronoi Particle dynamics (VP) to achieve high-level compactness of partitioning sub-domains and error-controlled load balance simultaneously. The equilibrium is calculated iteratively utilizing a two-step time integration scheme. The CVT diagram is constructed employing the Lloyd method lloyd1982least ()du2006convergence (). The model equation for VP is


where denotes the acceleration, the velocity vector, the pressure, the density, the region corresponding to Voronoi particle and the Voronoi cell surface. The pressure of the Voronoi particle is defined as


where is the target mass. The pressure at the surface between two neighboring cells is computed by second-order approximation . The scale for Voronoi particle is defined as the average distances from all neighboring particles.

In each interation substep the Voronoi particles first are moved according to the VP method by


and then updated following CVT construction as


where and are pseudo timestep sizes. The relaxation parameter is set as 0.8.

2.2 Imbalance monitoring strategy

To facilitate dynamic load balancing in the current framework, we develop an imbalance monitoring system. Two criteria are constructed to indicate the imbalance of a computation: (1) imbalance caused by the load change of SPH particles in , defined as ; (2) imbalance due to the change of communication, i.e. the number of ghost buffer particles constructed in , defined as .

The computational load for each SPH particle is estimated by considering two key operations, the neighbor search (NS) and the calculation of inter-particle forces (CF),


where denotes the normalized computational load. The adaptive weight is obtained by


where denotes the net runtime elapsed for different subroutines since the last load-balance estimate. The total computational-load for is calculated by


where is the number of SPH particles included in the Voronoi cell .

To combine the two criteria, an imbalance monitoring tag is defined in Eq. 8. The CVP method is triggered when .


where and , with are local errors in each sub-domain. and are initial values set after each partitioning. and are user defined error tolerance respectively. In current framework, we set .

3 Lagrangian Inertial CVP (LICVP) method

In this section, the detailed formulation of the LICVP method is developed. First, a three-step time integration scheme is proposed by introducing a background velocity for advecting Voronoi particles. The Inertial CVP is described in subsection 3.2. A splitting operator and an adaptive filter is defined to optimize and select partitioning strategy dynamically.

Figure 1: (a) Demonstration of three-step time integration scheme. (b) Demonstration of the operator if partitioning on the plane .

3.1 Background velocity for Voronoi particles

We first define two phases in dynamic load balancing using the CVP method (illustrated in Fig. 1 (a)). The first is the partitioning phase and the second is the load imbalance monitoring phase. In the first phase, the partitioning subroutine is triggered. The initial position of Voronoi particle at time , denoted as , is evolved using a two-step integration scheme given by Eq. 3 and 4. The partition operation is terminated when equilibrium is achieved after pseudo time-steps. The final position of Voronoi particle is , and the computational domain is repartitioned accordingly. In the second phase, the imbalance monitoring system is launched. The imbalance errors and are examined during the computation. The error accumulates until the threshold is reached, i.e. , then phase 1 is activated again. The final position of Voronoi particle in phase 2 is marked as .

For phase I, we develop a two-step time integration scheme to achieve both load balance and communication reduction targets. To increase the data reuse in rebalancing, we propose a third step to advect Voronoi particles in phase II (see Fig. 1 (a)). The third time integration step can be defined as


where is the timestep size identical to the underlined dynamic system, e.g. the SPH flow model. is a background velocity for Voronoi particle , which can be given arbitrarily. However, to preserve the incremental property, we suggest that is correlated to the motion of .

Several options for calculating can be considered. The first is to simply set the background velocity of Voronoi particle identical to the mean fluid velocity within the current subdomain.


where is the velocity of SPH particle in subdomain .

Another option is to consider the influence of computational load distribution. The mass center of is given as


where denotes the volume differential, and the coordinates of specific SPH particle in . Then can be defined by the time differential of mass center,


In practice, instead of updating the Voronoi particles according to the velocity of the mass center of every timestep, we simply set the mass center as the position of Voronoi particles before repartitioning.

We consider the aforementioned two choices in this paper. The performance and detailed comparison will be given in section 5.

3.2 Inertial CVP

In order to constrain the motion of Voronoi particle, we define a splitting operator, which operates on the time-integration scheme in phase I. We can rewrite Eq. 3 and 4 as




where is the resulting displacement. The splitting operator functions via manipulating , where .

For a general input vector A and a given direction of constraint, returns a resultant vector defined as


where and denote the vector parallel and perpendicular to N, respectively. N is the input that defines the direction of constraint. As illustrated in Fig. 1 (b), the particle motion is constrained on a plane where N denotes its normal direction. In this case, returns that is the projection of A into the plane.

To calculate N, we need to characterize the computational load distribution. In the RIB method simon1991partitioning (), the inertia matrix is utilized to calculate the principle inertia axis hendrickson2000dynamic (). Inspired by the RIB method, we can find the direction of constraint by the same means.

The global mass center of simulation is calculated by


Then the inertial matrix is obtained as


where denotes the total number of SPH particles simulated. The eigenvalue (Eq. 20) and eigenvector (Eq. 21) of J indicate the value and direction of principle inertia, respectively. The eigenvalues are sorted in increasing order.


The direction of the constraint can be calculated accordingly. We propose an inertial-based adaptive filter defined as


where . and are user defined thresholds. The selection procedure is to compare the normalized eigenvalues with predefined thresholds, and determine an appropriate strategy. The filter is adaptive as well, which allows for dynamic optimization of the partitioning strategy via choosing different constraints. For instance, if , the load along the first principle inertial axis, i.e. , is considerably larger than the combination of the other two axis. Thus, the degrees of freedom in and are confined, and only motion along is allowed. During the computation, if the other condition is satisfied, the strategy will be altered accordingly. If both conditions are not satisfied, the original CVP method is restored.

3.3 Intermediate conclusions

In section 3.1 and 3.2, we propose two extensions of CVP method. Both algorithms are mutually independent and compatible since they function at different phases of the computation, thus they can be integrated into a single framework. The objective of the LICVP method is to optimize the incremental property in rebalancing without violating the other constrains in static partitioning. In general, the Lagrangian property improves the data reuse when SPH particles move following the fluid field, while the Inertial CVP guarantees that partitioning is aware of the global load distribution and insensitive of the load variation along directions of minimum interest. LICVP methods can restore to original CVP method under certain conditions, and the implementation requires trivial modification. Moreover, The additional cost due to the extension is minimum since updating Voronoi particles is localized within sub-domains and the global inertial matrix is only updated once the repartitioning is triggered.

4 Numerical algorithms

4.1 Partitioning boundary conditions

Figure 2: Sketch of (a) symmetric and (b) periodic boundary conditions. The orange particles indicate the boundary Voronoi particles; the grey particle is the mirrored particles constructed to enforce symmetric boundary condition; the solid green particles are neighbors of the current (orange) Voronoi particle in a periodic box, and the translucent particles are mapped neighbor particles.

Symmetric and periodic boundary conditions can be applied on the partitioning domain (see Fig. 2). The ghost particle method is employed to enforce both boundary conditions.

Ghost particles for symmetric boundary conditions (SBC) are constructed following fu2017physics (). SBC is employed when the computational domain is symmetric itself or is defined as open boundary.

Periodic boundary conditions (PBC) are enforced in a similar way. In the first step, Voronoi particles that have neighbors on the other side of the periodic boundary are identified, and marked as boundary particles. In the second step, all boundary particles are mapped by adding/subtracting the domain length in the periodic direction, and the Voronoi diagram is constructed again including ghost particles. All physical variables of the ghost particles are identical to the mapped boundary particles. When the background velocity is introduced, PBC is enforced for phase II as well, which facilitates Voronoi particles for better tracing the local sub-domains. The benefit of PBC is twofold: 1) SPH particles that transported across the boundary during simulation cost no extra inter-processor data relocation; 2) the rebalancing is calculated allowing Voronoi particles to move across the boundary, thus data reuse is better preserved.

4.2 Flowchart

1:if current timestep count is divisible by 20 then the imbalance monitoring system is activated every 20 timesteps
2:      calculate imbalance error (Eq. 9) and (Eq. 10);
3:      Check whether repartitioning is required, i.e. (Eq. 8);
4:      if  then the system will be rebalanced
5:            calculate the inertial matrix (Eq. 19), and find its eigenvalue and eigenvector (Eq. 20 and 21);
6:            calculate direction of constraint, i.e. N (Eq. 22):
7:            while the partitioning error is not converged do
8:                 construct Voronoi diagram and solve the model equation Eq. 2;
9:                 calculate the resulting displacement according to Eq. 16 and Eq. 17;
10:                 update Voronoi particles;
11:                 check partitioning error;
12:            end while
13:            migrate SPH particles to target processor according to the new partitioning result;
14:      end if
15:      update data structure and construct ghost buffer particles;
16:      solve SPH governing equations;
17:      calculate the background velocity for transporting Voronoi particles using Eq. 12 or 14;
18:      update the position of Voronoi particles according to Eq. 11;
19:end if
Algorithm 1 Flowchart of LICVP method for one timestep

The detailed algorithm for LICVP method is summerized in Alg. 1. For simplicity, we only list the flowchart for one simulation timestep. Other correlative algorithms, e.g. CVP method, data structure, communication strategy, etc., are not elaborated here. One can refer to our previous work fu2017physics ()linfu_fast_neighbor ()zhe_ji_new () for a comprehensive overview.

5 Numerical validation

In this section, 5 test cases are considered to validate the proposed LICVP method. The underlining Lagrangian property of LICVP method is demonstrated via case 1, 2 and 3, where the Inertial CVP method degenerates to the original CVP method. The feasibility and performance of Inertial CVP method is then discussed in case 4 and 5. Case 3 is tested in the mpp2 cluster provided by Leibniz-Rechenzentrum (LRZ), which is constructed by 28-way Haswell-EP nodes with Infiniband FDR14 interconnect. The rest cases are carried out on the same workstation with 12 Intel(R) Xeon(R) CPU E5-2630 v2 cores (64G memory and 2.6GHz) and Scientific Linux system (Release 6.8).

Before moving to the results, we first define two measurements to assess the incremental property and optimization for communication reduction, i.e. averaged communication load () and averaged particle migration (). and are defined as


where and denotes the number of ghost buffer particles constructed and the number of migrated particles respectively.

5.1 Case 1

Figure 3: Steady advection test: the partitioning result of fluid with horizontal (upper row) and diagonal (bottom row) velocity at four instants. SPH particles are rendered with sub-domain index, and octahedrons are positions of Voronoi particles.
Figure 4: Steady advection test: time history of averaged particle migration. (a) fluid with horizontal velocity, (b) fluid with diagonal velocity.

In the first case, we consider a steady advection test. We initialize a 2D periodic box of unit length. The fluid is assigned with constant density (), and advected with uniform bulk velocity. A weakly compressible SPH solver is employed following adami2012generalized (). The number of particles simulated is 10000, and 12 MPI tasks are launched. The partitioning domain is set with equal size of the fluid field and PBC is enforced in both direction.

Since the fluid is advected with constant velocity, the relative position of SPH particles remains identical in entire simulation. With the background velocity, the topology of Voronoi particles should be invariant as well, consequently, after rebalancing, we should obtain the same partitioning diagram every time, and inter-processor particle migration is zero, i.e. data reuse is maximized.

Two bulk velocities, i.e. horizontal () and diagonal () velocity, are studied. It is noticed that all the computational load for SPH particle is the same, the mean velocity of the fluid coincides with the velocity of mass center. We rebalance the simulation every 100 timesteps. The partitioning results at four instants are illustrated in Fig. 3. As anticipated, the topology of the partitioning diagram remains unaltered, and repeats every one second. The averaged particle migration (see Fig. 4) is zero during entire simulation, except for a small oscillation at the beginning of the second case.

5.2 Case 2

Figure 5: 2D Keplerian disk problem: comparison between non-moving Voronoi particles (upper row), moving Voronoi particles with velocity of mass center (middle row) and mean fluid velocity (bottom row). Four snapshots are illustrated, i.e. 5s, 6s, 7s and 8s. SPH particles corresponding to different sub-domains are assigned with distinct colors, and the octahedrons denote the Voronoi particles.
Figure 6: 2D Keplerian disk problem: The time history of averaged communication load percentage using non-moving Voronoi particles (a), moving Voronoi particles with velocity of mass center (b) and mean fluid velocity (c). The time history of averaged migration percentage using non-moving Voronoi particles (d), moving Voronoi particles with velocity of mass center (e) and mean fluid velocity (f).

We consider a 2D cold Keplerian disc problem. It is a typical case in astrophysics, where gas orbits a central point mass subjecting to the equilibrium of gravity, centrifugal force and pressure force cullen2010inviscid (). We initialize a uniform density disk following raskin2016examining (). A compressible SPH solver proposed by price2012smoothed () is employed, where artificial viscosity and conductivity are switched off. To stabilize the flow, a damping term is added to the radial component of particle acceleration raskin2016examining (). A total number of 47500 particles are simulated with 12 MPI tasks.

Since the flow is shearing, different orbiting velocity will cause the deformation of sub-domains and an increasing of communication load. The proposed two background velocities are chosen to compare with simulation without setting background velocity, i.e. . The snapshot with regarding to three situations at four instants are illustrated in Fig 5. The top row gives the result of . It is observed that the positions of Voronoi particles remain approximately constant after rebalancing and the topology of partitioning result is exactly the same. Conversely, if the Voronoi particles are advected with the background velocity, both results exhibit shifted partitioning sub-domains according to the local flow, and slight topology alteration in a long run. Statistics comparison is manifested in Fig. 6. All three cases demonstrate that after rebalancing, the communication load decreases instantly, and restores to approximately the same value. Moreover, exhibits a slightly larger value when the Voronoi particles are not updated comparing to the other two situations. The for the first case shows a portion of 39% to 48% particles are being migrated after each rebalancing. This number drops to approximately 15% for the other two cases. It can be concluded that with the background velocity, the incremental property of CVP method is improved. The results from two underlying background velocities exhibit slight differences with respect to performance.

5.3 Case 3

Figure 7: 2D KH instability: the density field (upper row) and partitioning diagram (bottom row) with respect to four instants.
Figure 8: 2D KH instability: (a) The time history of averaged communication load percentage. (b) the time history of averaged particle migration percentage. (c) runtime of CVP method and particle migration during simulation. (d) time history of graph parameters.

We consider a 2D Kelvin-Helmholtz instability problem following price2012smoothed (). A Godunov SPH with second order reconstruction solver is employed inutsuka2002reformulation ()zhe_ji_new (). PBC is enforced at the boarder of partitioning domain. Eq. 12 is utilized to calculate the background velocity. 629,0560 SPH particles are simulated on 112 processors. Each processor has 1 TBB thread.

Fig. 7 presents the simulation result as well as the partitioning diagram at four instants. The topological layout of sub-domains in smooth region manifests superior similarity, and sub-domains are drifted with the local flow. In regions with discontinuity, due to the development of instability, the partitioning result exhibits topological alteration in a long run. In the simulation duration, i.e. 1.4s, the system is repartitioned 11 times, and drops immediately after each rebalancing (see Fig. 8 (a)). The averaged data migration drops from 40% at the beginning to approximate 25% after 0.5s (see Fig. 8 (b)). The runtime for CVP method as well as data migration (illustrated in Fig. 8 (c)) is negligible comparing to the total simulation time (60559s). The graph parameter, which is introduced in Ref. zhe_ji_new () and characterizes the sparse communication relationship, demonstrates that the total number of communication relation is bounded and is about 3 times larger than the MPI task number (see Fig. 8 (d)). The communication finishes within 10 sub-steps during the entire simulation.

5.4 Case 4

Figure 9: 3D Keplerian disk problem: the disk is initialized on x-o-y plane. Partitioning result (a)(b). Time history of partitioning error (c) and energy (d).
Figure 10: 3D Keplerian disk problem: the disk is initialized on plane with angle to x-o-y plane. Initial condition and Voronoi particle distribution (a). Partitioning result (b). Time history of partitioning error (c) and energy (d).

The 3D Keplerian disk problem is employed to demonstrate the feasibility of proposed Inertial CVP method. We use the same setup in case 2, and extend the height of the disk to 0.1 in z direction with identical particle pitch. The dynamic evolution of this case gives the same conclusions as in case 2, thus we only focus on the initial partitioning. Two situations are calculated, where the disk is initialized on x-o-y plane and plane with angle to x-o-y plane respectively. The calculation is carried out with 12 MPI tasks. The Voronoi particles are initialized on disk plane with constant angular separation and a random radius ranging between the outer circle and the inner circle. We set and in this case.

The result for both cases are plotted in Fig 9 and Fig. 10 respectively. It can be observed that Inertial CVP method captures the load distribution successfully in both cases, where the motion of Voronoi particles is constrained on the disk plane. The partitioning results feature convex, well-shaped sub-domains identical to the original CVP method. The partitioning error converges rapidly. The energy descends monotonically as well, which essentially optimizes the communication volume.

5.5 Case 5

Figure 11: 2D dam-break: snapshots of simulation result at four instants using (a) original CVP method with background velocity, (b) Inertial CVP with adaptive filter switched off and (c) Inertial CVP with adaptive filter.
Figure 12: 2D dam-break: statistic comparison of original CVP method with background velocity, Inertial CVP with adaptive filter switched off and on. (a) time history of averaged communication load, i.e. proportion of ghost buffer particles versus local registered particles. (b) time history of averaged migrated particles, i.e. proportion of migrated particles versus local registered particles. (c) time history of edge color number. (d) runtime regarding to domain decomposition subroutine, i.e. CVP method and data migration.

Finally, we consider the 2D dambreak case following the setup from (adami2012generalized, ). Initially, a liquid column is placed at the corner of the sink, and collapses due to the existence of gravity. The evolution of flow consists of violent wave-breaking and splashing event, witch is crucial in free-surface flow modeling. Meanwhile, the dynamic characteristic of this case deteriorates the performance of repartitioner as well, as the computational load varies dramatically in space and rebalancing becomes expansive. Three means are employed here for a comprehensive assessment of the performance: (1) original CVP method with background velocity; (2) proposed LICVP method with adaptive filter switched off, i.e. only one constraint is applied throughout the simulation, and here only partitioning along the largest principle inertial axis is allowed; (3) proposed LICVP method with adaptive filter, and we set and . The background velocity is calculated by Eq. 14. The number of particles simulated is 8208, and 12 MPI tasks are launched.

Fig 11 (a),(b) and (c) manifests the simulation result at 4 instants regarding to test 1, 2 and 3 respectively. Test 1 and 3 exhibits similar partitioning result at , when the water column is not sufficiently influenced by gravity. Whereas test 2 features sub-domains of slim-shaped rectangular, since only one dof (degree of freedom) is unconstrained. Following the propagation of wave front, the load distribution of the system varies accordingly and the load along horizontal axis becomes dominant. At , the partitioning strategy in test 3 is altered, and constraint is applied along N, i.e. identical to test 2. The partitioning strategy remains the same afterwards. It is observed that, the incremental property is best preserved in test 2, where the topology of sub-domains is identical during entire simulation. The incremental property for test 1 is the worst, as the topological layout of sub-domains varies constantly. Test 3 generally achieves an intermediate performance, since the partitioning strategy is dynamically shifted at .

The communication load is compared in Fig. 12 (a). for test 2 is the largest before , while test 1 and 3 has close communication volume during this period. The slim-shaped sub-domains cannot guarantee the optimization of communication reduction comparing to the compact sub-domains presented in test 1 and 3. After , test 2 and 3 exhibit generally the same level of communication load, whereas test 1 achieves lower during to and surpasses test 2 and 3 afterwards. Although test 1 achieves roughly equivalent performance in optimizing communication load after , the number of communication sub-steps, i.e. edge color, is higher and the system is rebalanced more frequently (see Fig. 12 (c)). Regarding to incremental property, the same conclusion can be drawn as mentioned in last paragraph by comparing the averaged data migration (see Fig. 12 (b)). The benefit from switching partitioning strategy in test 3 is remarkable, which avoids the violent stage encountered with original CVP method. The runtime comparison is illustrated in Fig. 12 (d)). Test 1 is the most time consuming, and with the proposed Inertial CVP method, a speedup of about 6x is achieved.

6 Conclusions

In this paper, a Lagrangian Inertial CVP method is developed by compounding the concepts of a background velocity as well as an Inertial CVP method. The proposed LICVP method is employed as the rebalancer of a multi-resolution parallel framework for SPH method. The main accomplishment can be summarized as follows:


By defining a background velocity, Voronoi particles are able to track the motion of local sub-domains and characterize the topological variation of the system more precisely. Rebalancing upon the updated Voronoi-particle positions improves the incremental property remarkably. Moreover, since the equilibrium is calculated globally, the inter-processor communication is reduced implicitly.


The performance of simulations with extremely anisotropic computation-load distribution is improved utilizing the proposed Inertial CVP method. Due to the splitting operator, the Voronoi-particle motion is insensitive of the load variation along directions of minimum interest, which enhances the incremental property, and improves the convergence as well. Additionally, the adaptive filter allows a dynamic selection of partitioning strategy according to the evolution of load distribution. The selection procedure guarantees a relative balance between data-redistribution and inter-processor communication cost in extreme situations.


The first author is partially supported by China Scholarship Council (NO. 201506290038). The second author is partially supported by China Scholarship Council (NO. 201206290022).The computational resources are provided by Leibniz- Rechenzentrum der Bayerischen Akademie der Wissenschaften, München (LRZ).


  • (1) L. Fu, X. Y. Hu, N. A. Adams, A physics-motivated centroidal voronoi particle domain decomposition method, Journal of Computational Physics 335 (2017) 718–735.
  • (2) K. D. Devine, E. G. Boman, R. T. Heaphy, B. A. Hendrickson, J. D. Teresco, J. Faik, J. E. Flaherty, L. G. Gervasio, New challenges in dynamic load balancing, Applied Numerical Mathematics 52 (2-3) (2005) 133–152.
  • (3) K. Schloegel, G. Karypis, V. Kumar, Graph partitioning for high performance scientific simulations, Army High Performance Computing Research Center, 2000.
  • (4) B. Hendrickson, K. Devine, Dynamic load balancing in computational mechanics, Computer methods in applied mechanics and engineering 184 (2) (2000) 485–500.
  • (5) L. Fu, S. Litvinov, X. Y. Hu, N. A. Adams, A novel partitioning method for block-structured adaptive meshes, Journal of Computational Physics 341 (2017) 447–473.
  • (6) A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, C. Schulz, Recent advances in graph partitioning, in: Algorithm Engineering, Springer, 2016, pp. 117–158.
  • (7) C. Walshaw, M. Cross, M. G. Everett, Parallel dynamic graph partitioning for adaptive unstructured meshes, Journal of Parallel and Distributed Computing 47 (2) (1997) 102–108.
  • (8) S. Adami, X. Hu, N. Adams, A generalized wall boundary condition for smoothed particle hydrodynamics, Journal of Computational Physics 231 (21) (2012) 7057–7075.
  • (9) C. Zhang, X. Y. Hu, N. A. Adams, A generalized transport-velocity formulation for smoothed particle hydrodynamics, Journal of Computational Physics.
  • (10) P. F. Hopkins, A new class of accurate, mesh-free hydrodynamic simulation methods, Monthly Notices of the Royal Astronomical Society 450 (1) (2015) 53–110.
  • (11) H. Meyerhenke, B. Monien, S. Schamberger, Graph partitioning and disturbed diffusion, Parallel Computing 35 (10) (2009) 544–569.
  • (12) Z. Ji, L. Fu, X. Y. Hu, N. A. Adams, A new parallel framework for SPH method with adaptive smoothing-length, in: Proceedings of the 12th International SPHERIC Workshop, 2016.
  • (13) H. D. Simon, Partitioning of unstructured problems for parallel processing, Computing Systems in Engineering 2 (2-3) (1991) 135–148.
  • (14) A. Okabe, B. Boots, K. Sugihara, S. N. Chiu, Spatial tessellations: concepts and applications of Voronoi diagrams, Vol. 501, John Wiley & Sons, 2009.
  • (15) S. Lloyd, Least squares quantization in PCM, IEEE transactions on information theory 28 (2) (1982) 129–137.
  • (16) Q. Du, M. Emelianenko, L. Ju, Convergence of the Lloyd algorithm for computing centroidal Voronoi tessellations, SIAM journal on numerical analysis 44 (1) (2006) 102–119.
  • (17) L. Fu, Z. Ji, X. Y. Hu, N. A. Adams, A parallel fast neighbor search and communication strategy for particle-based methods with adaptive smoothing-length, Computer Physics Communications (2017) submitted.
  • (18) L. Cullen, W. Dehnen, Inviscid smoothed particle hydrodynamics, Monthly Notices of the Royal Astronomical Society 408 (2) (2010) 669–683.
  • (19) C. Raskin, J. M. Owen, Examining the accuracy of astrophysical disk simulations with a generalized hydrodynamical test problem, The Astrophysical Journal 831 (1) (2016) 26.
  • (20) D. J. Price, Smoothed particle hydrodynamics and magnetohydrodynamics, Journal of Computational Physics 231 (3) (2012) 759–794.
  • (21) S.-i. Inutsuka, Reformulation of smoothed particle hydrodynamics with Riemann solver, Journal of Computational Physics 179 (1) (2002) 238–267.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description