Computing dynamics of thin films via large scale GPUbased simulations
Abstract
We present the results of large scale simulations of th order nonlinear partial differential equations of diffusion type that are typically encountered when modeling dynamics of thin fluid films on substrates. The simulations are based on the alternate direction implicit (ADI) method, with the main part of the computational work carried out in the GPU computing environment. Efficient and accurate computations allow for simulations on large computational domains in three spatial dimensions (3D) and for long computational times. We apply the methods developed to the particular problem of instabilities of thin fluid films of nanoscale thickness. The large scale of the simulations minimizes the effects of boundaries, and also allows for simulating domains of the size encountered in published experiments. As an outcome, we can analyze the development of instabilities with an unprecedented level of detail. A particular focus is on analyzing the manner in which instability develops, in particular regarding differences between spinodal and nucleation types of dewetting for linearly unstable films, as well as instabilities of metastable films. Simulations in 3D allow for consideration of some recent results that were previously obtained in the 2D geometry (J. Fluid Mech. 841, 925 (2018)). Some of the new results include using Fourier transforms as well as topological invariants (Betti numbers) to distinguish the outcomes of spinodal and nucleation types of instabilities, describing in precise terms the complex processes that lead to the formation of satellite drops, as well as distinguishing the shape of the evolving film front in linearly unstable and metastable regimes. We also discuss direct comparison between simulations and available experimental results for nematic liquid crystal and polymer films.
sort&compress
Simulating the dynamics of thin fluid films on substrates presents significant computational challenges, requiring consideration of complex setups that include evolving free surfaces and computational domains. Solving such problems using full numerical simulations of NavierStokes equations coupled with complex boundary conditions is computationally very expensive. While there has been significant progress based on Volume of Fluid, phase field and related methods Jacqmin1999 (); Jacqmin2000 (); JCP2015 (); Renardy2001 (); seric2017 (); Yue2010 (); ZHOU2010498 (), it is desirable to be able to consider evolving thin films in a simpler and cheaper manner.
Motivated by the desire to reduce computational cost, and even to gain some insight based on analytical methods, thin fluid films are often considered using asymptotic approaches, in particular based on the long wave approximation. Such an approach, in its simplest form, leads to a highly nonlinear 4th order partial differential equation (PDE) of diffusion type, which must then be solved numerically. While this is a much simpler task than that presented by the original problem, it still leads to significant computational challenges, particularly in configurations involving contact lines. Contact lines introduce short length scales in the problem, that result from the need to include additional physics resulting from liquidsolid interaction forces. These short length scales need to be resolved for the purpose of producing accurate results, leading to (in a finite difference setting) a large number of grid points. For evolving problems, the resulting outcome is still computationally expensive, typically limiting the simulations to relatively small computational domains and short times. Particularly when film instabilities are considered, simulating small computational domains is a significant restriction, since it is not obvious to which degree computational domain boundaries influence the results.
Numerical approaches that have been applied to the 4th order nonlinear partial (PDE) of diffusion type are numerous and we will not attempt to provide a comprehensive review. Only some examples are mentioned, with focus on the three dimensional setting that is of interest here. There is a number of finite difference based methods becker_nat03 (); dk_jcp02 (); fetzer_05 (); lin_pof12 (); sharma_epje03 (), pseudospectral approaches TBBB2003epje (), and finite element type of methods EWGT2016prf (). Within finite difference based computations, significant progress has been achieved using the ADI (Alternate Direction Implicit) approach, which allows for efficient and accurate simulations, with good stability properties. Such an approach was discussed within the context of thin films Witelski2003 () and later applied to a number of different problems, e.g. kd_pre09 (); lin_pof12 (); mk_jfm09 (). The main computational expense in this approach consists of solving a large number of linear algebra problems, involving in particular large sparse (pentadiagonal) matrices.
In this paper, we present a fast numerical method using a Graphics Processing Unit (GPU)and the Compute Unified Device Architecture (CUDA)Application Programming Interface (API). Using this approach, the most time consuming parts of the simulations are carried out on a GPU, producing as an outcome simulations that are more than an order of magnitude faster than a similarly coded serial Central Processing Unit (CPU)method.
The newly developed computational methods allow for consideration of problems that until now have been out of reach using reasonable computational resources. In this work we will focus on problems involving instabilities of thin films on the nanoscale, but it should be noted that the presented computational approach is quite general, and could be applied to a number of problems within the context of thin films, and also other physical problems that reduce to similar mathematical formulations that lead to the CahnHilliard and KuramotoSivashinsky type of equations.
Returning to the discussion of thin films, we note that such films are often unstable due to destabilizing liquidsolid interaction forces. The nature of these instabilities has been considered extensively in a variety of settings including polymer films becker_nat03 (); Jacobs2008 (); seeman_prl01 (), liquid crystal films Herminghaus1998 (); poulard2005 (); Schlagowski2002 (), as well as liquid metals fowlkes_nano11 (); gonzalez2013 () (see also cm_rmp09 (); oron_rmp97 () for excellent reviews of this topic). The instability development leads to many questions even in relatively simple settings. One question involves differences between the instabilities due to the presence of a free surface, and those due to localized perturbations; the latter could be present due to some imperfection of either the fluid, or the substrate, or both. Free surface perturbations are typically global in character, random, and small in size (compared to the film thickness); the instabilities due to their presence are typically classified as spinodal dewetting, with the name evolving from mathematically similar spinodal decomposition. Localized perturbations, on the other hand, are typically large and lead to socalled nucleation type instability. One question is how to compare the instabilities resulting from these two mechanisms. While elaborate approaches based on Minkowski functionals have proved useful becker_nat03 (), one wonders whether, given a sufficiently large computational domain, one could analyze the results using some more straightforward approach. We will discuss one possible approach in this work.
There are additional aspects of thin film instabilities that require large scale simulations to resolve. One could ask, for example, whether the drop size resulting from film breakup due to spinodal and nucleation type instabilities are comparable. This question is of relevance to applications where instability is harnessed to produce drops of a specific size for use in some application, such as (for example) plasmonic resonance in the context of metal films atwater_natmat10 (). Turning this around, if one is interested in the source of instability, one could analyze the resulting drop sizes, and use this information to infer the source. Furthermore, one could ask whether instability and resulting film breakup lead to the formation of small drops, often called satellite drops Lam2018 (); Seric2014 (). We will see that some of our findings could serve as a basis for answering these questions.
Clearly, a number of questions could be asked. We will focus on particular examples of nematic liquid crystal films and, to a smaller extent, on polymer films; however, the main features of our results are general, and will be of relevance to a number of problems involving thin films. What distinguishes different films, substrates and film thickness regimes, at least for slow evolution where inertial effects are not significant, is essentially the form of (effective) disjoining pressure that, in addition to liquid/solid interaction, may include the effects of anchoring in the context of liquid crystals; interactions of electric or magnetic type in the context of ferrofluids Seric2014 (); or composite substrates in the case of polymer films Jacobs2008 (). The influence of the functional form of such effective disjoining pressure on film stability was discussed in two spatial dimensions (2D) recently Lam2018 (). In that work, we presented numerical evidence for the formation of secondary (satellite) drops for positive values of the effective disjoining pressure, and discussed various regimes of instability development within linearly unstable as well as metastable regimes. The present paper will discuss some of these results in the 3D geometry.
The rest of this paper is organized as follows. Section 1 provides the basic mathematical formulation of the problems to be considered. The numerical methods are discussed in § 2, and the performance of the presented method in § 3. The aspects related to implementation on a GPUare relegated to A. An application specific to nematic liquid crystal (NLC) films is discussed in § 4, and comparison to experimental results for both NLC and polymer films in § 5. We present our conclusions in § 6.
1 Governing Equation
We consider equations describing nonlinear diffusion of the following general form
(1.1) 
where is the evolving quantity of interest; and , are some smooth nonlinear functions of . The square bracketed term may be interpreted as the flux that governs the diffusion process. Governing equations of the form stated in (1.1) appear in a variety of models, such as those leading to CahnHilliard, KuramotoSivashinsky and related equations.
1.1 Description of model problems
Equations of the form (1.1) commonly appear in the context of thin fluid films. In this context, the variable describes the film thickness, . For future reference, we specify here the governing equation for thin films in terms of (hatted) dimensional variables
(1.2) 
where is the fluid film thickness and is the viscosity. To nondimensonalize (1.2), four scaling factors are defined: , a representative film thickness scale, , the typical lengthscale of variations in the plane of the film, ; , the small aspect ratio; and , the timescale of fluid flow. Scaling , and in the obvious way, (1.2) becomes
(1.3) 
where and are some positive dimensional prefactors associated with and , respectively, such that and are nondimensional functions of the dimensionless film height , for example, . The prefactors of and in (1.3) are nondimensional; therefore, for simplicity, we absorb these prefactors into the definitions of the relevant functions.
1.2 Linear Stability Analysis (LSA)
We now present a brief overview of the Linear Stability Analysis (LSA)of a flat film of thickness in 2D (two dimensions), which will be used to validate our numerical code. To derive the dispersion relation, the solution is assumed to be of the form ), where . Substituting this ansatz into (1.1) yields
(1.4) 
A film of thickness is linearly unstable if , and in the unstable flat film thickness regime, the critical wavenumber (below which films are unstable), the most unstable mode, and the maximum growth rate are given by
(1.5) 
respectively.
1.3 Thin Film Models
We consider in this paper three different models i.e., three different sets of and in (1.3). The first model considered is a test case leading to a simple linear partial differential equation, i.e,
(1.6) 
where the sign of is chosen so that a flat film is linearly unstable. For the remaining two models, we assume and are of the form
(1.7) 
where is the inverse Capillary number (ratio of surface tension forces to viscous force) and is the disjoining pressure, typically describing the strength of fluid/solid interaction. In our second model, the disjoining pressure is specified as an ‘effective’ disjoining pressure derived in the context of nematic liquid crystal films Lin2013 (); Lam2018 (). For this model, , with
(1.8) 
where
(1.9) 
The scales are chosen based on the experiments of Herminghaus et al. and Cazabat et al. for thin films of NLC Cazabat2011 (); Herminghaus1998 ()
(1.10) 
The values of the nondimensional parameters are also based on these experiments, as discussed in some detail in our earlier work Lam2018 (), and are set to
(1.11) 
These parameters are used throughout the paper except if specified differently. We leave the discussion of the details of this model to 4, where we focus on extending previous results for 2D Lam2018 () to 3D films; however, for now, we note that the term with prefactor in the disjoining pressure defined in (1.8) is the powerlaw form of disjoining pressure consisting of Born repulsion and the van der Waals force. The powerlaw form of the disjoining pressure is commonly used in the literature; see e.g., the review cm_rmp09 () for detailed discussion. The term with a prefactor in (1.8) is due to the elastic response of Nematic Liquid Crystal (NLC), further discussed in 4.
The third model describes polymeric films Jacobs2008 (), where , with
(1.12) 
and the coefficients in (1.7) and (1.12) are given by
(1.13) 
with the scalings
(1.14) 
(here the physical parameters are taken from Seemann et al. Seemann2001b ()). The first term in (1.12) is due to steric effects (nonbonding intermolecular interactions), and the last two terms are van der Waals forces in a thin film of a polymer deposited on a silicon substrate (Si), coated with a silicon oxide (SiOx) layer of thickness .
2 Numerical Method
The numerical approach that we employ is based on the Alternate Direction Implicit method, discussed in the context of thin film flows Witelski2003 () and implemented in recent works, such as lin_pof12 (); Lin2013 (). In the present paper we provide a review of the method, both for completeness, and for the purpose of discussing particular issues involving implementation in a GPU computing environment. More precisely, in 2.1 we discuss the spatial dicretization scheme in terms of fluxes; temporal discretization is described in 2.2; discretization of fluxes in 2.3; and boundary conditions are discussed in 2.4.
2.1 Conservation Law
In terms of a conservation law, the governing equation may be expressed as
(2.1) 
is the flux vector, with two components, i.e., . To simplify the results, we generalize the flux to linear combinations of terms of the form
(2.2) 
where is some smooth function of and is some linear differential operator with two components, i.e., . The results to be shown can be easily extended to the flux in our original governing equation (2.1).
The governing equation for , the average value of on some subdomain , is given by
(2.3) 
where is the area of the subdomain , denotes its boundary, and is the outwardpointing normal to . To subdivide the entire domain, the solution is discretized on an equipartitioned grid; specifically, the grid points in the and directions are defined as
(2.4) 
where is the grid spacing; and are the numbers of points in the and directions, respectively; and and are the initial points in and , respectively. Furthermore, the cell average points are given by
(2.5) 
for and ; and their respective governing equations are
(2.6) 
where
(2.7) 
and similar notation is used for and . To solve (2.6), two secondorder accurate approximations are implemented: 1) the integrals on the righthand side of (2.6) are evaluated using the midpoint rule, e.g.,
(2.8) 
where
(2.9) 
are the center points of a cell; and 2) the cell averaged value is approximated by the cellcentered value, i.e.,
(2.10) 
Substituting (2.8) and (2.10) into (2.6) yields
(2.11) 
2.2 Temporal Discretization
Let us write (2.11) in the following general form
(2.12) 
where is a nonlinear matrix differential operator, representing the fluxes on the righthand side of (2.11); and is the collection of grid point values . The grid points in are ordered lexicographically and may be in rowmajor form, ; or columnmajor form, . Note that vector notation will be used to denote vectors associated with cell centered quantities.
To begin, a central difference discretization in time is applied to (2.12),
(2.13) 
where is the time step, and superscripts denote the value at the current time. Using the trapezoidal rule to evaluate the nonlinear term at the half step, , (2.13) may be expressed as
(2.14) 
where is the identity matrix. Equation (2.14) contains a nonlinear implicit term, i.e., nonlinear dependence on the unknown ; therefore the above equation is solved numerically using a Newton iterative scheme.
2.2.1 Newton Iterative Scheme
To rewrite (2.14) as a rootsolving problem, we define
(2.15) 
and find satisfying this equation using Newton’s iterative scheme. First, we compute the Jacobian of with respect to , yielding
(2.16) 
where is the Jacobian of . The Newton iterative scheme may be expressed as
(2.17) 
where corresponds to the values at the current iterative step, is an intermediate variable, and the subscript in indicates that the nonlinear terms in are evaluated using . The iterative scheme is initialized by setting . To reduce computational complexity, we apply an Alternating Direction Implicit (ADI)type scheme, discussed next.
2.2.2 Alternating Direction Implicit (ADI) Method
The ADImethod splits the implicit differential operator (2.16) in two parts (implicit derivatives for a fixed and vice versa). Specifically, the righthand side of (2.16) may be expressed as
(2.18) 
where and are the pure and derivative terms in , respectively; and represents the remaining mixed derivative terms, i.e. .
Neglecting the implicit mixed derivatives, , we may substitute (2.18) into (2.17) reducing the method to firstorder accuracy; however, as noted and confirmed numerically Witelski2003 (), implementing an appropriate iterative method, the ADIscheme will converge to second order accuracy in time. We will also show second order convergence for our implementation. The Newton iterative scheme in (2.17) becomes
(2.19)  
(2.20)  
(2.21) 
where and are intermediate variables. This pseudoNewton iterative method reduces (by an approximation) the left hand side of (2.18), from a large sparse matrix of size , to two block diagonal matrices (right hand side of (2.18)). In the implicit step, using columnmajor ordering, we form a block diagonal matrix, , with blocks of size . Each block can be inverted independently of others, thus reducing the computational complexity. Furthermore, each block is diagonal, where the width of the diagonal band, , depends on the stencil used to evaluate spatial derivatives. For our implementation, (pentadiagonal system). The diagonal matrices can be inverted with linear complexity, further reducing computational cost, for example by using an extension of the Thomas algorithm for an diagonal matrix. Similarly, in the implicit step, using rowmajor ordering, we form a block diagonal matrix, , with blocks of size .
2.2.3 Adaptive Time Stepping
To control the time step, we first specify the convergence conditions. Specifically, the solution at the next time step is accepted (i.e. ) if
(2.22) 
where is the error tolerance. In addition, if the error tolerance is not satisfied in a specified maximum number of iterative steps, the scheme is assumed to have failed, triggering time step decrease.
Note 1: Our simulations show that setting the maximum number of iterative steps, , to 10 gives the largest effective time steps, i.e., is maximized.
Note 2: Other restrictions besides the convergence of the iterative scheme may be placed on accepting the solution at the next iteration or next time step, e.g. sufficiently small truncation error, thus for generality, we will refer to the collection of such conditions as the solution criteria.
If the solution criteria are not met, then the timestep is reduced and the pseudoNewton iterative scheme is performed with the new reduced time step. Furthermore, if the new reduced time step is below some minimum, the numerical scheme halts and ends in a failed state. To improve the efficiency of the numerical scheme, the time step is increased if a minimum number of successful time steps have been consecutively computed. To remove possible numerical errors, the time step is bounded from above. Figure 1 shows a flow chart of the adaptive time stepping procedure, coupled with the pseudoNewton iterative method.
2.3 Flux Discretization
To summarize the results so far, a secondorder accurate scheme has been developed in terms of a nonlinear spatial differential operator , representing the fluxes across cell boundaries. In this section, to complete the scheme, we specify the form of . Recalling (2.11), we may split into two parts: the fluxes across the cell boundaries , and the fluxes across the cell boundaries . The values of the components of are specified in terms of fluxes, i.e.,
(2.23) 
where is either or , and and are nonlinear matrix differential operators for the fluxes at the two boundaries. Similarly to 2.1, the fluxes are generalized to a linear combination of functions of the form
(2.24) 
where is a function of , and is a linear differential operator. We may express as
(2.25) 
To derive an expression for and in (2.19) and (2.20), we recall that mixed derivative terms have been ignored, and therefore,
(2.26) 
where denotes the Jacobian, and the nonlinear matrix differential operator, , only contains derivatives with respect to its subscript.
To complete the derivation, expressions for , , , and are required. For brevity, we specify these expressions by the following descriptions:

at the cellcentered point ,

is evaluated at ,

is evaluated at ,

is evaluated at , and

is evaluated at ;


the linear matrix differential operator is computed using cellcentered values;

at a cell boundary is evaluated using interpolation, e.g.,
(2.27) 
the total combined stencil of all terms is shown in Figure 2.
2.4 Boundary Conditions
To complete the description of the numerical methods, we now discuss the boundary conditions. Typically, boundary conditions of the form
(2.28) 
are used, and may be interpreted as symmetry conditions. Furthermore, these boundary conditions correspond to zero flux at the boundary (recall (2.1)).
To implement the boundary conditions, the stencil shown in Figure 2 includes ghost points at the boundaries. While ghost points may be used for explicit terms ( and ), for the implicit term, , the finite difference stencil must be modified. However, since implicit terms are derivatives of a single variable, the stencil modification is relatively simple.
Using the boundary conditions to define the ghost points, the lefthand sides of the numerical scheme (2.19) and (2.20) are computable everywhere except at the corners of the domain, e.g., at the cellcenter point . There are two choices to compute the ghost points in the corner ( for ): First, the boundary conditions are applied, computing the solution at the ghost points for and , then the boundary conditions are applied at the new ghost points, computing the solution at the corner ghost points. Alternatively, the order of operations may be reversed. It is relatively easy to show that both procedures lead to identical results.
3 Numerical Performance
We now switch focus to the numerical performance and accuracy of our implementation. The details related to implementation in the GPU computing environment are discussed in A. Here, we discuss first convergence, followed by confirmation of conservative properties, and the comparison with LSA. Then, we discuss the performance by comparing the GPUand serial CPUcodes.
For the purpose of the numerical tests discussed in this section, we choose an initial condition of the form
(3.1) 
where and is the wave number. The initial film thickness, , is fixed at for the linear model (for simplicity), for the NLC model, and for the polymer model. For the last two models, the values are motivated by the experiments Jacobs2008 (); Vandenbrouck1999 ().
3.1 Validation
To test the convergence properties, in 3.1.1 and 3.1.2, the adaptive time stepping is removed and we set the maximum number of Newton iterative steps to 5000 (allowing for a more robust selection of spatial step size). In addition, we choose and in (3.1) where is given by (1.5). The spatial step size, , where is the most unstable wavelength, is the number of grid points, and simulations are carried out on a square computational domain, ; therefore, choosing (discussed in the next sections) determines . Note that the choice is made so as to maximize the range of grid sizes that fit into the GPU’s memory and maintain a sufficiently large spatial step size to maximize the (common) stable time step. In 3.1.3, simulation results are compared with the predictions of LSA, where adaptive time stepping is utilized, the maximum number of Newton iterative steps is reverted back to 10, we set , and we vary according to the dispersion curve from LSA.
3.1.1 Convergence
To study the convergence properties, we first define , the initial spatial grid size, i.e., , and , the initial time step size. In addition, since obtaining an analytical solution for the nonlinear models is not possible, we compute the error relative to the numerical solution obtained for the most refined partition.
For the purpose of checking temporal convergence, we fix for all simulations in this section, and set , i.e., we scale the time step with the growth rate of the most unstable mode. The initial simulation is carried out for one time step, , and for each subsequent simulation, the time step is halved, i.e., , where is the refinement level. Note that at each refinement level the total number of time steps doubles. Figure 3 shows the norm of the error for all three considered problems for two implementations: i) the Newton iterative convergence error tolerance, in (2.22), is set close to machine precision (solid curves), ; and ii) the error tolerance is set to scale with the temporal error (dashed curves), . Note that we limit since if the relative error in the Newton iterative scheme is smaller than machine error, the iterative method will always fail. We observe that in the first case, secondorder temporal accuracy is achieved (compare to the dotted black line). Note that for the linear model, the two curves lie on top of each other, as expected. Furthermore, the results show that it is sufficient to set the Newton error tolerance to be proportional to the temporal accuracy to maintain secondorder accuracy, and even higher order convergence is achieved for the nonlinear models, although the error is larger.
To confirm the spatial convergence, we fix for all simulations and set the coarsest grid size . Recalling 2.1, the numerical solution is evaluated at the cellcenters of the spatial grid; therefore, if the spatial step size is halved, the cellcentered points on the refined grid are not contained in the coarse grid. To avoid errors associated with interpolating the numerical solutions on different grid sizes to a common grid size, we instead triple the grid size (e.g., , where is the level of refinement) so that cellcentered points on the coarser grid are contained in the refined grid. Figure 3 confirms secondorder spatial accuracy. Furthermore, we observe that there is little difference between setting the Newton error tolerance to or close to machine precision, .
To summarize, the results show that the implemented method is secondorder accurate in time and space, and furthermore, that it is sufficient to set the error tolerance to scale with the order of the spatial and temporal accuracy. Such implementation significantly reduces computational time.
3.1.2 Conservation of Mass
To verify conservation of mass, we carry out the simulations until the final time, , and collect data in intervals of . Furthermore, motivated by the results of 3.1.1, we fix for spatial convergence simulations, and for temporal convergence simulations. To compute the average mass of the solution, recall that the cell average is given by the cellcentered value (with second order accuracy), therefore, the average mass on the entire domain is given by
(3.2) 
Figure 4 shows the change (relative to the initial condition) in total mass for the linear model (left column), NLCmodel (central column) and polymer model (right column), for fixed time step and varied spatial step (top row), and fixed spatial step and varied time step (bottom row). The results obtained when the spatial step size is varied (top row) show that there is no discernible trend in the mass error (numerical noise), with the exception of the smallest step size for the NLCmodel and the polymer model, indicated by the arrows in panels \subreffig:com_nlc_space and \subreffig:com_polymer_space, respectively. It is interesting to note that the conservation law formulation of the governing equation in terms of fluxes (recall equation (2.11) in 2.1) indicates that conservation of mass should be second order accurate in space. However, while the approximations of the fluxes are secondorder accurate, cancellation of these errors across a cell boundary leads to higher order accuracy.
When time step is varied, with the spatial step fixed, the bottom row of Figure 4 shows that conservation of mass is at least second order accurate. For the linear model, panel \subreffig:com_linear_time, the error is close to machine precision, so no trend is observed.
3.1.3 Comparison to LSA
Here, we validate our model by showing agreement with LSA. This is done by comparing the growth rates extracted from simulations to the dispersion relation (1.4). For each model, simulations are carried out for several values for perturbations in either the or directions. In addition, the linear domain is fixed to , i.e. in (3.1) with 32 points (). The final solution time is fixed at , where is defined in (1.4). Figure 5 confirms that the numerical results agree with the LSA predictions, validating the code.
3.2 Performance Comparison
Having validated the accuracy, convergence, and mass conservation of our numerical method, here we discuss the computational performance of our GPU implementation (details are given in A) in comparison to our serial CPU code. We note that CPU computations were performed on an Intel^{©} Core™ i76700K and GPU computations were performed on a Nvidia Geforce^{©} GTX Titan X Maxwell. For brevity, the performance results are presented only for the NLCmodel. For simplicity we compare two major components: inverting the pentadiagonal matrices in (A.1) and (A.2); and computing the terms in (A.4). These particular components were chosen since they are responsible for the dominant part of the computational cost.
We begin by comparing the GPU pentadiagonal solver to the inhouse serial CPU pentadiagonal solver. The serial CPU implementation is based on a method found in Numerical Recipes in FORTRAN William1992 (). Figure 6 shows that for the smallest domain size, the GPU and CPU codes show comparable performance; however, for larger domains, the speedup with the GPU reaches factor of 40. The saturation of the speed up occurs when the total number of independent linear systems is greater than the total core count (3072 for the GPU used in the reported computations).
Next, we compare the second major component of the solvers, computing the entries of the matrices in (2.19) and (2.20). Since no equivalent inhouse serial CPU implementation exists for computing these terms, for simplicity we only implement an equivalent version of the kernel used to compute the matrices in (2.19). We assume the time required for computing the matrix in (2.20) scales (with respect to the CPU) similarly to the computation time required to evaluate the matrices in (2.19). Figure 7 shows that on a large enough domain, the GPU implementation is at least 25 times faster than the serial CPU code. We note that in the GPU implementation, for the smallest two domain sizes in Figure 7, the domain is sufficiently large and all cores in the GPU should be utilized, therefore, the reduced performance (compared to the 4 largest domains) may be due to bottlenecks in the memory.
We have therefore showed that on large enough domain sizes our GPU implementation can perform up to times faster than the CPU code. While we have not compared the GPU implementation to an equivalent serial CPU implementation in full, we have shown significant performance increase in major components of the numerical scheme, thus it is reasonable to assume similar performance gains would be observed in a full comparison.
4 Nematic Liquid Crystals
In this section, we focus on the thin film model derived for nematic liquid crystals, i.e., the governing equation given by (1.1) and (1.8). The details of the model itself are discussed in detail in Lam2018 (), where a weak anchoring model was presented. In Lam2018 () it was shown that the nematic contribution to the thin film model results in an effective disjoining pressure, given by (1.8), and plotted in Figure 8. Similar functional forms of the disjoining pressure, characterized by a local maximum for nonzero film thicknesses, are found in other contexts, such as polymer films Jacobs2008 (), or even simpler setups, such as water on quartz homsy06 ().
In the present paper, we use largescale simulations to discuss some of the main features of the results from Lam2018 () in the three dimensional context. These include formation of satellite (secondary) drops, nucleation versus spinodal type instability, and metastability of films. The fact that we are able to simulate accurately large domains of linear dimensions measured in tens of wavelengths of maximum growth, allows us to obtain results that are only very weakly influenced by the presence of the domain boundaries. This section is organized as follows: in 4.1 we present the model in terms of the gradient dynamics formulation, 4.2 gives a model outline, and the main part, 4.3 presents the computational results.
4.1 Gradient Dynamics Formulation
It is convenient to express the governing equation (1.1) in terms of the gradient dynamics formulation; in particular, it is advantageous to express analytical results in terms of a disjoining pressure, . The benefit of this formulation is that previous analytical results for 2D films Lam2018 () show that the form of the disjoining pressure primarily determines the transition between different stability types of a thin film as a function of its thickness.
To motivate this assertion, we perform LSAon the gradient dynamics formulation of the governing equation, which is given by
(4.1) 
is the mobility function, is total interfacial energy (Gibbs energy), is the surface tension, and is disjoining pressure. Relating (4.1) to the nondimensional governing equation (1.3),
(4.2) 
where, similarly to before, and are dimensional constant prefactors derived from expressing and as dimensionless functions of the dimensionless variable , e.g., . The nondimensional linear stability analysis results (1.5) simplify to
(4.3) 
is a dimensionless constant and may be interpreted as the ratio of disjoining pressure to surface tension forces. We note that the disjoining pressure determines the transition between linear stability () and instability (). The instability wavelength is independent of the mobility function, which only affects the growth rate of instabilities.
4.2 Model Description
To motivate the form of the disjoining pressure shown in Figure 8 and defined by (1.8), here we provide a brief outline of the main features of NLC, and we refer the reader to our previous work Lam2018 () for a more detailed derivation. Typically, NLCmolecules are rodlike structures with an (electrical) dipole moment, which gives rise to a state of matter intermediate between a crystal and a liquid. Specifically, similarly to a Newtonian fluid, there is no positional ordering to the NLCmolecules; however, similarly to a crystal, molecules have shortrange orientational order due to interactions between dipoles (elastic response). To model NLC, in addition to modeling the velocity field of molecules (as with most fluids), shortrange ordering is often modeled with a director field. The director field is a unit vector, aligned with the long axis of the liquid crystal, see Figure 9, and it is often convenient to consider the polar angle, , and azimuthal angle, , of the director field orientation as functions of Cartesian space variables .
The evolution of NLCmay be modeled using the LeslieEricksen equations, an extension of the NavierStokes equations, with an additional equation modeling conservation of energy. Under the long wave approximation, and assuming that the time scale of fluid flow is much faster than that of the elastic reorientation, the conservation of energy equation decouples from the remaining equations and only depends on the director field. Using energy minimization Cummings2004 (); Lam2018 (); Lin2013 (), the polar and azimuthal angles of the director field are determined to be of the form
(4.4) 
where , , and are constant with respect to and are chosen to satisfy appropriate socalled anchoring boundary conditions at both the substrate and the free surface, thus the director field is a function of the instantaneous fluid height.
At the interface between the NLCfilm and another material, liquid crystal molecules molecules satisfy certain anchoring conditions. Typically, at the free surface (air/NLCinterface), molecules are normal to the free surface (a condition known as homeotropic anchoring; within the longwave approximation, ), while at the NLC/substrate interface, planar anchoring is appropriate, so , see Figure 9. However, for very thin films, or close to a contact line, this configuration induces a large energy penalty in the bulk due to rapid spatial variations in the director field; therefore, we implement a novel weak free surface anchoring model. In practice, the substrate anchoring is stronger than the free surface anchoring, thus we relax the free surface anchoring to that of the substrate for very thin films (weak anchoring model) to alleviate the large energy penalty in the bulk (compare Figures 9 and 9). Specifically,
(4.5) 
where and is defined in (1.9). The azimuthal anchoring in (4.4) is independent of ; therefore, assuming the substrate anchoring dominates, the azimuthal anchoring is determined by the substrate. The governing equation (1.1) is a simplification of the long wave approximation of the LeslieEricksen equations that ignores substrate anchoring. In the full (long wave) model, the mobility function may be expressed as
(4.6) 
where is the identity matrix, and and are anisotropic viscosities. To simplify the model, we fix and , which removes the dependence on . Note that by definition, , therefore the mobility function is positive definite and does not change stability properties. We leave the investigation of the effects of substrate anchoring for future work.
4.3 Simulations
For the remainder of this section, we focus on considering some aspects of the previous analysis carried out for 2D films Lam2018 () in the 3D geometry. We note that there has been a significant amount of work discussing various aspects of instability development, see, e.g. becker_nat03 (); neto_jphys03 (); pototsky_jcp05 (); sharma_epje03 (); Thie2003epje (); thiele_prl01 () and the relevant part of the review cm_rmp09 (). The simulation results that we present here are distinguished by significantly larger computational domains that allow for more elaborate analysis of the results. Direct comparison with some of the available experimental results is presented in 5.
This section is divided into three parts; in the first two parts we focus on the linearly unstable regime, investigating coarsening and satellite drop formation in 4.3.1; and the nucleation dominated regime in 4.3.2. In 4.3.3, the metastable regime is examined. Animations of the simulation results are available, and are provided as Supplementary Material SM (). Before presenting these results, we discuss the domain size, the spatial step size of simulations, and also the tools used to quantify simulation results: (i) the Fourier transform of the film profile, and (ii) the Betti numbers, topological measures of the film profile.
In 3.1 the spatial step size was determined by , , and , in this section, we instead fix , , and and then set the grid size , with ranging between and , giving the total number of grid points of the order of . For all simulations, we fix the spatial step size to except for the film thicknesses , where , respectively. The spatial step size was changed for these film thicknesses so as to maintain a sufficient number of points per period of the most unstable wavelength, . Note that for thicker films, while a larger may be chosen while maintaining a sufficient number of points per period (say 50), to adequately resolve the contact line, is chosen to be close to the precursor thickness, . In addition, simulations in the linearly unstable regime are scaled in two ways: (i) the linear domain size is set to where is an integer, i.e., domains are scaled by periods of the most unstable wavelength from LSA; and (ii) when comparing results for different film thicknesses as a function of time, it is convenient to define a new timescale, , i.e., we scale time by the most unstable growth rate obtained using LSA. We fix for all (linearly unstable) thicknesses.
One aspect of the analysis of the results focuses on the extraction of the dominant length scales. This goal is achieved by using a 2D Fourier transform. Assuming that the instability pattern is isotropic, the magnitude of the 2D Fourier transform may be mapped to a onedimensional (radial) function of the magnitude of the wave vector , where and are the and components of the wave vector. In addition, several smoothing techniques are required to reliably extract local maxima. The details of the procedure are not trivial if one wishes to extract accurate results, and are described in B.
Next, we discuss the use of Betti numbers to further quantify the simulation results. The Betti numbers, , are topological invariants that characterize the connectivity of dimensional objects. For our purposes, since the film thickness is a function of two spatial variables. To illustrate the meaning of and , we use landscape analogy: consider a landscape sliced by a horizontal plane at the height . Then, counts the number of mountain peaks (components) above , and counts the number of valleys (loops) surrounding the peaks. In any given landscape, the values of and change with the threshold level . In the present context, we will think of the film thickness, , as the variable describing the landscape; by computing Betti numbers for all relevant thresholds (ranging from precursor film thickness to the values larger than the initial film thickness), we will be in the position to describe in precise terms the topology of the evolving films, and to analyze the appearance of topological objects across scales. To minimize the influence of boundaries, it is important to carry out simulations in large domains such that boundaryrelated effects are minor; the computational domains that we choose are sufficiently large for the purpose. The calculations of Betti numbers are carried out using the Computational Homology Project (CHomP) software package.^{1}^{1}1http://chomp.rutgers.edu/
To examine the Betti numbers for various film thicknesses, we find it convenient to normalize them by the area corresponding to the most unstable wavelength, . Therefore, we define normalized Betti numbers, and by where is the linear domain size, see (4.7). The normalized Betti numbers give a measure of the number of features (components, loops) per , and will be shown as functions of the scaled threshold value , where is the average film thickness, and the rescaled time, .
4.3.1 Evolution of films exposed to infinitesimal perturbations of global character
Here we analyze the evolution of randomly perturbed films of thicknesses that are in the linearly unstable regime. The initial condition is set to a flat film that has been randomly perturbed, and to excite all modes in the 2D Fourier transform independently (combinations of and ), pseudoPerlin noise is used. Specifically, the initial condition is of the form
(4.7) 
where , is given in (1.5), and is the inverse Fourier transform of
(4.8) 
Here, , is some positive constant, and is a random variable, uniformly distributed on , for each . In addition, is scaled so that and we fix , where is the number of discretization points in the and directions, so that the spatial scale of the noise is proportional to .
Figures 10 and 11 show the results for the thinnest film that we consider, , and for times and , respectively. The central parts of the figures show the free surface thickness. The corresponding radial Fourier Transform is shown at the top right. We see that the most unstable wavelength dominates the morphology of the film at . On the bottom right, the local maxima of the radial Fourier transform are plotted as functions of time, showing consistently that for , dominates the wave pattern. At later times drops form and coarsening is observed in the radial Fourier transform. Similar coarsening effects are also observed in Volume of Fluid based simulations of nanoscale metal films Mahady2016 (). Additional simulations (figures not shown for brevity) uncover consistent results for all linearly unstable film thicknesses considered, in the range .
Figures 10 and 11 also plot the normalized Betti numbers (top left) and (bottom left) as contour plots with independent variables being (vertical axis) and the scaled time, (horizontal axis). (Note that the only difference between the Betti plots shown in these two figures is the position of vertical (magenta) lines, that show the current value of depicted in the central part of the figures.) The basic interpretation of these Betti numbers figures is straightforward for both early and late times. For the early times, , the perturbations are small and are not captured by the level of thresholding implemented; therefore both and are small (the exact values depend on the treatment of the boundaries, but this is not of particular interest here). For long times, , the drops form and there are no separate valleys/loops (they are all connected by the precursor); therefore is small. For in the range (the smallest value considered) to , for long times we see essentially constant values of , suggesting that the most of the drops are of approximately similar size; small values of for show that there are essentially no drops exceeding height