# Element orbitals for Kohn-Sham density functional theory

###### Abstract

We present a method to discretize the Kohn-Sham Hamiltonian matrix in the pseudopotential framework by a small set of basis functions automatically contracted from a uniform basis set such as planewaves. Each basis function is localized around an element, which is a small part of the global domain containing multiple atoms. We demonstrate that the resulting basis set achieves meV accuracy for 3D densely packed systems with a small number of basis functions per atom. The procedure is applicable to insulating and metallic systems.

###### pacs:

71.15.Ap, 71.15.Nc## I Introduction

Kohn-Sham density functional theory (KSDFT) Kohn and Sham (1965) is the most widely used electronic structure theory for condensed matter systems. When solving the Kohn-Sham equations, the choice of basis functions usually poses a dilemma for practitioners. The accurate and systematically improvable basis functions that are uniform in space, such as plane waves or finite elements, typically result in a large number of degrees of freedom () per atom in the framework of norm conserving pseudopotential Troullier and Martins (1991) especially for transition metal elements. The number of basis functions per atom can be reduced to the order of hundreds using ultrasoft pseudopotential Vanderbilt (1990), or augmentation techniques in the core-region such as the linearized augmented plane-wave (LAPW) method Andersen (1975) and the projector augmented wave (PAW) method Blöchl (1994). The relatively large number of basis functions used leads to a large prefactor in front of the already expensive cubic scaling for solving KSDFT.

Contracted basis functions, such as Gaussian type orbitals, atomic orbitals or muffin-tin orbitals, can represent the Kohn-Sham orbitals with a small number of degrees of freedom per atom (). These contracted basis functions contain a number of parameters to be determined. The flexibility for choosing different forms of parameters has generated a vast amount of literature (see e.g. Refs. Ozaki, 2003; Blum et al., 2009; Junquera et al., 2001; Chen et al., 2010; Andersen and Saha-Dasgupta, 2000; Qian et al., 2008) in the past few decades, which has been reviewed recently in Ref. Bowler and Miyazaki, 2012. The parameters in the contracted basis functions are typically constructed by a fitting procedure for a range of reference systems. The necessary inclusion of polarization basis Frisch et al. (1984), diffuse basis Clark et al. (1983), multiple radial functions for each angular momentum (multiple basis Junquera et al. (2001)) are just a few examples when the choice of basis functions becomes difficult and system dependent especially for complex systems.

It is desirable to combine the advantage of uniform basis set in which the accuracy is controlled by no more than a handful of universal parameters for almost all materials, and the advantage of contracted basis functions with a very small number of basis functions per atom. In other words, we would like to generate a small number of contracted basis functions by a unified procedure with high accuracy comparable to that obtained from uniform basis functions. In a recent work Lin et al. (2012a), we have developed a unified method for constructing a set of contracted basis functions from a uniform basis set such as planewaves in the pseudopotential framework. The new basis set, called adaptive local basis (ALB) set, is constructed by solving the Kohn-Sham problem restricted to a small part of the domain called element. Each ALB is discontinuous from the perspective of the global domain, and the continuous Kohn-Sham orbitals are approximated by the discontinuous ALBs under a discontinuous Galerkin framework Arnold (1982). It was demonstrated that the ALBs are able to achieve high accuracy (in the order of meV) using disordered Na and Si as examples. However, the number of basis functions per atom increases with respect to dimensionality. For example, basis functions per atom is needed to reach the accuracy of meV/atom for 3D bulk Na system.

In this paper, we propose a new basis set that is constructed from linear combination of adaptive local basis functions. Each new basis function, dubbed element orbital (EO), has localized nature around its associated element of the domain. The number of EOs used is significantly reduced compared to the number of ALBs for 3D bulk systems. We demonstrate that EOs per atom are sufficient to achieve meV per atom accuracy for 3D bulk Na system with disorderedness. We also apply EOs to study Na, Si and graphene, with varying system sizes, lattice constants or types of defects. The new method consistently achieves meV accuracy for calculating the total energy when compared to standard electronic structure software such as ABINIT Gonze et al. (2009). Since the EOs are contracted from a uniform basis set such as planewave basis set, the shape of the EOs has more flexibility to reflect the environmental effect than contracted basis sets which are centered around atoms. Numerical result indicates that the shape of EOs can resemble both atomic orbitals of different angular momentum and chemical bonds centered in the interstitial region, depending on their chemical environment.

We remark that the construction of the EOs is closely related to several existing techniques for reducing the number of basis functions per atom, starting from a large primitive basis set consisting of Gaussian orbitals or atomic orbitals Ozaki (2003); Blum et al. (2009); Rayson and Briddon (2009); Rayson (2010). However, the EOs are contracted from a fine uniform basis set such as planewaves, and a number of difficulties arise which makes the previous techniques difficult to be directly applied. For instance, the filtration technique in Ref. Rayson and Briddon, 2009; Rayson, 2010 constructs a near-minimal basis set from a large number of Gaussian type orbitals by applying a filtration matrix to a set of trial orbitals, taken from one or a few Gaussian-type orbitals. When the Gaussian-type orbitals are replaced by a fine uniform basis set such as planewaves, finding a good set of trial orbitals itself becomes a difficult task, and the construction of trial orbitals can inevitably introduce a set of undetermined parameters, which is not desirable in the current framework.

This paper is organized as follows: Section II introduces the adaptive local absis functions in the discontinuous Galerkin framework for solving Kohn-Sham density functional theory in the pseudopotential framework. The construction of the element orbitals is introduced in Section III. Section IV discusses briefly the implementation procedure of element orbitals. The performance of element orbitals is reported in Section V, followed by the discussion and conclusion in Section VI.

## Ii Adaptive local basis functions

Consider a quantum system with electrons under external potential by in a rectangular domain with periodic boundary condition. To simplify the equations, we ignore the electron spin for now. In Kohn-Sham density functional theory at a finite Kohn and Sham (1965); Mermin (1965), the Helmholtz free energy is given by

(1) |

Correspondingly and are the solutions to the minimization problem

(2) |

are the occupation numbers which add up to the total number of electrons . Here we use exchange-correlation functional under local density approximation (LDA) Ceperley and Alder (1980); Perdew and Zunger (1981) and adopt norm conserving pseudopotential Troullier and Martins (1991), with the projection vector of the nonlocal pseudopotential in the Kleinman-Bylander form Kleinman and Bylander (1982) denoted by , and is a sign. The number of eigenstates calculated in practice is chosen to be slightly larger than the number of electrons in order to compensate for the finite temperature effect, following the criterion that the occupation number is sufficiently small (less than . The electron density is given by

The Kohn-Sham equation, or the Euler-Lagrange equation associated with (2) is Kohn and Sham (1965); Mermin (1965)

(3) |

where the effective one-body potential is

and the occupation numbers follow the Fermi-Dirac distribution

Here the chemical potential is chosen so that . In each SCF iteration of (3), we freeze and solve for the lowest eigenfunctions . This linear eigenvalue problem is the focus of the following discussion.

The discontinuous Galerkin (DG) framework Arnold (1982) provides flexibility in choosing appropriate basis functions to discretize the Kohn-Sham Hamiltonian . In the DG framework, a smooth function delocalized across the global domain can be systematically approximated by a set of discontinuous functions that are localized in the real space. Let be a collection of elements, i.e. disjoint rectangular partitions of , and be the collection of surfaces that correspond to each element in . We associate with each a set of orthogonal basis functions supported in , with the total number of basis functions given by

Under such a basis set, the Hamiltonian is discretized into an matrix with entries given by

(4) |

where and are inner products in the bulk and on the surface respectively, and is a fixed parameter for penalizing cross-element discontinuity. The notations and stand for the average and jump operators across surfaces Arnold (1982). Comparing (3) and (4), the new terms involving the average and jump operators can be derived from integration by parts of the Laplacian operator, and provide consistency and stability of the DG method Arnold et al. (2002).

In the work of adaptive local basis set Lin et al. (2012a), the functions in each element are determined as follows. Let be the dimension of the system. For each (one black box in Fig. 1), we define an associated extended element , which includes both and its neighboring elements. Define to be the restriction of to with periodic boundary condition and with potential given by the restriction of to . is then discretized and diagonalized with uniform basis functions such as planewaves. We denote the corresponding eigenvalues and eigenfunctions by and , respectively, starting from the lowest eigenvalue. One then restricts the first functions of to , where is set to be proportional to the number of electrons inside the extended element (see the numerical examples for specific choice of ). In addition, we define for each

(5) |

i.e. the largest selected eigenvalue in which shall be used later. Applying the Gram-Schmidt procedure to then gives rise to a set of orthonormal functions

(6) |

for each . The union of such functions over all elements gives the set of adaptive local basis functions (ALBs).

For a given system, the partition of is kept to be the same even with changing atomic configurations as in the case of structure optimization and molecular dynamics. Dangling bonds may form when atoms are present on the surface of the extended elements, but we emphasize that these dangling bonds are not needed to be passivated by introducing auxiliary atoms near the surface of the extended elements Zhao et al. (2008). This is because the potential is not obtained self-consistently within , but instead from the restriction of the screened potential in the global domain to in each SCF iteration, which mutes the catastrophic damage of the dangling bonds. The oscillation in the basis functions caused by the discontinuity of the potential at the surface of the (called Gibbs phenomenon) still exists, but it damps exponentially away from the surface of and has controlled effect in . Using disordered Na and Si as examples, we demonstrated that ALB can achieve meV accuracy per atom using basis functions per atom Lin et al. (2012a).

## Iii Element orbitals

The high accuracy of ALBs indicates that the span of approximately contains the span of the Kohn-Sham orbitals . However, we found that the number of basis functions per atom may vary significantly with respect to the dimensionality of the system, which has not been seen reported in the literature using traditional contracted basis set to the extent of our knowledge.

The dimension dependence of ALBs can be intuitively understood as follows, motivated from the success of the contracted basis set such as atomic orbitals. Consider the case where an atom is positioned at the center of element and assume for simplicity that each of its atomic orbital overlaps only with the neighboring elements (i.e., those inside the extended element ). In order to include one such atomic orbital denoted by in the span of , each neighboring element in should allocate one of its ALBs for representing the restriction of in . This implies that , the total number of ALBs, should roughly be equal to , which becomes increasingly redundant with respect to the dimension . In fact, this is close to what has been observed in the numerical experiments Lin et al. (2012a).

In order to avoid this redundancy and motivated by the construction of atomic orbitals, we propose to build a new basis set by piecing the ALBs in neighboring elements in to construct functions that are qualitatively close to the atomic orbitals. To distinguish them from the pre-fitted atomic orbitals, we name these functions element orbitals (EOs). In order to achieve this, one is faced mainly with three issues. First, the ALBs are always discontinuous across the element boundaries, while qualitatively the EOs should be a continuous function since the atomic orbitals are continuous. Second, when one pieces back the ALBs to obtain the EOs, it is essential that the resulting functions have low-energy. Finally, one needs to make sure that the EOs of should be localized at in order to avoid degeneracy.

A two-step procedure is proposed to address these three issues. In the first step, we construct, for each element , a set of candidate functions that take care of the first two issues. Then in the second step, we identify the element orbitals by localizing the candidate functions. More specifically, the method proceeds as follows.

Let us fix an element . First, since each ALB is only supported in its associated element and equal to zero outside, we seek for a set of candidate functions for element that are linear combinations of the ALBs of both and its neighbors (Fig. 1). Denoting by the index of all the ALBs, and by the index set of ALBs supported in , we define a local Hamiltonian

i.e., the restriction of to the index set . Following the intuition that the atomic orbitals should only be affected by the local environment of , it is reasonable to assume that the low eigenfunctions of serve as good candidate functions. Computationally, we diagonalize by

(7) |

where the diagonal of contains all the eigenvalues bounded from above by the cut-off energy given by (5) and the columns of contains the corresponding eigenfunctions. The matrix is called the merging matrix for element . We argue that this step addresses the continuity and low-energy issues of the element orbitals since the eigenfunctions (7) are qualitatively smooth due to the cross-element penalty term of the DG formulation. Choosing the eigenfunctions below ensures that the candidate functions have low-energy.

Second, we localize these candidate functions to be centered at using a penalizing weight function defined for . is only nonzero in the extended element outside a certain distance, called the localization radius, from the boundary of (light gray area in Fig. 1). For simplicity we choose in the penalty area and otherwise. More sophisticated weighting function as developed for linear scaling methods García-Cervera et al. (2009) and confining potentials as developed for atomic orbitals Junquera et al. (2001) can be used and optimized for EOs in the future work. A weighting matrix for the adaptive basis functions in the index set is defined in the extended element by

In order to localize the candidate functions, we solve a second eigenvalue problem

where since are orthogonal from (7). The columns of and the diagonal of consist of the first eigenfunctions and eigenvalues, respectively. Here is the number of element orbitals (EOs) of . As will be shown later in the numerical results, a small number of EOs per atom already achieve high accuracy in the total energy calculation. We call the matrix the localization matrix, and the product gives the coefficients of the EOs in in terms of the ALBs indexed by . In order to present these EOs in terms of the whole adaptive basis set, we introduce an selection matrix such that is equal to the identity and all zero otherwise. By defining the coefficient matrix , we can construct the element orbitals associated with by

(8) |

Note that, since these functions are localized in by construction, the index only runs through the elements insides . Finally, the coefficient matrix

gives the whole set of coefficients of the element orbitals based on adaptive local basis functions. Once the element orbitals are identified, we solve an generalized eigenvalue problem

(9) |

where the diagonal of gives the Kohn-Sham eigenvalues and the columns of provide the coefficients of Kohn-Sham orbitals in terms of EOs. From , one can calculate the chemical potential and the occupation number . Finally, by introducing the Gram matrix

we can write as

(10) |

where indexes the element that contains . Notice that one only needs the knowledge of the diagonal blocks of the Gram matrix to construct the electron density. This allows us to use the recently developed pole expansion and selected inversion type fast algorithms Lin et al. (2009a, b, c, 2011a, 2011b, 2012b) to reduce the asymptotic scaling for solving the generalized eigenvalue problem (9) from cubic scaling to at most quadratic scaling for 3D bulk systems.

## Iv Parallel Implementation

Our algorithm is implemented fully in parallel for message-passing environment, based on the implementation details presented in Ref. Lin et al., 2012a. Here we summarize the key components of the parallel implementation.

The global domain is discretized with a uniform Cartesian grid with a spacing fine enough to capture the local oscillations of the Kohn-Sham orbitals and the electron density. Rather than using the dual grid approach with one set of grid for representing the Kohn-Sham wavefunctions, and another set of denser grid for representing the electron density, we only use one set of Cartesian grid for both the Kohn-Sham wavefunctions and the electron density for simplicity of the implementation. The grid inside an element is a three-dimensional Cartesian Legendre-Gauss-Lobatto (LGL) grid in order to accurately carry out the operations of the basis functions such as numerical integration. The ALBs are first represented in a planewave basis set in each extended element solved by LOBPCG algorithm Knyazev (2001) with a preconditioner Teter et al. (1989), and are interpolated to each element and orthogonalized. The eigenvalue problems involved in constructing the EOs are performed by LAPACK subroutine dsyevd.

To simplify the discussion of the parallel implementation, we assume that the number of processors is equal to the number of elements. It is then convenient to index the processors with the same index used for the elements. In the more general setting where the number of elements is larger than the number of processors, each processor takes a couple of elements and the following discussion will apply with only minor modification. Each processor locally generates and stores the ALBs for and the coefficients for the EOs for and . The EOs are not explicitly formed in the real space. We further partition the non-local pseudopotentials by assigning to the processor if and only if the atom associated to is located in the element .

Since the matrices and are sparse, the Hamiltonian matrix and the mass matrix in (9) are also sparse matrices. However, these matrices are treated as dense matrices in our implementation for simplicity. The parallel matrix-matrix multiplication for constructing and are performed using PBLAS subroutine pdgemm, and the generalized eigenvalue problem (9) is solved by converting it to a standard eigenvalue problem using ScaLAPACK Blackford et al. (1997) subroutine pdpotrf and pdsygst, and the standard eigenvalue problem is solved by ScaLAPACK subroutine pdsyevd.

In our implementation, the matrices and are constructed locally according to the element indices. However, the ScaLAPACK routines that operate on and require them to be stored in the two dimensional block cyclic pattern. In order to support these two types of data storage, we have implemented a rather general communication framework that only requires the programmer to specify the desired non-local data. This framework then automatically fetches the data from the processors that store them locally. The actual communication is mostly done using asynchronous communication routines MPI_Isend and MPI_Irecv.

## V Numerical results

The new method is implemented with Hartwigsen-Goedecker-Hutter (HGH) pseudopotential Hartwigsen et al. (1998), with the local and nonlocal pseudopotential implemented fully in the real space Pask and Sterne (2005). Finite temperature formulation of the Kohn-Sham density functional theory Mermin (1965) is used, and the temperature is set to be K only for the purpose of accelerating the convergence of SCF iteration. Since finite temperature is used, the accuracy is quantified by the error of the total free energy Alavi et al. (1994) per atom. HGH pseudopotential has analytic expression, which allows us to minimize the effect of numerical interpolation and perform accurate comparison with existing electronic structure code. We compare our result with ABINIT Gonze et al. (2009) which also supports HGH pseudopotential. The ALBs and EOs start from random initial guess, and are refined iteratively in the SCF iteration together with the electron density. In all the calculations, Anderson mixing Anderson (1965) with Kerker preconditioner Kerker (1981) are used for the SCF iteration. Gamma point Brillouin sampling is used for simplicity. In Section II and Section III, we count the number of basis functions in terms of the number of ALBs per element and the number of EOs per element. In this section, we count the number of ALBs and EOs per atom instead, in order to be consistent with literature. All computational experiments are performed on the Hopper system at the National Energy Research Scientific Computing (NERSC) center. Each Hopper node consists of two twelve-core AMD “MagnyCours” 2.1-GHz processors and has 32 gigabytes (GB) DDR3 1333-MHz memory. Each core processor has 64 kilobytes (KB) L1 cache and 512KB L2 cache. It also has access to a 6 megabytes (MB) of L3 cache shared among 6 cores.

As mentioned earlier, the ALBs have been shown to achieve effective dimension reduction for quasi-1D systems, but with deteriorating performance as the dimensionality of the system increases Lin et al. (2012a). Using Na as example, it has been shown that while ALBs per atom is enough to reach meV accuracy for quasi-1D systems, ALBs per atom is necessary to reach the same accuracy for 3D bulk systems. Now using a 3D bulk Na system with atoms as example, we illustrate that the number of basis functions per atom can be effectively reduced using EO.

The supercell for Na is simple cubic and the length of the supercell along each dimension is a.u.. A random perturbation with standard deviation a.u. is applied to each atom in the supercell to eliminate the translational invariance of the system. The supercell is partitioned into elements, with the length of each dimension of each element being a.u.. The length of each dimension of each extended element is a.u. which is times larger than that of the element. The penalty parameter in (4) is set to be . The supercell is discretized with a uniform mesh of dimension in the real space. This mesh is used for representing both the electron density and the Kohn-Sham orbitals, which corresponds to a planewave cutoff of Ry in the Fourier space. ABINIT uses a dual grid for representing the Kohn-Sham wavefunctions and the electron density. The planewave cutoff for wavefunctions used in ABINIT is Ry. This corresponds to a planewave cutoff for the electron density at Ry, with a uniform mesh of dimension in the real space. The different numbers of grid points along each dimension come from the automatic grid adjustment in ABINIT. We remark that the grid size is chosen to be larger than the typical setup in electronic calculation for Na to make sure that the error introduced by the grid size is small compared to that introduced by using ALBs and EOs. Inside each element a Legendre-Gauss-Lobatto (LGL) grid of dimension is used for numerical integration in the assembly process of the discretized Hamiltonian matrix . The error of the total free energy per atom only using ALBs is shown in Fig. 2 (a). The error systematically decreases with the increase of the number of ALBs. When the number of ALBs exceeds , the error of the total free energy per atom is less than meV.

Element orbitals (EOs) provide further dimension reduction compared to ALBs. Fig. 2 (b) shows the difference of the free energy per atom calculated from EOs and that from ABINIT. We construct EOs from as many as ALBs per atom, following the criterion (5) for the choice of the candidate functions and using a localization radius of a.u.. Compared to a converged ALB calculation, the error using only EOs per atom is already within meV per atom. When EOs are used, the total free energy calculated is essentially the same as that using ALBs, and the error compared to ABINIT is less than meV per atom. Fig. 2 (b) indicates that the EOs are indeed effective for reducing the number of basis functions per atom for 3D bulk systems.

Compared to ALB, the EO approach introduces an additional parameter which is the localization radius. Fig. 2 (c) shows the error of the total free energy per atom using ALBs per atom, and EOs per atom but with different localization radius. When the localization radius is a.u. which is the length of an element, the error of the total energy per atom is meV. Moderate choice of the localization radius of a.u. ( of the length of an element) yields accuracy around meV per atom. Fig. 2 (c) shows that our method is stable even for a large localization radius a.u. ( of the length of an element), and the error is even smaller and is below meV per atom. We also remark that if the localization radius is further increased, the EOs are no longer localized around the element, but become fully extended in the extended element. This can lead to an unstable scheme with large error. Fig. 2 (c) shows that the accuracy of the EO is not very sensitive to the choice of localization radius.

EOs can resemble atomic orbitals but with local modifications reflecting the environmental effect, despite the fact that they are constructed in the extended elements with rectangular domain. Using the same Na system as example, we show in Fig. 3 the isosurface of the first element orbitals ( to ) belonging to the same extended element, with the red and blue color indicating the positive and negative part of the EOs, respectively. atoms nearest to these EOs within a sphere of radius a.u. are also plotted in Fig. 3 as gold balls. We see that mimics orbital, - mimic -orbitals, and - mimic -orbitals. Both the general shape and the multiplicity of the element orbitals agree well with the physical intuition. We also find that hybridization of the orbitals naturally appears in the EOs, reflecting the effect of the environment. For example, the isosurface of exhibits “holes” around atoms. These holes are not described in the spherical symmetric atomic orbital, but can only be reflected in orbitals of higher angular momentum such as orbitals. Therefore, EOs are natural generalization of atom-centered orbitals, with both the atomic and environmental effect taken into account simultaneously.

EOs are localized in the extended elements. Since each candidate function is not continuous across the boundary of the extended element, EOs are still discontinuous across the boundary of the extended element. Nonetheless, the EOs are “qualitatively continuous” at the boundary of the extended elements. Fig. 4 (a) shows the behavior of for the Na system along one direction, with the zoom-in near the boundary of the extended element shown in Fig. 4 (b). EOs are very close to a continuous function especially for and with lower angular momentum. The value of EOs of higher angular momentum such as at the grid point closest to the boundary of the extended element is within .

EOs can be used for calculating the relative energies of different atomic configurations. Fig. 5 (a) shows the total free energy per atom for a crystal of Na consisting of unit cells with atoms. Each unit cell is body centered cubic with Na atoms. The lattice constant ranges from a.u. to a.u.. The size of each element is equal to that of one unit cell. EOs per atom are constructed from ALBs per atom and are used for calculating the total free energy. The planewave cutoff for Kohn-Sham wavefunctions in ABINIT is Ry. The difference of the total energy per atom is less than meV across all the lattice constants. Similar result can be obtained for Si. The supercell for Si contains unit cells with atoms in total. Each unit cell is diamond cubic with Si atoms. Fig. 5 (b) reports the total free energy per atom for lattice constants from a.u. to a.u.. Each element only covers unit cells. We remark that elements occupying a fraction of the unit cell are allowed, which is important especially when EOs are applied to systems with defects and disorderedness. The planewave cutoff for Kohn-Sham wavefunctions in ABINIT is set to be Ry to achieve the high accuracy as benchmark solution. The localization radius is also a.u.. Starting from ALBs per atom, EOs per atom are computed. The difference of the total free energy per atom is less than meV for all lattice constants.

EOs are also effective for calculating the total energy of systems with defects. For a crystal Na system with atoms and the length of each dimension of the supercell being a.u., the total free energy evaluated using ABINIT is a.u.. Using the same setup as done in the crystal system with EOs per atom, the total free energy evaluated using EO is a.u.. The difference is as small as meV per atom. Since our implementation takes the spin-unpolarized form, we consider a system with two vacancies by removing Na atoms belonging to one unit cell from the supercell. All the parameters are the same as those for the calculation of the crystal system. The total free energy evaluated using ABINIT is a.u., and the total free energy evalauted using EOs per atom is a.u., with the difference being meV per atom. The error for both the crystal and the defect system is less than meV per atom. We also estimate the formation energy of neutral vacancies by

(11) |

with being the free energy for the crystal system with atoms, and being the free energy for the same system but with atoms removed. Atomic relaxation is not taken into account at this stage. Using (11), the formation energy calculated from ABINIT is eV, and that calculated from EO is eV. The difference of the formation energy is eV, and the relative error of the formation energy is .

The calculation of the defect formation energy for Si is as follows. For a crystal Si system with atoms and the length of each dimension of the supercell being a.u., the total free energy evaluated using ABINIT is a.u., and the total free energy evaluated using EOs per atom is a.u.. The difference is as small as meV per atom. A defect system is constructed by removing one Si atom, and all the parameters are the same as those for the crystal calculation. The total free energy evaluated using ABINIT is a.u., and the total free energy evaluated using EOs per atom is a.u., with the difference being meV per atom. The error for both the crystal system and that for the defect system is less than meV per atom. The formation energy calculated from ABINIT is eV, and that calculated from EO is eV. The difference of the formation energy is eV, and the relative error of the formation energy is .

Next we study graphene sheet consisting of C atoms (cyan balls), with C atom replaced by a Si atom (gold ball), as shown in Fig. 6. The length of the supercell is a.u., a.u. and a.u. for directions, respectively. The C and Si atoms are in the plane. The supercell consists of elements, with each element containing atoms, and represented by one black box. The length of each element is therefore a.u., a.u. and a.u. along directions, respectively. The shape of the EOs is shown in Fig. 6 (a) for the first EOs belonging to different elements, and (b) for the second EOs belonging to the same elements, respectively. We find that in the upper element reflects the C-C bond and in the lower element reflects the C-Si bond, respectively. Similarly, reflects the bonds in both the upper and the lower elements. The shape of the EOs agree well with the physical intuition. In particular, the element orbitals are not centered around individual atoms but correspond directly to chemical bonds, which are of lower energy than individual atomic orbitals. Fig. 6 shows that the EOs constructed from a complete basis set such as planewaves provides a more flexible treatment of chemical environment than atom centered orbitals. The total free energy calculated using ABINIT with a planewave cutoff at Ry is a.u.. EOs per atom contracted from ALBs per atom with localization radius being a.u.. The total free energy calculated using EO is a.u.. The difference in the total free energy per atom is meV.

A more complicated example is a graphene sheet with C atoms, and with of the C atoms randomly selected and replaced by Si atoms. The atomic configuration is shown in Fig. 7 (a), with the C atoms represented by cyan balls and Si atoms represented by gold balls, respectively. The atoms are all in the plane, and the dimension of the supercell is a.u., a.u. and a.u. along directions, respectively. The electron density in the plane is shown in Fig. 7 (b). The total free energy calculated from ABINIT is a.u., and the total free energy calculated from EO with EOs per atom for all elements is a.u.. The error of the total free energy per atom is meV per atom.

The fact that a small number of EOs per atom already achieve high accuracy allows us to perform calculations for systems of large size. Here we study 3D bulk Na systems of various sizes, ranging from atoms to atoms. The length of the supercell along each dimension is also proportional to the system size, from a.u. for atoms to a.u. for atoms. The number of processors (computational cores) used is chosen to be proportional to the number of atoms, with processors used for atoms, and processors used for atoms. EOs per atom are constructed from ALBs per atom for all calculations. The total time per SCF iteration is shown in Fig. 8. We find that even though the number of atoms increase by a factor of , the wall clock time only increases by less than times from sec for atoms to sec for atoms. The small increase of the total wall clock time is because the time for solving the generalized eigenvalue problem (9), which is asymptotically the computationally dominating part, only takes less than sec even for system as large as atoms, thanks to the small number of basis functions per atom allowed to be used in the calculation. The time for generating the ALBs using LOBPCG and the time for constructing the EOs from the ALBs are flat for all systems, since these steps are localized in each extended element and the computational cost is independent of the global system size. The overall time for solving the generalized eigenvalue problem (9) has not dominated the computational time for atoms with a Hamiltonian matrix of size . However, the wall clock time for this part already scales quadratically with respect to the number of atoms. Since the number of processors scale linearly with respect to the system size, the overall time for solving the generalized eigenvalue problem scales cubically with respect to the system size, and will eventually dominate the overall running time for systems of larger size. The overhead of the DG calculation involves the assembly of the DG matrix , the construction of the Hamiltonian matrix and the mass matrix using parallel matrix-matrix multiplication, as well as the communication time. As alluded to earlier, the parallel matrix-matrix multiplication treats and as dense matrices in the current implementation. Therefore the asymptotic scaling of this part has the same asymptotic cubic scaling as solving the generalized eigenvalue problem. All the rest of the computational time (classified as “other time” in Fig. 8) mainly includes constructing the electron density using (10) in the global domain, solving the Kohn-Sham potential from the electron density, charge mixing as well as the extra data communication.

We also remark that treating the Hamiltonian matrix as dense matrices greatly increases the memory cost and the communication volume. Fig. 9 (a) shows the amount of memory used per processor. When the number of atoms is , the memory used per processor is GB, which becomes the bottleneck for further increasing the system size, despite that the computational time per SCF is still within affordable range. The communication volume, indicated by the percentage of the communication time within the total computational time is shown in Fig. 9 (b). The communication time occupies more than of the total time for systems with atoms. Both the large memory cost and the large communication volume is largely due to the treatment of and as dense matrices, and shall be improved in the future work.

## Vi Conclusion

In conclusion, we have introduced the element orbitals for discretizing the Kohn-Sham Hamiltonian in the pseudopotential framework, which are contracted automatically from a uniform basis set. Comparing with the existing contracted basis sets, element orbitals incorporate environment information by including directly all atoms in the neighboring elements on the fly. The implementation of element orbitals is straightforward thanks to the rectangular partitioning of the domain. The accuracy of element orbitals are systematically improvable and the same procedure can be applied to systems under various conditions. The element orbitals are constructed by solving KSDFT locally in the real space, and localized on each element via a localization procedure. We remark that the localization procedure used for constructing the element orbitals is not grounded on the near-sightedness property as in the linear scaling methods for insulating systems Goedecker (1999); Kohn (1996). Instead of finding the compact representations for the Kohn-Sham invariant subspaces Gygi (2009), the current work seeks for a set of compact basis functions in the real space, while the coefficients of the basis set for representing the Kohn-Sham orbitals can still be delocalized. As is shown by the numerical examples, the current procedure is applicable to both insulating and metallic systems.

Our numerical examples also indicate that treating and as dense matrices can greatly increase the memory cost, the communication volume and the computational time especially for systems of large size. The future improvement includes treating and as sparse matrices so that the construction of the Hamiltonian matrix and the mass matrix is of linear scaling. By treating and as sparse matrices, we can also incorporate the recently developed pole expansion and selected inversion type fast algorithms Lin et al. (2009a, b, c, 2011a, 2011b, 2012b) to reduce the asymptotic scaling for solving the generalized eigenvalue problem (9) from cubic scaling to at most quadratic scaling for 3D bulk systems. We also remark that the current procedure for constructing the orbitals from adaptive local basis functions is still a costly procedure inside each element. Method for generating element orbitals directly inside the extended element is also under our exploration.

This work is partially supported by NSF CAREER Grant 0846501 (L. Y.), and by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory under the U.S. Department of Energy contract number DE-AC02-05CH11231 (L. L.). The authors thank Jianfeng Lu for helpful discussions, and National Energy Research Scientific Computing Center (NERSC) for the support to perform the calculations. L. L. also thanks Weinan E for encouragement, and the University of Texas at Austin for the hospitality where the idea of this paper starts.

## References

- Kohn and Sham (1965) W. Kohn and L. Sham, Phys. Rev. 140, A1133 (1965).
- Troullier and Martins (1991) N. Troullier and J. L. Martins, Phys. Rev. B 43, 1993 (1991).
- Vanderbilt (1990) D. Vanderbilt, Phys. Rev. B 41, 7892 (1990).
- Andersen (1975) O. K. Andersen, Phys. Rev. B 12, 3060 (1975).
- Blöchl (1994) P. E. Blöchl, Phys. Rev. B 50, 17953 (1994).
- Ozaki (2003) T. Ozaki, Phys. Rev. B 67, 155108 (2003).
- Blum et al. (2009) V. Blum, R. Gehrke, F. Hanke, P. Havu, V. Havu, X. Ren, K. Reuter, and M. Scheffler, Comput. Phys. Commun. 180, 2175 (2009).
- Junquera et al. (2001) J. Junquera, O. Paz, D. Sanchez-Portal, and E. Artacho, Phys. Rev. B 64, 235111 (2001).
- Chen et al. (2010) M. Chen, G. C. Guo, and L. He, J. Phys.: Condens. Matter 22, 445501 (2010).
- Andersen and Saha-Dasgupta (2000) O. K. Andersen and T. Saha-Dasgupta, Phys. Rev. B 62, R16219 (2000).
- Qian et al. (2008) X. Qian, J. Li, L. Qi, C. Z. Wang, T. L. Chan, Y. X. Yao, K. M. Ho, and S. Yip, Phys. Rev. B 78, 245112 (2008).
- Bowler and Miyazaki (2012) D. R. Bowler and T. Miyazaki, Rep. Prog. Phys. 75, 036503 (2012).
- Frisch et al. (1984) M. Frisch, J. Pople, and J. Binkley, J. Chem. Phys. 80, 3265 (1984).
- Clark et al. (1983) T. Clark, J. Chandrasekhar, G. Spitznagel, and P. Schleyer, J. Comput. Chem. 4, 294 (1983).
- Lin et al. (2012a) L. Lin, J. Lu, L. Ying, and W. E, J. Comput. Phys. 231, 2140 (2012a).
- Arnold (1982) D. N. Arnold, SIAM J. Numer. Anal. 19, 742 (1982).
- Gonze et al. (2009) X. Gonze, B. Amadon, P. Anglade, J. Beuken, F. Bottin, P. Boulanger, F. Bruneval, D. Caliste, R. Caracas, M. Cote, et al., Comput. Phys. Commun. 180, 2582 (2009).
- Rayson and Briddon (2009) M. J. Rayson and P. R. Briddon, Phys. Rev. B 80, 205104 (2009).
- Rayson (2010) M. J. Rayson, Comput. Phys. Commun. 181, 1051 (2010).
- Mermin (1965) N. Mermin, Phys. Rev. 137, A1441 (1965).
- Ceperley and Alder (1980) D. M. Ceperley and B. J. Alder, Phys. Rev. Lett. 45, 566 (1980).
- Perdew and Zunger (1981) J. P. Perdew and A. Zunger, Phys. Rev. B 23, 5048 (1981).
- Kleinman and Bylander (1982) L. Kleinman and D. M. Bylander, Phys. Rev. Lett. 48, 1425 (1982).
- Arnold et al. (2002) D. N. Arnold, F. Brezzi, B. Cockburn, and L. D. Marini, SIAM J. Numer. Anal. 39, 1749 (2002).
- Zhao et al. (2008) Z. Zhao, J. Meza, and L. Wang, J. Phys.: Condens. Matter 20, 294203 (2008).
- García-Cervera et al. (2009) C. J. García-Cervera, J. Lu, Y. Xuan, and W. E, Phys. Rev. B 79, 115110 (2009).
- Lin et al. (2009a) L. Lin, J. Lu, R. Car, and W. E, Phys. Rev. B 79, 115133 (2009a).
- Lin et al. (2009b) L. Lin, J. Lu, L. Ying, and W. E, Chin. Ann. Math. Ser. B 30, 729 (2009b).
- Lin et al. (2009c) L. Lin, J. Lu, L. Ying, R. Car, and W. E, Commun. Math. Sci. 7, 755 (2009c).
- Lin et al. (2011a) L. Lin, C. Yang, J. Lu, L. Ying, and W. E, SIAM J. Sci. Comput. 33, 1329 (2011a).
- Lin et al. (2011b) L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, and W. E, ACM. Trans. Math. Software 37, 40 (2011b).
- Lin et al. (2012b) L. Lin, M. Chen, C. Yang, and L. He, arxiv:1202.2159 (2012b).
- Knyazev (2001) A. Knyazev, SIAM J. Sci. Comp. 23, 517 (2001).
- Teter et al. (1989) M. P. Teter, M. C. Payne, and D. C. Allan, Phys. Rev. B 40, 12255 (1989).
- Blackford et al. (1997) L. S. Blackford, J. Choi, A. Cleary, E. D’Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, et al., ScaLAPACK Users’ Guide (SIAM, Philadelphia, PA, 1997).
- Hartwigsen et al. (1998) C. Hartwigsen, S. Goedecker, and J. Hutter, Phys. Rev. B 58, 3641 (1998).
- Pask and Sterne (2005) J. E. Pask and P. A. Sterne, Phys. Rev. B 71, 113101 (2005).
- Alavi et al. (1994) A. Alavi, J. Kohanoff, M. Parrinello, and D. Frenkel, Phys. Rev. Lett. 73, 2599 (1994).
- Anderson (1965) D. Anderson, J. Assoc. Comput. Mach. 12, 547 (1965).
- Kerker (1981) G. P. Kerker, Phys. Rev. B 23, 3082 (1981).
- Goedecker (1999) S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999).
- Kohn (1996) W. Kohn, Phys. Rev. Lett. 76, 3168 (1996).
- Gygi (2009) F. Gygi, Phys. Rev. Lett. 102, 166406 (2009).