Padé approximants and exact two-locus sampling distributions

# Padé approximants and exact two-locus sampling distributions

[ [    [ [ University of California, Berkeley Computer Science Division
University of California, Berkeley
Berkeley, California 94720
USA
Department of Statistics
and Computer Science Division
University of California, Berkeley
Berkeley, California 94720
USA
\smonth7 \syear2010\smonth4 \syear2010
\smonth7 \syear2010\smonth4 \syear2010
\smonth7 \syear2010\smonth4 \syear2010
###### Abstract

For population genetics models with recombination, obtaining an exact, analytic sampling distribution has remained a challenging open problem for several decades. Recently, a new perspective based on asymptotic series has been introduced to make progress on this problem. Specifically, closed-form expressions have been derived for the first few terms in an asymptotic expansion of the two-locus sampling distribution when the recombination rate \rho is moderate to large. In this paper, a new computational technique is developed for finding the asymptotic expansion to an arbitrary order. Computation in this new approach can be automated easily. Furthermore, it is proved here that only a finite number of terms in the asymptotic expansion is needed to recover (via the method of Padé approximants) the exact two-locus sampling distribution as an analytic function of \rho; this function is exact for all values of \rho\in[0,\infty). It is also shown that the new computational framework presented here is flexible enough to incorporate natural selection.

[
\kwd
\doi

10.1214/11-AAP780 \volume22 \issue2 2012 \firstpage576 \lastpage607 \newproclaimdefinitionDefinition[section] \newproclaimapproximationApproximation[section]

\runtitle

Analytic two-locus sampling distributions

{aug}

A]\fnmsPaul A. \snmJenkins\thanksreft1label=e1]pauljenk@eecs.berkeley.edu and B]\fnmsYun S. \snmSong\corref\thanksreft1,t2label=e2]yss@stat.berkeley.edu

\thankstext

t1Supported in part by NIH Grants R00-GM080099 and R01-GM094402. \thankstextt2Supported in part by an Alfred P. Sloan Research Fellowship and a Packard Fellowship for Science and Engineering.

class=AMS] \kwd[Primary ]92D15 \kwd[; secondary ]65C50 \kwd92D10. Population genetics \kwdrecombination \kwdsampling distribution \kwdasymptotic expansion \kwdPadé approximants.

\pdfkeywords

92D15, 65C50, 92D10, Population genetics, recombination, sampling distribution, asymptotic expansion, Pade approximants

## 1 Introduction

Central to many applications in genetic analysis is the notion of sampling distribution, which describes the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions have been known for many decades [Wright (1949), Ewens (1972)]. In contrast, for multi-locus models with finite recombination rates, finding a closed-form sampling distribution has so far remained a challenging open problem. To make progress on this long-standing issue, we recently proposed a new approach based on asymptotic expansion and showed that it is possible to obtain useful analytic results when the recombination rate is moderate to large [Jenkins and Song (2009, 2010)]. More precisely, our previous work can be summarized as follows.

Consider a two-locus Wright–Fisher model, with the two loci denoted by A and B. In the standard coalescent or diffusion limit, let \theta_{A} and \theta_{B} denote the population-scaled mutation rates at loci A and B, respectively, and let \rho denote the population-scaled recombination rate between the two loci. Given a sample configuration {\mathbf{n}} (defined later in the text), consider the following asymptotic expansion of the sampling probability q({\mathbf{n}}\mid\theta_{A},\theta_{B},\rho) for large \rho:

 \displaystyle q({\mathbf{n}}\mid\theta_{A},\theta_{B},\rho) \displaystyle= \displaystyle q_{0}({\mathbf{n}}\mid\theta_{A},\theta_{B})+\frac{q_{1}({% \mathbf{n}}\mid\theta_{A},\theta_{B})}{\rho} (1) \displaystyle{}+\frac{q_{2}({\mathbf{n}}\mid\theta_{A},\theta_{B})}{\rho^{2}}+\cdots,

where the coefficients q_{0}({\mathbf{n}}\mid\theta_{A},\theta_{B}),q_{1}({\mathbf{n}}\mid\theta_{A},% \theta_{B}),q_{2}({\mathbf{n}}\mid\theta_{A},\theta_{B}),\ldots, are independent of \rho. The zeroth-order term q_{0} corresponds to the contribution from the completely unlinked (i.e., \rho=\infty) case, given simply by a product of marginal one-locus sampling distributions. Until recently, higher-order terms q_{M}({\mathbf{n}}\mid\theta_{A},\theta_{B}), for M\geq 1, were not known. In Jenkins and Song (2010), assuming the infinite-alleles model of mutation at each locus, we used probabilistic and combinatorial techniques to derive a closed-form formula for the first-order term q_{1}, and showed that the second-order term q_{2} can be decomposed into two parts, one for which we obtained a closed-form formula and the other which satisfies a simple recursion that can be easily evaluated using dynamic programming. We later extended these results to an arbitrary finite-alleles model and showed that the same functional form of q_{1} is shared by all mutation models, a property which we referred to as universality [see Jenkins and Song (2009) for details]. Importantly, we also performed an extensive study of the accuracy of our results and showed that they may be accurate even for moderate values of \rho, including a range that is of biological interest.

Given the above findings, one is naturally led to ask several important follow-up questions. In particular, the following questions are of both theoretical and practical interest.

1. 1.

Is it possible to compute the higher-order coefficients q_{M}({\mathbf{n}}\mid\theta_{A},\theta_{B}) for M>2?

2. 2.

For a given finite \rho>0, does the series in (1) converge as more terms are added?

3. 3.

If not, how should one make use of the asymptotic expansion in practice?

4. 4.

Is it possible to incorporate into the asymptotic expansion framework other important mechanisms of evolution such as natural selection?

In this paper, we develop a new computational technique to answer the above questions. Our previous method requires rewriting complex recursion relations into more structured forms, followed by laborious computation of the expectation of increasingly complicated functions of multivariate hypergeometric random variables. Generalizing that method to go beyond the second order (i.e., M>2) seems unwieldy. In contrast, our new method is based on the diffusion process and it utilizes the diffusion generator to organize computation in a simple, transparent fashion. Moreover, the same basic procedure, which is purely algebraic, applies to all orders and the computation can be completely automated; we have, in fact, made such an implementation.

To recapitulate, we propose here a method of computing the asymptotic expansion (1) to an arbitrary order. That is, for any given positive integer M, our method can be used to compute the coefficients q_{k}({\mathbf{n}}\mid\theta_{A},\theta_{B}) for all k\leq M; Theorem 3.1 summarizes this result. As discussed in Section 6.2, however, one can find examples for which the series (1) diverges for finite, nonzero \rho. To get around this problem, we employ the method of Padé approximants. The key idea behind Padé approximants is to approximate the function of interest by a rational function. Although (1) may diverge, we show that the sequence of Padé approximants converges for all values of \rho>0. In fact, for every sample configuration {\mathbf{n}}, we show that there exists a finite positive integer C({\mathbf{n}}), such that the Padé approximant of the asymptotic expansion up to order \geq C({\mathbf{n}}) is equal to the exact two-locus sampling distribution. Hence, our result implies that only a finite number of terms in the asymptotic expansion need to be computed to recover (via the Padé approximant) the exact sampling distribution as an analytic function of \rho; this function is exact for all values of \rho, including 0. Theorem 4.1 and the surrounding discussion lay out the details. Last, we also show in this paper that our new framework is flexible enough to incorporate a general model of diploid selection. This extension is detailed in Section 5.

The above-mentioned convergence result is theoretically appealing. For practical applications, however, one needs to bear in mind that the value of C({\mathbf{n}}) generally grows with sample size, thus implying that obtaining an exact, analytic sampling distribution may be impracticable for large samples. A possible remedy, which works well in practice, is to compute the asymptotic expansion only up to some reasonable order M<C({\mathbf{n}}), and use the corresponding Padé approximant as an approximate sampling distribution. We show in Section 6 that using M=10 or so produces quite accurate results.

An important advantage of our method over Monte Carlo-based methods is that, for a given mutation model, the bulk of the computation in our approach needs to be carried out only once. Specifically, the coefficients q_{k}({\mathbf{n}}\mid\theta_{A},\theta_{B}) need to be computed only once, and the same coefficients can be used to evaluate the sampling distribution at different values of the recombination rate \rho. We expect this aspect of our work to have important practical implications. For example, in the composite likelihood method for estimating fine-scale recombination rates [Hudson (2001), McVean et al. (2004)], one needs to generate exhaustive lookup tables containing two-locus sampling probabilities for a wide range of discrete \rho values. An alternative approach would be to store the coefficients q_{k}({\mathbf{n}}\mid\theta_{A},\theta_{B}) instead of generating an exhaustive lookup table using importance sampling, which is computationally expensive.

The rest of this paper is organized as follows. In Section 2, we lay out the notational convention adopted throughout this paper and review our previous work on asymptotic expansion of the two-locus sampling distribution up to second order. Our new technique for obtaining an arbitrary-order asymptotic expansion is described in Section 3, where we focus on the selectively neutral case. In Section 4, we present the method of Padé approximants and describe the aforementioned result on convergence to the exact sampling distribution. In Section 5, we describe how natural selection can be incorporated into our new framework. Finally, we summarize in Section 6 our empirical study of the accuracy of various approximate sampling distributions and provide in Section 7 proofs of the main theoretical results presented in this paper.

## 2 Notation and review of previous work

In this section, we introduce some notation and briefly review previous results on asymptotic sampling distributions. Initial results were obtained for the infinite-alleles model of mutation [Jenkins and Song (2010)] and later generalized to an arbitrary finite-alleles model [Jenkins and Song (2009)]. In this paper we focus on the latter case.

The set of nonnegative integers is denoted by \mathbb{N}. Given a positive integer k, [k] denotes the set \{1,\ldots,k\}. For a nonnegative real number z and a positive integer n, (z)_{n}:=z(z+1)\cdots(z+n-1) denotes the nth ascending factorial of z. We use \mathbf{0} to denote either a vector or a matrix of all zeroes; it will be clear from context which is intended. Throughout, we consider the diffusion limit of a haploid exchangeable model of random mating with constant population size 2N. We refer to the haploid individuals in the population as gametes. Initially we shall assume that the population is selectively neutral; we extend to the nonneutral case in Section 5.

### 2.1 One-locus sampling distribution

The sample configuration at a locus is denoted by a vector {\mathbf{n}}=(n_{1},\ldots,n_{K}), where n_{i} denotes the number of gametes with allele i at the locus, and K denotes the number of distinct possible alleles. We use n=\sum_{i=1}^{K}n_{i} to denote the total sample size. Let u denote the probability of mutation at the locus per gamete per generation. Then, in the diffusion limit, N\to\infty and u\to 0 with the population-scaled mutation rate \theta=4Nu held fixed. Mutation events occur according to a Poisson process with rate \theta/2, and allelic changes are described by a Markov chain with transition matrix \mathbf{P}=(P_{ij}); that is, when a mutation occurs to an allele i, it mutates to allele j with probability P_{ij}.

We denote by p({\mathbf{n}}) the probability of obtaining the unordered sample configuration {\mathbf{n}}. When writing sampling probabilities, we suppress the dependence on parameters for ease of notation. By exchangeability, the probability of any ordered configuration corresponding to {\mathbf{n}} is invariant under all permutations of the sampling order. We may, therefore, use q({\mathbf{n}}) without ambiguity to denote the probability of any particular ordered configuration consistent with {\mathbf{n}}. The two probabilities are related by

 p({\mathbf{n}})=\frac{n!}{n_{1}!\cdots n_{K}!}q({\mathbf{n}}).

Throughout this paper, we express our results in terms of ordered samples for convenience.

Consider an infinite population specified by the population-wide allele frequencies {\mathbf{x}}=(x_{i},\ldots,x_{K}), evolving according to a Wright–Fisher diffusion on the simplex

 \Delta_{K}=\Biggl{\{}{\mathbf{x}}=(x_{i})\in[0,1]^{K}\dvtx\sum_{i=1}^{K}x_{i}=% 1\Biggr{\}}. (2)

We assume that a sample is drawn from the population at stationarity. No closed-form expression for q({\mathbf{n}}) is known except in the special case of parent-independent mutation (PIM), in which P_{ij}=P_{j} for all i. In the PIM model, the stationary distribution of {\mathbf{x}} is Dirichlet with parameters (\theta P_{1},\ldots,\theta P_{K}) [Wright (1949)], and so q({\mathbf{n}}) is obtained by drawing an ordered sample from this population:

 q({\mathbf{n}})=\mathbb{E}\Biggl{[}\prod_{i=1}^{K}X_{i}^{n_{i}}\Biggr{]}=% \Gamma(\theta)\int_{\Delta_{K}}\prod_{i=1}^{K}\frac{x_{i}^{n_{i}+\theta P_{i}-% 1}}{\Gamma(\theta P_{i})}\,\mathrm{d}{\mathbf{x}}=\frac{1}{(\theta)_{n}}\prod_% {i=1}^{K}(\theta P_{i})_{n_{i}}.\hskip-30.0pt (3)

This sampling distribution can also be obtained by coalescent arguments [Griffiths and Tavaré (1994)].

### 2.2 Two loci

We now extend the above notation to two loci, which we refer to as A and B. Denote the probability of a recombination event between the two loci per gamete per generation by r. In the diffusion limit, as N\to\infty we let r\to 0 such that the population-scaled recombination parameter \rho=4Nr is held fixed. Suppose there are K possible alleles at locus A and L possible alleles at locus B, with respective population-scaled mutation parameters \theta_{A} and \theta_{B}, and respective mutation transition matrices \mathbf{P}^{A} and \mathbf{P}^{B}. The two-locus sample configuration is denoted by {\mathbf{n}}=({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), where {\mathbf{a}}=(a_{1},\ldots,a_{K}) with a_{i} being the number of gametes with allele i at locus A and unspecified alleles at locus B; {\mathbf{b}}=(b_{1},\ldots,b_{L}) with b_{j} being the number of gametes with unspecified alleles at locus A and allele j at locus B; {\mathbf{c}}=(c_{ij}) is a K\times L matrix with c_{ij} being the multiplicity of gametes with allele i at locus A and allele j at locus B. We also define

and use {\mathbf{c}}_{A}=(c_{i\cdot})_{i\in[K]} and {\mathbf{c}}_{B}=(c_{\cdot j})_{j\in[L]} to denote the marginal sample configurations of {\mathbf{c}} restricted to locus A and locus B, respectively. Notice the distinction between the vectors {\mathbf{a}} and {\mathbf{b}}, which represent gametes with alleles specified at only one of the two loci, and the vectors {\mathbf{c}}_{A} and {\mathbf{c}}_{B}, which represent the one-locus marginal configurations of gametes with both alleles observed.

When we consider the ancestry of a sample backward in time, a gamete may undergo recombination between the two loci, with each of its two parents transmitting genetic material at only one of the two loci. We allow the nontransmitting locus to remain unspecified as we trace the ancestry further back in time.

Denote by q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) the sampling probability of an ordered sample with configuration ({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), again suppressing the dependence on parameters for ease of notation. Sampling is now from a two-dimensional Wright–Fisher diffusion with population allele frequencies {\mathbf{x}}=(x_{ij})_{i\in[K],j\in[L]}, evolving on the state space

 \Delta_{K\times L}=\Biggl{\{}{\mathbf{x}}=(x_{ij})\in[0,1]^{K\times L}\dvtx% \sum_{i=1}^{K}\sum_{j=1}^{L}x_{ij}=1\Biggr{\}}. (4)

As before, q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is specified by drawing an ordered sample from the population at stationarity: q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=\mathbb{E}[F(\mathbf{X};{\mathbf{n}})], where

 F({\mathbf{x}};{\mathbf{n}})=\Biggl{(}\prod_{i=1}^{K}x_{i\cdot}^{a_{i}}\Biggr{% )}\Biggl{(}\prod_{j=1}^{L}x_{\cdot j}^{b_{j}}\Biggr{)}\Biggl{(}\prod_{i=1}^{K}% \prod_{j=1}^{L}x_{ij}^{c_{ij}}\Biggr{)} (5)

with x_{i\cdot}\!=\!\sum_{j=1}^{L}x_{ij} and x_{\cdot j}\!=\!\sum_{i=1}^{K}x_{ij}. In the two-locus model with 0\!\leq\!\rho\!<\!\infty, the stationary distribution, and hence, the sampling distribution, is not known in closed-form even when the mutation process is parent-independent. However, when \rho=\infty, the two loci become independent and q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is simply the product of the two marginal one-locus sampling distributions. More precisely, denoting the one-locus sampling distributions at A and B by q^{A} and q^{B}, respectively, we have

 \lim_{\rho\to\infty}q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=q^{A}({\mathbf{a% }}+{\mathbf{c}}_{A})q^{B}({\mathbf{b}}+{\mathbf{c}}_{B})

for all mutation models [Ethier (1979)]. In particular, if mutation is parent-independent, then we do have a closed-form formula for q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) when \rho=\infty, since from (3) we know that

### 2.3 Asymptotic sampling formula

As mentioned in the Introduction, although a closed-form formula for q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is not known for an arbitrary \rho, previously we [Jenkins and Song (2009; 2010)] were able to make progress by posing, for large \rho, an asymptotic expansion of the form

 q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=q_{0}({\mathbf{a}},{\mathbf{b}},{% \mathbf{c}})+\frac{q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})}{\rho}+\frac{% q_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})}{\rho^{2}}+\cdots, (7)

where the coefficients q_{k}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), for all k\geq 0, are independent of \rho. We summarize our previous results in the following theorem, specialized to the case of finite-alleles mutation, which is our interest here.

###### Theorem 2.1 ([Jenkins and Song (2009)])

In the asymptotic expan-sion (7) of the neutral two-locus sampling formula, the zeroth-order term is given by

 q_{0}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=q^{A}({\mathbf{a}}+{\mathbf{c}}_% {A})q^{B}({\mathbf{b}}+{\mathbf{c}}_{B}), (8)

and the first-order term is given by

 \displaystyle q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) \displaystyle= \displaystyle\pmatrix{c\cr 2}q^{A}({\mathbf{a}}+{\mathbf{c}}_{A})q^{B}({% \mathbf{b}}+{\mathbf{c}}_{B}) (9) \displaystyle{}-q^{B}({\mathbf{b}}+{\mathbf{c}}_{B})\sum_{i=1}^{K}\pmatrix{c_{% i\cdot}\cr 2}q^{A}({\mathbf{a}}+{\mathbf{c}}_{A}-{\mathbf{e}}_{i}) \displaystyle{}-q^{A}({\mathbf{a}}+{\mathbf{c}}_{A})\sum_{j=1}^{L}\pmatrix{c_{% \cdot j}\cr 2}q^{B}({\mathbf{b}}+{\mathbf{c}}_{B}-{\mathbf{e}}_{j}) \displaystyle{}+\sum_{i=1}^{K}\sum_{j=1}^{L}\pmatrix{c_{ij}\cr 2}q^{A}({% \mathbf{a}}+{\mathbf{c}}_{A}-{\mathbf{e}}_{i})q^{B}({\mathbf{b}}+{\mathbf{c}}_% {B}-{\mathbf{e}}_{j}),

where {\mathbf{e}}_{i} is a unit vector with a 1 at the ith entry and 0’s elsewhere. Furthermore, the second-order term can be decomposed as

 q_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=\sigma({\mathbf{a}},{\mathbf{b}}% ,{\mathbf{c}})+q_{2}({\mathbf{a}}+{\mathbf{c}}_{A},{\mathbf{b}}+{\mathbf{c}}_{% B},\mathbf{0}), (10)

where \sigma({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is known analytically and q_{2}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) satisfies the recursion relation

with boundary conditions q_{2}({\mathbf{e}}_{i},\mathbf{0},\mathbf{0})=0, q_{2}(\mathbf{0},{\mathbf{e}}_{j},\mathbf{0})=0 and q_{2}({\mathbf{e}}_{i},{\mathbf{e}}_{j},\mathbf{0})=0 for all i\in[K] and j\in[L].

The expression for \sigma({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) can be found in Jenkins and Song (2009) and we do not reproduce it here. Notice that q_{0}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) and q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) exhibit universality: their dependence on the model of mutation is subsumed entirely into the one-locus sampling distributions.

The proof of Theorem 2.1 used coalescent arguments. By considering the most recent event back in time in the coalescent process for the sample, it is possible to write down a recursion relation for the sampling distribution. In a two-locus, finite-alleles model, the appropriate recursion is a simple modification of the one introduced by Golding (1984) for the infinite-alleles model, also studied in detail by Ethier and Griffiths (1990). By substituting (7) into this recursion, after some lengthy algebraic manipulation one can obtain the expressions given in Theorem 2.1 [see Jenkins and Song (2009; 2010) for details].

## 3 Arbitrary-order asymptotic expansion

The approach described in Section 2.3 does not generalize easily. In what follows, we introduce a new approach based on the diffusion approximation. This new method is more transparent and more easily generalizable than the one used previously, and we illustrate this point by developing a method for computing the asymptotic expansion (7) to an arbitrary order.

### 3.1 Diffusion approximation of the two-locus model

Our approach is based on the diffusion process that is dual to the coalescent process. The generator for the two-locus finite-alleles diffusion process is

 \displaystyle\mathscr{L} \displaystyle= \displaystyle\frac{1}{2}\sum_{i=1}^{K}\sum_{j=1}^{L}\Biggl{[}\sum_{k=1}^{K}% \sum_{l=1}^{L}x_{ij}(\delta_{ik}\delta_{jl}-x_{kl})\,\frac{\partial}{\partial x% _{kl}}+\theta_{A}\sum_{k=1}^{K}x_{kj}(P_{ki}^{A}-\delta_{ik}) (12) \displaystyle                \hphantom{\frac{1}{2}\sum_{i=1}^{K}\sum_{j=1}^{L}% \Biggl{[}}{}+\theta_{B}\sum_{l=1}^{L}x_{il}(P_{lj}^{B}-\delta_{jl})+\rho(x_{i% \cdot}x_{\cdot j}-x_{ij})\Biggr{]}\,\frac{\partial}{\partial x_{ij}},

where \delta_{ik} is the Kronecker delta. For notational convenience, henceforth, where not specified otherwise, the indices i and k are assumed to take value in [K], while the indices j and l are assumed to take value in [L].

In what follows, we change to a new set of variables that capture the decay of dependence between the two loci, an approach originally due to Ohta and Kimura (1969a; 1969b). Specifically, the key quantity of interest is the following definition. {definition} The linkage disequilibrium (LD) between allele i at locus A and allele j at locus B is given by

 d_{ij}=x_{ij}-x_{i\cdot}x_{\cdot j}.

The collection of (K+1)(L+1)-1 new variables is

 (x_{1\cdot},\ldots,x_{K\cdot};x_{\cdot 1},\ldots,x_{\cdot L};d_{11},\ldots,d_{% \mathit{KL}}).

The diffusion is then constrained to the (KL-1)-dimensional simplex \Delta_{K\times L} by imposing the conditions

### 3.2 Rescaling LD

Since we are interested in the large \rho limit, we expect each d_{ij} to be small. We introduce the rescaling \widetilde{d}_{ij}=\sqrt{\rho}d_{ij}. The reason for this choice should become clear from (14) below; in the resulting generator, there should be a nontrivial interaction between recombination and genetic drift, that is, they should both act on the fastest timescale. The leading-order contribution to the generator, denoted below by L^{(2)}, has contributions from both of these biological processes if and only if we use this choice for \widetilde{d}_{ij}. See Song and Song (2007) for another example of this rescaling. By substituting for the new variables in (12) and using (13) for extensive simplification, the generator can be expressed as

 \widetilde{\mathscr{L}}=\frac{1}{2}\biggl{[}\rho L^{(2)}+\sqrt{\rho}L^{(1)}+L^% {(0)}+\frac{1}{\sqrt{\rho}}L^{(-1)}\biggr{]}, (14)

where the operators in (14) are given by

 \displaystyle L^{(2)} \displaystyle= \displaystyle\sum_{i,j}\biggl{\{}x_{i\cdot}x_{\cdot j}\biggl{[}\sum_{k,l}(% \delta_{ik}-x_{k\cdot})(\delta_{jl}-x_{\cdot l})\,\frac{\partial}{\partial% \widetilde{d}_{kl}}\biggr{]}-\widetilde{d}_{ij}\biggr{\}}\,\frac{\partial}{% \partial\widetilde{d}_{ij}}, \displaystyle L^{(1)} \displaystyle= \displaystyle\sum_{i,j,k,l}[2\widetilde{d}_{ij}(\delta_{ik}-x_{k\cdot})(\delta% _{jl}-x_{\cdot l})-\delta_{ik}\delta_{jl}\widetilde{d}_{ij}+2\widetilde{d}_{il% }x_{k\cdot}x_{\cdot j}]\,\frac{\partial^{2}}{\partial\widetilde{d}_{kl}\,% \partial\widetilde{d}_{ij}}, \displaystyle L^{(0)} \displaystyle= \displaystyle-\sum_{i,j,k,l}\widetilde{d}_{ij}\widetilde{d}_{kl}\,\frac{% \partial^{2}}{\partial\widetilde{d}_{kl}\,\partial\widetilde{d}_{ij}}+2\sum_{i% ,k,l}[(\delta_{ik}-x_{i\cdot})\widetilde{d}_{kl}-\widetilde{d}{}_{il}x_{k\cdot% }]\,\frac{\partial^{2}}{\partial\widetilde{d}_{kl}\,\partial x_{i\cdot}} \displaystyle{}+2\sum_{j,k,l}[(\delta_{jl}-x_{\cdot j})\widetilde{d}_{kl}-% \widetilde{d}_{kj}x_{\cdot l}]\,\frac{\partial^{2}}{\partial\widetilde{d}_{kl}% \,\partial x_{\cdot j}} \displaystyle{}+\sum_{i,k}x_{i\cdot}(\delta_{ik}-x_{k\cdot})\,\frac{\partial^{% 2}}{\partial x_{k\cdot}\,\partial x_{i\cdot}}+\sum_{j,l}x_{\cdot j}(\delta_{jl% }-x_{\cdot l})\,\frac{\partial^{2}}{\partial x_{\cdot l}\,\partial x_{\cdot j}} \displaystyle{}+\sum_{i,j}\biggl{[}\theta_{A}\sum_{k}\widetilde{d}_{kj}(P_{ki}% ^{A}-\delta_{ik})+\theta_{B}\sum_{l}\widetilde{d}_{il}(P_{lj}^{B}-\delta_{jl})% -2\widetilde{d}_{ij}\biggr{]}\,\frac{\partial}{\partial\widetilde{d}_{ij}} \displaystyle{}+\frac{\theta_{A}}{2}\sum_{i,k}x_{k\cdot}(P_{ki}^{A}-\delta_{ik% })\,\frac{\partial}{\partial x_{i\cdot}}+\frac{\theta_{B}}{2}\sum_{j,l}x_{% \cdot l}(P_{lj}^{B}-\delta_{jl})\,\frac{\partial}{\partial x_{\cdot j}}, \displaystyle L^{(-1)} \displaystyle= \displaystyle 2\sum_{i,j}\widetilde{d}_{ij}\,\frac{\partial^{2}}{\partial x_{i% \cdot}\,\partial x_{\cdot j}}.

This generator extends that of Ohta and Kimura (1969a) from a (2\times 2)- to a (K\times L)-allele model. Note that ours differs from that of Ohta and Kimura (1969a) by a factor of two; one unit of time corresponds to 2N (rather than N) generations in our convention.

Recall that our interest is in calculating the expectation at stationarity of the function F({\mathbf{x}};{\mathbf{n}}) shown in (5), which is now viewed as a function of

 \widetilde{{\mathbf{x}}}=(x_{1\cdot},\ldots,x_{K\cdot};x_{\cdot 1},\ldots,x_{% \cdot L};\widetilde{d}_{11},\ldots,\widetilde{d}_{\mathit{KL}}).

In the same way that the multiplicity matrix {\mathbf{c}} represents multinomial samples from a population with frequencies (x_{ij})_{i\in[K],j\in[L]}, we introduce an analogous matrix {\mathbf{r}}=(r_{ij})_{i\in[K],j\in[L]} associated with the variables (\widetilde{d}_{ij})_{i\in[K],j\in[L]}. We further define the marginal vectors {\mathbf{r}}_{A}=(r_{i\cdot})_{i\in[K]} and {\mathbf{r}}_{B}=(r_{\cdot j})_{j\in[L]}, where r_{i\cdot}=\sum_{j}r_{ij} and r_{\cdot j}=\sum_{i}r_{ij}, analogous to {\mathbf{c}}_{A} and {\mathbf{c}}_{B}. In this notation, the function F({\mathbf{x}};{\mathbf{n}}) becomes

 \displaystyle F(\widetilde{{\mathbf{x}}};{\mathbf{n}}) \displaystyle= \displaystyle\Biggl{(}\prod_{i=1}^{K}x_{i\cdot}^{a_{i}}\Biggr{)}\Biggl{(}\prod% _{j=1}^{L}x_{\cdot j}^{b_{j}}\Biggr{)}\Biggl{(}\prod_{i=1}^{K}\prod_{j=1}^{L}% \biggl{[}\frac{\widetilde{d}_{ij}}{\sqrt{\rho}}+x_{i\cdot}x_{\cdot j}\biggr{]}% ^{c_{ij}}\Biggr{)} (15) \displaystyle= \displaystyle\sum_{m=0}^{c}\frac{1}{\rho^{{m/2}}}\!\sum_{{\mathbf{r}}\in{% \mathcal{P}}_{m}}\biggl{[}\prod_{i,j}\!\pmatrix{c_{ij}\cr r_{ij}}\biggr{]}G^{(% m)}(\widetilde{{\mathbf{x}}};{\mathbf{a}}+{\mathbf{c}}_{A}-{\mathbf{r}}_{A},{% \mathbf{b}}+{\mathbf{c}}_{B}-{\mathbf{r}}_{B},{\mathbf{r}}),

where

 G^{(m)}(\widetilde{{\mathbf{x}}};{\mathbf{a}},{\mathbf{b}},{\mathbf{r}})=% \Biggl{(}\prod_{i=1}^{K}x_{i\cdot}^{a_{i}}\Biggr{)}\Biggl{(}\prod_{j=1}^{L}x_{% \cdot j}^{b_{j}}\Biggr{)}\Biggl{(}\prod_{i=1}^{K}\prod_{j=1}^{L}\widetilde{d}_% {ij}^{r_{ij}}\Biggr{)}, (16)

and the inner summation in (15) is over all K\times L matrices {\mathbf{r}} of nonnegative integers whose entries sum to m:

 {\mathcal{P}}_{m}=\Biggl{\{}{\mathbf{r}}\in\mathbb{N}^{K\times L}\dvtx\sum_{i=% 1}^{K}\sum_{j=1}^{L}r_{ij}=m\Biggr{\}}.

Note that only those matrices which form “subsamples” of {\mathbf{c}} have nonzero coefficient in (15); that is, 0\leq r_{ij}\leq c_{ij} for all i and j.

### 3.3 The key algorithm

We now pose an asymptotic expansion for the expectation \mathbb{E}[G^{(m)}(\widetilde{\mathbf{X}};{\mathbf{a}},{\mathbf{b}},{\mathbf{r% }})]:

 \displaystyle\mathbb{E}\bigl{[}G^{(m)}(\widetilde{\mathbf{X}};{\mathbf{a}},{% \mathbf{b}},{\mathbf{r}})\bigr{]} \displaystyle= \displaystyle g_{0}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})+\frac{g_{1}^% {(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})}{\sqrt{\rho}} (17) \displaystyle{}+\frac{g_{2}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})}{% \rho}+\cdots,

so that, using (15), the quantity of interest is given by

 \displaystyle q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) \displaystyle= \displaystyle\mathbb{E}[F(\widetilde{\mathbf{X}};{\mathbf{n}})] (18) \displaystyle= \displaystyle\sum_{m=0}^{c}\sum_{{\mathbf{r}}\in{\mathcal{P}}_{m}}\Biggl{[}% \prod_{i=1}^{K}\prod_{j=1}^{L}\pmatrix{c_{ij}\cr r_{ij}}\Biggr{]} \displaystyle\hphantom{\sum_{m=0}^{c}\sum_{{\mathbf{r}}\in{\mathcal{P}}_{m}}}{% }\times\sum_{u=0}^{\infty}\frac{g^{(m)}_{u}({\mathbf{a}}+{\mathbf{c}}_{A}-{% \mathbf{r}}_{A},{\mathbf{b}}+{\mathbf{c}}_{B}-{\mathbf{r}}_{B},{\mathbf{r}})}{% \rho^{(m+u)/2}}.

We also have the boundary conditions

where \bolds{\pi}^{A}=(\pi^{A}_{i})_{i\in[K]} and \bolds{\pi}^{B}=(\pi^{B}_{j})_{j\in[L]} are the stationary distributions of \mathbf{P}^{A} and \mathbf{P}^{B}, respectively.

Using (18) and (19), we can also assign boundary conditions for each g^{(0)}_{u}({\mathbf{a}},{\mathbf{b}},\mathbf{0}):

We have reduced the problem of computing an asymptotic expansion for q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) to one of computing g^{(m)}_{u}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) for each m and u. Consider arranging these quantities in a c\times\mathbb{N} array, as illustrated in Figure 1.

Refer to entries on the \ellth anti-diagonal, such that m+u=\ell, as residing on the \ellth level. As is clear from (18), the contribution in the expansion for q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) of order \rho^{-\ell/2} is comprised of entries on level \ell. For convenience, we define g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})=0 if u<0, or m<0, or if any entry a_{i},b_{j},r_{ij}<0. Then the following theorem, proved in Section 7.1, enables the level-wise computation of each g_{u}^{(m)}.

###### Theorem 3.1

The term g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) in the right-hand side of (17) is determined as follows. {longlist} For m=0 and u=0, g_{0}^{(0)}({\mathbf{a}},{\mathbf{b}},\mathbf{0})=q^{A}({\mathbf{a}})q^{B}({% \mathbf{b}}). [Recall that q^{A}({\mathbf{a}}) and q^{B}({\mathbf{b}}) are the respective one-locus sampling distributions at locus A and locus B.] For m>0 and u\geq 0, g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) on level \ell is determined by at most four entries on level \ell-2; these are g_{u}^{(m-2)}, g_{u-1}^{(m-1)}, g_{u-2}^{(m)} and g_{u-3}^{(m+1)}. This relationship is given explicitly [equation (7.1)] in Section 7. For m=0 and u\geq 1, g_{u}^{(0)}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) on level \ell is determined by g_{u-1}^{(1)}, also on level \ell, and other similar terms g_{u}^{(0)}({\mathbf{a}}^{\prime},{\mathbf{b}}^{\prime},\mathbf{0}), where a^{\prime}\leq a, b^{\prime}\leq b. This relationship is given explicitly [equation (7.1)] in Section 7.

For odd levels (i.e., \ell=1,3,5,\ldots), the above theorem implies the following vanishing result, which we prove in Section 7.2.

###### Corollary 3.1

If m+u=\ell is odd, then g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})=0, for all configurations ({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}).

Incidentally, Corollary 3.1 implies that only integral powers of \rho have nonzero coefficients in (18).

Given the entries on level \ell-2, Theorem 3.1 provides a method for computing each of the entries on level \ell. They can be computed in any order, apart from the slight complication of (iii), which requires knowledge of g_{u-1}^{(1)} as a prerequisite to g_{u}^{(0)}. The expression for g_{u}^{(0)}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) is only known recursively in {\mathbf{a}} and {\mathbf{b}}, and we do not have a closed-form solution for this recursion for u\geq 4 and even. Equation (2.1) provides one example, which turns out to be the recursion for g_{4}^{(0)}({\mathbf{a}},{\mathbf{b}},\mathbf{0}). If the marginal one-locus sampling distributions are known, the complexity of computing {g}^{(m)}_{u}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) for m>0 does not depend on the sample configuration ({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}). In contrast, the complexity of computing g^{(0)}_{u}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) depends on {\mathbf{a}} and {\mathbf{b}}, and the running time generally grows with sample size. However, in Section 6 we show that ignoring g^{(0)}_{u}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) generally leads to little loss of accuracy in practice.

To illustrate the method, entries on the first two even levels are summarized in Table 1. These recapitulate part of the results given in Theorem 2.1. The last column of Table 1 gives the “contribution” to (7) from g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) (for fixed u and m and summing over the relevant {\mathbf{r}}). According to (18), this quantity is

 \sum_{{\mathbf{r}}\in{\mathcal{P}}_{m}}\Biggl{[}\prod_{i=1}^{K}\prod_{j=1}^{L}% \pmatrix{c_{ij}\cr r_{ij}}\Biggr{]}g_{u}^{(m)}({\mathbf{a}}+{\mathbf{c}}_{A}-{% \mathbf{r}}_{A},{\mathbf{b}}+{\mathbf{c}}_{B}-{\mathbf{r}}_{B},{\mathbf{r}}). (21)

We have also checked that the total contribution from entries on level \ell=4 is equal to q_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), as given in Theorem 2.1. We note in passing that Theorem 3.1 makes it transparent why q_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is not universal in the sense described in Section 2.3: expressions on level \ell=4 depend directly on L^{(0)}, which in turn depends upon the model of mutation. By contrast, the nonzero contribution to q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), for example, is determined by L^{(2)}, which does not depend on the model of mutation.

It is important to emphasize that g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) is a function of the vectors {\mathbf{a}}, {\mathbf{b}}, the matrix {\mathbf{r}} and (implicitly) the parameters \theta_{A} and \theta_{B}. The relationships given in Theorem 3.1 are thus functional, and only need to be computed once. In other words, all of these arguments of g_{u}^{(m)} can remain arbitrary. It is not necessary to redo any of the algebraic computations for each particular choice of sample configuration, for example. Moreover, the solutions to each g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) are expressed concisely in terms of the marginal one-locus sampling distributions q^{A} and q^{B}; this fact follows inductively from the solution for g_{0}^{(0)}({\mathbf{a}},{\mathbf{b}},\mathbf{0}). Unlike the method of Jenkins and Song (2009; 2010), the iterative procedure here is essentially the same at every step.

## 4 Partial sums, optimal truncation and Padé approximants

In principle, the procedure described in the previous section provides a method of computing an arbitrary number of terms in the asymptotic expansion (7), for any sample configuration. Suppose the computation has been carried out up to level \ell=2M and consider the partial sum

 q^{(M)}_{\mathrm{PS}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=q_{0}({\mathbf{a% }},{\mathbf{b}},{\mathbf{c}})+\frac{q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c% }})}{\rho}+\cdots+\frac{q_{M}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})}{\rho^{M% }}. (22)

Since we do not know its radius of convergence, we should be prepared for this sum to diverge eventually as M increases. An important question that we address in this section is: How many terms should we use to maximize the accuracy of the approximation?

### 4.1 Optimal truncation

As mentioned above, simply adding more and more terms to the asymptotic expansion may decrease the accuracy beyond a certain point. Optimal truncation is a rule of thumb for truncating the partial sum at a point that is expected to maximize its accuracy. More precisely, it is defined as follows. {definition}[(Optimal truncation rule)] Given the first M+1 terms q_{0}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}),\ldots,q_{M}({\mathbf{a}},{% \mathbf{b}},{\mathbf{c}}), in the asymptotic expansion, let M^{\prime} be the index such that |{q_{M^{\prime}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})}/{\rho{}^{M^{\prime}}% }|<|{q_{M^{\prime\prime}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})}/{\rho^{M^{% \prime\prime}}}|, for all M^{\prime\prime}\neq M^{\prime}, where M^{\prime},M^{\prime\prime}\leq M. Then, the optimal truncation rule (OTR) suggests truncating the sum at order M^{\prime}:

 q^{(M)}_{\mathrm{OTR}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=q_{0}({\mathbf{% a}},{\mathbf{b}},{\mathbf{c}})+\frac{q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{% c}})}{\rho}+\cdots+\frac{q_{M^{\prime}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}% )}{\rho^{M^{\prime}}}.

The motivation for this rule is that the magnitude of the ith term in the expansion is an estimate of the magnitude of the remainder. More sophisticated versions of the OTR are available [e.g., Dingle (1973), Chapter XXI], but for simplicity we focus on the definition given above.

There are two issues with the OTR. First, it minimizes the error only approximately, and so, despite its name, it is not guaranteed to be optimal. For example, the magnitude of the first few terms in the series can behave very irregularly before a pattern emerges. Second, it may use only the first few terms in the expansion and discard the rest. As we discuss later, for some sample configurations and parameter values of interest, the OTR might truncate very early, even as early as M^{\prime}=2. This is unrelated to the first issue, since the series may indeed begin to diverge very early. Below, we discuss a better approximation scheme with a provable convergence property.

The key idea behind Padé approximants is to approximate the function of interest by a rational function. In contrast to the OTR, Padé approximants make use of all of the computed terms in the expansion, even when the expansion diverges rapidly. More precisely, the [U/V] Padé approximant of a function is defined as follows. {definition}[([U/V] Padé approximant)] Given a function f and two nonnegative integers U and V, the [U/V] Padé approximant of f is a rational function of the form

 [U/V]_{f}(x)=\frac{A_{0}+A_{1}x+\cdots+A_{U}x^{U}}{B_{0}+B_{1}x+\cdots+B_{V}x^% {V}},

such that B_{0}=1 and

 f(x)-[U/V]_{f}(x)=O(x^{U+V+1}).

That is, the first U+V+1 terms in a Maclaurin series of the Padé approximant [U/V]_{f}(x) matches the first U\!+\!V\!+\!1 terms in a Maclaurin series of f.

Our goal is to approximate the sampling distribution q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) by the Padé approximant

such that the first U+V+1 terms in a Maclaurin series of [U/V]_{q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})}(\frac{1}{\rho}) agrees with (22), where M=U+V. [In this notation, [U/V]_{q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})}(\frac{1}{\rho}) is an implicit function of the mutation parameters.] As more terms in (22) are computed (i.e., as M increases), a sequence of Padé approximants can be constructed. This sequence often has much better convergence properties than (22) itself [Baker and Graves-Morris (1996)].

For a given M, there is still some freedom over the choice of U and V. As M increases, we construct the following “staircase” sequence of Padé approximants: [0/0],[0/1],[1/1],[1/2],[2/2],\ldots. This scheme is motivated by the following lemma, proved in Section 7.3.

###### Lemma 4.1

Under a neutral, finite-alleles model, the sampling distribution q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is a rational function of 1/\rho, and the degree of the numerator is equal to the degree of the denominator.

This simple yet powerful observation immediately leads to a convergence result for the Padé approximants in the following manner.

###### Theorem 4.1

Consider a neutral, finite-alleles model. For every given two-locus sample configuration ({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), there exists a finite nonnegative integer U_{0} such that for all U\geq U_{0} and V\geq U_{0}, the Padé approximant q^{[U/V]}_{\mathrm{Pad}\mbox{{{\'{e}}}}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is exactly equal to q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) for all \rho\geq 0.

A proof of this theorem is provided in Section 7.4. Note that the staircase sequence is the “quickest” to reach the true sampling distribution q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}). Although Theorem 4.1 provides very strong information about the convergence of the Padé approximants, in practice U and V will have to be intractably large for such convergence to take place. The real value of Padé summation derives from the empirical observation that the approximants exhibit high accuracy even before they hit the true sampling distribution. The staircase sequence also has the advantage that it exhibits a continued fraction representation, which enables their construction to be made computationally more efficient [Baker and Graves-Morris (1996), Chapter 4].

## 5 Incorporating selection

We now incorporate a model of diploid selection into the results of Section 3. Suppose that a diploid individual is composed of two haplotypes (i,j), (k,l)\in[K]\times[L], and that, without loss of generality, selective differences exist at locus A. We denote the fitness of this individual by 1+s^{A}_{ik}, and consider the diffusion limit in which \sigma^{A}_{ik}=4Ns^{A}_{ik} is held fixed for each i,k\in[K] as N\to\infty.

In what follows, we use a subscript “s” to denote selectively nonneutral versions of the quantities defined above. Results will be given in terms of the nonneutral one-locus sampling distribution q^{A}_{\mathrm{s}} at locus A and the neutral one-locus sampling distribution q^{B} at locus B.

### 5.1 One-locus sampling distribution under selection

For the infinite-alleles model, one-locus sampling distributions under symmetric selection have been studied by Grote and Speed (2002), Handa (2005) and Huillet (2007). In the case of a parent-independent finite-alleles model, the stationary distribution of the one-locus selection model is known to be a weighted Dirichlet distribution [Wright (1949)],

 \pi^{A}_{\mathrm{s}}({\mathbf{x}})=D\Biggl{(}\prod_{i=1}^{K}x_{i}^{\theta_{A}P% ^{A}_{i}-1}\Biggr{)}\exp\Biggl{(}\frac{1}{2}\sum_{i=1}^{K}\sum_{k=1}^{K}\sigma% ^{A}_{ik}x_{i}x_{k}\Biggr{)},

where {\mathbf{x}}\in\Delta_{K} [see (2)] and D is a normalizing constant. The one-locus sampling distribution at stationarity is then obtained by drawing a multinomial sample from this distribution:

 q^{A}_{\mathrm{s}}({\mathbf{a}})=\mathbb{E}_{{\mathrm{s}}}\Biggl{[}\prod_{i=1}% ^{K}X_{i}^{a_{i}}\Biggr{]}. (24)

Thus, under a diploid selection model with parent-independent mutation, we are able to express the one-locus sampling distribution at least in integral form. There are two integrals that need to be evaluated: one for the expectation and the other for the normalizing constant. In practice, these integrals must be evaluated using numerical methods [Buzbas, Joyce and Abdo (2009)].

### 5.2 Two-locus sampling distribution with one locus under selection

To incorporate selection into our framework, we first introduce some further notation. {definition} Given two-locus population-wide allele frequencies {\mathbf{x}}\in\Delta_{K\times L} [see (4)], the mean fitness of the population at locus A is

 \bar{\sigma}^{A}({\mathbf{x}}_{A})=\sum_{i,k}\sigma^{A}_{ik}x_{i\cdot}x_{k% \cdot}.

Selection has an additive effect on the generator of the process, which is now given by

 \mathscr{L}_{{\mathrm{s}}}=\mathscr{L}+\frac{1}{2}\sum_{i,j}x_{ij}\biggl{[}% \sum_{k}\sigma^{A}_{ik}x_{k\cdot}-\bar{\sigma}^{A}({\mathbf{x}}_{A})\biggr{]}% \,\frac{\partial}{\partial x_{ij}},

where \mathscr{L} is the generator (12) of the neutral diffusion process [see, e.g., Ethier and Nagylaki (1989)]. Rewriting \mathscr{L}_{\mathrm{s}} in terms of the LD variables and then rescaling d_{ij} as before, we obtain

 \widetilde{\mathscr{L}}_{\mathrm{s}}=\widetilde{\mathscr{L}}+\frac{1}{2}\biggl% {[}\rho L_{\mathrm{s}}^{(2)}+\sqrt{\rho}L_{\mathrm{s}}^{(1)}+L_{\mathrm{s}}^{(% 0)}+\frac{1}{\sqrt{\rho}}L_{\mathrm{s}}^{(-1)}\biggr{]},

where \widetilde{\mathscr{L}} is as in (14), and the new contributions are

 \displaystyle L_{\mathrm{s}}^{(2)} \displaystyle= \displaystyle 0, \displaystyle L_{\mathrm{s}}^{(1)} \displaystyle= \displaystyle 0, \displaystyle L_{\mathrm{s}}^{(0)} \displaystyle= \displaystyle\sum_{i,j}\biggl{[}d_{ij}\biggl{(}\sum_{k}\sigma_{ik}^{A}x_{k% \cdot}-\bar{\sigma}^{A}({\mathbf{x}}_{A})\biggr{)}-\sum_{k,k^{\prime}}d_{kj}% \sigma_{kk^{\prime}}^{A}x_{i\cdot}x_{k^{\prime}\cdot}\biggr{]}\,\frac{\partial% }{\partial d_{ij}} \displaystyle{}+\sum_{i}x_{i\cdot}\biggl{(}\sum_{k}\sigma_{ik}^{A}x_{k\cdot}-% \bar{\sigma}^{A}({\mathbf{x}}_{A})\biggr{)}\,\frac{\partial}{\partial x_{i% \cdot}}, \displaystyle L_{\mathrm{s}}^{(-1)} \displaystyle= \displaystyle\sum_{i,j,k}\widetilde{d}_{ij}\sigma_{ik}^{A}x_{k\cdot}\,\frac{% \partial}{\partial x_{\cdot j}}.

In addition, we replace the asymptotic expansion (17) with

 \displaystyle\mathbb{E}_{\mathrm{s}}\bigl{[}G^{(m)}(\widetilde{\mathbf{X}};{% \mathbf{a}},{\mathbf{b}},{\mathbf{r}})\bigr{]} \displaystyle= \displaystyle h_{0}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})+\frac{h_{1}^% {(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})}{\sqrt{\rho}} \displaystyle{}+\frac{h_{2}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})}{% \rho}+\cdots,

the corresponding expansion for the expectation of G^{(m)}(\widetilde{\mathbf{X}};{\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) with respect to the stationary distribution under selection at locus A. Finally, the boundary conditions (3.3) become the following [Fearnhead (2003)]:

where \bolds{\phi}^{A}=(\phi^{A}_{i})_{i\in[K]} is the stationary distribution for drawing a sample of size one from a single selected locus.

With only minor modifications to the arguments of Section 3, each term in the array for h_{u}^{(m)} can be computed in a manner similar to Theorem 3.1. In particular, entries on odd levels are still zero. Furthermore, as proved in Section 7.5, we can update Theorem 2.1 as follows.

###### Theorem 5.1

Suppose locus A is under selection, while locus B is selectively neutral. Then, in the asymptotic expansion (7) of the two-locus sampling distribution, the zeroth- and first-order terms are given by (8) and (9), respectively, with q^{A}_{\mathrm{s}}({\mathbf{a}}) in place of q^{A}({\mathbf{a}}). Furthermore, the second-order term (10) may now be decomposed into two parts:

 q_{2,{\mathrm{s}}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})=q_{2,{\mathrm{s}}}(% {\mathbf{a}}+{\mathbf{c}}_{A},{\mathbf{b}}+{\mathbf{c}}_{B},\mathbf{0})+\sigma% _{\mathrm{s}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), (26)

where \sigma_{\mathrm{s}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is given by a known analytic expression and q_{2,{\mathrm{s}}}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) satisfies a slightly modified version of the recursion relation (2.1) for q_{2}({\mathbf{a}},{\mathbf{b}},\mathbf{0}). (These expressions are omitted for brevity.)

We remark that the above arguments can be modified to allow for locus B also to be under selection, provided the selection is independent, with no epistatic interactions, and provided one can substitute \phi_{j}^{B} for \pi_{j}^{B} in (5.2). Then, one could also simply substitute q^{B}_{\mathrm{s}}({\mathbf{b}}) for q^{B}({\mathbf{b}}) in the expressions for q_{0}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) and q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}). However, for the boundary conditions (5.2) to be modified in this way we would need to extend the result of Fearnhead (2003) to deal with two nonneutral loci, and we are unaware of such a result in the literature.

## 6 Empirical study of accuracy

In this section, we study empirically the accuracy of the approximate sampling distributions discussed in Section 4.

### 6.1 Computational details

As discussed earlier, a major advantage of our technique is that, given the first M terms in the asymptotic expansion (7), the (M+1)th term can be found and has to be computed only once. There are two complications to this statement: first, as mentioned in the discussion following Theorem 3.1, the Mth-order term q_{M} for M\geq 2 has a contribution [namely, g^{(0)}_{2M}({\mathbf{a}},{\mathbf{b}},\mathbf{0})] that is not known in closed form, and is only given recursively. [Recall that the M=1 case is an exception, with {g}^{(0)}_{2}({\mathbf{a}},{\mathbf{b}},\mathbf{0})=0 for all ({\mathbf{a}},{\mathbf{b}},\mathbf{0}).] In Jenkins and Song (2009) it was observed that the contribution of {g}^{(0)}_{4}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) to q_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is generally very small, but that its burden in computational time increases with sample size. Extrapolating this observation to higher-order terms, we consider making the following approximation. {approximation} For all M\geq 2, assume

 {g}^{(0)}_{2M}({\mathbf{a}},{\mathbf{b}},\mathbf{0})\approx 0

for all configurations ({\mathbf{a}},{\mathbf{b}},\mathbf{0}).

As we show presently, adopting this approximation has little effect on the accuracy of asymptotic sampling distributions. In what follows, we use the symbol “\accentset{\circ}{\phantom{q}}” to indicate when the above approximation is employed. For example, the partial sum q^{(M)}_{\mathrm{PS}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) in (22) becomes \accentset{\circ}{q}^{(M)}_{\mathrm{PS}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) under Approximation 6.1.

Upon making the above approximation, it is then possible to construct a closed-form expression for each subsequent term \accentset{\circ}{q}_{M}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}). However, there is a second issue: Given the effort required to reach the complicated expression for \accentset{\circ}{q}_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) [Jenkins and Song (2009)], performing the computation by hand for M>2 does not seem tractable. Symbolic computation using computer software such as Mathematica is a feasible option, but we defer this for future work. Here, we are interested in comparing the accuracy of asymptotic sampling distributions with the true likelihood. Therefore, we have implemented an automated numerical computation of each subsequent term in the asymptotic expansion, for a given fixed sample configuration and fixed mutation parameters. For the samples investigated below, this did not impose undue computational burden, even when repeating this procedure across all samples of a given size. Exact numerical computation of the true likelihood is possible for only small sample sizes (say, up to thirty), so we restrict our study to those cases.

For simplicity, we assume in our empirical study that all alleles are selectively neutral. Furthermore, we assume a symmetric, PIM model so that \mathbf{P}^{A}=\mathbf{P}^{B}=\left({1/2\atop 1/2}\enskip{1/2\atop 1/2}\right) and take \theta_{A}=\theta_{B}=0.01. This is a reasonable approximation for modeling neutral single nucleotide polymorphism (SNP) data in humans [e.g., McVean, Awadalla and Fearnhead (2002)]. For the PIM model, recall that the marginal one-locus sampling distributions are available in closed-form, as shown in (6).

### 6.2 Rate of convergence: An example

To compare the convergence of the sequence of partial sums (22) with that of the sequence of Padé approximants (23), we re-examine in detail an example studied previously. The sample configuration for this example is {\mathbf{a}}={\mathbf{b}}=\mathbf{0}, {\mathbf{c}}=\left({10\atop 2}\enskip{7\atop 1}\right). In Jenkins and Song (2009), we were able to compute the first three terms in the asymptotic expansion, obtaining the partial sum q^{(2)}_{\mathrm{PS}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}). Applying the new method described in this paper, we computed q^{(M)}_{\mathrm{PS}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) for M\leq 11 [including the recursive terms g^{(0)}_{u}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) discussed above], and also the corresponding staircase Padé approximants q^{[U/V]}_{\mathrm{Pad}\mbox{{{\'{e}}}}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}). Results are illustrated in Figure 2, in which we compare various approximations of the likelihood curve for \rho with the true likelihood curve. Here, the likelihood of a sample is defined simply as its sampling distribution q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) treated as a function of \rho, with \theta_{A} and \theta_{B} fixed at 0.01.

Figure 2(a) exhibits a number of features which we also observed more generally for many other samples. For fixed \rho in the range illustrated, the sequence (q^{(0)}_{\mathrm{PS}},q^{(1)}_{\mathrm{PS}},q^{(2)}_{\mathrm{PS}},\ldots) of partial sums diverges eventually, and, for many realistic choices of \rho, this divergence can be surprisingly early. Arguably the best overall fit in the figure is q^{(2)}_{\mathrm{PS}}, though for \rho\geq 70 good accuracy is maintained by q^{(M)}_{\mathrm{PS}} for all 1\leq M\leq 10. Divergence is extremely rapid if we add any further terms; witness the curve for q^{(11)}_{\mathrm{PS}}. Of course, in real applications the true likelihood will be unavailable and we might rely on the aforementioned optimal truncation rule to guide the truncation point. Here, it performs reasonably well, correctly picking the most accurate index across most of the range [compare the bottom two rows of indices in Figure 2(a)].

In contrast, Figure 2(b) shows that there is much less risk in using “too many” terms in constructing the Padé approximants. They approximate the true likelihood very accurately and very quickly; to the naked eye, Padé approximants q^{[U/V]}_{\mathrm{Pad}\mbox{{{\'{e}}}}} for most [U/V] are indistinguishable from the true likelihood curve. Indeed, this example suggests that there is very little gain in accuracy beyond q^{[0/1]}_{\mathrm{Pad}\mbox{{{\'{e}}}}}, but that there is no significant loss beyond it either. (It should be pointed out, however, that achievement of such high accuracy so early in the sequence of approximants does not seem to be a common occurrence across samples. Usually, several more terms are required; see next section for further details.)

There is one further important observation to be made regarding Padé approximants. There is nothing to prevent the polynomial in the denominator from having positive real roots, thus creating singularities. This is indeed observed once in Figure 2(b); q^{[2/3]}_{\mathrm{Pad}\mbox{{{\'{e}}}}} exhibits a singularity at \rho=0.9. To examine this behavior further, in Table 2 we tabulate the nonnegative real roots of the numerator and denominator of each approximant in the staircase sequence up to [10/10]. We see some interesting patterns: (i) The total number of nonnegative real roots does not seem to grow with M. (ii) Roots in the denominator are almost invariably accompanied by a nearby root in the numerator, and their proximity is usually extremely close, with agreement to several decimal places. (iii) Pairs of roots appear transiently; the roots of one approximant in the sequence provide almost no information regarding the next. Such pairs of roots are known as defects, and are an inevitable feature of Padé approximants [Baker and Graves-Morris (1996), page 48]. Provided we anticipate them, defects need not act as a serious obstacle. Because of the proximity of the two roots in a pair, they almost behave like removable singularities. In particular, Padé approximants are still well behaved outside a reasonably narrow interval around the defect, and can still approximate the likelihood accurately. In light of these observations, henceforth we use the following simple heuristic for dealing with defects. {definition}[(Defect heuristic)] Suppose we are interested in approximating the likelihood curve at a particular value \rho_{0} and have the resources to compute up to M in the partial sum (22). Then proceed as follows. {longlist}[(3)] (1) Initialize with M^{\prime\prime}=M. (2) Construct the [U/V] Padé approximant in the staircase sequence, where U+V=M^{\prime\prime}. (3) If it exhibits a root in the interval (\rho_{0}-\varepsilon,\rho_{0}+\varepsilon)\cap[0,\infty), either in its numerator or denominator, then decrement M^{\prime\prime} by one and go to step (2), otherwise use this approximant. Choice of threshold \varepsilon involves a trade-off between the disruption caused by the nearby defect and the loss incurred by reverting to the previous Padé approximant. Throughout the following section, we employ this heuristic with \varepsilon=25, which seemed to work well.

### 6.3 Rate of convergence: Empirical study

To investigate to what extent the observations of Section 6.2 hold across all samples, we performed the following empirical study. Following Jenkins and Song (2009), we focused on samples of the form (\mathbf{0},\mathbf{0},{\mathbf{c}}) for which all alleles are observed at both loci, and measured the accuracy of the partial sum (22) by the unsigned relative error

 e_{\mathrm{PS}}^{(M)}(\mathbf{0},\mathbf{0},{\mathbf{c}})=\biggl{|}\frac{q^{(M% )}_{\mathrm{PS}}(\mathbf{0},\mathbf{0},{\mathbf{c}})-q(\mathbf{0},\mathbf{0},{% \mathbf{c}})}{q(\mathbf{0},\mathbf{0},{\mathbf{c}})}\biggr{|}\times 100\%.

An analogous definition can be made for e_{{\mathrm{Pad}\mbox{{{\'{e}}}}}}^{(M)}, the unsigned relative error of the staircase Padé approximants q^{[U/V]}_{\mathrm{Pad}\mbox{{{\'{e}}}}}, where U=\lfloor M/2\rfloor and V=\lceil M/2\rceil. When Approximation 6.1 is additionally used, the respective unsigned errors are denoted \accentset{\circ}{e}_{\mathrm{PS}}^{(M)} and \accentset{\circ}{e}_{\mathrm{Pad}\mbox{{{\'{e}}}}}^{(M)}. These quantities are implicit functions of the parameters and of the sample configuration. In our study, we focused on sufficiently small sample sizes so that the true sampling distribution q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) could be computed. Specifically, we computed q(\mathbf{0},\mathbf{0},{\mathbf{c}}) for all samples of a given size c using the method described in Jenkins and Song (2009). By weighting samples according to their sampling probability, we may compute the distributions of e_{\mathrm{PS}}^{(M)} and e_{\mathrm{Pad}\mbox{{{\'{e}}}}}^{(M)}, and similarly the distributions of \accentset{\circ}{e}_{\mathrm{PS}}^{(M)} and \accentset{\circ}{e}_{\mathrm{Pad}\mbox{{{\'{e}}}}}^{(M)}. Table 3 summarizes the cumulative distributions of \accentset{\circ}{e}_{\mathrm{PS}}^{(M)} and \accentset{\circ}{e}_{\mathrm{Pad}\mbox{{{\'{e}}}}}^{(M)} for \rho=50, across all samples of size c=20 that are dimorphic at both loci. The corresponding table for e_{\mathrm{PS}}^{(M)} and e_{\mathrm{Pad}\mbox{{{\'{e}}}}}^{(M)} was essentially identical (not shown), with agreement usually to two decimal places. This confirms that utilizing Approximation 6.1 is justified.

Table 3 illustrates that the observations of Section 6.2 hold much more generally, as described below. {longlist}[(3)] (1) The error \accentset{\circ}{e}_{\mathrm{PS}}^{(M)} for partial sums is not a monotonically decreasing function of M; that is, the accuracy of \accentset{\circ}{q}^{(M)}_{\mathrm{PS}} improves as one adds more terms up to a certain point, before quickly becoming very inaccurate. (2) Empirically, the actual optimal truncation point for the parameter settings we considered is at M^{\prime}=2 or M^{\prime}=3, which perform comparably. Moreover, both provide consistently higher accuracy than employing the OTR, which is a serious issue when we wish to use this rule without external information about which truncation point really is the most accurate. (3) Overall, using Padé approximants is much more reliable. Note that the accuracy of Padé approximants continues to improve as we incorporate more terms. For a sample drawn at random, the probability that its Padé approximant is within 1\% of the true sampling distribution is 1.00, compared to 0.59 or 0.57 for truncating the partial sums, respectively, at M^{\prime}=2 or M^{\prime}=3, and only 0.25 for using the OTR. For the remainder of this section, we focus on the accuracy of the staircase Padé approximants. It was shown in Jenkins and Song (2009) that the accuracy of the partial sum q^{(2)}_{\mathrm{PS}} increases with \rho, but (perhaps surprisingly) decreases with increasing sample size. In Tables 4 and 5, we address the same issue for the

Padé approximant. Table 4 confirms that accuracy increases with \rho, as one might expect. Furthermore, it is also the case that substantial accuracy is achievable even for moderate values of \rho (say, \rho=25), provided that sufficiently many terms are utilized in the construction of the Padé approximant. For example, when M=5 the probability that the Padé approximant of a sample drawn at random is within 5\% of the truth is 0.94. Also very encouraging is the pattern shown in Table 5. Provided that sufficiently many terms are used, the accuracy of the Padé approximant is only slightly affected by looking at larger sample sizes. For example, when M=5 the probability that the Padé approximant of a randomly drawn sample of size 30 is within 5\% of the truth is 0.99, compared with 1.00 for a sample of size 20. This loss in accuracy is much less severe than the corresponding loss in accuracy of the partial sums; for q^{(M)}_{\mathrm{PS}}, the highest accuracy is achieved for M=2, in which case which the corresponding loss in accuracy is from 0.87 when c=20, to 0.70 when c=30.

## 7 Proofs

In this section, we provide proofs of the results presented earlier.

### 7.1 Proof of Theorem 3.1

For an infinitesimal generator \mathscr{A} of a diffusion process on the state space \Omega and a twice continuously differentiable function h\dvtx\Omega\to\mathbb{R} with compact support, it is well known that

 \mathbb{E}[\mathscr{A}h(\mathbf{X})]=0,

where expectation is with respect to the stationary distribution of \mathbf{X}. Apply this result to the generator \widetilde{\mathscr{L}} shown in (14) and monomial G^{(m)}(\widetilde{{\mathbf{x}}};{\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) shown in (16). This provides a linear equation relating the expectations \mathbb{E}[G^{(m+1)}(\widetilde{\mathbf{X}};{\mathbf{a}}^{\prime}, {\mathbf{b}}^{\prime},{\mathbf{r}}^{\prime})], \mathbb{E}[G^{(m)}(\widetilde{\mathbf{X}};{\mathbf{a}}^{\prime},{\mathbf{b}}^{% \prime},{\mathbf{r}}^{\prime})], \mathbb{E}[G^{(m-1)}(\widetilde{\mathbf{X}};{\mathbf{a}}^{\prime},{\mathbf{b}}% ^{\prime},{\mathbf{r}}^{\prime})] and\mathbb{E}[G^{(m-2)}(\widetilde{\mathbf{X}};{\mathbf{a}}^{\prime},{\mathbf{b}}% ^{\prime}, {\mathbf{r}}^{\prime})], each appearing with various different arguments ({\mathbf{a}}^{\prime},{\mathbf{b}}^{\prime},{\mathbf{r}}^{\prime}) depending on ({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}); we omit the simple but algebraically lengthy details. Now, for these four choices of m we substitute the proposed expansion (17). If m>0, then compare the coefficients of \rho^{1-{u/2}} in the resulting expression; if m=0, then compare the coefficients of \rho^{-{u/2}}. We then obtain the following:

i(i) If m=0 and u=0, then the resulting expression is

with boundary conditions given by (3.3). This is the sum of two copies of a familiar recursion [Griffiths and Tavaré (1994)] for the sampling distribution of a single locus, one for locus A and one for locus B. In our notation the solution is, therefore,

 g^{(0)}_{0}({\mathbf{a}},{\mathbf{b}},\mathbf{0})=q^{A}({\mathbf{a}})q^{B}({% \mathbf{b}}).

(ii) If m>0, then the resulting expression is

Equation (7.1) relates g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) to the known expressions g_{u}^{(m-2)}, g_{u-1}^{(m-1)}, g_{u-2}^{(m)} and g_{u-3}^{(m+1)}, as claimed.

(iii) If m=0 and u\geq 1, then the resulting expression is

with boundary conditions (3.3). Hence, this provides a recursion relation for g^{(0)}_{u}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) when g^{(1)}_{u-1} is known.

### 7.2 Proof of Corollary 3.1

For the base case \ell=1, note that if m=1 and u=0 then (7.1) simplifies to

 g^{(1)}_{0}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})=0,

and hence, if m=0 and u=1 then (7.1) simplifies to

with boundary conditions (3.3). This has solution

 g^{(0)}_{1}({\mathbf{a}},{\mathbf{b}},\mathbf{0})=0,

which is unique [since q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is unique]. This completes \ell=1. Now suppose inductively that g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})=0 for every m, u such that m+u=\ell-2 where \ell is a fixed odd number greater than 1. Then for m, u such that m+u=\ell, (7.1) becomes

 g_{u}^{(m)}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}})=0

as required.

### 7.3 Proof of Lemma 4.1

In what follows, define the length of a sample configuration ({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) to be a+b+2c. Under a neutral, finite-alleles model, the probability of a sample with length \delta satisfies a closed system of equations [e.g., see equation (5) of Jenkins and Song (2009)] which can be expressed in matrix form:

 \mathbf{M}{\mathbf{q}}={\mathbf{v}},

where {\mathbf{q}} is a vector composed of the probabilities of samples of length less than or equal to \delta, {\mathbf{v}} is a constant vector of the same dimension as {\mathbf{q}} and \mathbf{M} is an invertible matrix (since the solution to this equation is unique). The entries of \mathbf{M} and {\mathbf{v}} are rational functions of \rho, and hence, {\mathbf{q}}=\mathbf{M}^{-1}{\mathbf{v}} is a vector each of whose entries is a rational function of \rho.

Let U_{0} denote the degree of the numerator, and V_{0} the degree of the denominator. If U_{0}>V_{0}, then q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) becomes unbounded as \rho\to\infty, while if V_{0}>U_{0} then q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})\to 0 as \rho\to\infty. But we know that q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is a probability, and hence, bounded. Moreover, it has support over all samples of a fixed size, since we assume that \mathbf{P}^{A} and \mathbf{P}^{B} are irreducible. Thus, to ensure \lim_{\rho\to\infty}q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}})\in(0,1) we must have U_{0}=V_{0}. By a similar argument as \rho\to 0, we must have that the coefficients of \rho^{0} are nonzero both in the numerator and in the denominator. We can, therefore, divide the numerator and denominator by \rho^{U_{0}} to obtain a rational function of 1/\rho whose degree in the numerator and denominator are both U_{0} (=V_{0}).

### 7.4 Proof of Theorem 4.1

This is an application of Theorem 1.4.4 of Baker and Graves-Morris (1996), which we spell out for completeness. By Lemma 4.1, q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) is a rational function of 1/\rho and is analytic at \rho=\infty with Taylor series (7). Denote the degree of its numerator and denominator by U_{0}. Then, q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) has U_{0}+U_{0}+1 independent coefficients determined by the first U_{0}+U_{0}+1 terms of its Taylor series expansion. Thus, provided U\geq U_{0} and V\geq U_{0}, by the definition of the [U/V] Padé approximant, it must coincide uniquely with q({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}).

### 7.5 Proof of Theorem 5.1

This is simply an application of Theorem 3.1 applied to the generator for the diffusion under selection, \widetilde{\mathscr{L}}_{\mathrm{s}}, rather than \widetilde{\mathscr{L}}, and so we just summarize the procedure.

The change in generator results in slight modifications to the relationships between the g^{(m)}_{u}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) in order to obtain the relationships between the h^{(m)}_{u}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) for each m and u: {longlist}[(1)] (1) h^{(0)}_{0}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) satisfies (7.1) (replacing each g^{(0)}_{0} with h^{(0)}_{0}), but with extra terms {}+\sum_{i,k}a_{i}\biggl{[}\sigma_{ik}h_{0}^{(0)}({\mathbf{a}}+{\mathbf{e}}_{k% },{\mathbf{b}},\mathbf{0})-\sum_{k^{\prime}}\sigma^{A}_{kk^{\prime}}h_{0}^{(0)% }({\mathbf{a}}+{\mathbf{e}}_{k}+{\mathbf{e}}_{k^{\prime}},{\mathbf{b}},\mathbf% {0})\biggr{]} on the right-hand side. The solution is h_{0}^{(0)}({\mathbf{a}},{\mathbf{b}},\mathbf{0})=q^{A}_{\mathrm{s}}({\mathbf{% a}})q^{B}({\mathbf{b}}).

• (2)

For m>0, h^{(0)}_{0}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) satisfies (7.1) but with extra terms

on the right-hand side. (3) For m=0 and u\geq 1, h^{(0)}_{0}({\mathbf{a}},{\mathbf{b}},\mathbf{0}) satisfies (7.1) but with extra terms \displaystyle{}+\sum_{i,k}\bigl{[}a_{i}\sigma_{ik}^{A}h^{(0)}_{u}({\mathbf{a}}% +{\mathbf{e}}_{k},{\mathbf{b}},\mathbf{0})-a\sigma_{ik}^{A}h^{(0)}_{u}({% \mathbf{a}}+{\mathbf{e}}_{i}+{\mathbf{e}}_{k},{\mathbf{b}},\mathbf{0})\bigr{]} \displaystyle\qquad{}+\sum_{i,j,k}b_{j}\sigma_{ik}^{A}h^{(1)}_{u-1}({\mathbf{a% }}+{\mathbf{e}}_{k},{\mathbf{b}}-{\mathbf{e}}_{j},{\mathbf{e}}_{ij}) on the right-hand side. Using these equations to evaluate h^{(m)}_{u}({\mathbf{a}},{\mathbf{b}},{\mathbf{r}}) on levels \ell=0,1,\ldots,4 provides expressions for the nonneutral versions of q_{0}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) and q_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}). Those for q_{0}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) and q_{1}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}) in terms of the relevant one-locus sampling distributions are unchanged, while the new generator makes some minor modifications to the expression for q_{2}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}). The analytic part of this term, \sigma_{\mathrm{s}}({\mathbf{a}},{\mathbf{b}},{\mathbf{c}}), is easily calculated from h^{(4)}_{0}, h^{(3)}_{1}, h^{(2)}_{2} and h^{(1)}_{3}, while the recursive part, q_{2,{\mathrm{s}}}({\mathbf{a}}+{\mathbf{c}}_{A},{\mathbf{b}}+{\mathbf{c}}_{B}% ,\mathbf{0}), follows from h^{(0)}_{4}.

• ## Acknowledgments

We gratefully acknowledge Anand Bhaskar for implementing the algorithm stated in Theorem 3.1 and for checking some of our formulas. We also thank Matthias Steinrücken and Paul Fearnhead for useful discussion.

## References

• Baker and Graves-Morris (1996) {bbook}[author] \bauthor\bsnmBaker, \bfnmG. A.\binitsG. A. \AND\bauthor\bsnmGraves-Morris, \bfnmP.\binitsP. (\byear1996). \btitlePadé Approximants, \bedition2nd ed. \bseriesEncyclopedia of Mathematics and its Applications \bvolume59. \bpublisherCambridge Univ. Press, \baddressCambridge. \MR1383091 \endbibitem
• Buzbas, Joyce and Abdo (2009) {barticle}[author] \bauthor\bsnmBuzbas, \bfnmE. O.\binitsE. O., \bauthor\bsnmJoyce, \bfnmP.\binitsP. \AND\bauthor\bsnmAbdo, \bfnmZ.\binitsZ. (\byear2009). \btitleEstimation of selection intensity under overdominance by Bayesian methods. \bjournalStat. Appl. Genet. Mol. Biol. \bvolume8 \bpagesArt. 32, 24. \MR2533640 \endbibitem
• Dingle (1973) {bbook}[author] \bauthor\bsnmDingle, \bfnmR. B.\binitsR. B. (\byear1973). \btitleAsymptotic Expansions: Their Derivation and Interpretation. \bpublisherAcademic Press, \baddressNew York. \MR0499926 \endbibitem
• Ethier (1979) {barticle}[author] \bauthor\bsnmEthier, \bfnmS. N.\binitsS. N. (\byear1979). \btitleA limit theorem for two-locus diffusion models in population genetics. \bjournalJ. Appl. Probab. \bvolume16 \bpages402–408. \MR0531773 \endbibitem
• Ethier and Griffiths (1990) {barticle}[author] \bauthor\bsnmEthier, \bfnmS. N.\binitsS. N. \AND\bauthor\bsnmGriffiths, \bfnmR. C.\binitsR. C. (\byear1990). \btitleOn the two-locus sampling distribution. \bjournalJ. Math. Biol. \bvolume29 \bpages131–159. \MR1116000 \endbibitem
• Ethier and Nagylaki (1989) {barticle}[author] \bauthor\bsnmEthier, \bfnmS. N.\binitsS. N. \AND\bauthor\bsnmNagylaki, \bfnmT.\binitsT. (\byear1989). \btitleDiffusion approximations of the two-locus Wright–Fisher model. \bjournalJ. Math. Biol. \bvolume27 \bpages17–28. \MR0984223 \endbibitem
• Ewens (1972) {barticle}[author] \bauthor\bsnmEwens, \bfnmW. J.\binitsW. J. (\byear1972). \btitleThe sampling theory of selectively neutral alleles. \bjournalTheoretical Population Biology \bvolume3 \bpages87–112. \bnoteErratum, ibid. 3 (1972) 240; erratum, ibid. 3 (1972) 376. \MR0325177 \endbibitem
• Fearnhead (2003) {barticle}[author] \bauthor\bsnmFearnhead, \bfnmP.\binitsP. (\byear2003). \btitleHaplotypes: The joint distribution of alleles at linked loci. \bjournalJ. Appl. Probab. \bvolume40 \bpages505–512. \MR1978106 \endbibitem
• Golding (1984) {barticle}[author] \bauthor\bsnmGolding, \bfnmG. B.\binitsG. B. (\byear1984). \btitleThe sampling distribution of linkage disequilibrium. \bjournalGenetics \bvolume108 \bpages257–274. \endbibitem
• Griffiths and Tavaré (1994) {barticle}[author] \bauthor\bsnmGriffiths, \bfnmR. C.\binitsR. C. \AND\bauthor\bsnmTavaré, \bfnmS.\binitsS. (\byear1994). \btitleSimulating probability distributions in the coalescent. \bjournalTheoretical Population Biology \bvolume46 \bpages131–159. \endbibitem
• Grote and Speed (2002) {barticle}[author] \bauthor\bsnmGrote, \bfnmM. N.\binitsM. N. \AND\bauthor\bsnmSpeed, \bfnmT. P.\binitsT. P. (\byear2002). \btitleApproximate Ewens formulae for symmetric overdominance selection. \bjournalAnn. Appl. Probab. \bvolume12 \bpages637–663. \MR1910643 \endbibitem
• Handa (2005) {barticle}[author] \bauthor\bsnmHanda, \bfnmK.\binitsK. (\byear2005). \btitleSampling formulae for symmetric selection. \bjournalElectron. Commun. Probab. \bvolume10 \bpages223–234. \MR2182606 \endbibitem
• Hudson (2001) {barticle}[author] \bauthor\bsnmHudson, \bfnmR. R.\binitsR. R. (\byear2001). \btitleTwo-locus sampling distributions and their application. \bjournalGenetics \bvolume159 \bpages1805–1817. \endbibitem
• Huillet (2007) {barticle}[author] \bauthor\bsnmHuillet, \bfnmT.\binitsT. (\byear2007). \btitleEwens sampling formulae with and without selection. \bjournalJ. Comput. Appl. Math. \bvolume206 \bpages755–773. \MR2333711 \endbibitem
• Jenkins and Song (2009) {barticle}[author] \bauthor\bsnmJenkins, \bfnmP. A.\binitsP. A. \AND\bauthor\bsnmSong, \bfnmY. S.\binitsY. S. (\byear2009). \btitleClosed-form two-locus sampling distributions: Accuracy and universality. \bjournalGenetics \bvolume183 \bpages1087–1103. \endbibitem
• Jenkins and Song (2010) {barticle}[author] \bauthor\bsnmJenkins, \bfnmP. A.\binitsP. A. \AND\bauthor\bsnmSong, \bfnmY. S.\binitsY. S. (\byear2010). \btitleAn asymptotic sampling formula for the coalescent with recombination. \bjournalAnn. Appl. Probab. \bvolume20 \bpages1005–1028. \MR2680556 \endbibitem
• McVean, Awadalla and Fearnhead (2002) {barticle}[author] \bauthor\bsnmMcVean, \bfnmG.\binitsG., \bauthor\bsnmAwadalla, \bfnmP.\binitsP. \AND\bauthor\bsnmFearnhead, \bfnmP.\binitsP. (\byear2002). \btitleA coalescent-based method for detecting and estimating recombination from gene sequences. \bjournalGenetics \bvolume160 \bpages1231–1241. \endbibitem
• McVean et al. (2004) {barticle}[author] \bauthor\bsnmMcVean, \bfnmG. A. T.\binitsG. A. T., \bauthor\bsnmMyers, \bfnmS. R.\binitsS. R., \bauthor\bsnmHunt, \bfnmS.\binitsS., \bauthor\bsnmDeloukas, \bfnmP.\binitsP., \bauthor\bsnmBentley, \bfnmD. R.\binitsD. R. \AND\bauthor\bsnmDonnelly, \bfnmP.\binitsP. (\byear2004). \btitleThe fine-scale structure of recombination rate variation in the human genome. \bjournalScience \bvolume304 \bpages581–584. \endbibitem
• Ohta and Kimura (1969a) {barticle}[author] \bauthor\bsnmOhta, \bfnmT.\binitsT. \AND\bauthor\bsnmKimura, \bfnmM.\binitsM. (\byear1969a). \btitleLinkage disequilibrium at steady state determined by random genetic drift and recurrent mutations. \bjournalGenetics \bvolume63 \bpages229–238. \endbibitem
• Ohta and Kimura (1969b) {barticle}[author] \bauthor\bsnmOhta, \bfnmT.\binitsT. \AND\bauthor\bsnmKimura, \bfnmM.\binitsM. (\byear1969b). \btitleLinkage disequilibrium due to random genetic drift. \bjournalGenetical Research \bvolume13 \bpages47–55. \endbibitem
• Song and Song (2007) {barticle}[author] \bauthor\bsnmSong, \bfnmY. S.\binitsY. S. \AND\bauthor\bsnmSong, \bfnmJ. S.\binitsJ. S. (\byear2007). \btitleAnalytic computation of the expectation of the linkage disequilibrium coefficient r^{2}. \bjournalTheoretical Population Biology \bvolume71 \bpages49–60. \endbibitem
• Wright (1949) {bincollection}[author] \bauthor\bsnmWright, \bfnmS.\binitsS. (\byear1949). \btitleAdaptation and selection. In \bbooktitleGenetics, Paleontology and Evolution (\beditor\bfnmG. L.\binitsG. L. \bsnmJepson, \beditor\bfnmE.\binitsE. \bsnmMayr \AND\beditor\bfnmG. G.\binitsG. G. \bsnmSimpson, eds.) \bpages365–389. \bpublisherPrinceton Univ. Press, \baddressPrinceton, NJ. \endbibitem
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters