Three tree priors and five datasets: A study of the effect of tree priors in IndoEuropean phylogenetics
Abstract
The age of the root of the IndoEuropean language family has received much attention since the application of Bayesian phylogenetic methods by Gray and Atkinson (2003). The root age of the IndoEuropean family has tended to decrease from an age that supported the Anatolian origin hypothesis to an age that supports the Steppe origin hypothesis with the application of new models (Chang et al., 2015). However, none of the published work in the IndoEuropean phylogenetics studied the effect of tree priors on phylogenetic analyses of the IndoEuropean family. In this paper, I intend to fill this gap by exploring the effect of tree priors on different aspects of the IndoEuropean family’s phylogenetic inference. I apply three tree priors—Uniform, Fossilized BirthDeath (FBD), and Coalescent—to five publicly available datasets of the IndoEuropean language family. I evaluate the posterior distribution of the trees from the Bayesian analysis using Bayes Factor, and find that there is support for the Steppe origin hypothesis in the case of two tree priors. I report the median and 95% highest posterior density (HPD) interval of the root ages for all the three tree priors. A model comparison suggested that either Uniform prior or FBD prior is more suitable than the Coalescent prior to the datasets belonging to the IndoEuropean language family.
1 Introduction
The IndoEuropean language family is widely spoken and consists of languages belonging to subgroups such as Albanian,
Armenian,
BaltoSlavic, Germanic, Greek, IndoIranian, and ItaloCeltic. The root age of the IndoEuropean family has been a heavily debated topic
since the application of Bayesian
phylogenetic methods to lexical cognate data. The root age of the IndoEuropean language family was estimated using phylogenetic methods
developed in computational biology
(Gray and Atkinson, 2003, Atkinson et al., 2005, Nicholls and Gray, 2008, Ryder and Nicholls, 2011, Bouckaert et al., 2012). These phylogenetic methods employ
lexical cognate data
(from Swadesh word lists [table 5]; Swadesh 1952) and external evidence (from archeology and history)
regarding both the age of the ancient languages (such as Latin) and the age of the internal subgroups (such as Germanic) to infer the
timescale of the IndoEuropean phylogeny. The work of Gray and
colleagues
produced root age estimates that supported the Anatolian origin hypothesis (8000–9500 Years Before Present [B.P];
Renfrew, 1987) of the IndoEuropean language family. In contrast, historical
linguistics—based on cultural and material vocabulary—points to a Steppe origin of the IndoEuropean language
family where the root age falls within the range 5500–6500 Years B.P (Anthony and Ringe, 2015).
In a followup work, Chang et al. (2015) corrected the IELex dataset (Dunn, 2012)—originally compiled by Dyen et al. (1992)—and tested a wide range of models and datasets. Chang et al. (2015) modified the Bayesian phylogenetic inference software BEAST (Drummond et al., 2012) such that the software samples trees that show eight ancient languages—Vedic Sanskrit, Ancient Greek, Latin, Classical Armenian, Old Irish, Old English, Old High German, and Old West Norse—as ancestors of modern descendant languages (table 1). The results of their analysis showed that the estimated median root age of the IndoEuropean language family falls within the age range that supports the Steppe origin of the IndoEuropean language family.
Ancient language  Modern descendants 

Vedic Sanskrit  IndoAryan languages 
Ancient Greek  Modern Greek 
Latin  Romance languages 
Classical Armenian  Modern Armenian dialects: Adapazar, Eastern Armenian 
Old Irish  Irish, Scots Gaelic 
Old English  English 
Old West Norse  Faroese, Icelandic, Norwegian 
Old High German  German, Swiss German, Luxembourgish 
The phylogenetic dating analyses reported by Bouckaert et al. (2012) and Chang et al. (2015) are based on a coalescent tree prior that employs both the ages of the ancient languages and the internal node ages to infer the dates of all the internal nodes (and the root) of a language tree. The coalescent tree prior described in the context of Bayesian phylogenetic inference by Yang (2014, 309–320) is based on the coalescence process studied by Kingman (1982), and is used to model the spread of viruses or alleles in a population of individuals across time.
The coalescent tree prior cannot model the linguistic reality that an ancient language such as Old English is the ancestor of Modern English. It will infer that both Old English and Modern English descended from an unattested linguistic common ancestor. This observation is the departure for the ancestry constrained analyses reported by Chang et al. (2015). The authors found that constraining an ancient language to be the ancestor of modern language(s) infers a reduced age for the root of the IndoEuropean language family which supports the Steppe origin hypothesis.
While discussing their results, Chang et al. (2015) observed that the coalescent tree prior without ancestry constraints does not sample trees where an ancient language can be the ancestor of modern language(s). Therefore, the coalescent tree prior might not be appropriate for modeling the evolution of the IndoEuropean family. This observation marks the departure point of the analyses reported in this paper where I explore the effect of tree priors in the IndoEuropean phylogenetics. All the previous phylogenetic studies involving the IndoEuropean family compare the fit and effect of the age of different substitution models such as Covarion, Stochastic Dollo, and a binary state Generalized Time Reversible model. However, none of the above studies studies the effect of tree priors on dating of the IndoEuropean language family.
Therefore, in this paper, I attempt to fill this gap by analyzing all the five publicly available datasets (section 3.1) using FBD tree prior, uniform prior, and constant population size coalescent prior. I perform a Bayes Factor analysis similar to Chang et al. (2015) in section 3.5 and find that the trees inferred with FBD prior (Stadler, 2010, Heath et al., 2014, Gavryushkina et al., 2014, Zhang et al., 2016) and uniform tree prior (Ronquist et al., 2012a) support the Steppe origin hypothesis of the IndoEuropean languages. Finally, the root’s median age and 95% highest posterior density ages inferred from the coalescent analysis support an Anatolian origin of the IndoEuropean languages.
Unlike Bouckaert et al. (2012) and Chang et al. (2015), I do not supply the subgroup constraint information to the phylogenetic program beforehand, but allow the phylogenetic program to infer the tree topology along with the divergence times of the internal nodes. I find that the Bayesian phylogenetic program infers known subgroups correctly across tree priors. My experiments with FBD and uniform priors show that ancestry constraints are not necessary to infer support for the Steppe origin of the IndoEuropean family. I also performed a model comparison based on the Akaike Information Criterion through MCMC (AICM; Baele et al., 2012) and found that both uniform and FBD priors fit better than coalescent tree prior.
The rest of the paper is organized as follows. I will motivate the appropriateness of FBD prior for the IndoEuropean family diversification scenario and describe other tree priors in section 2. I will discuss the datasets, substitution model, tree prior settings, Monte Carlo Markov Chain settings, and calculation of Bayes Factor support for the Steppe origin hypothesis vs. the Anatolian origin hypothesis in section 3. I will present the inferred median ages and 95% highest posterior density (HPD) age intervals, Bayes Factors, relevance of ancestry constraints, and quality of inferred trees in section 4. Finally, I will conclude the paper in section 5.
2 Tree priors
In this section, I will describe the three different tree priors used in the paper. First, I describe the coalescent tree prior in section 2.1. Next, I will motivate why FBD tree prior is more suitable than the Coalescent tree prior for the IndoEuropean family in section 2.2. Finally, I describe the uniform tree prior in section 2.3.
2.1 Constant size coalescent prior
The constant population size coalescent tree prior is dependent on the parameter where is the
effective population size and is the base clock rate. The probability of a tree under this model is
, where is the time during which there are
lineages ancestral to the sequences in the data. Both and are sampled in this paper. I note that the constant size population
prior was also used by Chang et al. (2015, A6,220) to perform an
ancestryconstrained phylogenetic analysis which supports the Steppe origin hypothesis.
2.2 BirthDeath priors
BirthDeath tree priors are used to model lineage diversification and to date the split event within a phylogeny. The standard birthdeath prior of Yang and Rannala (1997) is conditioned on the age of the most recent common ancestor () and assumes that birth () and death () rates are constant over time. In this model, all the tips in the tree are extant and do not contain any fossils (figure 3). A fossil can be the ancestor of a modern language or can be extinct without leaving any descendants. For instance, Vedic is considered to be the ancestor of all the modern IndoAryan languages (table 1), whereas, Hittite or Gothic are languages that died out without leaving any descendant.
The birthdeath model described by Yang and Rannala (1997) handles incomplete languages sampling through where is the number of languages in the sample and is the total number of extant languages in the family. The birthdeath model estimates the species divergence times on a relative scale. The relative times can be converted into geological time scale by tying one or more internal nodes to known historical or archaeological evidence. It has to be noted that the coalescent process is mathematically different from birthdeath process (Stadler, 2009, 62–63).
In the case of the IndoEuropean language family, the standard birthdeath tree prior of Yang and Rannala (1997) only uses the
internal
node calibrations (for instance, the information that Germanic subgroup is about 2200 years old
(Chang et al., 2015)) to infer the remaining internal nodes’ dates. This procedure is known as node dating
and has been used for inferring the phylogeny of Bantu languages
The node dating method does not utilize the available lexical cognate information about attested ancient languages that went extinct (e.g. Gothic) or evolved into modern languages (e.g. Latin). However, the node dating method indirectly uses the age information of extinct languages to apply constraints to the internal node ages of a language family. In another argument against node dating, Ronquist et al. (2012a) noted that if there is more than one fossil in the same language group, then, only the oldest fossil provides the age constraint for the associated internal node. For example, in the case of the Germanic subgroup, there are four fossil languages—Gothic, Old High German, Old English, and Old West Norse—out of which only Gothic’s age information would be used to specify the minimum age of the Germanic subgroup, whereas the rest of the fossil languages cannot provide extra information regarding the age of the Germanic subgroup.
Stadler (2010) proposed an extension to the standard birthdeath prior that can handle the placement of ancient languages as tips or as internal nodes (fossils; figure 3). This prior is known as Fossilized BirthDeath (FBD) Prior since it can handle both fossil and extant species in a single model. The FBD family of priors can model the linguistic fact that Old English is the ancestor of Modern English. Along with the parameters, and , the FBD prior also features fossil sampling rate parameter , which is the rate at which fossils are observed along a branch. The FBD tree prior requires only the ages of fossils to infer the root age of a tree; and, is more objective than node dating that requires internal node age constraints that are not directly observed. The standard birthdeath prior conditioned on is a special case of FBD prior when (Stadler, 2010, 401). An example of a fossilized birthdeath tree is presented in figure 3.
The left tree (3) in figure 3 shows the FBD tree including lineages with sampled extant and fossil languages whereas the right figure shows the standard birthdeath tree with extant languages.
The probability of a tree under the FBD tree prior is conditioned on and the nature of extant taxa sampling. In this paper, I assume that the extant taxa are sampled uniformly at random. Unlike Chang et al., who impose ancestry constraints externally, the FBD tree prior can infer the ancestry constraints from the data (if such a signal exists) and do not have to be supplied beforehand. The species sampling probability is determined as the ratio between the number of extant languages in the dataset to the total number of extant IndoEuropean languages.
The probability of the tree under the FBD model (Stadler, 2010, equation 5) conditioned on () is given below. Here, is the number of extant sampled tips, is the number of extinct sampled tips, is the number of sampled ancestors with sampled descendants, and is the age of a extinct sampled tip.
(1) 
Here, , , , , and are defined as followed:

is the probability that an individual present at time before present has no sampled extinct or extant descendants, which is given as

is the probability that an individual present at time before present has only one sampled extant descendant and no sampled extinct descendant, which is given as

, ,
FBD tree priors have been used for estimating divergence times for datasets with extant and fossil species (Heath et al., 2014, Gavryushkina et al., 2014, Zhang et al., 2016). Since the IndoEuropean family has both fossils and extant languages, the FBD tree prior that handles attested fossil ancestors is more suitable than the coalescent tree prior that places fossils as tips. For instance, Tocharian languages went extinct without leaving any modern descendant language, whereas modern Romance languages are the descendants of Latin (an ancient language). Moreover, the data for the IndoEuropean language family comes from divergent languages and not from a single population. These arguments support the choice of FBD prior over a coalescent prior for modeling the evolution of the IndoEuropean language family.
2.3 Uniform tree prior
Similar to the coalescent tree prior, the uniform tree prior (Ronquist et al., 2012a) places fossils as tips of the tree. However, the uniform tree prior does not make any assumptions regarding the lineage diversification process. The uniform tree prior assumes that the internal nodes’ ages are uniformly distributed between tip ages and the root age. The prior probability of a tree under uniform model is conditioned on the root age which is drawn from a prior distribution . Under this model, an interior node age is drawn from a uniform distribution with a tip age as the lower bound and the root age as the upper bound. The probability of a tree under the uniform model is proportional to where is the age of a tip .
3 Methods
In this section, I describe the datasets, prior settings, inference procedure details, and calculation of the Bayes Factor.
3.1 Data
Language  Age Prior  Language  Age Prior 

Hittite  Old High German^{A}  
Old Irish^{A}  Tocharian B  
Classical Armenian^{A}  Tocharian A  
Ancient Greek^{A}  Lycian  
Luvian  Old Prussian  
Vedic Sanskrit^{A}  Umbrian  
Old English^{A}  Avestan  
Old Persian  Gothic  
Latin^{A}  Old Norse^{A}  
Oscan  Old Church Slavonic  
Cornish  Sogdian 
All the five datasets used in this paper—B1,
B2, Broad, Medium, and Narrow—are assembled from IELex by Chang et al. (2015).
The Broad dataset consists of 94 languages and 197
meaning classes. The Broad dataset is corrected for cognate judgments in the IndoIranian
subgroup; and, also has an extra medieval language, Sogdian,
which is not present in B1. Ten meanings that are susceptible to sound symbolism and have poor
coverage in terms of number of languages are also removed from the
Broad dataset (Chang et al., 2015, 213). The
Medium dataset is a subset of the Broad dataset and is assembled in such a way that
the languages and meanings with poor coverage are excluded. The Medium dataset has 82
languages
and 143 meanings. The Narrow dataset is a subset of the Medium dataset and consists of
only those modern languages that have an attested ancestor. This selection leaves the Narrow dataset
with 52 languages.
3.2 Substitution models
Bayesian phylogenetics originated in evolutionary biology and works by inferring the evolutionary relationship (trees) between DNA sequences of species. The same method can also be applied to binary (morphological) traits of species (Yang, 2014). Linguistic data is binary trait data where each column in the trait matrix is a cognate class. Words that belong to the same cognate class are coded as 1, else, they are coded as 0. For example, in the case of German, French, Swedish, and Spanish, the word for all in German [al\textipa@] and Swedish [\textipa”al\textipa:a] would belong to the same cognate set as English, while French [tu] and Spanish [to\textipaDo] belong to a different cognate set. The binary trait matrix for these languages for the meaning all is shown in table 5. If a language is missing in a cognate set, then the entry for that language is coded as ?, and is ignored in the calculation of likelihood using pruning algorithm (Felsenstein, 2004, 255). I used a Generalized Time Reversible model (equivalent to a F81 model in the case of binary traits) with ascertainment bias correction (Felsenstein, 1992, Lewis, 2001) for all unobserved 0 columns. The rate variation across sites is modeled using a discrete Gamma model with four rate categories (Yang, 1994), where the shape parameter of the Gamma distribution is drawn from a exponential prior with mean .
3.3 Tree prior settings
In this paper, I assumed that the extant languages are randomly sampled. The FBD tree prior is dependent on the number of extant languages in the sample. I estimated the number of extant IndoEuropean languages (400) from Glottolog (Nordhoff and Hammarström, 2011), and set the parameter accordingly for each dataset. For FBD prior, the net diversification rate is drawn from a exponential prior with mean , the relative extinction rate (turnover) is drawn from a Beta(1,1) prior, and the fossil sampling probability is also drawn from a Beta(1,1) prior.
I draw the root age from a uniform distribution bounded between and years in the case of FBD and uniform priors. The root age’s upper bound is fixed at years since this age is more than double the upper bound of the age limit of the Anatolian origin hypothesis. In fact, none of the inferred trees’ root ages are close to years. The coalescent prior, as implemented in MrBayes, is not conditioned on . All the fossils’ age priors were drawn from uniform distributions whose age ranges are given in table 2.
In the case of the coalescent prior, population parameter is drawn from a Gamma distribution with shape parameter and rate parameter . The base clock rate is drawn from an exponential prior with mean . In all the analyses, I use a Independent Gamma Rate model (Lepage et al., 2007), where each branch rate is drawn from a Gamma distribution with mean and variance , where —the branch length of a branch —is computed as the product of geological (or calendar) time and . is the independent gamma rate model’s variance parameter that is drawn from an exponential prior with mean . I do not employ topology constraints and allow the software to infer the IndoEuropean phylogeny along with the time scale from the data.
3.4 Markov chain Monte Carlo sampling
I ran all the experiments using MrBayes software.
3.5 Evaluating Steppe vs. Anatolian Hypothesis
For each dataset, I ran the MrBayes software twice: once without cognate data to generate a prior sample of trees and once with cognate data to generate a posterior sample of trees. Then, I used Bayes Factor (BF) formulation from Chang et al. (2015) to calculate the support for respectively the Anatolian (A) and Steppe (S) hypothesis. Given data , the Bayes factor is calculated as follows:
(2) 
where, and represents the range of Steppe and Anatolian ages and denotes the root age of a tree which is in the case of FBD prior. The numerator and denominator in equation 2 are computed as follows:
(3) 
The numerators in equation 3 correspond to the fraction of trees in the posterior sample for which and . The denominators correspond to the fraction of trees in the prior sample for which and . Following the interpretation of Bayes Factor by Kass and Raftery (1995), the support for Steppe origin hypothesis is very strong if , strong if , positive if , not worth more than a bare mention (neutral) if and negative if .
4 Results
In this section, I present and discuss the root’s median age and 95% HPD age intervals, fit of tree prior, Bayes Factor support for the Steppe vs. the Anatolian hypotheses, comparison of subgroups’ inferred dates with expert dates, relevance of clade constraints, and ancestry constraints.
4.1 Median and 95% HPD ages
Dataset  95% HPD  Median Age  

FBD  Coalescent  Uniform  FBD  Coalescent  Uniform  
B1  6244–8766  8370–11695  5760–8115  7512  9821  6789 
B2  6150–8430  7590–10913  5536–7986  7177  9133  6738 
broad  5591–7585  6654–9327  5073–6947  6551  7984  5935 
medium  5942–7921  7070–9818  5395–7392  6845  8345  6339 
narrow  5790–7984  6826–9791  5423–7646  6826  8228  6462 
Table 6 shows the HPD intervals and median root ages for all dataset and tree prior combinations. None of the reported HPD age intervals lie completely within the Steppe age interval or the Anatolian age interval. The lower bounds of HPD ages in the case of FBD and uniform priors fall within the Steppe interval, whereas the lower bound of the coalescent prior’s HPD interval falls beyond the Steppe age interval. In the case of narrow and medium datasets, the root age is further reduced to 6826 and 6845 years respectively in the case of FBD prior. The median ages inferred by FBD prior belong neither to the Steppe hypothesis interval nor to the Anatolian hypothesis interval for all the datasets. The median age inferred by uniform prior for Broad, Medium, and Narrow datasets lie within the range of the Steppe interval. All the priors infer median ages that lie beyond the Steppe interval in the case of B1 and B2 datasets. The coalescent prior infers root ages that lie within the Antolian hypothesis in the case of all the datasets except B1 dataset. Across all the priors, the median root ages decrease when the datasets are corrected for errors. The descreasing trend in the median ages is similar to the trend observed in Chang et al. (2015).
Why the broad dataset yields younger ages?
Chang et al. (2015) argue that sparsely attested languages can influence the chronology estimates. The authors argue by observing that the ascertainment bias correction to the likelihood calculation (Felsenstein, 1992) accounts for unobserved cognate sets that are not observed in the data, but, does not account for the missing entries in a dataset. For example, if 50% of the data is missing for a language, then the ascertainment bias correction does not account for missing 50% of the data. If there are unique cognate sets in the observed 50% of the data, then, there is a possibility that the unobserved 50% of the data also has unique cognate sets that do not enter the likelihood calculation.
The likelihood calculation would only consider the observed unique cognate sets, therefore, underestimating the true number of unique cognate sets for a language in a dataset. Due to this reason, a language with higher number of missing entries is treated as more conservative (or lesser number of character changes) than it should be. This is particularly true for languages such as Hittite, Tocharian A & B which have about and missing entries in the case of the broad dataset as compared to and in the case of the medium dataset. Since, both Hittite and Tocharian doculects are very close to the root of the IndoEuropean tree, this underestimation of number of unique cognate sets leads to a shorter branch length which causes the median root age to be younger. Both coalescent and FBD tree priors infer a younger age for broad dataset than medium and narrow datasets.
Why the B2 dataset yields younger ages?
The B1 dataset features six sparsely attested languages—Lycian, Oscan, Umbrian, Old Persian, Luvian, and Kurdish—where more than 50% of the meanings are unattested. As explained in the previous paragraph, inclusion of sparsely attested languages causes the Bayesian inference program to underestimate the root age. The opposite happens when a language has more number of unique cognates than it should have. This is the case of Luvian, where 33% of the attested cognate sets are erroneously coded as unique cognate sets, although, they are cognate with either Hittite or Lycian. This erroneous coding causes the Bayesian software to treat Luvian which is one internal node away from the root node to have evolved more and posits longer branches, therefore, pushing the root age of the tree away from the Steppe age interval. The B2 dataset excludes the six sparsely attested languages including erroneously coded Luvian which leads to shortening of the median root age in the posterior sample. This effect is clearly observed with both the median root age and 95% HPD age range in the B2 dataset. The median root age is pushed 400 years downwards towards the Steppe hypothesis in the case where the FBD tree prior is applied to the B2 dataset. The coalescent prior also infers a younger median age for B2 dataset than B1 dataset, whereas the uniform prior is not influenced by the six sparsely attested languages.
4.2 Which tree prior is the best?
Tree Prior  B1  B2  broad  medium  narrow 

Uniform Prior  94002.748  90299.551  89269.61  50769.888  32162.117 
FBD Prior  94005.297  90297.721  89270.359  50764.79  32163.007 
Coalescent Prior  94117.099  90396.491  89374.335  50917.074  32241.019 
I determine the best model through Akaike Information Criterion through MCMC (AICM; Baele et al., 2012). It has to be noted that Bouckaert et al. (2012) employ both Harmonic Mean and AICM to perform model comparison. In this paper, I only use AICM, since, it is more accurate than harmonic mean which is unstable. On the other hand, methods such as stepping stone sampling (Xie et al., 2010) and thermodynamic integration (Lartillot and Philippe, 2006) used to estimate marginal likelihood are more accurate than AICM but are computationally intensive and require at most times (usually set to 10) the computation as the original MCMC runs (Yang, 2014, 258–259).
The AICM values for each dataset and tree prior are presented in table 7. The results show that the Uniform tree prior fits the best for B1, Broad, and narrow datasets. The difference between AICM values of Uniform and FBD priors is almost negligible in the case of Broad and narrow datasets. The coalescent prior shows the highest AICM value and differs by a large margin when compared with FBD and Uniform priors. Since uniform tree prior has fewer parameters than FBD prior, I suggest that any future phylogenetic experiment should test uniform tree prior as a baseline before testing more parameterrich priors such as FBD or Coalescent priors.
4.3 Bayes Factor for Steppe vs. Anatolia
Dataset  FBD  Coalescent  Uniform 

B1  0.138 (Negative)  **  67.043 (Strong) 
B2  1.015 (Neutral)  **  1022.968 (Very Strong) 
broad  88.624 (Strong)  *  6728.994 (Very Strong) 
medium  18.536 (Positive)  **  113.968 (Strong) 
narrow  16.55 (Positive)  *  27.549 (Strong) 
I present the results of the Bayes factor (BF) analysis in table 8. In the case of the FBD prior, BF results support the Steppe origin hypothesis for all the datasets, except, for the B1 dataset. The corrected datasets clearly support the Steppe hypothesis positively in terms of Bayes Factor in the case of FBD prior. In the case of the uniform prior, all the datasets support the Steppe origin hypothesis over the Anatolian origin hypothesis. In the case of the coalescent prior, the Bayes Factor was not possible to calculate since there is no tree in either prior or posterior sample that has a root age belonging to the age range of the Steppe hypothesis. Overall, the interpretation of the strength of the Bayes Factor analysis suggests that appropriate tree priors and corrected datasets support the Steppe origin hypothesis of the IndoEuropean language family.
4.4 Internal node ages
In this subsection, for each dataset, I compare the inferred dates for the language subgroups with the historically attested dates given in table 9. The uniform tree prior, on an average, overestimates the ages for all the datasets, except, for the narrow dataset. The predicted ages from the uniform tree prior come closest to the historical ages in the case of the median dataset. In contrast, Chang et al. present younger ages for both the narrow (100 years on an average) and medium datasets (330 165 years).
Subgroup  Historical Age  B1  B2  Broad  Medium  narrow 

Germanic  2250  2876 [22863572]  2816 [22563458]  2615 [21473166]  2449 [20312935]  2334 [19432807] 
Romance  1750  2987 [24003629]  2149 [16282714]  1980 [15152493]  1841 [14012345]  1736 [13092248] 
Scandinavian  1500  1523 [11272016]  1469 [11021906]  1340 [10241697]  1164 [8981477]  – 
Slavic  1500  1860 [14012423]  1822 [13782309]  1647 [13012069]  1575 [12261972]  – 
East Baltic  1300  1584 [9142356]  1561 [9362265]  1465 [8912086]  1460 [8922115]  – 
British Celtic  1250  1732 [11052402]  1687 [11372343]  1537 [10242093]  1450 [9552011]  – 
Modern Irish/Scots Gaelic  1050  1058 [5301615]  1052 [5891620]  967 [5231442]  834 [4511260]  829 [4421290] 
PersianTajik  750  882 [4241412]  842 [3861360]  819 [4091250]  704 [3361098]  – 
Average difference  394  256  127  15.875  50.33 
4.5 Relevance of clade constraints
Both Bouckaert et al. (2012) and Chang et al. (2015) constrain the
topologies in tree search through clade constraints. For
instance, a Germanic clade constraint would
mean that the Bayesian software would only sample those trees that place all
the Germanic languages under a single node. Both the studies do not follow the same set of topological constraints when inferring the dates
of IndoEuropean language family. Chang et al. (2015) apply a stricter set of constraints—derived from the linguistic knowledge
of IndoEuropean language family—than those of Bouckaert et al. (2012).
In this paper, I do not employ
any clade constraints and allow the software to automatically infer the tree topology from the datasets.
I present the majority rule consensus tree inferred using uniform prior
for the broad dataset in
figure 4.
Position of Anatolian and Tocharian languages
There is a general consensus among the IndoEuropean scholars that the Anatolian language group was the first branch to split from the
ProtoIndoEuropean stage, after which, the Tocharian language group was the second to split off from the postAnatolian IndoEuropean
languages (Ringe et al., 2002). In fact,
Chang et al. supply this linguistic knowledge as two constraints to the Bayesian software: Nuclear IndoEuropean group
consisting of all the nonAnatolian languages; and, Inner
IndoEuropean group consisting
of all the Nuclear IndoEuropean languages excluding Tocharian languages. I observe that the majority consensus trees constructed from the
analyses inferred with uniform tree prior always groups both
the Anatolian and Tocharian languages as distinct subgroups unified under the same internal node which is directly connected to the root
node. This is also true in the case of the majority consensus
tree inferred when the coalescent tree prior is applied to the B2 dataset. The majority consensus trees constructed from FBD tree prior’s
analyses always show that the Anatolian languages
were the first to split off, followed by the branching of the Tocharian languages from the postAnatolian IndoEuropean complex. This
observation also holds for the the majority consensus trees
inferred with colaescent tree priors applied to B1, broad, medium, and narrow datasets.
In conclusion, the majority consensus trees suggest that the wellestablished IndoEuropean subgroups can be inferred directly, and need not be supplied beforehand. The exact placement of the wellestablished subgroups with respect to each other within the Inner IndoEuropean clade is a topic of research among scholars and has to be determined to full satisfaction (Anthony and Ringe, 2015).
4.6 Relevance of ancestry constraints
Chang et al. (2015) introduced ancestry constraints into their phylogenetic analysis, which, then, supported the Steppe origin hypothesis. The application of the FBD prior can be used to verify if the ancestry constraints can be inferred from the data. The FBD prior can infer whether an ancient language is an ancestral language or a tip in the tree. However, the majority rule consensus trees inferred from all the datasets using FBD tree prior do not show any support for the ancestry relationships enforced as constraints by Chang et al. (2015). I examined the log files of the MCMC runs and found that the MCMC proposal move (deletebranch) in MrBayes supporting the placement of an ancient language as an internal node was never accepted during the MCMC sampling. At least, based on trees inferred from lexical datasets, I conclude that the FBD prior does not infer any ancestry relations employed by Chang et al. (2015).
5 Conclusion
In this paper, I addressed the question of the effect of tree priors in Bayesian phylogenetic analysis and found the following.

The model comparison results suggest that both Uniform and FBD priors show better fit to the datasets of the IndoEuropean language family than the coalescent prior. Therefore, based on the Bayes Factor analysis, I conclude that the Steppe hypothesis is supported by FBD and Uniform priors for majority of the datasets.

The FBD tree prior does not infer any ancestry relation from any of the datasets suggesting that the lexical datasets used in the paper does not have signal for ancestry relations.

I also observe that the Bayesian inference program can infer wellestablished subgroups correctly from the data and need not be supplied beforehand.

Finally, the experiments reported in the paper suggest that right tree priors and corrected cognacy judgments are important for estimating the phylogeny and the age of IndoEuropean language family.
Acknowledgments
The paper would not have been possible without the continuous support of Igor Yanovich, Søren Wichmann, Chris Bentz, Gerhard Jäger, JohannMattis List, Richard Johansson, Lilja Øvrelid, Sowmya Vajjala, Çağrı Çöltekin, and Aparna Subhakari. I thank Remco Bouckaert, Johannes Wahle, Armin Buch, Johannes Dellert, Marisa Köllner, Roland Mühlenbernd, and Vijayaditya Peddinti for all the comments and discussions that improved the paper. Finally, I thank the anonymous reviewers for all the comments which helped improved the paper. One of the reviewers provided extensive comments regarding the models and results which helped improve the paper. All the remaining errors are mine. The author is supported by BIGMED and ERC Advanced Grant 324246 EVOLAEMP, which is gratefully acknowledged.
Appendix A Coalescent Prior
Appendix B FBD Prior
Appendix C Uniform Prior
Footnotes
 The scripts, the data files, and the results of the paper are available at https://github.com/PhyloStar/iephyloexps.
 I discovered a bug in the MrBayes implementation with the coalescent prior that was calculating the MetropolisHastings ratio incorrectly. My implementation is already made available here: https://github.com/PhyloStar/mrbayescoal.
 This interpretation is due to Igor Yanovich.
 To be precise, the scholars used a pure birth (Yule) process with , a special case of birthdeath process, to estimate the divergence times of the internal node splits in the Bantu language family phylogeny.
 Hruschka et al. (2015) use cognate sets from etymological dictionary where the reflexes within a cognate set need not have the same meaning. This approach is different from the phylogenetic approaches used in this and other papers, where the cognates are rootmeaning pairs derived from Swadesh lists (Chang et al., 2015, 201).
 One of the reviewers asked why I did not experiment with CoBL database (http://www.shh.mpg.de/207610/cobldatabase). The database is not publicly available to perform experiments.
 All the datasets are available at http://muse.jhu.edu/article/576999/file/supp02.zip.
 Available at http://mrbayes.sourceforge.net/.
 A 50% majority consensus tree is a summary tree that consists of only those clades that occur in more than 50% of the post burnin sample of trees.
 I also present the inferred phylogenies, posterior support and HPD intervals of the internal nodels for all the tree priors and datasets in the appendix.
 I note that the clade constraint information is derived from historical linguistics research that is limited to language families such as IndoEuropean, Dravidian, Uralic, Austronesian, and SinoTibetan with long tradition of classical comparative linguistic research (Campbell and Poser, 2008).
 All the trees presented in this paper are visualized using FigTree (Rambaut, 2016).
References
 Anthony, David W and Don Ringe. 2015. The IndoEuropean homeland from linguistic and archaeological perspectives. Annu. Rev. Linguist. 1(1): 199–219.
 Atkinson, Quentin, Geoff Nicholls, David Welch, and Russell Gray. 2005. From words to dates: Water into wine, mathemagic or phylogenetic inference? Transactions of the Philological Society 103(2): 193–219.
 Baele, Guy, Philippe Lemey, Trevor Bedford, Andrew Rambaut, Marc A Suchard, and Alexander V Alekseyenko. 2012. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular biology and evolution 29(9): 2157–2167.
 Bouckaert, Remco, Philippe Lemey, Michael Dunn, Simon J. Greenhill, Alexander V. Alekseyenko, Alexei J. Drummond, Russell D. Gray, Marc A. Suchard, and Quentin D. Atkinson. 2012. Mapping the origins and expansion of the IndoEuropean language family. Science 337(6097): 957–960.
 Campbell, Lyle and William J. Poser. 2008. Language classification: History and Method. Cambridge University Press.
 Chang, Will, Chundra Cathcart, David Hall, and Andrew Garrett. 2015. Ancestryconstrained phylogenetic analysis supports the IndoEuropean steppe hypothesis. Language 91(1): 194–244.
 Drummond, Alexei J, Marc A Suchard, Dong Xie, and Andrew Rambaut. 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular biology and evolution 29(8): 1969–1973.
 Dunn, Michael. 2012. IndoEuropean lexical cognacy database (IELex). Nijmegen: Max Planck Institute for Psycholinguistics .
 Dyen, Isidore, Joseph B. Kruskal, and Paul Black. 1992. An IndoEuropean classification: A lexicostatistical experiment. Transactions of the American Philosophical Society 82(5): 1–132.
 Felsenstein, Joseph. 1992. Phylogenies from restriction sites: A maximumlikelihood approach. Evolution 46(1): 159–173.
 Felsenstein, Joseph. 2004. Inferring Phylogenies. Sunderland, Massachusetts: Sinauer Associates.
 Gavryushkina, Alexandra, David Welch, Tanja Stadler, and Alexei J Drummond. 2014. Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration. PLoS Computational Biology 10(12): e1003,919.
 Gray, Russell D. and Quentin D. Atkinson. 2003. Languagetree divergence times support the Anatolian theory of IndoEuropean origin. Nature 426(6965): 435–439.
 Grollemund, Rebecca, Simon Branford, Koen Bostoen, Andrew Meade, Chris Venditti, and Mark Pagel. 2015. Bantu expansion shows that habitat alters the route and pace of human dispersals. Proceedings of the National Academy of Sciences 112(43): 13,296–13,301.
 Heath, Tracy A, John P Huelsenbeck, and Tanja Stadler. 2014. The fossilized birth–death process for coherent calibration of divergencetime estimates. Proceedings of the National Academy of Sciences 111(29): E2957–E2966.
 Hruschka, Daniel J, Simon Branford, Eric D Smith, Jon Wilkins, Andrew Meade, Mark Pagel, and Tanmoy Bhattacharya. 2015. Detecting regular sound changes in linguistics as events of concerted evolution. Current Biology 25(1): 1–9.
 Kass, Robert E and Adrian E Raftery. 1995. Bayes Factors. Journal of the American Statistical Association 90(430): 773–795.
 Kingman, John Frank Charles. 1982. The coalescent. Stochastic processes and their applications 13(3): 235–248.
 Lartillot, Nicolas and Hervé Philippe. 2006. Computing bayes factors using thermodynamic integration. Systematic Biology 55(2): 195–207. doi:10.1080/10635150500433722. URL http://dx.doi.org/10.1080/10635150500433722.
 Lepage, Thomas, David Bryant, Hervé Philippe, and Nicolas Lartillot. 2007. A general comparison of relaxed molecular clock models. Molecular biology and evolution 24(12): 2669–2680.
 Lewis, Paul O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic biology 50(6): 913–925.
 Nicholls, Geoff K and Russell D Gray. 2008. Dated ancestral trees from binary trait data and their application to the diversification of languages. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(3): 545–566.
 Nordhoff, Sebastian and Harald Hammarström. 2011. Glottolog/Langdoc: Defining dialects, languages, and language families as collections of resources. In Proceedings of the First International Workshop on Linked Science, vol. 783.
 Rambaut, Andrew. 2016. Figtree v1.6. URL http://tree.bio.ed.ac.uk/software/figtree/.
 Rambaut, Andrew, Alexie J Drummond, and Marc Suchard. 2013. Tracer. URL http://tree.bio.ed.ac.uk/software/tracer/.
 Renfrew, Colin. 1987. Archaeology and language : The puzzle of IndoEuropean origins. London : Cape.
 Ringe, Don, Tandy Warnow, and Ann Taylor. 2002. IndoEuropean and computational cladistics. Transactions of the Philological Society 100(1): 59–129.
 Ronquist, Fredrik, Seraina Klopfstein, Lars Vilhelmsen, Susanne Schulmeister, Debra L Murray, and Alexandr P Rasnitsyn. 2012a. A totalevidence approach to dating with fossils, applied to the early radiation of the Hymenoptera. Systematic Biology 61(6): 973–999.
 Ronquist, Fredrik, Maxim Teslenko, Paul van der Mark, Daniel L Ayres, Aaron Darling, Sebastian Höhna, Bret Larget, Liang Liu, Marc A Suchard, and John P Huelsenbeck. 2012b. Mrbayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space. Systematic Biology 61(3): 539–542.
 Ryder, Robin J and Geoff K Nicholls. 2011. Missing data in a stochastic Dollo model for binary trait data, and its application to the dating of ProtoIndoEuropean. Journal of the Royal Statistical Society: Series C (Applied Statistics) 60(1): 71–92.
 Stadler, Tanja. 2009. On incomplete sampling under birth–death models and connections to the samplingbased coalescent. Journal of Theoretical Biology 261(1): 58–66.
 Stadler, Tanja. 2010. Samplingthroughtime in birth–death trees. Journal of Theoretical Biology 267(3): 396–404.
 Swadesh, Morris. 1952. Lexicostatistic dating of prehistoric ethnic contacts: with special reference to North American Indians and Eskimos. Proceedings of the American Philosophical Society 96(4): 452–463.
 Xie, Wangang, Paul O Lewis, Yu Fan, Lynn Kuo, and MingHui Chen. 2010. Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology 60(2): 150–160.
 Yang, Ziheng. 1994. Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution 39(1): 105–111.
 Yang, Ziheng. 2014. Molecular Evolution: A Statistical Approach. Oxford: Oxford University Press.
 Yang, Ziheng and Bruce Rannala. 1997. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Molecular biology and evolution 14(7): 717–724.
 Zhang, Chi, Tanja Stadler, Seraina Klopfstein, Tracy A. Heath, and Fredrik Ronquist. 2016. TotalEvidence Dating under the Fossilized Birth–Death Process 65(2): 228–249. doi:10.1093/sysbio/syv080.