The Cultural Evolution of National Constitutions
Abstract
We explore how ideas from infectious disease and genetics can be used to uncover patterns of cultural inheritance and innovation in a corpus of 591 national constitutions spanning 1789–2008. Legal “Ideas” are encoded as “topics”  words statistically linked in documents  derived from topic modeling the corpus of constitutions. Using these topics we derive a diffusion network for borrowing from ancestral constitutions back to the US Constitution of 1789 and reveal that constitutions are complex cultural recombinants. We find systematic variation in patterns of borrowing from ancestral texts and “biological”like behavior in patterns of inheritance with the distribution of “offspring” arising through a bounded preferentialattachment process. This process leads to a small number of highly innovative (influential) constitutions some of which have yet to have been identified as so in the current literature. Our findings thus shed new light on the critical nodes of the constitutionmaking network. The constitutional network structure reflects periods of intense constitution creation, and systematic patterns of variation in constitutional lifespan and temporal influence.
Introduction
Cultural inheritance involves the diffusion of innovations, a process of interest to both biologists (?, ?) and social scientists (?, ?). In biology inheritance is governed by mechanisms of genetic transmission, which have been quantified (?, ?). Cultural inheritance takes a variety of forms which can resemble variants of biological inheritance (?, ?, ?, ?), including cultural selection (?, ?, ?). In cultural domains, complex forms of knowledge are encoded in social norms, legal principles and scientific theories (?, ?, ?) and follow complex forms of transmission that involve the coordinated borrowing and learning of constellations of ideas, producing a diversity of phylogenetic patterns (?, ?).
Now that a large body of the cultural record has been digitized (including books (?, ?), music (?, ?), art (?, ?), etc.) new techniques of machine learning are making the quantitative analysis of highdimensional cultural artifacts possible. In analogy with the biological sciences, and genetics in particular, this data mining approach to the analysis of culture is sometimes referred to as “culturomics” (?, ?), a term born of the consideration of the frequency distribution of an gram in the Google Books corpus over time (?, ?) as proxy for how memes move in and out of the cultural record. Literature (and text generally) remains a primary focus of such work (see e.g., (?, ?, ?, ?)). A fascinating challenge is to supplement these correlationbased approaches to the understanding of cultural evolution with principled causal mechanisms directed at discovering fundamental, extrabiological evolutionary processes.
We consider the notion of diffusion patterns in the study of cultural inheritance as a means of tracking the diffusion of topics through the documents in a legal text corpus of five hundred and ninetyone national constitutions (the full list is given in the Supplementary Materials Table S1). “Topics” has a technical meaning here (and throughout this paper that is the sense in which the word is used) as probability distributions over words (positive weights that sum to one) that are the output of topic modeling, which is a computational and statistical methodology for text analysis that has made great inroads throughout the humanities (see e.g., (?, ?)), to the point of reaching an almost “plugandplay” form (see e.g., (?, ?)) for easy deployment. A set of topics is “learned” (i.e., automatically derived) from the corpus. The various topic distributions highlight (i.e., attach high weight to) different sets of words. In the best cases those words usually suggest a particular theme and associated labeling of the topic. Texts in the corpus are partitioned into chunks, which are thus represented as varying weighted mixtures of topics. In this way topics provide a lowdimensional representation of the corpus in terms of higher level ideas and provide a rigorous operational basis for a meme, to be tested against a suitable dynamics of inheritance. Although we focus on its use in the analysis of text, the topic modeling framework is more general and has been used in a number of areas (?, ?).
Given a topic of some significance in a work, embodied in a set of semantically correlated legal concepts, we track its appearance and prevalence in subsequent constitutions within the corpus, as well as its extinction. While dynamical considerations have been incorporated previously into topic models (?, ?, ?) this analysis differs in that we account for the diffusion of topics from document to document, and in this way reveal more clearly the patterns of genealogy and the essentially recombinant nature of textual artifacts. These resemble in the parallel domain of invention the recombinant quality of patents (?, ?). It is our contention that while culture is clearly an active in situ feature of human brains (?, ?), it is also present in material artifacts which afford rich forms of combinatorial manipulation and transmission ex situ.
The corpus of national constitutions is particularly wellsuited to a framing and analysis as a document corpus composed of units of correlated meaning evolving according to idea diffusion and borrowing. Indeed, scholars have demonstrated that many provisions in constitutions are copied from those of other countries. For example, through gram analysis Ginsburg et al. (?, ?) show that constitutional preambles, which are conceptualized as the most nationally localized part of constitutions, also speak in a universal idiom and include a good deal of borrowing. Law and Versteeg (?, ?) have shown that rights provisions have spread around the globe. Elkins et al. (?, ?, ?) show that some rights, such as freedom of expression, have become nearly universal, while others have not. Some even argue that there is a kind of global script at work, whereby nationstates seek to use constitutions to participate in global discourses (?, ?, ?, ?). This evolutionary framing of the creation of national constitutions draws on broader biological analogies for legal development across time and space (?, ?). Our use of diffusion trees as a framework for the study of this problem (see the Methods section in the Supplementary Materials for details) can be seen as a novel quantification of this biological analogy.
It is important that we are clear that this integration of topic modeling and diffusion networks enables only a quantitative articulation and tracking of instances of thematic similarity over time. The links we demonstrate across texts are consistent with a model in which one text influences another. However, our approach does not demonstrate the specific mechanisms by which influences are transmitted, so we focus instead on the sequential patterns in which textual material flows across time and space. As we demonstrate in our Discussion, this enables an analysis enhancing traditional scholarly opinion as regards the usual notion of “influence”, while also at times uncovering temporal connections suggesting further or new investigations.
Results
As mentioned, a topic is a probability distribution over a fixed vocabulary derived from a text corpus. It thus represents a correlated set of words encoding something like a “meme” or stochastic set of associations. (Technically, the preprocessing of the texts may result in some elements of the vocabulary set that are not words per se, but instead word stems, often called “tokens”. We will use the more colloquial term “word” in this paper.) The text corpus is partitioned into documents, sets of roughly contiguous groupings of words. This is a standard topic modeling document length, short enough to reflect local context and long enough to make sensible the statistical model. In the best case each constitution would be partitioned into contiguous wordblocks, but processing may remove the odd abbreviation, title, etc. besides respecting natural boundaries, such as the end of one constitution and the beginning of another. In the case of our corpus of constitutions, each constitution generally comprises a subset of such documents. The model does not take into account word order, just which words occur and in what frequencies. This is the socalled “bagofwords” model or representation, which is then encoded as a probability distribution over the vocabulary (the frequencies are positive and sum to one).
Topic modeling is a methodology for learning topics such that each document (represented as a bag of words) is represented as a weighted sum (mixture) of topics. In its generative form, the topic model encodes the creation of each document by first choosing a topic according to the mixture of topics that the document comprises and then choosing a word according to the distribution of that particular topic. In this respect a constitution can be thought of as a “meme cloud” with the topics encoding the memes. We use the latent Dirichlet allocation (LDA) topic model (see (?, ?) for a discussion of the various parameters that define the model). LDA is effectively the topic modeling industry standard. We tested several choices for the number of topics and chose which we then validated (cf. the Methods section in the Supplementary Materials for details).
The output of the topic model forms the basis for our results. They include (1) the discovery of the topics that make up the corpus of constitutions, (2) the determination of their flow through time (“information cascades”), (3) the reconstruction of cultural diffusion trees; (4) network analysis of diffusion trees; and (5) discovery of a very biological pattern of inheritance with a highly skewed pattern of cultural fertility.
Topics
The topics were “handlabeled” by a constitution expert. Note that “handlabeling” of topics is standard. Further elaboration on this can be found in our Discussion. Since generally each constitution comprises a set of corpus documents we assign an overall constitutional weight for a topic as the average topic weight over the documents that it comprises. In Table 1 we list the ten topics with largest average topic weight (over all the constitutions), along with the ten most probable (heavily weighted) words (in decreasing order) for each topic.^{1}^{1}1A full list of the topics, in order of average weight, with the weights of the top 20 words can be found at https://www.math.dartmouth.edu/rockmore/topics_weight_order.txt.
Topic name  Top 10 words in topic 

General rights  right rights citizens freedom law 
public guaranteed citizen everyone religious  
Sovereignty  national people sovereignty law rights 
state flag language international equal  
public order  law public cases order one 
property laws authority liberty civil  
separation of powers  congress executive laws power ministers 
state secretaries order necessary public  
organic law  law government president organization national 
organic public laws social functioning  
socialism  people socialist country revolution working 
popular citizens system society development  
legislative sessions  session deputies sessions deputy members 
elected first vote majority extraordinary  
bureaucracy  papers years state department necessary 
respective individuals departments body power  
socialism legislature  people organs state supreme work 
organ presidium elected decisions committees 
Influence and clustering
The identification of the topics now gives a natural way to represent a constitution as a mixture of probability distributions. With that, we can compare quantitatively constitutions and get at a quantitative notion of influence, completely driven by the data of the words. A first coarse pass at this is to create a constitutional “family tree”, where the (unique) immediate ancestor of any given constitution is simply the constitution closest to it among all earlier constitutions. Given that our constitutions are now represented as probability distributions (over topics), a natural measure of distance is the KullbackLiebler (KL) divergence. Recall that the KL divergence of probability distributions and is defined as . KL is inherently nonsymmetric. A standard interpretation^{2}^{2}2See e.g., https://en.wikipedia.org/wiki/KullbackLeibler_divergence. is the degree to which a distribution approximates another distribution . So thinking of an earlier constitution as a potential model for a newly written constitution, the KL divergence of their underlying topic probability distributions is a natural measure of similarity.
The “KL Constitution Tree” is shown in Figure 1. Note that the figure is not scaled horizontally for time. The size and form of the representation presents some difficulty for reproducing legibly herein, so a separate pdf document, readily magnifiable, can be found online. ^{3}^{3}3See https://www.math.dartmouth.edu/rockmore/kltree.pdf. We also include a detail.
The KLtree is a coarse and aggregate articulation of the notion that constitutional ideas flow in time. It is also purely correlative and local. We should also like to explore global patterns of influence and the possibility of causal influence. We approach this by considering the “flow” of topics through constitutions and through time. Each instance of a topic flowing appearing in a constitution (above some fixed threshold) is treated as a “cascade”. We follow standard conventions (?, ?) and define an information cascade as a collection of constitutions and their timestamps where each topic in the constitution makes up a proportion greater than a robust threshold value. When two constitutions (nodes) both express a topic above threshold then we consider this pair as a candidate for information “cascading” from the earlier to the later.
The topic cascades form the underlying data for a mode of inference for how ideas represented by topics are likely to have propagated through the corpus over time. As stated previously, we view the observation of a topic (above some threshold) in two constitutions as a quantitative measure indicating correlation across time. Given the content of the topics and the fact that the constitutions are ordered chronologically and typically clustered spatially (see the Network Analysis subsection below and Figure 2), shared topics may very well have spread from the earlier to the latter, and hence are at least consistent with weak causality. In order to learn the most likely propagation structure of the topics (given the data) we estimate an underlying diffusion network for the corpus (?, ?). A diffusion network is a directed graph with nodes corresponding to constitutions and where the edges satisfy the condition that the source constitution predates the destination constitution. This imposes weak causal structure on the correlations. Importantly, we do not observe the diffusion network, but only the cascades that are assumed to diffuse over it and are consistent with it. In brief, a probabilistic model describing the consistency of the observed cascades with respect to a fixed diffusion network is defined. The diffusion network is that which (approximately) maximizes this probability (?, ?).
The presentation of the full diffusion tree on our corpus presents some visualization challenges. To give a sense of what it looks like, Figure S1 shows the entire learned diffusion tree on a restricted set of ninetynine constitutions. Even this is too dense to be inspected visually for information, but the figure at least gives a good sense of the way in which the methodology reifies the phenomena of the idea diffusion. Each of the edges (directed and extending downward) indicate particular topics diffusing forward in time to be taken up by subsequent constitutions. Issues of readability make it impossible to put labels on the various edges. The optimization algorithm that produces the diffusion network only collects a subset of the topics that appear in a constitution. Some diffuse forward, others do not. The “offspring” of a given constitution thus borrow certain “ideas” of the parents, but others are created afresh, presumably depending on legally appropriate contextual factors.
Network analysis
In order to discern patterns in the diffusion tree the diffusion network is subjected to a clustering analysis. This picks out communities of constitutions by methods of community detection and optimal modularity in which groups of constitutions which share topics – and thereby a directed edge – in an amount above that expected by chance. Such a community constitutes a cluster (?, ?). Figure 2 displays the results of a network reconstruction of the full circuit along with two color codings of the network resulting from the application of two forms of clustering analysis to the network. The network is illustrated using spring embedding whereby densely connected nodes appear packed together. The network has the form of a “constitutional caterpillar” with a temporal spine threaded through the network spanning 1789 to 2014 (Figure 2A). This temporal structure is very clear in the clustering coloring. Using community structure algorithms (?, ?) we observe (Figure 2B) three clear constitutional communities, each of which describes a span of time: epoch 1: from 1789 to 1936; epoch 2: from 1937 to 1967; and epoch 3: from 1968 to 2014. Using a spectral technique for community detection we can further partition (Figure 2C) these network data into higher order communities (?, ?). This analysis maintains the chronological structure and illustrates the way in which clusters that are growing in absolute size (more constitutions in each) have evolved to encompass roughly decreasing ranges of time.
Each constitution in the diffusion tree can be described in terms of its transmission motif – “tmotif”, a visualization of the indegree and outdegree for each constitution. A selection of these motifs is shown in Figure 3 with a full set in Supplementary Materials Figure S2. The motifs demonstrate the variation to be found in balancing inbound and outbound influence for each constitution. Early constitutions tends to have few parents (e.g., Canada only has one – the US constitution) whereas subsequent constitutions vary significantly in their ancestry. This variation can be explained thorough a combination of both time (earlier constitutions present more opportunities for imitation) and how representative, novel and applicable each constitutions is as a model for imitation.
Models for transmission
We can gain further insights into the patterns of inheritance by studying directly the distributions of indegree and outdegree across the entire dataset. Figures 4A and 4B represent the pdf (probability density function) and cdf (cumulative distribution function) for the indegree for all constitutions. Illustrated in blue is the data and in orange the maximum likelihood parameter estimates for the best fitting distribution. The indegree distribution is wellcaptured by a Gaussian distribution with a mean of and a standard deviation of . The estimated distribution does tend to slightly underestimate the mean but captures the tails very accurately. A straightforward interpretation is one of independent sampling of possible sources. The outdegree however, is quite different. Figures 4C and 4D show the best fitting Poisson distribution and the outdegree distribution. Whereas the mean is effectively recovered, the tails of the distribution are poorly fitted; the Poisson underestimates the number of constitutions with few offspring and overestimates the number of constitutions with many offspring. On the other hand consider Figures 4E and 4F where we show the best fitting negative binomial distribution to the data. This very accurately recovers the entire offspring distribution with maximum likelihood parameter estimates for the two shape parameters of the distribution as and Recall that for a negative binomial describes the number of offspring observed before no more offspring are generated and that the probability of producing an offspring is given by the value of . We view this as a pure birth process as constitutions never die – in the sense that they are always available as inspiration for a newly written constitution. Moreover, the negative binomial distributions are well known to be attractors of the Yule process (?, ?), also known as “preferential attachment” (?, ?) . The excellent fit of outdegree to this distribution has broader implications for connections between offspring number and longevity. In short, that we witness a small number of constitutions of relatively early constitutions of enduring influence. All of this – including the attendant modeling considerations – is considered in some greater detail in the Discussion below.
Growth and Lifespans
We are able to track the number of new constitutions written over time. We find statistical evidence for three epochs of authorship reflecting three distinct rates of growth (Figure 5 inset). These three growth phases coincide with the three temporal groupings of the transmission graph determined through spectral clustering. Hence there is an association between the growth rate and the detailed community structure of the graph. We also find significant variation in the lifespan of constitutions. The lifespan is defined as the first appearance to the last instance of influence. There is a strong association between how early a constitution is written and how long it is observed to live. Unlike biological life spans nearly all constitutions “die” young (Figure 5).
Discussion
We have searched for regular patterns of transmission in complex cultural artifacts. If there are cultural analogs to genotypes, and perhaps even phenotypes if we were to consider the broader context of constitutional influence, we should be able to observe their signatures in a temporally resolved study of evolving documents. Much like organisms that adapt to local environments, constitutions must be adapted to local cultural and legal conditions to be effective. And as with organisms, a great deal of variability in constitutions has been documented or inferred as derived from ancestral documents.
Our deeper discussion of the results starts with the labeling of the topics. We had an expert in constitutional law inspect the learned topics and provide labels for them corresponding to the dominant theme of the most probable words in each topic. We note that providing labels for the learned topics is a challenging task due to the lack of ground truth. Assigning labels to topics in our setting is essentially projecting the learned topics onto one’s conception of constitutional law and (admittedly) depends heavily on the individual involved contributing both bias and variance to the procedure. We assume that an expert in the field mitigates both of these effects and allows us to study the corpus using the learned topics.
Perhaps given the nature of the topic labeling problem (a general lack of ground truth) there is not much prior work on solving it. An early line of research examined whether commonly used predictive measures of topic models correlated with human interpretation of the topics and found that they did not (?, ?). This previous work also was the first to use human experiments to evaluate the interpretation of learned topics. More recent work has focused on incorporating knowledge bases of topics (e.g., WordNet) directly into topic models in order to encourage the model to learn topics that are interpretable by biasing them to look like topics in the knowledge base (?, ?). This is an interesting and difficult problem and further progress on it would enhance the results of this paper.
The motifs (Figures 3) illustrate clearly how constitutions are “cultural recombinants” borrowing extensively from their ancestors. Constitutions vary in their hybridicity. The motif variations suggest a constitution taxonomy, of minor, major, idiosyncratic, and innovative depending on where in the distribution matrix (divided via the median in both dimensions) the indegree and outdegree lie. As an example of a minor constitution, consider Switzerland 1848. It had no descendants and only two parents (Liberia 1847 and El Salvador 1843, both of which are probably explained by temporal proximity.) A major constitution, on the other hand, might be Thailand’s 1932 Constitution, which established a constitutional monarchy and a European style administrative system: it had 15 parents and 33 offspring, making it the third most densely networked in the data. Idiosyncratic constitutions include those of Burkina Faso 1991 and Lesotho 1983, with twelve and nine parents respectively, but only a single offspring each. Some 20% of texts in the data have a child/parent ratio of or less, indicating more than twice as many parental relationships as offspring. On the other hand, some 8% of constitutions in the sample have a child/parent ratio of two or more, indicating relatively high levels of innovation. Examples include Zambia’s 1991 constitution, with 4 parents and 11 offspring, or Micronesia’s constitution of 1990, with 8 parents and 24 offspring; in the latter case, it may be that the offspring are in fact those of the United States 1789, which was a very close model for Micronesian drafters. In general, parentchild relationships are temporally proximate, and they are often geographically proximate. This reflects the more general finding in the literature that time and space are powerful determinants of constitutional content. This diversity highlights an important difference from biology where species of organisms show far less variation in the basic mechanics of transmission.
Returning to the highlighted portion of Figure 1 to illustrate the mechanisms at play, consider Egypt’s 1923 Constitution and its relationship with those of its descendants. Examining the top ten topics in each text, Egypt 1923 shares multiple topics with Albania 1925 (topics act and public office) and Iraq’s 1925 documents (civil service and monarchy) Burundi (public office and labor) and one with Yugoslavia 1931 (mandate). No other constitutional dyad feature these combination of topics in the same density. While the influence of Egypt’s 1923 Constitution is well known to scholars of the Arab region, it also seems to share similarities with other documents drafted shortly thereafter in neighboring parts of Europe and Africa. This illustrates how our method can point scholars to look at new links that conventional analysis might not identify.
The most fecund constitution in our network is surprising at first glance: Paraguay’s 1813 Constitution. It makes sense, however, when one realizes that Latin America is the home to a plurality of constitutional texts, because it is a region of old nation states and frequent turnover (?, ?). Paraguay’s was the first constitution adopted in Latin America after the Spanish Constitution of Cadiz of 1812. That document embodied an illfated attempt to establish a liberal constitutional monarchy in Spain, featuring equality under the law and popular sovereignty, and is recognized as a model for the constitutions of Norway of 1814, Portugal of 1822 and Mexico of 1824. The top topic in this Constitution, “language of law” consists of generic legal terms that are, of course, widely used in constitutional texts. So the influence was more formal than substantive.
Conversely, some canonical constitutions do not indicate the same kind of influence in our analysis that conventional analysis would expect. For example, the 1936 Constitution of the Soviet Union is well known as a major step in the ideological development of communism in that it incorporated many rights that were never implemented. Yet at the level of ideas, much of this involved borrowing from extant models, such as the 1931 Republican constitution of Spain. Perhaps unsurprisingly, there was little new that was in the USSR’s constitution and so it has few children. Similarly, the Weimar Constitution of 1919, which was thought to have embodied social democratic ideas (?, ?), in fact was squarely within the topical mainstream of its time. With six parents and nine offspring, it is near the medians and its oldest direct ancestor is only 14 years prior to it. It shares three of its top ten topics (”geography”, ”human rights”, and ”education”) with SpainÕs Republican Constitution of 1931, which is regarded as an important and influential text. Its last direct descendent is the 1936 Constitution of the Soviet Union, with which it shares the topic ”social development.” This supports the claim that our method emphasizes ideological connections across text, because the Weimar Constitution is generally considered to have been a structural model for France’s 1958 Constitution (?, ?) though ideologically perhaps closer to that of the USSR.
The notion of “cultural recombination” imports one kind of biological analogy to the evolution of constitutions. The distributions of the indegree and outdegree support different biological analogies. Consider again the striking result of the fit of the outdegree distribution to the negative binomial and the indegree to the Gaussian. A principled way to understand these distributions is to derive them from suitable stochastic processes. The Gaussian distribution arises naturally from the sum of independent random variables with a well defined mean and variance. Poisson distributions are attractors of the GaltonWatson process whereas negative binomial distributions are attractors of the Yule process (see e.g.(?, ?)). Both Poisson and negative binomial offspring distributions are observed frequently in biological systems. The GaltonWatson process was derived to explain the extinction of family names. The idea is that at each generation a parent can transmit their name to some number of offspring. Each parent samples the number of offspring independently from the same distribution. Our data support a negative binomial distribution so we shall focus on the Yule process. The Yule process is also well known as a preferential attachment process (?, ?) as it can be derived from an “urn process” in which balls of a given color are sampled in linear proportion to the number of balls already in each urn. The negative binomial distribution is derived by solving a simple recurrence equation describing the temporal evolution of a probability distribution of the form,
Here is the probability of finding constitutions at time . The rate of offspring production in some interval is parameterized by Hence at a time a number of constitutions will decline through the addition of more offspring proportional to and increase through the production of offspring by the class at a rate . If we establish an initial condition as the number of constitutions at the start of constitutional history as , , we find that,
Which takes the form of the negative binomial distribution in which we observe exactly offspring in trials with a success probability, For a formal exposition of preferential attachment dynamics illustrating the relationship of negative binomials to the special case of power laws see (?, ?).
We can test the assumptions of the Yule process by looking directly at the imitation dynamics of any given constitution. We simply plot the date on which the descendant of a given constitution was created against the order in which it was created. In Figure 6A we look at the evolution of the first constitutions. By far the majority have fewer than offspring and these offspring span a range of under years. However a few of these constitutions are exceptional. The most remarkable is the 1813 constitution of Paraguay that has provided material for descendant constitutions in a temporal range extending years. This is followed by the original constitution of the Unites States of America from that produces 20 descendant constitutions, and over a span of years. The Canadian constitution of 1791 produces 11 descendants over 150 years. Figure 6B includes the first constitutions, 6C the first , and 6D all in the data set. A clear relationship between offspring number and longevity emerges consistent with preferential attachment in which a small number of constitutions are of dominant influence, these appeared early in constitutional history gaining a significant foothold, and with the vast majority of constitutions both short lived and producing less than 10 offspring.
The analysis of cultural recombination through a principled decomposition of textual artifacts suggests new domains of cultural inheritance. Unlike simple Mendelian systems, or simple learning models with homogeneous rules, we observe diverse patterns of variation in the way in which nations encode important moral and legal principles. Moreover we can obtain a principled definition of a meme – or unit of cultural transmission – that goes beyond the single “word” and captures highly linked sets of words expressing a functional, legal category – much the way a gene, composed of linked sets of nucleotides – contributes to a function. Nations differ in their debt to the past and their original contributions to the future. This allows us to speak in a rigorous fashion about phylogenetic concepts like analogy and homology when it comes to a cultural artifact. This has been an area of active research which includes the formal analysis of cultural and symbolic systems (?, ?, ?, ?), experimental approaches to cultural transmission (?, ?, ?), and qualitative frameworks of integration (?, ?). At this point in time the status of key phylogenetic concepts applied to culture is in flux (?, ?), we favor an instrumental approach defining cultural analogy and homology strictly in phylogenetic terms.
We suggest that the “semantic” interpretation of a given constitution and its practical legal impact is what we mean by the phenotype. We might expect many different genotypes to be neutral in that their interpretations are equivalent, and that constitutions vary in their “penetrance”, that is their influence on cultural practices.
This approach builds on prior research related to concepts such as “citation backbones” (?, ?) in which citations to prior publications form a treelike structure from which novel papers descend, patent backbones in the automobile industry (?, ?), skewed patterns of borrowing in human designed artifacts (?, ?), patterns of word borrowing (?, ?) and the evolution of programming languages (?, ?).
Reconciling statistical patterns of influence with potential biases and patterns in thinking and writing will bring us closer to frameworks that connect methods of mathematical science with objects of psychological and humanistic interest in the service of new models and theories of cultural transmission and influence. The evolution of the law with its rich textual and interpretive traditions provides a nearly ideal model system.
Supplementary Materials for The Cultural Evolution of National Constitutions: Supporting Information
Materials and Methods
As explained, our results and methodology depend on the use of topic models (see e.g., (?, ?)) and diffusion networks. Topic models are statistical models to learn the underlying structure of a corpus of documents. There are many flavors of topic model. We use the Latent Dirichlet Allocation (LDA) (?, ?) probabilistic generative topic model. The underlying topics are represented as latent variables in a hierarchical Bayesian model. A generative model is assumed to be responsible for the observed documents and the word distributions of each topic. The topic proportions of each document can be learned via estimation of the posterior distribution of latent variables conditioned on the observed documents. The topic representations of the constitutions then form the underlying data for the inference of the diffusion network a la (?, ?). Some details of this now follow.
Materials
Our basic materials are 591 constitutions in English obtained from the publicly available and accessible Comparative Constitutions Project website (http://comparativeconstitutionsproject.org/). A complete list of the constitutions we use is in Table S1.
Methods
Topic modeling
The foundation of our text analysis is the use of a form of topic modeling on the corpus of 591 constitutions from which we derive a diffusion network for the inferred topics. A topic is a probability distribution over a fixed vocabulary derived from a text corpus. The corpus is composed of documents, where a document consists of a set of (possibly nonunique) words from the vocabulary.
We obtained PDF versions of the constitutions from (?, ?) and converted them to text files. Table S1 provides a list of the constitutions. The documents in the corpus are contiguous blocks of text extracted from partitioning the constitutional texts. We set a document length of 500 words and also require that documents respect the borders of constitutions (i.e., no document straddles multiple constitutions). If the length of a document is too long then the learned topics will put similar probability on many words and thus will not capture our intuitive notion of a topic. If the document length is too short the resulting topics are overfit to specific documents as there is insufficient data to learn general topics that can be used across the corpus. In addition, the choice of document length depends on the type of structure we are interested in, short document lengths are good for learning localized topics that are specific whereas longer document lengths learn smooth topics that explain large portions of the corpus.
We use a standard methodology for further preprocessing the documents by stemming the documents using the well known NLTK stemming package (http://www.nltk.org/api/nltk.stem.html), removing English stopwords as well as words that appear less than 20 times across the entire corpus. We also remove words that appear in over 90% of the corpus. The resulting vocabulary consists of 3,546 unique terms. We then computed the number of occurrences of each word in the vocabulary in each document so that each document is represented by a 3,546 dimensional vector where the th entry contains the number of occurrences of token in the document. This is a bag of words representation. (i.e., that the order of words does not matter, also referred to as exchangeable) and additionally we assume that the order of the documents, both within and between constitutions, does not matter.
We then topic model the document corpus using the Latent Dirichlet Allocation (LDA) (?, ?) probabilistic generative topic model. In the topic model a document is viewed as a mixture of topics where the underlying topics are represented as latent variables in a hierarchical Bayesian model. A generative model is assumed to be responsible for the observed documents and the word distributions of each topic. The topic proportions of each document through the posterior distribution of the latent variables conditioned on the observed documents.
To set notation let our corpus be defined as a set of documents, . Let denote the document length and break a constitution into multiple documents, respecting constitution boundaries. A topic is a distribution over a fixed vocabulary and can thus be represented by a vector , , where is the size of the vocabulary and the entry is the probability of picking word from this topic. We denote the proportion of document that is made up of topic by , where is the number of topics, where and . Given the th word in document , , let indicate which topic the token is drawn. Let denote the Dirichlet distribution with parameter and denote the multinomial distribution over the distribution .
The specific generative process underlying LDA is as follows:

Fix , the number of topics

For each topic , draw

For each document :

Choose topic proportions

For each word position :

Choose a topic indicator

Choose a word


Note that LDA depends on four parameters, the Dirichlet parameters , the number of topics and the document length . In order to expedite the mixing of the Markov chain and reduce experiment time, we fix , and , and vary the value of . Choosing an appropriate number of topics for a given corpus is a problem of model selection. We carried out 5fold crossvalidation to optimize . Specifically, we split the corpus evenly into 5 folds which are used to define training and testing sets to evaluate parameter configurations. For each configuration of we hold out one of the folds as a test set, , and use the other four as the training set, . We ran the Gibbs sampler for LDA on the training set for iterations which produced samples from the posterior distribution which were then used to evaluate the likelihood of the test set, which measures the generalization ability of the model. Unfortunately, the computation of the heldout likelihood, is intractable so we adopted the Chibstyle estimation in (?, ?) to efficiently approximate it. The values of and that obtain the highest overall heldout likelihood over the five folds are chosen for the rest of our analysis. Figure S3 shows the effect of varying from which we see the optimal value is .
Inferring diffusion networks
The topics, , that are learned with LDA represent high level ideas and each constitution can be represented by the proportion of topics it exhibits, (which we described how to compute above). As demonstrated in experiments, these discovered topics correspond to high level legal aspects, such as human rights, international agreements, and economic systems. By treating a topic as the unit in a diffusion and tracking the occurrence of each topic at each constitution over time, we can learn an underlying diffusion network by which topics spread through constitutions over time, thus uncovering the diffusion patterns of legal evolution over time.
We follow the method of Rodriguez et al. (?, ?) for inferring diffusion networks. We define a cascade as a set of pairs , indicating that cascade was observed at node/constitution at time . Each topic, , will have an associated cascade, , so that means that topic has spread to constitution at time . To determine if a topic has spread to a constitution, we set a threshold and we say the topic is observed at the nodes/constitutions whose proportion of this topic are among the top percent When a topic does not spread to a constitution, , we set . Note that we do not observe the path by which topics are spreading but only where topics have spread to at a given time.
Having defined cascades, we describe a probabilistic model of how they diffuse through constitutions. Specifically, we denote the probability that a cascade is transmitted from node to node as , where , indicating that a constitution can only be influenced by its predecessors. In our experiment, we take , where is the diffusion parameter and and are the timestamps of constitutions and .
A diffusion network is a directed graph (?, ?) where an edge from node to node indicates that topics can diffuse from constitution to constitution . We note that any directed graph can be represented as the union of the set of spanning trees (?, ?), i.e. subgraphs that connect all of the nodes and that have no cycles. The inference process produces an optimal network able to accommodate the observed cascades. To get a sense of what the process produces, see Figure S1 for an inferred diffusion network derived from the methodology discussed here on a subset of constitutions.
First, we define the probability that a cascade is consistent with a given tree structure (where the edges in obey the ordering of time stamps) to be the following:
(1) 
Notice that Eq.1 assumes that all edges in are independent but that are conditional probabilities so that Eq. 1 defines a Markov process. Notice that Eq. (1) only depends on the edges in the tree as nodes not observed in a cascade have and infinite time of observation^{4}^{4}4In (?, ?) the probability of a cascade spreading from a given node or dying off at the node is modeled with a Bernoulli random variable in order to account for the fact that cascades usually do not reach all nodes and thus controlling the size of the cascades. However, the probability of spreading and not spreading turns out to be a constant in the optimization used to infer a diffusion network so we ignore it here and it turns out to be computationally advantageous to control the complexity of the inferred diffusion network using a constraint on the number of edges in the inferred diffusion network..
Using Eq.1 we define the probability of observing a cascade given an arbitrary diffusion network as:
(2) 
where is the set of all spanning trees of and we assume is uniform over all spanning trees . Lastly, we define the probability of observing all cascades, , one for each topic, for a given diffusion network as:
(3) 
The goal is then to find the maximum likelihood diffusion network by maximizing Eq.3 with respect to over all possible directed graphs with consistent time stamps (directed edges only emanate from earlier constitutions and terminate in later constitutions). Formally, we need to solve the following optimization problem:
(4) 
where the constraint, , indicates that the number of edges in be less than . This constraint provides complexity control since the graph consisting of all edges from a constitution to later constitutions is a trivial solution and because as mentioned above cascades usually only consist of a subset of constitutions. Optimizing Eq. 4 is NPhard, however, an efficient greedy algorithm that obtains a nearoptimal solution due to the submodularity of the problem (?, ?). In addition, we use a heuristic that stops the algorithm from adding new edges (and thus terminating) when the objective function in Eq. 4 reaches of an upper bound derived in (?, ?). This allows us to avoid using expensive crossvalidation when setting the complexity, , of the model.
A key parameter is the threshold , which sets the fraction of topics viewed as important at a given constitution. In order to set , we varied between to , and inferred the diffusion network for each of the values. We optimize relative to the parameters of the mean in and outdegrees of all inferred diffusion networks sat each parameter. This can be found in Figure S4. Note the hump shape of the means in increasing , reflecting the gradual accumulation of possible paths in increasing and then a tailing off as the optimization aspect of the diffusion network construction begins to winnow edges. After observing both in and outdegrees reach peaks at , we investigate the robustness around . For a set of thresholds, most densely sampled around we create a vector of indegrees ordered according to the year of the constitution and a vector of outdegrees similarly ordered. Then for the indegrees a (symmetric) matrix is constructed computing the Pearson correlation of the entry, and similarly for the outdegrees. Figures S4 and S5 show the heat map of values. The farther away you are from the diagonal, the farther you are in threshold difference (most densely sampled near 0.3). The slow rolloff in color reflects the robustness of the calculation in that region – i.e., the indegree and outdegree orderings are not changing much as threshold varies between and . All of this motivates a choice of .
The final diffusion tree produces for each constitutions a set of direct descendants and ancestors, thereby giving rise to the indegree/outdegree “motifs”. A full list of motifs is presented in Figure S2. The full tree can be found at www.math.dartmouth.edu/rockmore/FullConstDiffNet.pdf. A detail of the full tree is given in Figure S5.
References
 ARTstorARTstor ARTstor. (n.d.). Retrieved from http://www.artstor.org (Accessed April, 2013)
 BleiBlei Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 7784.
 Blei LaffertyBlei Lafferty Blei, D. M., Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the 23rd international conference on machine learning (pp. 113–120). New York, NY, USA: ACM. Retrieved from http://doi.acm.org/10.1145/1143844.1143859 doi: 10.1145/1143844.1143859
 Blei, Ng, JordanBlei et al. Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 9931022.
 BoliBennettBoliBennett BoliBennett, J. (1987). Human rights or state expansion? Crossnational definitions of constitutional rights, 18701970. In G. Thomas, J. Meyer, F. Ramirez, J. Boli (Eds.), Institutional structure (pp. 71–91). Sage.
 Boyd RichersonBoyd Richerson Boyd, R., Richerson, P. J. (1996). Why culture is common but cultural evolution is rare. Proceedings of the British Academy, 88, 73930.
 Chang, BoydGraber, Gerrish, Wang, BleiChang et al. Chang, J., BoydGraber, J. L., Gerrish, S., Wang, C., Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Nips (Vol. 31, pp. 1–9).
 ChristiansenChristiansen Christiansen, F. B. (2008). Theories of population variation in genes and genomes. Princeton University Press, Princeton, NJ.
 The Comparative Constitutions ProjectThe Comparative Constitutions Project The Comparative Constitutions Project. (n.d.). Retrieved from http://www.comparativeconstitutionsproject.org (Accessed April, 2013)
 Cormen, Leiserson, Rivest, SteinCormen et al. Cormen, T., Leiserson, C., Rivest, R., Stein, C. (2001). Introduction to algorithms. MIT Press, Cambridge MA.
 EldredgeEldredge Eldredge, N. (2011). Paleontology and cornets: Thoughts on material cultural evolution. Evolution: Education and Outreach, 4(3), 364–373. Retrieved from http://dx.doi.org/10.1007/s120520110356z doi: 10.1007/s120520110356z
 Elkins, Ginsburg, MeltonElkins et al. Elkins, Z., Ginsburg, T., Melton, J. (2009). The endurance of national constitutions. Cambridge University Press.
 Elkins, Ginsburg, SimmonsElkins et al. Elkins, Z., Ginsburg, T., Simmons, B. (2013). Getting to rights: Constitutions and international law. Harvard International Law Journal, 51, 201–34.
 Foti, Ginsburg, RockmoreFoti et al. Foti, N., Ginsburg, T., Rockmore, D. (2014). ‘We the Peoples’: The global origins of constitutional preambles. George Washington International Law Review,, 46, 101–134.
 Girvan NewmanGirvan Newman Girvan, M., Newman, M. E. (2002, June). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826. Retrieved from http://dx.doi.org/10.1073/pnas.122653799
 GoGo Go, J. (2003, March). A globalizing constitutionalism? Views from the postcolony 19452000. International Sociology, 18, 7195.
 GomezRodriguez, Leskovec, KrauseGomezRodriguez et al. GomezRodriguez, M., Leskovec, J., Krause, A. (2012). Inferring networks of diffusion and influence. TKDD, 5(4), 21.
 The Google books NGram ViewerThe Google books NGram Viewer The Google books NGram Viewer. (n.d.). Retrieved from http://books.google.com/ngrams (Accessed April, 2013)
 The Google books websiteThe Google books website The Google books website. (n.d.). Retrieved from http://books.google.com/ (Accessed April, 2013)
 Gualdi, Yeung, ZhangGualdi et al. Gualdi, S., Yeung, C. H., Zhang, Y.C. (2011). Tracing the evolution of physics on the backbone of citation networks. CoRR, abs/1108.1325.
 Hart ClarkHart Clark Hart, D. L., Clark, A. G. (1997). Principles of population genetics. Sinauer Associates, Inc Publishers, Mass.
 Henrich McEalreathHenrich McEalreath Henrich, J., McEalreath, R. (2003). The evolution of cultural evolution. Evolutionary Anthropology, 12, 132135.
 Hughes, Foti, Krakauer, RockmoreHughes et al. Hughes, J. . M., Foti, N. J., Krakauer, D. C., Rockmore, D. N. (2012). Quantitative patterns of stylistic influence in the evolution of literature. Proceedings of the National Academy of Sciences, 109(20), 7682–7686.
 International Music Score Library ProjectInternational Music Score Library Project International Music Score Library Project. (n.d.). Retrieved from http://imslp.org/ (Accessed July, 2013)
 JockersJockers Jockers, M. (2013). Macroanalysis: Digital methods and literary history. University of Illinois Press.
 KaiserKaiser Kaiser, D. I. (2009). Drawing theories apart: The dispersion of feynman diagrams in postwar physics. University of Chicago Press.
 Karlin TaylorKarlin Taylor Karlin, S., Taylor, H. M. (1975). A first course in stochastic processes. Academic Press.
 LawLaw Law, D. (2005, Feb.). Generic constitutional law. Minn. L. Rev., 89, 652.
 Law VersteegLaw Versteeg Law, D., Versteeg, M. (2011). The evolution and ideology of global constitutionalism. Cal. Law Review, 99, 1163.
 Leskovec, McGlohon, Faloutsos, Glance, HurstLeskovec et al. Leskovec, J., McGlohon, M., Faloutsos, C., Glance, N. S., Hurst, M. (2007). Patterns of cascading behavior in large blog graphs. In Sdm (p. 551556).
 Lin, chen, ChenLin et al. Lin, Y., chen, J., Chen, Y. (2011). Backbone of technology evolution in the modern era automobile industry: an analysis by the patents citation. J Syst Sci Syst Eng, 20, 416442.
 Mace HoldenMace Holden Mace, R., Holden, C. J. (2004). A phylogenetic approach to cultural evolution. Trends in Ecology and Evolution, 20, 116121.
 Mesoudi WhitenMesoudi Whiten Mesoudi, A., Whiten, A. (2008). The multiple roles of cultural transmission experiments in understanding human cultural evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 363(1509), 3489–3501. Retrieved from http://rstb.royalsocietypublishing.org/content/363/1509/3489 doi: 10.1098/rstb.2008.0129
 Mesoudi, Whiten, LalandMesoudi et al. Mesoudi, A., Whiten, A., Laland, K. N. (2006). Towards a unified science of cultural evolution. BEHAVIORAL AND BRAIN SCIENCES, 29, 329383.
 Michel et al.Michel et al. Michel, J.B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Team, T. G. B., … Aiden, E. L. (2010). Quantitative analysis of culture using millions of digitized books. Science, 331, 176182.
 MorettiMoretti Moretti, F. (2005). Graphs, Maps, Trees. Verso Books.
 NelsonSathi et al.NelsonSathi et al. NelsonSathi, S., List, J.M., Geisler, H., Fangerau, H., Gray, R. D., Martin, W., Dagan, T. (2011). Networks uncover hidden lexical borrowing in indoeuropean language evolution. Proceedings of the Royal Society of London B: Biological Sciences, 278(1713), 1794–1803. Retrieved from http://rspb.royalsocietypublishing.org/content/278/1713/1794 doi: 10.1098/rspb.2010.1917
 NewmanNewman Newman, M. (2006). Modularity and community structure in networks. Proceedings of National Academy of Sciences, USA, 103, 85778582.
 Nowak KrakauerNowak Krakauer Nowak, M., Krakauer, D. (1999). The evolution of language. Proc Natl Acad Sci USA, 96, 80288033.
 Nowak, Plotkin, KrakauerNowak et al. Nowak, M., Plotkin, J., Krakauer, D. (1999). The evolutionary language game. J. theor . Biol, 200, 147162.
 PagelPagel Pagel, M. D. (2013). Wired for culture. W.W. Norton & Company.
 Richerson BoydRicherson Boyd Richerson, P. J., Boyd, R. (2006). Not by genes alone: How culture transformed human evolution. University Of Chicago Press.
 RiddellRiddell Riddell, A. (2014). How to Read 22,198 Journal Articles: Studying the History of German Studies with Topic Models. In M. Erlin L. Tatlock (Eds.), Distant readings: Topologies of german culture in the long nineteenth century (pp. 91–114). Rochester, NY, USA: Camden House.
 D. S. Rogers EhrlichD. S. Rogers Ehrlich Rogers, D. S., Ehrlich, P. R. (2007). Natural selection and cultural rates of change. Proceedings of the National Academy of Sciences, 105, 3416–3420.
 E. M. RogersE. M. Rogers Rogers, E. M. (1995). Diffusion of innovations. The Free Press, NYC.
 RossRoss Ross, N. (2013). Power laws in preferential attachment graphs and stein’s method for the negative binomial distribution. Advances in Applied Probability, 45, 876893.
 Sforza FeldmanSforza Feldman Sforza, L. L. C., Feldman, M. W. (1981). Cultural transmission and evolution: A quantitative approach. Princeton University Press, Princeton, NJ.
 SkachSkach Skach, C. (2006). Borrowing constitutional designs. Princeton: Princeton University Press.
 Stanford Topic Modeling ToolboxStanford Topic Modeling Toolbox Stanford Topic Modeling Toolbox. (n.d.). Retrieved from http://nlp.stanford.edu/software/tmt/tmt0.4/ (Accessed July, 2013)
 Valverde SoléValverde Solé Valverde, S., Solé, R. V. (2015). Correction to ’punctuated equilibrium in the largescale evolution of programming languages’. Journal of the Royal Society, Interface, 13 117.
 van der Hofstadvan der Hofstad van der Hofstad, R. (2017). Random graphs and complex networks. Cambridge Universty Press, Cambridge, UK.
 VenterVenter Venter, F. (2013). Constitutional comparison. Cambridge, MA: Kluwer.
 Wallach, Murray, Salakhutdinov, MimnoWallach et al. Wallach, H. M., Murray, I., Salakhutdinov, R., Mimno, D. M. (2009). Evaluation methods for topic models. In Icml (p. 139).
 Wang McCallumWang McCallum Wang, X., McCallum, A. (2006). Topics over time: A nonMarkov continuoustime model of topical trends. In Proceedings of the 12th acm sigkdd international conference on knowledge discovery and data mining (pp. 424–433). New York, NY, USA: ACM. Retrieved from http://doi.acm.org/10.1145/1150402.1150450 doi: 10.1145/1150402.1150450
 WatsonWatson Watson, A. (1974). Legal transplants. Cambridge University Press.
 WimsattWimsatt Wimsatt, W. C. (1999, April). Genes, memes, and cultural heredity. Biology and Philosophy, 14, 279–310.
 Wood, Tan, Das, W. Wang, ArnoldWood et al. Wood, J., Tan, P., Das, A., W. Wang, W., Arnold, C. (2016). Sourcelda: Enhancing probabilistic topic models using prior knowledge source. Retrieved from https://arxiv.org/abs/1606.00577
 Youn, Strumsky, Bettencourt, LoboYoun et al. Youn, H., Strumsky, D., Bettencourt, L., Lobo, J. (2015). Invention as a combinatorial process: evidence from us patents. J. R. Soc. Interface, 12, 106.