\thechapter Abstract

Multiscale Structure in Eco-Evolutionary Dynamics

\publishersUniversity of Massachusetts Boston
Department of Physics

I’ve done this because I love the act of writing, which is to say I love the act of discovery, of revelation, and then the attempt to share that revelation in all its fullness and clarity. You can never be sure how many people will want to share in that feeling. Ta-Nehisi Coates (2014)

This is a slightly expanded version of my PhD thesis, which was accepted by Brandeis University’s Martin A. Fisher School of Physics in May of 2015. In turn, that thesis was based on research I began at the New England Complex Systems Institute.

Contents:

Chapter \thechapter Abstract

In a complex system, the individual components are neither so tightly coupled or correlated that they can all be treated as a single unit, nor so uncorrelated that they can be approximated as independent entities. Instead, patterns of interdependency lead to structure at multiple scales of organization. Evolution excels at producing such complex structures. In turn, the existence of these complex interrelationships within a biological system affects the evolutionary dynamics of that system. I present a mathematical formalism for multiscale structure, grounded in information theory, which makes these intuitions quantitative, and I show how dynamics defined in terms of population genetics or evolutionary game theory can lead to multiscale organization. For complex systems, “more is different,” and I address this from several perspectives. Spatial host–consumer models demonstrate the importance of the structures which can arise due to dynamical pattern formation. Evolutionary game theory reveals the novel effects which can result from multiplayer games, nonlinear payoffs and ecological stochasticity. Replicator dynamics in an environment with mesoscale structure relates to generalized conditionalization rules in probability theory.

The idea of natural selection “acting at multiple levels” has been mathematized in a variety of ways, not all of which are equivalent. We will face down the confusion, using the experience developed over the course of this thesis to clarify the situation.

Chapter \thechapter applies the general abstract framework of multiscale structure to some geometrical examples, to build intuition for it, and then connects it with population genetics and network theory. Chapter \thechapter studies emergent multiscale structure in a spatial evolutionary ecosystem. Next, Chapter \thechapter takes a different approach to the notion of “more is different,” using both simulations and dynamical systems theory to understand evolutionary games in which the interactions do not resolve into pairs.

I have set aside Chapter \thechapter to summarize the parts of probability theory which will be necessary for the following two chapters, because I’ve yet to find a textbook which has the necessary stuff all in one place. Chapter \thechapter is purely analytical: I break a theorem from the literature, show how to fix it and then point out where it will break again. The goal for Chapter \thechapter is to provide analytical arguments for at least a few of the things seen in Chapters \thechapter, \thechapter and \thechapter. Specifically, I aim to use universality to predict critical exponents for phase transitions.

Chapters \thechapter and \thechapter are mostly about explaining other people’s work in a way I can understand. Chapter \thechapter is essentially a concept piece, intended to sketch out the possibility of new interesting problems.

Chapter \thechapter Multiscale Structure

1 Introduction

A century and odd years ago, the philosopher William James asked [2],

What shall we call a thing anyhow? It seems quite arbitrary, for we carve out everything, just as we carve out constellations, to suit our human purposes. For me, this whole ‘audience’ is one thing, which grows now restless, now attentive. I have no use at present for its individual units, so I don’t consider them. So of an ‘army,’ of a ‘nation.’ But in your own eyes, ladies and gentlemen, to call you ‘audience’ is an accidental way of taking you. The permanently real things for you are your individual persons. To an anatomist, again, those persons are but organisms, and the real things are the organs. Not the organs, so much as their constituent cells, say the histologists; not the cells, but their molecules, say in turn the chemists.

The Jamesian view is that none of these scientific disciplines ought to be taken as more “fundamental” than another. Each must prove its own worth by way of its pragmatic utility; none is by necessity merely the reduction of another to a special case.

In the study of complex systems, we face this directly. A complex system exhibits structure at many scales of organization. For example, one can study human beings at any magnification, from the molecular level to the societal, and an entire science flourishes at each level. We have developed a formalism for making this intuition mathematically precise and quantitatively useful, employing the tools of information theory [3, 4, 5, 6, 7, 8]. To explore how this formalism can be used, and to make clear the intricacies of multiscale information theory, we shall in this chapter apply that theory to an illustrative class of geometrical problems. Having done this, we will be in a good position to use it to study collective behaviors in systems developed in mathematical biology.

Thinking clearly about what we mean by “complexity” is important for biology, and few problems bring this home more clearly than the so-called C-value paradox. This is the puzzle that the sizes of species’ genomes do not correlate with any obvious, intuitive or meaningful measure of organismal complicatedness [9]. A species’ C-value is the characteristic amount of DNA which occurs in one set of chromosomes within its nucleus. It can be measured in picograms, for a physical unit, or in base pairs, for a more informational one. (A trillionth of a gram roughly works out to a billion base pairs.) One might think that species with larger C-values would be more “complex” by some fairly apparent standard. However, nature has not turned out that way. The domestic onion, Allium cepa, has approximately 16 billion base pairs on one set of chromosomes. This is roughly five times the total size of the human genome [10]. And the problem goes beyond the check to our pride: life forms which seem by all accounts to be comparably complicated can have widely separated genome sizes. For example, lungfish can have genomes 350 times larger than those of pufferfish [10]. Even close evolutionary proximity is no guarantee that C-values will agree. Zea mays, the maize plant, diverged from the teosinte grass Zea luxurians about 140,000 years ago [11], and in that time, its genome size has increased by half [12, 9].

So, the puzzle: what does all that extra genetic information do? The answer, in brief terms, is basically nothing.

Nor are we humans making much use of our own genomes, percentage-wise. Eukaryotic DNA contains, as a general rule, vast supplies of junk [9, 13, 14, 15, 16, 10, 17, 18, 19]. Some DNA specifies the sequences of proteins, and is designated “coding” DNA. Other stretches of the double helix play a role in regulating which genes are active and when. Still other portions of the genome are transcribed into the RNA components of cellular machinery like ribosomes. But even with all these accounted for, there remain sequences which are, identifiably, detritus.

The C-value “paradox” is not so paradoxical after all, then: the variable amounts of genomic bloat due to nonfunctional DNA make C-value variations a rather unsurprising phenomenon. The presence of a large quantity of nonfunctional DNA can have a biological effect, since it takes up space and increases the resources required for cells to replicate. For example, salamanders carry a truly remarkable amount of genetic information, with different species possessing genomes four to thirty-five times the size of our own [10]. Plainly, it is possible to make a salamander using far less DNA than some species of them have. Among salamander species, larger genome size is correlated with slower regeneration of lost limbs, suggesting that elevated genome size might be somewhat costly [20]. The essential points are, first, that this cost is, if it exists, not on the whole deleterious enough for natural selection to act strongly against it [10], and second, that it is due to the quantity of DNA present, not its specific sequence.

Moreover, the presence of DNA detritus suggets a way of thinking about complexity more quantitatively. The matter is one of effective description. Let us focus, for the moment, on the complexity of a genome itself, rather than of the body plan associated with it. Our intuition leads us to say that we require more information to describe a more complicated genome. However, large stretches of a eukaryotic genome will be junk sequences, which can be switched with other, equally nonfunctional sequences or even deleted entirely with little or no effect. This suggests a strategy: we can describe the functional portion of the genome—the protein-coding genes, the regulatory regions and so forth—faithfully, and then we can loosely characterize the rest. We take careful notes about the functional parts, and then we fill in the rest with broad brush-strokes. A coarse-grained description of the nonfunctional portion is adequate, because any other nucleotide sequence which satisfies the same coarse-grained criteria could be swapped in for the actual junk.

In turn, we can apply the same method to the functional portion. Multiple nucleotide sequences are translated to the same protein, because multiple codons in the genetic code stand for the same amino acid [21, 22]. We can, therefore, exploit this redundancy and use a smaller number of characters to represent what it is most important to know about each protein-coding gene. Then, multiple amino-acid sequences are often biologically equivalent to one another, because the substitution of one amino acid for a similar one does not drastically change the resulting protein [23]. So, we can describe a protein in a coarse-grained way, and so on. Indeed, this plurality of possible sequences compatible with the same coarse-grained description is biologically essential: given the amount of DNA which human cells carry, we would otherwise be ground under heel by the genetic load of deleterious mutations. It also affects the rate of evolution, since a population can more easily explore the space of possible genomes when there is a network of neutral paths through it [24].

The general lesson is that partial descriptions of a system can have increased utility if they can exploit patterns and redundancies. Furthermore, the way the utility of a description increases as we allow more information to be used tells us about the structure of the system we are describing. This approach is different from the way information theory has typically been used in the past, because we are considering scale and information as complementary quantities [3]. We measure the effort which goes into a description in units of information, whereas the effectiveness of that description is the scale of what it captures.

We now turn to mathematizing this idea, following earlier work by the author and others on multiscale structure and information theory. The resulting formalism will be applicable at levels from the intracellular to the societal. This will allow us to discuss descriptions, utilities and related concepts beginning from an axiomatic starting point (so that we will not need molecular biology in order to define utility). With these concepts developed and some illustrative examples analyzed, we will then apply them to evolutionary dynamics.

1.1 Information-Theoretic Axioms for Structured Systems

For convenience, we review the basic axioms of the multiscale information formalism which we developed in earlier work [3]. In this formalism, a system is defined by a set of components, , and an information function, , which assigns a nonnegative real number to each subset . This number is the amount of information needed to describe the components in . To qualify as an information function, must satisfy two axioms:

  • Monotonicity: The information in a subset that is contained in a subset cannot have more information than . That is, .

  • Strong subadditivity: Given two subsets, the information contained in both cannot exceed the information in each of them separately minus the information in their intersection:

    (1)

Given an information function , we can construct functions which express different kinds of possible correlations among a system’s components, such as the mutual information, which is the difference between the total information of two components considered separately and the joint information of those two components taken together:

(2)

By extension, we can also define the tertiary mutual information

(3)

This can be extended to higher scales in the same fashion, defining shared information for sets of four or more components.

One way to understand the meaning of our axioms is the following. Take a set of questions, which are all mutually independent in the sense that answering one doesn’t help to answer any other, or any combination of others. Each question pertains to one or more components of a system. Components have nonzero shared information if one or more questions pertain to both of them. In other words, if each question is represented by a point, then each component is a set of points, and set-theoretic intersection defines shared information.

As mentioned earlier, this formalism treats scale and information as complementary quantities. Often, “scale” is thought of in terms of length or time (for example, the James quote above organizes the learned disciplines essentially by the geometrical dimensions of what they study). For an axiomatic development, a more general definition is appropriate, and so for our purposes, “scale” will refer to the number of components within a system which are involved in an interrelationship.

1.2 Indices of Structure

To specify the structure of a system according to our definition, it is necessary to specify the information content of each subset . Because the number of such subsets grows exponentially with the number of components, complete descriptions of structure are impractical for large systems. Therefore, we require statistics which can convey the general character of a system’s structure without specifying it completely. Using an index of structure, we can summarize how a system is organized and compare that pattern of organization to the patterns manifested by other systems.

One such index of structure is the complexity profile, introduced in [5] to formalize the intuition that a genuinely complex system exhibits structure at multiple scales. The complexity profile is a real-valued function that specifies the amount of information contained in interdependencies of scale and higher. can be computed using a combinatorial formula which takes as input the values of the information function on all subsets [3, 4, 5, 6, 7, 8]. First, we define the quantity as the sum of the joint information of all collections of components:

(4)

The complexity profile can be computed using the formula

(5)

where is the number of components in the system. Generally, captures the amount of information contained in interrelationships of order and higher. We shall illustrate this in the next section with a few examples.

The complexity profile satisfies a conservation law: the sum of over all scales is

(6)

That is, the sum of the complexity over all scales is given by the individual information assigned to each component, regardless of the components’ interrelationships.

Another useful index of structure is the Marginal Utility of Information, or MUI [3]. While the complexity profile characterizes the amount of information that is present in the system behavior at different scales, the MUI is based on descriptive utility of limited information through its ability to describe behavior of multiple components. Informally speaking, we describe a system by “investing” a certain amount of information, and for any amount of information invested, an optimal description yields the best possible characterization of the system. The MUI expresses how the usefulness of an optimal description increases as we invest more information. We can define the MUI precisely, starting with the basic axioms of information functions, by using notions from linear programming [3].

In general outline, one constructs the MUI as follows. Let be a system, defined per our formalism as a set of components and an information function . Then, let be a descriptor, an entity which conveys information about the system . To express this mathematically, we consider the new, larger system made by conjoining with the set of components and defining an information function on the subsets of this expanded set. The information function for the augmented system reduces to that of the original for all those interdependencies which do not involve the descriptor , and it expresses the shared information between and the original system. The utility of  is the sum of the shared information between and each component within :

(7)

This counts, in essence, the total scale of the system’s organization that is captured by . We define the optimal utility as the utility of the best possible descriptor having . The MUI is then the derivative of .

How do these structure indices capture the organization of a system? We can illustrate the general idea by way of a conceptual example. Consider a crew of movers, who are carrying furniture from a house to a truck. They can be acting largely independently, as when each mover is carrying a single chair, or they can be working in concert, transporting a large item that requires collective effort to move, like a grand piano. In the former case, knowing what any one mover is doing does not say much about what specific act any other mover is engaged with at that time. Information about the crew applies at the scale of an individual mover. By contrast, in the latter case, the behavior of one mover can be inferred from that of another, and information about their actions is applicable at a larger scale. From these general considerations, it follows that for a system of largely independent movers, is large at small and drops off rapidly, whereas when the movers are working collectively, is small for low and remains nonzero for larger . When the movers act mostly independently, we cannot do much better at describing their behavior than by specifying the behavior of each mover in turn. Therefore, as we invest more information into describing the system, the gain in utility of our description remains essentially constant. For the case of independent movers, then, the MUI curve is low and flat. On the other hand, when the movers are acting in concert, a brief description can have a high utility, so the MUI curve is peaked at the origin and falls off sharply. Heuristically speaking, we can in this example think of the complexity profile and the MUI as reflections of each other. When we develop these indices quantitatively, we find in fact that this is exactly true in a broad class of systems.

Both the complexity profile and the MUI obey a convenient sum rule. If a system separates into two independent subsystems, the complexity profile of the whole is the sum of the profiles of the pieces, and likewise for the MUI. This property of both structure indices follows from the basic information-function axioms [3]. In the next section, we will see examples of systems which illustrate the sum rule for both the MUI and the complexity profile.

2 Examples

2.1 Three-Component Systems

To explore the consequences of our definitions, it is helpful to begin with simple examples. Following the recent review article about the multiscale complexity formalism [3], we study the following four systems, each of which contains three binary variables.

  • Example A: Three independent bits: The system comprises three components, and knowing the state of any one bit provides no inference about the state of any other. As a whole, the system can be in any one of eight possible configurations, with no preference given to any of the eight possibilities.

  • Example B: Three completely interdependent bits: The system as a whole is either in state or state , with no preference given to either option. Knowing the value of any one bit allows the inference of both other bits.

  • Example C: Independent blocks of dependent bits: Each component is equally likely to take the value 0 or 1; however, the first two components always take the same value, while the third can take either value independently of the coupled pair.

  • Example D: The parity bit system: Three bits which can exist in the states 110, 101, 011, or 000 with equal probability. Each of the three bits is equal to the parity (0 if even; 1 if odd) of the sum of the other two. Any two of the bits are statistically independent of each other, but the three as a whole are constrained to have an even sum.

Figures 1 and 2 show the complexity profiles and the MUI curves for these example systems.

Figure 1: Complexity profiles for the three-component example systems A, B, C and D, computed using Eq. (5). Examples A and B illustrate the general fact that highly interdependent systems have tall and narrow complexity profiles, whereas the profiles of systems with largely independent components are low and wide. Example C, which we can think of as the combination of two independent subsystems, illustrates the complexity profile’s sum rule. Finally, example D, the parity-bit system, showcases the emergence of negative shared information. Note that the total signed area bounded by each curve is 3 units. (Figure reproduced from [3].)
Figure 2: MUI plots for the three-component example systems A, B, C and D. Note that, as with the complexity profiles in Figure 1, the total area bounded by each curve is 3 units. Furthermore, for examples A, B and C, the complexity profile and the MUI are reflections (generalized inverses) of each other. This is generally true for systems which are the disjoint union of internally interdependent blocks [3], but it is not the case for the parity-bit system, example D. (Figure reproduced from [3].)

2.2 Minimal Incidence Geometry

To develop additional intuition about our information-theoretic formalism, and to build a bridge between different areas of mathematics, we shall apply the information theory of multicomponent systems to incidence geometry. The premise of incidence geometry is that one has a set of points and a set of lines which connect them, satisfying some conditions which abstract basic notions of geometry. To wit, for any incidence geometry, every line contains at least two distinct points, and for every line, there exist one or more points not lying on that line. We relate geometry to information theory in the following way: Ascribe to each point 1 unit of information, and define each line to be a system component. Then for any incidence geometry, the information ascribed to a component is always greater than or equal to 2, and the information within the whole system is always greater than 2.

The examples we shall consider from incidence geometry will illustrate most of the key features of the multiscale information theory formalism. The noteworthy exception is that incidence geometry does not provide examples of negative multivariate mutual information. This is a subtlety which can arise when one considers dependencies among three or more components [3], as we saw in example D. However, it will not be a major concern for the models from mathematical biology which we will study later in this chapter.

The simplest possible incidence geometry contains 3 points and 3 lines. We depict this construction in Figure 3.

Figure 3: The simplest possible incidence geometry. Each of the three lines contains two distinct points, and for each line, there exists exactly one point which does not lie on that line. When we associate a unit of information to each point, the shared information of any pair of lines is the information ascribed to their point of intersection.

If we denote the three lines by , and , as in Figure 3, then because each line contains exactly two points, we have

(8)

while because any two lines intersect in exactly one point,

(9)

The joint information of all three components taken together is the total number of points in the geometry:

(10)

From these three observations, we can deduce that the tertiary mutual information of the three components vanishes:

(11)

This is the information-theoretic restatement of the geometric fact that the three lines do not all come together at a single point. All together, the complexity profile of the minimal incidence geometry is given by

(12)

The information-theoretic relationships among the three system components , and can be expressed in a three-circle Venn diagram, which we depict in Figure 4.

Figure 4: Information diagram for the minimal incidence geometry depicted in Figure 3. Within each of the three circles, all of the regions which contain nonzero information are regions of overlap with another circle. This is the information-theoretic consequence of the geometrical fact that each point belongs to more than one line. Note that the central region, where all three circles overlap, contains no information.

The other structure index introduced above is the MUI. We can deduce the MUI curve of the minimal incidence geometry using the properties of the MUI established in [3]. First, a descriptor must have at least 3 units of information to capture all of the information which was granted to the geometry. Expanding on an optimal descriptor, one which wastes nothing, brings no benefit beyond a descriptor length of . Therefore, the marginal utility will equal zero for . In addition, because the integral of the MUI curve is the utility of a full description—that is, the total scale-weighted information of the system—we know the integral of the MUI for this geometry will be 6. Furthermore, we can constrain the height of the MUI curve, using the following property:

  • If there are no interactions or correlations of degree or higher—formally, if for all collections of distinct components—then for all [3, §VII.B].

Here, this means that . Furthermore, we know that is the derivative of a piecewise linear function, so is piecewise constant. For the minimal geometry, then, we expect the MUI should be

(13)

Note that the MUI curve is the reflection of the complexity profile in Eq. (12).

These properties hold for any incidence geometry: the MUI vanishes for larger than the number of points used to build the geometry, the integral of the MUI is the total scale-weighted information, and the MUI is bounded above by one plus the maximal number of lines which mutually intersect at a common point. We can relate the MUI and the complexity profile by noting that the areas bounded by both curves are the same, and moreover, the width of the MUI curve is the height of the complexity profile , because both are given by the number of points in the geometry.

2.3 Fano Plane

The Fano plane, pictured in Figure 5, has 7 points, 7 lines, 3 points on every line, and 3 lines through every point. The total scale-weighted information is the number of points per line times the number of lines, or 21. The information content of the whole system is 7, while the mutual information between any line and any other is again 1.

Figure 5: The Fano plane: a symmetrical arrangement of seven points and seven lines. The shared information between any two lines is 1, but the shared information at higher scales depends on which set of lines we choose.

For the Fano plane, there are two possible information-theoretic scenarios involving three distinct lines. If the three lines do not all meet at a common point, as for example , and , then the tertiary mutual information of those three components is zero. The other option is for the three lines to meet at a common point, as with , and , in which case the tertiary mutual information is 1.

(14)

The tertiary mutual information can never be greater than 1, because any two lines come together in one and exactly one point. This uniqueness of intersections also implies that, in the three-circle Venn diagram, the lens-shaped regions where two circles overlap always contain a total of 1 unit of information. If the inner region of triple overlap, corresponding to , contains the value 1, then the outer region, , must contain the value 0, and vice versa. In addition, the total information content enclosed by each of the three circles is always 3 units, the number of points per line. Together, these facts constrain the possible three-circle Venn diagrams for subsets of the Fano-plane system, leaving only the two possibilities depicted in Figure 6.

Figure 6: Illustrative examples of the two possible three-circle Venn diagrams for three-component subsets of the Fano-plane system. Note that the information in the central region is zero in one case but nonzero in the other.

We have in the Fano plane an elementary illustration of a commonplace occurrence in complex systems: higher-order structure which cannot be resolved into lower-order interrelationships. In this case, we see there exists a variety among triples of components which cannot be inferred from considering pairs. The Fano-plane system does not fall within the particular special classes of systems studied as examples in [3], since not all subsets of the same size have the same information content.

We can deduce the complexity profile of the Fano-plane system by counting points lying on or more lines, or by computation using Eq. (5). If we take the latter approach, with 7 components there are nonempty subsets for which we must find the information content . However, thanks to the high degree of symmetry, the information content for the different possible subsets of the component set is not that hard to work out. Because there exist 3 lines through any point, eliminating 1 line (component) can only reduce the number of lines through any point down to 2. So, the information content of any six-component set is still 7. To eliminate all lines through a point, we must erase at least three components. From considerations like these, we can deduce by explicit computation with Eq. (5). For the Fano plane, the complexity profile is given by

(15)

We can also deduce this result quickly by recalling that, generally, captures the amount of information contained in interrelationships of scale and higher. In the context of incidence geometry, the complexity profile has the geometrical interpretation as the number of distinct points which lie at the intersection of or more lines.

Applying the same properties as we used for the minimal incidence geometry, we can deduce that the MUI of the Fano plane vanishes for descriptor lengths , that the integral of is , and that for all .

3 The Dual of a Complex System

Geometry makes much use of duality. The dual of a geometrical arrangement is that arrangement which is found by interchanging the roles of points and lines in the original. For example, an affine plane of order is a specialization of an incidence geometry in which the following conditions hold [25, 26]:

  • the geometry contains points in all;

  • the geometry contains lines;

  • each line contains points;

  • each point lies on lines.

Interchanging points and lines yields a dual affine plane [25, 27], a geometry which meets the following criteria:

  • the geometry contains points;

  • the geometry contains lines;

  • each line contains points;

  • each point lies on lines.

If we translate from geometry to information theory, what is the meaning of duality? We began by saying that each point in a geometry corresponded to a unit of information, and each line was to become a component in an information-theoretic system. Applying the operation of duality, we find that each line in the original geometry should be ascribed a unit of information in the dual system, and each point in the original geometry becomes a system component in the dual.

The complexity profile of the original system is defined for values of from 1 to the number of lines in the original geometry. Therefore, the complexity profile of the dual system is defined for values of from 1 to the number of points in the original geometry. The property of “lines meeting at a common point” in the original becomes the property of “points lying on a common line” in the dual. Consequently, the complexity profile of the dual system can be found from the original geometry. The dual of lines intersecting at a point is points sharing a common line. We can find the complexity profile of the dual system by counting the number of distinct lines in the original system which contain or more points.

For the minimal incidence geometry depicted in Figure 3 and the Fano plane portrayed in Figure 5, the duality exchange operation does not change the complexity profile. We can say that for those geometries is self-dual. Considering the affine planes of order defined above, we know that each point lies on lines, so is a rectangle of width and height . In a dual affine plane of order , each of the points lie on lines, so is a rectangle of width and height . For affine planes, the duality transformation preserves the area, but not the shape, of the complexity profile.

One naturally wonders whether this property of the area under the curve is more general. We can investigate this question using the conservation law, proved in Allen et al. [3], that the area under the complexity profile is always the total scale-weighted information content of the system. For the special case of an incidence geometry, this means that

(16)

where ranges from 1 to the total number of lines in the geometry. Let be the set of all points in the geometry. (For an affine plane, thus ranges from 1 to .) The area bounded by the complexity profile is then

(17)

In this sum, the multiplicity of any is the number of lines which contain the point . Therefore,

(18)

The components of the dual system are the points of the original geometry, and the information content of each component in the dual system is the number of lines which pass through that point in the original geometry. Summing over the components of the dual system to find the total area under the dual complexity profile yields the same sum as in Eq. (18). In consequence, we can say that the duality transformation generally preserves the area under the complexity profile.

What can we deduce about the Marginal Utility of Information for affine planes and their duals? We recall that the MUI is, by construction, a piecewise constant function. The general properties we deduced earlier imply that for an affine plane of order , vanishes for , and the integral of is the number of lines times the number of lines per point, or . Furthermore, because exactly lines meet at each point, for all .

The most straightforward way to meet these requirements is with the following piecewise constant curve:

(19)

For a dual affine plane of order , the integral of is the same, . The right-hand edge is instead at , and the upper limit becomes . So,

(20)

For both affine planes and their duals, is the reflection of the complexity profile . And again, the duality transformation preserves areas but not shapes.

These self-duality properties are the information-theoretic consequences of the geometry theorem known as the principle of double counting [26]. To wit: if are the points of an incidence geometry and are its lines, then

(21)

We can prove this by observing that both sides of Eq. (21) are equal to the size of the set of ordered pairs defined by

(22)

4 Computation and Gammoids

The idea of information is closely related to that of computation, and in this section, we will explore another type of system for which we can define an information function, thereby bringing our concepts of multiscale structure into a computational context.

We can idealize a computation as a mathematical process that takes a set of inputs to a set of outputs. For example, if we have a program that takes a number and returns its square, we can represent that code pictorially as an arrow:

(23)

We have drawn the output as a filled circle, and the input as an unfilled one. The picture (23) is a diagrammatic representation for any function that acts on a single input to produce one and only one output: the sine of an angle, the number of meters in a distance specified in furlongs and so on.

Some functions take two inputs and return a single output. Given two numerical variables, for example, we could compute their sum, their product, the logarithm of one to the base given by the other, and so forth. We can represent all of these functions pictorially with the following diagram:

(24)

Alternatively, a single input might be used to create multiple outputs. For example, we might be given the time elapsed in seconds since midnight, and calculate the current hour and the minute within that hour:

(25)

There is a sense in which these outputs are not independent, for they derive from a common source. This is true even if, as is the case for separating clock time into minutes and hours, we cannot infer one output from the other.

What other patterns of information flow might we encounter as we go about our business? Given two positive integers, we can find their greatest common divisor and their least common multiple. If a function performs both of these tasks, then it has two outputs, both of which depend on the two inputs:

(26)

A computation may proceed by way of intermediate steps:

(27)

These intermediate stages may combine multiple inputs, and they can yield multiple outputs. In our diagrams, input vertices (unfilled circles) have no incoming links, and output vertices (filled circles) have no outgoing ones. We introduce intermediate vertices with both types of links, as in the following:

(28)

In this example, two pieces of input data are combined to yield an intermediate result, which is then used to compute two output values. If we saved this intermediate result, we could reconstruct the two outputs without having the original input data. It might be that we cannot deduce the left-hand output from the right, but there is still an interdependency between them, linking them by virtue of their common past.

Consider a computation with two inputs and two outputs, one of which is arrived at by means of an intermediate step:

(29)

Both of the outputs derive ultimately from the same inputs. However, they are in a way independent of each other, because knowing the result of the intermediate step lets us compute one output but not the other. Diagrams with multiple layers of intermediate steps are also possible, and they have a straightforward interpretation in terms of computations that proceed in stages.

These considerations motivate the following idea: The effective amount of information associated with a set of outputs is the number of inputs and/or intermediate values which one must know in order to produce those outputs. We are, after a fashion, giving a mathematical form to the notion that the difficulty of a computational task is how arduous it would be to recover from a crash!

All of our diagrams have taken the form of directed graphs. Each one is a set of vertices, connected by a set of edges, where each edge carries an indication of which vertex is its source and which is its target. In addition, we have distinguished subsets of the vertex set, marking each vertex as input, intermediate or output. Recognizing this, we can develop the idea more generally.

Let be a directed graph, or “digraph” for short. Designate a subset of its vertices as inputs, and select a subset of vertices to be the outputs. We can think of as the set of “sources” for information flow, and as the set of its “targets.” (In the digraphs we have drawn, the vertices in  have no incoming edges, and those vertices in have no outgoing edges. This is sensible, but it turns out not to be essential for proving the key properties of the information function.) The information content of a subset is the size of the smallest set of vertices having the property that all paths from  to  must pass through it. The size of this “minimal separating set” tells us how many intermediate variables we would need to save in order to compute the outputs in .

It can be proven that the function satisfies our axioms for being an information function. Consequently, the function yields sensible expressions for the interdependence of output variables. Having defined , we can construct the indices of multiscale structure and as we did before.

For example, take the graph

(30)

Here, we have one output, which we label , and we see directly that .

A graph with two outputs might look like the following:

(31)

In this case, , while and . The shared information between outputs and is

(32)

corresponding nicely to the fact that and depend upon one common input.

A more involved two-output graph could include an intermediate step:

(33)

Here, , while . The shared information between and is now , because the intermediate stage “shields” output from the original input.

What we have constructed here is known in pure mathematics as a gammoid, and our information function is the rank function of that gammoid [28]. A gammoid is an example of a matroid, a structure defined as a set of elements and a rank function that assigns a nonnegative integer to each subset of . Matroid rank functions are monotonic and strongly subadditive, so they count as information functions. As we have seen in this section, matroid theory furnishes examples of mathematical entities to which we can apply the ideas of multiscale structure.

5 Network Dynamics

We can apply the multiscale complexity formalism quantitatively to a model which is an idealized representation of multiple interesting biological scenarios. Doing so requires relating the concepts of probability and information; we provide a more detailed development of the mathematical prerequisites in Chapter \thechapter.

One way to make progress on many biological problems is to make the approximation that each component of a system can, at any given time, be in one of two mutually exclusive states of being. In essence, we idealize a phenomenon by treating it as composed of binary random variables, possibly correlated. We might postulate that each organism in a population can follow one of two survival strategies. For example, a male bower bird can maraud, attacking other birds’ bowers, or it can remain at its own bower, guarding its own mating display from marauders [29, 30]. Or, we might postulate that a gene comes in two variant forms. Each instance of the gene in the idealized population is then a binary random variable. We can make an analogous approximation when modeling social and economic systems. For example, an individual voter can choose between one of two political parties. Or, in a simplified but still instructive model for a stock market, the price of a company’s stock can be going either up or down [31].

A specific implementation of this idea is the Moran model, which was originally formulated in biology but can be applied more broadly [31]. Consider a haploid population of individuals, and a gene which comes in two alleles. The genetic character of the population can change as individuals are born and die. One simple dynamical model for this process picks an individual at random with each tick of a discrete-time clock. The chosen, or focal, individual mates with one of the other organisms and produces an offspring, which then takes the place of the focal individual. The allele carried by the offspring is that carried by one of its two parents, the choice of which parent being made randomly with equal probability either way.

Reframing the Moran model in network-theory language turns out to be convenient for developing extensions, such as treatments including structured populations, wherein mating is not uniformly random. Furthermore, doing so broadens the range of systems to which the mathematics can be applied: moving away from the specifically biological terminology makes it more explicit that the Moran model can be applied equally well to biological evolution or to social dynamics [31].

The components of our system will be the nodes of a network. Each node is a random variable which can take the values 0 and 1. In addition to these nodes, we augment the system with a number of nodes whose states are all fixed at 0, and a quantity of nodes whose states are fixed to be 1.

At each time step, we pick one of the variable nodes at random. We then choose, stochastically, whether or not to change that node’s value. With probability , we keep the node value the same, and with probability , we assign to it the value of another node, chosen at random from a pool of candidates. This pool contains both the neighborhood of the node in the network topology and the fixed nodes. In this way, the fixed nodes represent the possibility of mutation: even if all the dynamical population has allele 1, there remains the opportunity of picking up a 0, and vice versa. For a complete graph, the steady-state behavior of this dynamical system can actually be found analytically [32]. The probability that exactly nodes out of the whose value can vary will be in state 1 is

(34)

where the normalization constant is given by

(35)

We illustrate for networks of nodes and different values of  and in Figure 7. The function is an example of the beta-binomial distribution, which is significant in probability theory for reasons we will return to in Chapter \thechapter.

Because the gamma function can take noninteger values, we can compute the probability even for nonintegral and . This is useful if we wish to examine the low-mutation-rate limit.

Figure 7: Probability that exactly nodes out of 10 will be in the 1-state, for different values of and . Red (solid): ; green (dashed): ; blue (dash-dotted): ; black (dotted): and .

If the network topology is that of a complete graph, then the system has exchange symmetry, an invariance under permutations which simplifies the calculation of structure indices [3]. This simplification follows from the fact that if exchange symmetry holds, all subsets having the same number of components can be taken to contain the same quantity of information. Formally, for each set , the information of is a function of the cardinality , which we can write as a subscript, .

Recalling that the complexity profile indicates the information in dependencies of scale and higher, the information specific to scale is

(36)

The sum of over all scales is . For any fixed scale , the complexity is (up to a prefactor) the binomial transform of the sequence .

(37)

We can calculate from the probability distribution , as given by Eq. (34). Knowing , we can compute , from which we can easily find the complexity profile . Figure 8 illustrates the results. We see that depends upon the numbers of fixed influence nodes, and . When is concentrated at , the nodes are changing their values almost independently of one another. This is the case, for example, when we set . For those parameter values, the external influences are stronger than those of the variable nodes upon each other, while being equally balanced in both directions. This creates a situation in which knowing the status of any one variable node provides very little information about the status of any other. On the other hand, when is elevated at larger values of , then nonnegligible amounts of information apply at higher scales. This occurs when the external influences are weaker than the internal dynamics, causing the variable nodes to act collectively.

Figure 8: Complexity profiles for the cases illustrated in Figure 7. Each curve is normalized so that the total area under it is 1. Elevated complexity at larger indicates collective behavior at larger scales. Red (solid): ; green (dashed): ; blue (dash-dotted): ; black (dotted): and .

6 Frequency-Dependent Moran Process

One fundamental fact of evolutionary biology is that the environment of an organism consists in large part of other organisms. A simple, albeit approximate, way to represent the configuration of an ecosystem is by specifying the frequencies of abundance or population densities for the species which are present. (We will consider more sophisticated approximations, and the hazards of oversimplified representations, in later chapters.) In this context, we can speak of frequency-dependent fitness: the success of an organism type or an evolutionary strategy can be a function of the current population densities.

The simplest kind of frequency dependence is a linear relationship between population density and fitness. As before, we consider two varieties, and we keep the total population size constant, so the frequencies of both types can be given in terms of a single variable . We take the reproductive rates of type-0 and type-1 organisms to be given by

(38)
(39)

The coefficient is the payoff which a type- player gains by playing with a type- player. Different values of the matrix represent different interactions between evolutionary strategies.

In order to apply Shannon information theory, we need a probability distribution. A convenient and meaningful one for these purposes is the mutation-selection equilibrium distribution, which is the steady state of the frequency-dependent Moran process. We can find this distribution numerically by iterating the appropriate update rule, which we can represent as multiplication by a transition matrix. The next step is to construct this matrix. Having done so, we will be able to compute and thence obtain the complexity profile, as before. The result will typically depend both upon the payoff matrix and on the mutation rate.

Let the total population size be , and let denote the number of type-1 individuals. We suppose that reproduction is imperfect, with mutations occurring at rate . That is, an offspring inherits its parent’s type with probability , while with probability , we pick the offspring’s type at random. A nonzero mutation rate implies that the population does not have to get stuck in a uniform configuration: even if all individuals have the same type, an error in reproduction can create an organism of the opposite type in the next generation. This is a necessary requirement for having a steady-state probability distribution which is not concentrated entirely at  or .

To find the steady-state probability distribution for , we first need to calculate the probabilities that will increase or decrease. In the frequency-dependent Moran process [33], the probability that will decrease by 1 is

(40)

And the probability of increasing by 1 is

(41)

With these equations, we can find the steady-state probability distribution , which will depend on the payoff matrix and the mutation rate . (For the present purposes, a numerical computation will suffice.) Knowing , we can as before find the complexity profile . The resulting curve tells us about the scales of organization which arise within the population as a consequence of the evolutionary game dynamics.

To connect with the literature [33], we carry out this calculation for the payoff matrix

(42)

which defines an instance of the Prisoner’s Dilemma. One application of this to biology is the case of the bower birds mentioned earlier [29, 30]. Simplifying somewhat, a male bower bird has two strategies available to it: to guard its own bower, or to maraud and attempt to damage others. Designate guarding as strategy 0 and marauding as strategy 1. The matrix element denotes the payoff to a bird following strategy against an opponent who plays strategy . In this example, a guardian (row 0) who plays against a marauder (column 1) obtains a score of 4. The highest payoff is , the score obtained by a marauder who plays against a guardian. In fact, it is better to maraud than to guard, when facing either kind of foe:

(43)

So far, it looks like the thing to do is to maraud. However, the payoff obtained when both birds follow this strategy is , which is less than the payoff they would have obtained if they had both stayed home.

The particular choice of numbers here is arbitrary, but the relationships between the numbers are representative of typical conditions in the wild. As Gonick [30] summarizes, “Seemingly forced by the game’s logic into a hostile strategy, they end up worse off than if they had only cooperated!” A wide variety of biological scenarios can be considered as examples of this game [34, 35]. A primary concern is to identify the conditions under which cooperation (for example, both bower birds guarding rather than marauding) is evolutionarily favorable.

This type of situation is designated a “Prisoner’s Dilemma” because it is usually introduced with an example of two people apprehended for a crime and interrogated by the police. Each player can choose to say nothing, or to inform on the other player. The payoff matrix is such that it is better to inform than to stay silent, whatever option the other player takes; however, if both players keep quiet, they fare better than if they both inform on each other.

Figure 9 shows the probability distribution for the Moran process in mutation-selection equilibrium with this payoff matrix, given two different mutation rates. Note that the effect of varying the mutation rate is quite dramatic. As before, we can compute the complexity profile, which we plot in Figure 10.

Figure 9: Equilibrium probability that exactly agents out of 10 will be in the 1-state (marauding), for the Prisoner’s Dilemma game defined by Eq. (42). Blue (thinner line): ; red (thicker line): .
Figure 10: Complexity profiles for the cases illustrated in Figure 9. Each curve is normalized so that the total area under it is 1. Elevated complexity at larger indicates collective behavior at larger scales. Blue (thinner line): ; red (thicker line): .

7 Multiscale Challenges and Evolution

In the previous section, we considered the scales of organization which can arise as an evolutionary process develops stochastically. We can also apply our mathematical formalism of multiscale structure to other aspects of evolutionary theory. The following is excerpted from an article by Allen, Bar-Yam and myself [3].

The discipline of cybernetics, an ancestor to modern control theory, used Shannon’s information theory to quantify the difficulty of performing tasks, a topic of relevance both to organismal survival in biology and to system regulation in engineering. Cyberneticist W. Ross Ashby considered scenarios in which a regulator device must protect some important entity from the outside environment and its disruptive influences [36]. In Ashby’s examples, each state of the environment must be matched by a state of the regulatory system in order for it to be able to counter the environment’s influence on a protected component. Successful regulation implies that if one knows only the state of the protected component, one cannot deduce the environmental influences; i.e., the job of the regulator is to minimize mutual information between the protected component and the environment. This is an information-theoretic statement of the idea of homeostasis. Ashby’s “Law of Requisite Variety” states that the regulator’s effectiveness is limited by its own information content, or variety in cybernetic terminology. An insufficiently flexible regulator will not be able to cope with the environmental variability. A multiscale extension of Shannon information theory provides a multiscale cybernetics, with which we can study the scenarios in which “that which we wish to protect” and “that which we must guard against” are each systems of many components, as are the tools we employ for regulation and control [4, 5, 6].
Multiscale information theory enables us to overcome a key limitation of the requisite variety concept. In the examples of traditional cybernetics [36], each action of the environment requires a specific, unique reaction on the part of the regulator. This neglects the fact that the impact which an event in the environment has on the system depends upon the scale of the environmental degrees of freedom involved. There is a great difference between large-scale and fine-scale impacts. Systems can deflect fine-scale impacts without needing to specifically respond to them, while they need to respond to large-scale ones or perish. For example, a human being can be indifferent to the impact of a falling raindrop, whereas the impact of a falling rock is much more difficult to neglect, even if specifying the state of the raindrop and the state of the rock require the same amount of information. An extreme case is the impact of a molecule: air molecules are continually colliding with us, yet the only effects we have to cope with actively are the large-scale, collective behaviors like high-speed winds. Ashby’s Law does not make this distinction. Indeed, there is no framework for the discussion due to the absence of a concept of scale in the information theory he used: Each state is equally different from every other state and actions must be made differently for each different environment.
Thus, in order to account for the real-world conditions, a multiscale generalization of Ashby’s Law is needed. According to such a Law, the responses of the system must occur at a scale appropriate to the environmental change, with larger-scale environmental changes being met by larger-scale responses. As with the case of raindrops colliding with a surface, large-scale structures of a system can avoid responding dynamically to small-scale environmental changes which cause only small-scale fluctuations in the system.
Given a need to respond to larger-scale changes of the environment, coarser-scale descriptions of that environment may suffice. A regulator that can marshall a large-scale response can use a coarse-grained description of the environment to counteract large-scale fluctuations in the external conditions. In this way, limited amounts of information can still be useful. To make requisite variety a practical principle, one must recognize that information applies to specific scales.
Ashby aimed to apply the requisite variety concept to biological systems, as well as technological ones. An organism which lacks the flexibility to cope with variations in its environment dies. Thus, a mismatch in variety/complexity is costly in the struggle for survival, and so we expect that natural selection will lead to organisms whose complexity matches that of their environment. However, “the environment” of a living being includes other organisms, both of the same species and of others. Organisms can act and react in concert with their conspecifics, and the effect of any action taken can depend on what other organisms are doing at the same time [37]. In some species, such as social insects [38], distinct scales of the individual, colony and species are key features characterizing collective action. This suggests a multiscale cybernetics approach to the evolution of social behavior: We expect that scales of organization within a population—the scales, for example, of groups or colonies—will evolve to match the scales of the challenges which the environment presents. Furthermore, the concept of multiscale response applies within the individual organism as well. Multiple scales of environmental challenges are met by different scales of system responses. To protect against infection, for example, organisms have physical barriers (e.g., skin), generic physiological responses (e.g., clotting, inflammation) and highly specific adaptive immune responses, involving interactions among many cell types, evolved to identify pathogens at the molecular level. The evolution of immune systems is the evolution of separate large- and small-scale countermeasures to threats, enabled by biological mechanisms for information transmission and preservation [39]. As another example, the muscular system includes both large and small muscles, comprising different numbers of cells, corresponding to different scales of environmental challenge (e.g., pursuing prey and escaping from predators versus chewing food) [40].

In Chapter \thechapter, we will use evolutionary game theory to understand one example of multiscale requisite variety: a scenario in which reproductive success depends on multiple organisms acting in concert.

Chapter \thechapter Host–Consumer Evolution by Simulation

8 Introduction

Spatial extent is a complicating factor in mathematical biology. The possibility that an action at point A cannot immediately affect what happens at point B creates the opportunity for spatial nonuniformity. This nonuniformity must change our understanding of evolutionary dynamics, as the same organism in different places can have different expected evolutionary outcomes. Since organism origins and fates are both determined locally, we must consider heterogeneity explicitly to determine its effects. We use simulations of spatially extended host–pathogen and predator–prey ecosystems to reveal the limitations of standard mathematical treatments of spatial heterogeneity. Our model ecosystem generates heterogeneity dynamically; an adaptive network of hosts on which pathogens are transmitted arises as an emergent phenomenon. The structure and dynamics of this network differ in significant ways from those of related models studied in the adaptive-network field. We use a new technique, organism swapping, to test the efficacy of both simple approximations and more elaborate moment-closure methods, and a new measure to reveal the timescale dependence of invasive-strain behavior. Our results demonstrate the failure not only of the most straightforward (“mean field”) approximation, which smooths over heterogeneity entirely, but also of the standard correction (“pair approximation”) to the mean field treatment. In spatial contexts, invasive pathogen varieties can prosper initially but perish in the medium term, implying that the concepts of reproductive fitness and the Evolutionary Stable Strategy have to be modified for such systems.

Mathematical modeling of biological systems involves a tradeoff between detail and tractability. Here, we consider evolutionary ecological systems with spatial extent—a complicating factor. Analytical treatments of spatial systems typically treat as equivalent all configurations with the same overall population density, the same allele frequencies, the same pairwise contact probabilities or the like. For ease of analysis, one seeks a simplified analytical model, which coarse-grains “microstates” (the complete specification of each organism) to “macrostates” (characterized by quantities like average densities), allowing one to make useful predictions about the model’s behavior [41, 42]. Corrections to simple coarse-grainings can quickly generate an overbearing quantity of algebra. It is fairly well appreciated that the simplest approximations break down in the spatial context. What is less acknowledged and not yet systematically understood is that the extensions of the simpler approximations also fail. Before exhausting ourselves with ever-more-elaborate refinements, it would be useful to have some understanding of when a particular series of approximations is doomed to inadequacy.

In this chapter, we study the context in which commonly-used coarse-grainings can be expected to fail at capturing the evolutionary dynamics of an ecosystem, and in addition we provide a novel, direct demonstration of that failure. The fundamental issue is spatial heterogeneity, a long-recognized concern for mathematical biology [43, 44]. When does spatial heterogeneity significantly impact the choice of appropriate mathematical treatment, and when does a chosen mathematical formalism not capture the full implications of spatial variability? We show that one can test a treatment of heterogeneity by transplanting organisms within a simulated ecosystem in such a way that, were the treatment valid, the modeled behavior of the ecosystem over time would remain essentially unchanged. We demonstrate situations where the system’s behavior changes dramatically and cannot be captured by a conventional treatment. The complications we explore imply that short-term descriptions of what is happening in an evolutionary ecological model can be insufficient and, in fact, misleading, with regard not just to quantitative details but also to qualitative characteristics of ecological dynamics.

Many modeling approaches in mathematical biology which appear distinct at first glance turn out to be describing the same phenomenon with different equations [45, 46, 47]. What matters for our purposes is not so much which technique is chosen, but whether the underlying assumptions do, in fact, apply.

“Mean-field theory” is a term from statistical physics [48, 49] which has been adopted in ecology [50, 51, 52], referring to an approximation in which each component of a system is modeled as experiencing the same environment as any other. This implies that the probability distribution over all possible states of the system factors into a product of probability distributions for individual components. An example in population genetics is the assumption that a population is panmictic. That is, if a new individual in one generation has an equal chance of receiving an allele from any individual in the previous generation, then we can approximate the ecosystem dynamics using only the proportion of that allele, rather than some more complicated representation of the population’s genetic makeup. Modeling evolution of that population as “change in allele frequencies over time” (per, e.g., [53, 45]) is, implicitly, a mean-field approximation [54]. The mean-field approximation is also in force if one postulates that an individual organism interacts with some subset, chosen at random, of the total population, even if the form and effect of interactions within that subset are complicated (as in, e.g., [55, 56]).

It is well known that real species are not necessarily panmictic. However, many treatments which acknowledge this are still mean-field models. The textbook way of incorporating geographical distance into a population-genetic model is to divide the system into local subpopulations, “islands,” connected via migration [57, 58, 59]. Within each subpopulation, distance is treated as negligible, and organisms are well mixed [60, 44]. This approach makes a simplifying assumption that there is a single distance scale below which panmixia prevails [61], and it relies on well-defined boundaries between panmictic subpopulations which persist over time [60]. Furthermore, the connections among subpopulations are frequently taken to have the topology of a complete graph, i.e., an organism in one subpopulation can migrate to any other with equal ease [60, 44, 58, 59]. In this case, each of the subpopulations do experience the same environment, to within one part in . Thus, the mean-field approximation is in force at the island level, and the island model incorporates spatial extent without incorporating a full treatment of spatial heterogeneity. For real ecosystems [62, 63, 61, 64], one or more of these simplifying assumptions can fail. Long-distance migration is often thought to return a spatial ecosystem to a well-mixed form, but if organisms’ migration habits are themselves adaptive, this is not necessarily so [65]. More complicated population structures require more sophisticated mathematical treatments of evolution, a fact which has mathematical consequences, but more importantly has real-world implications for practical issues like the evolution of drug-resistant diseases [66].

Where mean field approximations fail, “higher order” approximations may be employed. Rather than individual organisms or islands, a pair approximation considers pairs of organisms or pairs of spatial regions in average contexts. However, this approximation can also fail when local contexts of groups do not reflect the overall system behavior due to heterogeneity across larger domains. Patches of distinct genetic composition in different parts of a spatial system that are well separated cannot be treated correctly by such approximations. Quantitative analyses confirm this inadequacy. We introduce a new approach to analyzing such approximations by swapping pairs of organisms in a way that preserves the pair description. For spatial systems, such swapping events violate the spatial separation between patches and changes the evolutionary behavior of the system. The swapping method therefore serves as a direct test of the (in)adequacy of the pair approximation. For evolution on random networks of sites that do not embody large spatial distances, the pair approximation can work and the swapping test does not change measures of evolutionary dynamics. However, such networks do not capture important properties of spatial heterogeneity.

As one of the key properties of spatial extent is the propagation of organisms from one part of the space to the other over long distances, we show that important insights can be gained by considering models of percolation. Percolation describes the physical propagation of, e.g., fluids through a random medium. In certain limits the evolutionary behavior of spatial systems can be mapped onto percolation behavior, demonstrating that investigations of such systems which go beyond mean-field or scaling studies are relevant to evolutionary dynamics. This and other advances that go beyond the mean field are necessary to fully describe spatial evolutionary dynamics as they are necessary for the description of many physical systems of spatial extent. The complexities of spatially extended evolutionary dynamical systems beyond the prototypical problem of percolation create new demands and opportunities for advancing our insight into the dynamics of heterogenous systems and their implications for evolution.

9 Model and Methods

We make the issue of spatial heterogeneity concrete by focusing on a specific model of ecological and evolutionary interest. We take a model of hosts and consumers interacting on a 2D spatial lattice. Each lattice site can be empty (0), occupied by a host () or occupied by a consumer (). We use the term consumer as a general label to encompass parasites, pathogens and predators. Where convenient for examples, we will specialize to one or another of these terminologies. Hosts reproduce into adjacent empty sites with some probability per site, taken as a constant for all hosts. Consumers reproduce into adjacent sites occupied by hosts, with probability per host; sometimes is fixed for all consumers, but we also consider cases in which it is a mutable parameter passed from parent to offspring. We will refer to  as the transmissibility. Hosts do not die of natural causes, while consumers perish with probability per unit time (leaving empty sites behind). Because consumers can only reproduce into sites where hosts live, the effective graph topology of reproductively available sites experienced by the consumers is constantly changing due to their very presence. This makes the ecosystem an adaptive network, a system in which the dynamics of  a network and the dynamics on that network can occur at comparable timescales and reciprocally affect one another [67, 68, 69, 70, 71]. In this model, dynamics can be highly complex, including spatial cascades of host and consumer reproduction. Even when a quasi-steady-state behavior emerges, as we shall see, it is a consequence of fluctuations over extended space and time intervals.

Several different types of biological interactions can be treated by this modeling framework. Hosts could represent regions inhabited by autotrophs alone, while consumers represent regions containing a mixture of autotrophs and the heterotrophs which predate upon them [72]. Alternatively, host agents could represent healthy organisms, while consumers represent organisms infected with a parasite or pathogen. Thus, host–consumer models are closely related to Susceptible–Infected–Recovered (SIR) models, which are epidemiological models used to understand the spread of a disease through a population. SIR models describe scenarios in which each individual in a network is either susceptible (S) to a pathogen, infected (I) with it, or recovered (R) from it; susceptible nodes can catch the disease from infected neighbors, becoming infected themselves, while nodes which have become infected can recover from the disease and are then resistant against further infection. Susceptible, infected and recovered individuals roughly correspond to hosts, consumers, and empty cells, respectively. An important difference between host–consumer models and epidemiological models concerns the issue of reinfection. In the host–consumer model, an empty site left behind by a dead consumer can be reoccupied by another consumer, but only if a host reproduces into it first. Other research has considered models where R[ecovered] individuals can also become I[nfected], with a different (typically lower) probability than S[usceptible] ones, thereby incorporating imperfect immunity into the model [73, 74]. The degree of immunity is independent of geography and the environment of the R[ecovered] individual, unlike reoccupation in the host–consumer model. Another application is illustrated by the Amazon molly, Poecilia formosa, which is a parthenogenetic species: P. formosa, all of which are female, reproduce asexually but require the presence of sperm to carry out egg development. (This kind of sperm-dependent parthenogenesis is also known as gynogenesis.) P. formosa are thus dependent on males of other species in the same genus—usually P. mexicana or P. latipinna—for reproduction. Because P. formosa do not incur the cost of sex, they can outcompete the species on which they rely, thereby possibly depleting the resource they require for survival, i.e., male fish [58, 75]. Thus, hosts could be regions containing sexual organisms, with consumers standing for areas containing both sexual and asexual individuals [58].

Figure 11: Snapshots of a simulated host–consumer ecosystem on a lattice, taken at intervals of 100 generations. Consumers are dark gray (red online), hosts are light gray (green online) and empty space is left white. The simulation began with a single consumer at the center of the lattice, which gave rise to an expanding front of consumers. The first image in this sequence shows the state of the ecosystem 100 generations into the simulation. Hosts which survive the consumer wave recolonize the empty sites, leading to pattern formation. Here, the host growth rate is , the consumer death rate is and the consumer transmissibility is fixed at .

This host–consumer model displays waves of colonization, consumption and repopulation. Hosts reproduce into empty sites, and waves of consumers follow, creating new empty regions open for host colonization. Therefore, clusters of hosts arise dynamically [76, 77, 78, 79], a type of pattern formation which can separate regions of the resources available to pathogens into patches without the need for such separation to be inserted manually. Figure 11 illustrates a typical example of this effect. This is a specific example of the general phenomenon of pattern formation in nonequilibrium systems [54]. Consumers are ecosystem engineers [80, 81, 82, 83, 84, 85] which shape their local environment: an excessively voracious lineage of consumers can deplete the available resources in its vicinity, causing that lineage to suffer a Malthusian catastrophe [76, 86, 87, 58, 88, 89, 90, 64]. Because the ecology is spatially extended, this catastrophe is a local niche annihilation, rather than a global collapse [91]. A mutant strain with a high transmissibility can successfully invade in the short term but suffer resource depletion in the medium term, meaning that in a population where consumer transmissibilities evolve, averages taken over long numbers of generations yield a moderate value [50, 92]. This implies that an empirical payoff matrix or reproduction ratio will exhibit nontrivial timescale dependence [93, 94, 50].

This model is distinct from another approach to studying evolutionary dynamics in spatial contexts, that of evolutionary game theory. Game-theoretic models of spatially structured populations have been explored at great length. These investigations have found that breakdowns of mean-field approximations are commonplace. However, evolutionary game theory has its own simplifying assumptions. The vast majority of studies consider only two-player games. Population size is usually taken to be constant, and population structure is typically fixed in place. In game-theoretic models, the benefits and costs of different organism behavioral traits are parameters whose values are chosen by the modeler. By contrast, “benefits” and “costs” in host–consumer models are emergent properties which depend on interactions over many generations. Population size is not fixed, and population structure is dynamical: the environment in which different consumer varieties compete changes stochastically, in ways affected by their presence.

10 Results

10.1 Evolution of Transmissibility

We investigate evolution in the spatial host–consumer ecosystem through simulation and analytic discussion. If the transmissibility is made a heritable trait, passed from a consumer to its offspring with some chance of mutation, what effect will natural selection have on the consumer population? Figure 12(A) shows the average, minimum and maximum values of the transmissibility observed in a population over time. The average tends to a quasi-steady-state value dependent on the host growth rate and the consumer death rate ; if the simulation is started with set to below this value, the average will increase, and likewise, the average will decrease if the consumer population is initialized with over the quasi-steady-state value. Even when the average has achieved its quasi-steady-state value, the population displays a wide spread of transmissibilities whose extremes fluctuate over time [72].

Figure 12: (A) Minimum (blue), average (green) and maximum (red) transmissibility for a consumer population over time, with  and . (The mutation rate is and the step size is , as was used in reference [72].) The average tends to a quasi-steady-state value dependent on  and ; if the simulation is started with set to below this value, the average will increase, and likewise, the average will decrease if the consumer population is initialized with over the quasi-steady-state value [72]. The horizontal dotted line indicates the threshold value of  which, in a mean-field model, is the smallest value at which a consumer population can sustain its numbers. The dashed line indicates the value to which would trend in a well-mixed ecosystem. (B) Minimum, average and maximum as a function of , with . The dotted line shows the minimum sustainable as predicted by mean-field approximation. Each point is found by averaging over 15,000 timesteps. Error bars indicate one standard deviation.

In a well-mixed ecosystem, the average of the population will tend to 1, maximizing the reproductive rate of the individual consumer. This occurs because each consumer on average experiences the same environment as any other, and thus has the same number of hosts available to reproduce into. A consumer with a higher has a higher reproduction rate and therefore evolutionary dominance up to the highest possible value, 1. The observation of a quasi-steady-state value below 1 is an important result. This is the first breakdown of the mean-field approximation, and it indicates the inapplicability of traditional assumptions about fitness optimization, with implications for the origins of reproductive restraint, communication-based altruism and social behaviors in general [93, 94, 72, 95, 91, 50].

One can avoid tending to 1 in a panmictic system by imposing some extra constraint, such as a tradeoff between transmissibility and lethality, where higher transmissibility becomes impossible due to lethality that prevents transmission. This tradeoff between infectiousness and lethality can be considered as a within-host version of resource overexploitation that here occurs at the population level. Such within-host tradeoffs are difficult to establish empirically in living populations [96, 97]. Often, one lacks pertinent information, such as the functional relationship between pathogen load and disease transmission probability, or the extent to which empirical proxies for pathogen load predict actual host mortality [98]. An empirical observation of low virulence should not by itself be taken as evidence that a tradeoff exists: it may well be that another condition, such as panmixia, fails to obtain. The behavior of spatial models makes clear that the relevant scale of the limiting factor is not necessarily within the individual host.

Another difference between spatial and nonspatial host–consumer systems is the rate at which consumers must reproduce in order to sustain their population. One can calculate the minimum sustainable value of  in the mean-field approximation [99] by balancing the birth and death rates. If the host population is small compared to the total ecosystem size, then the minimum sustainable is the value which satisfies , where is the number of neighbors adjacent to a site. For the parameters used in Figure 12(A), this value would be 0.05, which is substantially smaller—by a factor of 4—than the lowest seen in the evolving spatial population. Consumer populations with  at the mean-field threshold are not sustainable in the spatial case. This is easily verified by numerical simulations or by using the mean-field equations for the host–consumer dynamics [100, 95, 101]. Stochastic fluctuations suppress the active phase, i.e., the range of parameter values which permit a living consumer population is reduced [99].

To gain insight into this phenomenon, we study the case of fixed by means of numerical simulations. We fill the lattice with hosts and inject a single consumer with a of our choice; then, we observe how long the descendents of that consumer persist as a function of . The consumer population does not persist when is either too low or too high. Figure 13 shows the probability that a consumer strain will survive for a substantial length of time (2000 generations) after injection into a lattice filled with hosts. This probability is hump-shaped, with an asymmetric plateau bounded above and below by cutoffs.

Figure 13: Probability of a consumer strain surviving 2000 generations after injection at a single point in a lattice filled with hosts. (Computed with 1000 runs per point. Host reproduction rate , consumer death rate .) Vertical dotted line shows the sustainability threshold found through the mean-field approximation.

To understand the upper cutoff visible in Figure 13, i.e., the value of above which the consumer population again becomes unsustainable, consider the limiting scenario where . If hosts do not reproduce into available empty sites, our system reduces to an epidemic process which has been studied before [102, 74]. Below the transition point at , a consumer injected into a lattice of hosts will produce a consumer strain (which we can think of as an infection) which survives for a finite number of generations and then dies out, leaving the lattice filled with hosts (susceptibles) marred by a small patch of empty sites (recovered individuals). Above the transition point, a single consumer gives rise to an expanding wave of consumers which propagates over the lattice, leaving empty sites in its wake, until it consumes all the hosts in the ecosystem. This regime is known as annular growth. No finite ecosystem can sustain annular growth indefinitely. If the host growth rate is made nonzero, then hosts can recolonize sites left empty by the expanding consumer population, opening the possibility of host–consumer coexistence in an ecosystem of dynamically formed and re-formed patches. Figure 11 illustrates an example of this phenomenon. This is a specific example of the general phenomenon that, even far from phase transitions, nonequilibrium systems can display pattern formation [54].

We can, therefore, interpret the upper cutoff on consumer sustainability as a Malthusian catastrophe due ultimately to the limited amount of available hosts [95]. In physics jargon, this cutoff is a finite-size effect. This is the key to understanding what happens when multiple types of consumer are present on the same lattice, and in particular the case we study in the next section, where an invasive consumer variety is introduced to an ecosystem where native hosts and consumers have already formed a dynamic patch distribution. The environment experienced by the invasive variety is that formed by the native species, and the “finite size” of the resources available to the invasive variety is not the size of the whole lattice, but that of a local patch [95].

10.2 Timescale Dependence of Invasion Success

Figure 14: (A) Survival probability as a function of time for five scenarios: injecting mutants with the same transmissibility as the native consumers, three examples of injecting mutants with transmissibility higher than the of the native consumers, and an example of injecting mutants with lower than the native population. (B) Time intervals during which the survival-probability curves for the native and invasive strains overlap. indicates the difference between the invasive and native transmissibilities. The closer the mutant trait value is to the resident, the greater the duration of time over which the survival-probability curves for the native and mutant strains overlap. Here, overlap is defined by probabilities being coincident at the 95% confidence level; using other overlap criteria gives qualitatively the same results. Inset: magnified view of the region.

A key question about an ecological system is whether a new variety of organism, having a different genetic character and phenotypic trait values, can successfully invade a native population. If a mutant consumer strain with fixed transmissibility can successfully invade a population of transmissibility , then we expect the time-averaged value of  seen in the evolving system to be larger than . To investigate this, we simulate scenarios where the native population has close to the average value seen in the evolutionary case described above. We then inject a mutant consumer strain with significantly larger and study the results. For a typical example, we see from Figure 12(A) that when and , the average is approximately 0.33. So, we simulate mutants entering an ecosystem whose native population has . Initially, the mutants prosper, but they ultimately fail to invade. As shown in Figure 14, the probability of a strain surviving for tens of generations after injection is larger than that of a strain. That is, mutants with the higher can out-compete the neutral case. However, after generations, the survival-probability curves cross. Observed over longer timescales, the mutant strain is less successful than the native variety. This pattern is consistent for : the average transmissibility seen in the evolutionary case stands up to invasive varieties. This key result manifests the distinctive properties of the spatial structure of the model. The underlying reason for this result is that the mutants encounter the resource limitations imposed by the patchy native population. Over short timescales, the mutant strain enjoys the resources available within the local patch, consuming those resources more rapidly than can be sustained once it encounters the limitations of the local patch size. In this way, the initial generations of the mutant strain “shade” their descendants. Thanks to descendant-shading, short-term prosperity is not a guarantee of medium- or long-term success.

This is to be contrasted with what happens in a well-mixed ecosystem. In the well-mixed scenario, consumer strains with higher successfully invade and displace the native population with a high probability. The invasion success is consistent with the dynamics of a continuously evolving ecosystem. If is made an evolvable trait in simulated panmictic systems, the average of the population will tend to 1, as predicted by the mean-field analytic proof. There is no difference in a well-mixed scenario between short-term and long-term success. Descendant-shading does not occur in the well-mixed case. This follows from the lack of distinction between local patches and large-scale structure.

One common measure of evolutionary success is the expected relative growth rate of the number of offspring of a mutant individual within a native population, i.e., the relative growth rate of a mutant strain. This rate, known as the invasion fitness, is often used to investigate the stability of an evolutionary ecosystem [103, 104, 51]. If the invasion fitness is found to be positive, the native variety is judged to be vulnerable to invasion by the mutant. Conversely, if the invasion fitness is found to be negative, the native variety is deemed to be stable. For the spatial host–consumer ecosystem, this method gives qualitatively incorrect predictions for evolutionary dynamics.

Our investigation builds on earlier work which studied the timescale dependence of fitness indicators in spatial host–consumer ecosystems [93, 94]. In this chapter we have augmented the prior work by considering the survival probability to show the effects of varying . We have also more systematically shown the number of generations until dominance of the evolutionary stable strain. In addition, we reported the case of a mutant strain invading a background population, clarifying the conceptual and quantitative results of those earlier works, which considered instead scenarios complicated by multiple ongoing mutations.

10.3 Pair Approximations

The inadequacy of mean-field treatments of spatial systems motivates the development of more elaborate mathematical methods. In this section, we review one such methodology, based on augmenting mean-field approximations with successively higher-order correlations, and we test its applicability to our host–consumer spatial model. The numerical variables used in this methodology are probabilities which encode the state of the ecosystem and can change over time. One such variable is, for example, the probability that a lattice site chosen at random contains an organism of type . Another is , the probability that a randomly-chosen pair of neighboring sites will have one member of type and the other of type . The change of these quantities over time is usually described by differential equations, for which analysis tools from nonlinear dynamics are available [104, 76, 95, 51, 105, 106, 107].

The importance of the joint probabilities is that they reflect correlations which mean-field approximations neglect. To understand the relevance of the joint probabilities , consider a scenario where an invasive mutant variety forms a spatial cluster near its point of entry. Let be the probability that a lattice site chosen at random contains a mutant-type organism, and let denote the probability that a pair of neighboring sites chosen at random will both be occupied by mutant-type organisms. Then the average density of invasive mutants in the ecosystem, , will be low, while the conditional probability that a neighbor of an invasive individual will also be of the invasive type, , will be significantly higher. (It is typical in theoretical spatial ecology to denote conditional probabilities with , rather than  [108].) A discrepancy between the conditional probability and the overall probability can persist when the ecosystem has settled into a quasi-steady-state behavior, and is then an indicator of spatial pattern formation.

Applying this idea to the spatial host–consumer model, let be the probability that a lattice site chosen at random contains a consumer, and let denote the conditional probability that lattice site adjacent to a consumer will also be occupied by a consumer. Figure 15(A) shows and measured during the course of numerical simulations. In a well-mixed scenario (where we expect the mean-field approximation to be applicable), the average consumer density and the consumer–consumer pairwise correlation are essentially equal over time. In the spatial lattice scenario, and are noticeably different.

Figure 15: (A) Pairwise conditional probability plotted against the average density of consumers, , for three variations on the host–consumer model: a well-mixed case in which mean-field theory is applicable, a random regular graph (in which each site has exactly four neighbors) and a 2D square lattice. The dotted line, , indicates the mean-field approximation. timesteps were computed for each case. The well-mixed case is simulated by dynamically rewiring sites at each time step, precluding the generation of spatial heterogeneity; consequently, the pairwise correlation is within statistical variation equal to  (). A random regular graph (RRG) with random but static connections does develop spatial heterogeneity so that is not the same as  (). The discrepancy is even stronger in the lattice case (). (B) Success rate of invasive mutant strains as a function of swapping probability. Voracious mutant strains with are introduced into a lattice ecosystem defined by a host growth rate of , a consumer death rate (the same for both consumer varieties), and a native consumer transmissibility of . Average success rates are found by simulating 2000 invasions per value of the swapping probability parameter; error bars indicate 95%-confidence intervals. Increasing the fraction of possible swaps which are actually performed makes the voracious invasive strain more likely to take over the ecosystem.

Treating the correlations as not wholly determined by the probabilities is a way of allowing spatial heterogeneity to enter an analytical model. Whether it is a sufficient extension in any particular circumstance is not, a priori, obvious. Typically, the differential equations for the pair probabilities depend on triplet probabilities , which depend upon quadruplet probabilities and so forth. The standard procedure is to truncate this hierarchy at some level, a technique known as moment closure [109, 103, 105, 110, 70]. Moment closures constitute a series of approximations of increasing intricacy [111, 112]. The simplest moment closure is the mean field approximation; going beyond the mean field to include second-order correlations but neglecting correlations of third and higher order constitutes a pair approximation. These approximations do not incorporate all of the information about spatial structure which may be necessary to account for real-world ecological effects [103].

10.4 Organism Swapping

Several factors have been identified which undermine pair approximations [103, 100, 95, 101, 72, 113, 106, 114, 70, 115]. In our model, we can directly test the efficacy of pair approximations in a completely general way. The key idea is to transplant individuals in such a way that the variables used in the moment-closure analytical treatment remain unchanged. At each timestep, we look through the ecosystem for isolated consumers, that is, for individual consumers surrounded only by a specified number of hosts and empty sites. We can exchange these individuals without affecting the pairwise correlations. For example, if we find a native-type consumer adjacent to three hosts and one empty site, we can swap it with an invasive-type consumer also adjacent to three hosts and one empty site. We can also exchange isolated pairs of consumers in the same way. The variables used in the moment-closure treatment remain the same. Were the moment-closure treatment valid, we would expect the dynamics to remain unchanged when we perform such exchanges.

When we perform the simulation, however, swapping strongly affects the dynamics. With this type of swapping in effect, mutants with higher can invade a native population with lower . In one typical simultation run with a native of 0.33 and an invasive of 0.45, the invasive strain succeeded in 1,425 of 10,000 injections. Without swapping, the number of successful invasions is zero.

Swapping can be considered as creating a new ecosystem model with the same moment-closure treatment as that of the original. The behavior of invasive strains is different, because transplanting organisms allows invasive varieties to evade localized Malthusian catastrophes. Swapping opens the ecosystem up to invasive strains, since, in essence, it removes individuals from the “scene of the crimes” committed by their ancestors.

This type of swapping is, to our knowledge, a new test of moment-closure validity. Randomized exchanges have been incorporated into computational ecology simulations for different purposes. For example, research on dispersal rates in an island model shuffled individuals in such a way that the population size of each island was held constant [116].

If, instead of performing every permissible swap, we transplant organisms with some probability between 0 and 1, we can interpolate between the limit of no swapping, where invasions always fail, and the case where pair approximation is most applicable and invasions succeed significantly often. The results are shown in Figure 15(B) and indicate that the impact of swapping becomes detectable at a probability of and effectively saturates at a probability of .

Our swapping method allows us to test the significance of complications which can undermine pair approximation techniques or make them impractical to apply, several of which have been identified. First, introducing mutation into a game-theoretic dynamical system can make pair approximation treatments of that system give inaccurate predictions [106, 114].

Second, when the evolving population has a network structure, the presence of short loops in the network often makes pair approximations fail [113]. For example, in a triangular lattice, one can take a walk of three steps and return to one’s starting point, whereas on a hexagonal lattice, the shortest closed circuit is six steps long. A pair approximation can work well for a dynamical system defined on the hexagonal lattice but fail when the same dynamics are played out on a triangular one. This happens because the short loops provide opportunities for contact which the coarse-graining necessary for a pair approximation will miss. (We will study this point in more detail in Chapter \thechapter.) This effect is amplified in adaptive network models, where the underlying network changes dynamically in response to the population living upon it. In such cases, even extending the moment closure to the triplet level brings little improvement [70].

Third, fluctuating population sizes make pair approximations significantly more cumbersome to construct, leading to systems of differential equations which are too intricate to be significantly illuminating. In a game-theoretic model where a lattice is completely filled at all times with cooperators and defectors, there is one independent population density variable and three types of pairs. By contrast, in an ecological model where two consumer varieties are competing within an adaptive network of hosts, a pair approximation requires nine independent variables [100, 95, 101]. Modeling phenomena of biological interest can easily increase the complexity still more. For example, if organism behavior changes in response to social signals [72], the number of possible states per site, and thus the number of dynamical variables in a pair approximation treatment, increases further.

Fourth, the pair-approximation philosophy of averaging over all pairs in the system impedes the incorporation of environmental heterogeneities. These include biologically crucial factors like variable organism mobility, background toxicity or other localized “costs of living,” and resource availability [115].

Finally, dynamical pattern formation creates spatial arrangements which the pair approximation does not describe [103]. This can be thought of as the ecosystem generating its own environmental heterogeneities.

10.5 Effect of Substrate Topology

It is instructive to compare the spatial lattice ecosystem with the host–consumer model defined on a random regular graph (RRG). In an RRG, each node has the same number of neighbors, as they do in a lattice network, but the connections are otherwise random. RRGs have been used as approximations to incorporate the effects of spatial extent into population models, as they make for more tractable mathematical treatments, although they are typically less realistic than spatial lattices [117]. The network structure is set at the beginning of a simulation and does not change over time. The important aspect of this network as compared to the spatial case is that there exist short paths of links that couple all nodes of the network. This is quite different from the spatial case, where strains in one part of the network cannot reach another in only a few generations due to the need to traverse large numbers of spatially local links.

When we simulate our host–consumer ecosystem on an RRG, we find that an invasive consumer strain with higher transmissibility can out-compete and overwhelm a native consumer population with lower . In one typical simulation run, using the native and invasive values of 0.33 and 0.45 respectively, 2,233 out of 10,000 invasions were successful, whereas on the lattice no invasion succeeded using the same parameters. Thus, the RRG does not capture the essential features of the spatial scenario. In particular, our results show that the RRG case is more like the well-mixed case than the spatial lattice, as far as stability against invasion is concerned.

Our swapping test provides insight into the utility of the pair approximation, which can be effective for the RRG even though it is not for the spatial case. Consider the pairwise correlation value , which would be a variable for a pair approximation treatment. On an RRG, the underlying network topology provides enough locality that and are unequal, distinct from the well mixed case as shown in Figure 15(A). This means that the pair approximation is nontrivial for the RRG as it incorporates the difference between  and , which would not be contained in a mean-field treatment. We can also implement swapping on the RRG, where invasions can succeed without it; as expected, swapping does not affect the success rate on the RRG. With 10,000 simulated invasions for each case, the 95%-confidence interval for the difference in success rates between full swapping and none is . Thus, the pair approximation may be successful in this network topology. However, this does not mean that the RRG or the pair approximation capture the full significance of a spatial system, because the RRG network does not embody essential properties of spatial extent—separation by potentially large distances.

10.6 Percolation

In order to obtain quantitatively or even qualitatively correct predictions for spatial host–consumer evolutionary dynamics, different approaches are needed. Having encountered the limitations of moment closures, we now demonstrate a change of perspective which yields quantitatively useful results. In certain situations, the process of pathogen propagation through the host population distributed in space can be mapped onto a percolation problem. A topic widely investigated in mathematics, percolation theory deals with movement though a matrix of randomly placed obstacles. A prototypical percolation problem is a fluid flowing downhill through a regular lattice of channels, with some of the lattice junction points blocked at random. The key parameter is the fraction of blocked junction points. If this fraction is larger than a certain threshold value, the fluid will be contained in a limited part of the system. However, if the blocking fraction is below the threshold, the fluid can percolate arbitrarily far from its starting point. This is a phase transition, a shift from one regime of behavior to another, in this case between a phase in which fluid flow can continue indefinitely and one in which flow always halts. Similar issues arise when a pathogen propagates by cross-infection through a set of spatially arranged hosts. Sufficiently many hosts in mutual contact are required for the pathogen to propagate successfully. Pathogen strains therefore survive or die out over time depending on whether percolation is or is not possible [99, 102, 118, 119, 52, 120].

One important goal of studying host–pathogen models is knowing the pathogen properties that enable its survival in a population, or equivalently what prevents it from persisting in a population. The growth rate of a pathogen in a population can be an important public health concern. We therefore focus on analyzing the minimum value of the transmissibility that enables a pathogen population to persist, and the growth dynamics of population sizes near that transition.

Of essential importance to the quantitative theoretical and empirical analysis is the recognition that infected population growth can be described by power laws , with an exponent that differs from that of the mean field. Identifying the value of the power is important to practical projections of the number of infected individuals. The initial growth curve of infected populations can be correctly extrapolated if the exponent is known, guiding public health responses. Knowing what impediments are needed to prevent further propagation can even better guide public health intervention strategies.

Figure 16: (A) Population size as a function of time, averaged over simulation runs, for values near the transition points at and on the spatial lattice, with . Dashed and solid lines indicate the population growth for systems at dynamic percolation and directed percolation transitions respectively, showing that these transitions have the characteristic properties of those universality classes. (B) Critical for the host–consumer ecosystem with . The transition line crosses over from the dynamic percolation universality class at to directed percolation between and . Red Xs indicate the transition curve for the host-consumer dynamics on a random regular graph (RRG) of uniform degree 4; the dashed line connecting them is to guide the eye. The RRG transition is neither directed percolation nor dynamic percolation.

We show in Figure 16 the results of numerical simulations which indicates that the consumer extinction transition, when the transmissibility becomes just large enough that the consumer population sustains itself, lies in the directed percolation universality class [121, 74, 122, 123, 124]. A similar result has been reported for related models [99, 102], consistent with those models being in the same universality class. The directed percolation universality class is a large set of models, all of which exhibit a phase transition between two regimes of behavior, and all of which behave in essentially the same way near their respective transition points. The scenario of fluid flow through a random medium considered above is a classic example of a directed percolation-class model, but many others exist as well [121, 74]. The critical exponents describe how properties of the modeled system vary over time or as a function of how far the control parameter is from the critical point. They are the same for all systems in the universality class. Other universality classes exist as well, with different classes having different quantitative values for the critical exponents. Identifying the universality class a system belongs to enables us to study a complicated phenomenon by examining a simpler representative of its class instead. This is convenient, because regions of parameter space near phase transitions are precisely where mean-field and moment-closure approximations are least reliable, even for short-timescale modeling. Near the phase transition, stochastic fluctuations create dynamical patterns with a wide range of sizes. In Figure 16(A), we see that percolation theory gives quantitatively correct predictions for the growth of consumer population sizes in the spatial host–consumer model.

We can understand the and extremes by mapping the host–consumer model onto other stochastic models for which exact or approximate results are available. When , the host–consumer model maps onto the SIR epidemic process [74]. In turn the SIR model on the square lattice can be understood in terms of bond percolation on the square lattice [125], for which the transition point is known exactly [126, 127]. We can therefore predict analytically that the critical on the square lattice is 0.5. Percolation theory also gives a prediction for the critical on an RRG: it should be approximately  [128]. These both match the simulation results seen in Figure 16(B).

In contrast, when , empty sites are filled as quickly as possible, so the behavior of the host–consumer model should resemble that of an epidemic model with only Susceptible and Infected sites. In this case, the transition point of the epidemic model on the square lattice is only known numerically [125]. The numerical value, , does agree with the critical found by simulating the host–consumer model at .

Thus, in the limiting cases of and , the host–consumer model is roughly equivalent to the SIR and SIS epidemic models. However, a host–consumer model with  has dynamical behavior distinct from an epidemic model which allows reinfection of Recovered sites. The key difference is that reoccupying an empty site with a consumer requires prior recolonization by a host, whereas the vulnerability of a R[ecovered] individual to becoming I[nfected] is defined as an intrinsic property of the R[ecovered] type. This changes the role of ecology: both models incorporate space, but the effect of spatial extent is different. This manifests as a change in the shape of the critical-threshold curve, as well as a change in universality class [125, 74].

Furthermore, when we use an RRG topology instead, comparing host–consumer dynamics at  with an SIS epidemic model reveals their transitions to take place at different thresholds. For the host-consumer model, the critical on an RRG is approximately 0.2615, while the SIS threshold is approximately  [129, 130, 131].

The physics analysis of percolation behavior near the transition point maps directly onto the critical public health problem of the growth of infected populations, and more generally onto the dynamics of evolutionary systems. For these systems the mean field treatment fails and the standard transmission of infectious diseases in a population need not apply. Applications to real world systems must accommodate the actual network of connectivity. This network can also be modified by intervention strategies.

10.7 Patch Size and Structure

Since it is the size of a host patch which determines the amount of resources available for consumers, we now investigate patch sizes in detail. One way to test if the host patches have a characteristic size is to take snapshots of the dynamics in its quasi-steady-state regime (e.g., the fourth panel of Figure 11) and compute its autocorrelation. We can do this by running the snapshot through a filter that produces a binary matrix whose entries are 1 in locations occupied by hosts and 0 otherwise. Applying FFTs and the Wiener–Khinchin theorem then yields a 2D autocorrelation matrix, which by averaging we can collapse to a function of distance. If this autocorrelation function has a characteristic distance scale—for example, if it decays exponentially—then we can use that distance as the correlation length. We present examples confirming this in Figure 17.

Figure 17: The decay of host-host autocorrelation as a function of distance, for different choices of  (fixing , ). Note that the correlation curve for  essentially coincides with the curve found when is a mutable trait.

Next, Figure 18 summarizes the results of this procedure for host–host correlations as we vary , holding the other parameters fixed at , . We see that the correlation length of the host distribution, which we can regard as the characteristic size of host patches, increases with . By choosing a different filter, we can apply the same procedure to find the characteristic size of empty regions. The right panel of Figure 18 summarizes the results.

Figure 18: (Left) Correlation length of hosts for different values of  (, ). The solid line is the curve , where the parameters found by least-squares fitting are , and . (Right) Correlation length of empty space under the same conditions. The solid line is the curve , with , and .

We consider the following scenario: a native population of consumers, with transmissibility , is dynamically forming and re-forming host patches. Into this ecosystem, a mutant consumer with transmissibility is introduced. Call the characteristic size of host patches and the characteristic separation between them . Both of these length scales will depend on , and . The length is the typical size of host patches in which the mutant variety “expects” to live. If this length is too much larger than , the size of the patches which exist due to the native population, then the mutant variety is likely to suffer a Malthusian catastrophe.

When we find the correlation lengths via simulations, as seen in Figures 18 and 19, we see that both and increase with . Assuming that the deleterious effect of “expecting” larger host patches grows with the discrepancy between “expected” and actual patch sizes, the distribution of  in the consumer population will accumulate in the region where the curve starts to take off.

Figure 19: (Top) Host-host correlation length for different values of  (, ). The solid line is the curve , where the parameters found by least-squares fitting are , and . (Bottom) Void-void correlation length under the same conditions. Here, , and .

The missing piece is an analytical expression for  and in terms of the parameters , and , when all these parameters are fixed. What is the functional form of

(44)

Figure 18 suggests that both correlation lengths depend roughly exponentially on , but the offsets and growth rates will depend on  and .

One way in which we might try to estimate the length scale and see, at least qualitatively, how it depends on the system parameters, is to use a moment closure. This method turns out not to work, and we now investigate why. Consider a square lattice of size , and assume for simplicity that it contains some number of host patches, all of a comparable size which we take to be the characteristic length . The population density of hosts is

(45)

The quantity used in a pair approximation denotes the probability that a randomly chosen neighbor site of a randomly chosen host will also contain a host. If we imagine a patch of hosts in an otherwise empty lattice (for convenience, suppose the patch is roughly circular), then sites in the interior of the patch have all their neighboring lattice sites occupied by hosts, while sites on the edge have some empty space adjacent. Only the sites on the perimeter can be responsible for decreasing below unity. Writing for perimeter and for area, we have

(46)

where is a constant we can take to be roughly one-half. If, for simplicity, we assume ordinary Euclidean scaling of perimeters and areas,

(47)

then we have that

(48)

We have two quantities, and , which we can compute by numerically integrating a set of coupled differential equations. And we have two unknowns: the patch size and the number of patches . We solve for , obtaining

(49)

As the conditional probability approaches unity, the patch size increases.

Unfortunately, when we compute and by numerically iterating the dynamical equations (see [100, 95, 101]), we find that the inferred value decreases with . The direction of the change, as predicted by pair approximation, is incorrect! This is a sign that the pair approximation is losing important information about larger-scale structures.

Another possible way to predict how and depend on , and comes from the theory of spatial stochastic processes. For some models, we can derive an analytical expression for the power spectrum of fluctuations, as a function of both spatial and temporal oscillation frequencies. A peak in this power spectrum indicates a characteristic length or time scale [132, 133, 134]. Because the host–consumer lattice model differs in a few significant ways from those treated by these methods before, we defer a detailed exploration of this idea until a later chapter. The upshot appears to be that the characteristic length scale increases in the proper direction, that is, with increasing , although thanks to the differences hinted at, the concavity may not be correct.

Figure 20: Typical frames from host–consumer simulations run with different values of the transmissibility (the other parameters are fixed at , ). (Top left) ; (top right) ; (bottom left) ; (bottom right) .

At this point, we might be concerned that computing a correlation length in this fashion could be smearing over too much. If there are many small patches and a few large ones (and Figure 20 makes this look plausible), then does reducing the pattern to a single correlation length lose any important information? To cross-check the idea that host patch size controls the evolved transmissibility, we therefore measure patch size in another way.

The most straightforward way to assign a size to a patch is to count the number of lattice sites it contains. This has the advantage that while we are computing it, we can also count the number of sites in the patch which border on empty space. So, we can see how the perimeters of host patches relate to their areas. If host patches were Euclidean circles, then the perimeter would scale as the area to the one-half power. On the other hand, a patch which consists of a single site in a discrete lattice is all boundary: its perimeter and its area are equal. Our patches live on a square lattice, so a cluster of hosts must contain at least five individuals for its area to be greater than its perimeter. The more hosts are contained within the cluster, the more likely the cluster is to have an interior. We therefore expect a crossover: below some value of the area, the maximum observed perimeter will be equal to the area. Above the crossover point, the maximum observed perimeter will grow more slowly. Noting that the patch dynamics are stochastic and their edges irregular, we hypothesize that this growth will be faster than the area-to-the-0.5 obtained for Euclidean circles.

Figure 21 bears this out. When we plot the perimeters of host clusters against their areas, we see a crossover at an area of lattice sites (for , ). Furthermore, the increase of perimeter with area above the crossover point is faster than the square root of the area.

Pascual et al. [135] study the cluster-size distribution for prey in a predator–prey model which is similar to our host–consumer ecosystem. However, they do not find a crossover: instead, cluster perimeter scales smoothly and just barely sublinearly with cluster area, across the whole range of observed areas. This may be due to an extra mixing effect which their model includes and ours does not, an effect which tends to bring more of a cluster to its perimeter.

Figure 21: Perimeters of host patches, plotted against their areas (, , ). The upper (red) straight line indicates linear growth, and the lower (green) line indicates square-root growth. Note the crossover at area lattice sites. Below the crossover point, the maximum observed perimeter toes the line of linear growth, indicating that host patches exist which have no interior.

Having measured the host patch sizes, we can investigate the size distribution. Figure 22 shows the numbers of patches observed at different sizes. The fall-off of frequency with area is faster than an inverse-area relationship, though not uniform. At first blush, an appropriate characterization would be a power-law decay with an exponential cutoff. Because this distribution is fairly broad, we use a percentile to characterize it: we find the value of the area such that 99% of the host patches are that size or smaller. This is indicated by the vertical dashed line in Figure 22.

We see the results of repeating this calculation across a range of values in Figure 23. The minimum of the 99th-percentile curve gives the value of the transmissibility which evolves when is a mutable trait!

We can understand this, heuristically, using much the same argument we made above. A consumer strain which “expects” that large host patches are available will fare poorly if they are not. When is evolvable, the distribution of the consumer population will tend to concentrate around the minimum of the 99th-percentile curve. In the short term, local subpopulations with higher can blossom, causing occasional upward shifts in the average . We expect, therefore, that the average in the evolving population will lie somewhat to the right of the curve’s minimum point.

The hypothesis that these curves can be well approximated by exponentially truncated power laws can be validated by standard curve-fitting techniques [136]. Applying methods suitable to power-law analysis reveal that for all values, the number of patches decays with roughly the 1.5 power of the patch area. The location of the cutoff determined by curve-fitting follows, unsurprisingly, the relationship seen with the 99th-percentile areas (Figure 23). One should take care, however, when applying these statistical methods and interpreting the results, as they presume an independence among data points which may or may not be applicable here. If it is the case, for example, that large patches are typically accompanied by smaller patches nearby, then the assumption of statistical independence would be invalid.

The qualitative shape of the area/abundance curve, that is, the power-law dependence with an exponential cutoff, is also seen in the results of coagulation and fragmentation processes [137, 138]. Furthermore, such distributions are known to be maximum entropy distributions: they arise when one maximizes the Shannon entropy (which we will discuss in Chapter \thechapter), subject to a certain type of constraint [139]. This suggests that the general functional form is rather robust.

It is a good idea to try and understand the shape of the 99th-percentile curves (Figure 23) heuristically, since the minima of those curves indicate the values to which will evolve. Suppose, for concreteness, that the abundance as a function of area is a power-law decay with an exponential cutoff. Why should the cutoff decrease and then increase? Can we think of this in terms of countervailing influences?

First, we consider the low- regime. When is small, the cutoff should be large, because the system is near a critical point, and critical points mean power laws. The closer we move towards criticality, the better the quality of the power law, and the less noticeable any cutoff will be. Therefore, as we increase , we move away from criticality, and the deviation from a clean power law becomes more severe. This qualitatively explains the decreasing part of the curve in Figure 23.

What about when is large—say, when it is larger than the value to which it would evolve when mutation is present? Here, the theory of critical points is no longer pertinent, and we find guidance instead in the study of coagulation and fragmentation processes. Numerical simulations indicate that for each value of  and , there is a region along the axis where the total host population size does not strongly depend upon  (see Figure 24). That is, when is large, we can adjust it and the host population size will not change too much in response. This suggests that we can construct a simplified model in which is a parameter that we can vary independently of the consumer transmissibility.

Gueron and Levin develop a model of group-size dynamics wherein a population of fixed total size is divided into groups that can merge and split stochastically [137]. The rates of merging and splitting are taken to be functions of the group sizes. If and are the sizes of two groups, then the probability per unit time that those groups will merge is . Similarly, the probability per unit time that a group of size will fission into fragments of sizes and is . A convenient choice of functional form that allows some analytical solutions is to take

(50)

where is an increasing function of  and is a rate parameter. (For example, the probability of merging might increase with the surface area of each group, implying a power-law dependence on the group population.) The splitting rate is taken to be

(51)

This stochastic process has a stationary solution: the distribution of patch sizes follows the truncated decay

(52)

The cutoff is fixed by the total population size:

(53)

In our host–consumer ecosystem, host patches split apart because they are eaten into by consumers. They can merge together on their own, but their fission requires consumption. Very crudely speaking, the splitting rate should increase with the consumer population density. (In a slightly more refined approximation, we could say that the splitting rate should increase with the contact probability .) This corresponds, in the Gueron–Levin model, to increasing . If we imagine that and are fixed, increasing must be balanced by decreasing the value of the integral. Likewise, lower consumer density corresponds to lower and thus a larger value for the integral. The only way to change the value of the integral, if the function is a given, is to alter the cutoff . To obtain a larger value, we move the cutoff farther out.

We have, therefore, that if the host density is constant with respect to  but the consumer density falls, then the cutoff should be larger. When we measure the densities by numerical simulation, these are the trends we find at larger , and so the upswing fits neatly into our picture.

Figures 24 and 25 together indicate that while pair approximation itself fails to capture the eco-evolutionary dynamics of the spatial system, it can provide a qualitatively useful guide when combined with a coagulation/fragmentation model and the principle that host patch size controls consumer transmissibility by way of localized Malthusian catastrophes.

The distribution of areas (Figure 22) and the perimeter-area relationship (Figure 21) together provide an approximation to the complexity profile of the host-patch system, as we defined it back in Chapter \thechapter. Crudely speaking, the interesting part of a host patch is its boundary: in order to say what the patch might do next, we need to know what is happening at its edges. Therefore, the effective information content of a host patch should scale, roughly, with its perimeter. If we neglect the influences of one patch upon another, we can approximate the host population as a collection of blocks, and we can invoke the sum rule stated in §1.2. Each block contributes a rectangle to the complexity profile. The width of the rectangle is the number of hosts in the patch, and the height is the information content, which is given by the perimeter. Alternatively, by interchanging the axes, we can construct an MUI curve in the same way. In either case, the structure index so defined will only be an approximation.

Figure 22: Number of host patches observed as a function of patch area (, , ). The vertical dashed line indicates the 99th percentile, that is, the point at which 99% of the patches are that size or smaller. The sloping dashed line indicates a decay with the inverse of area, for comparison purposes.
Figure 23: 99th percentile of host patch area versus consumer transmissibility , for and (computed with ). The dashed vertical lines indicate the average transmissibilities which evolve when and are fixed.
Figure 24: Host and consumer densities as a function of , with and . Note that the agreement between the simulation results and the pair approximation is better for the host density than for the consumers. The pair approximation for this ecosystem was developed by de Aguiar et al. [100, 95, 101] and will be discussed in more detail in Chapter \thechapter.
Figure 25: Estimates of the evolved transmissibility as a function of the host death rate (with ). These values were obtained by combining the results of pair approximation with the insights from the Gueron–Levin coagulation and fragmentation model and the idea that the host-patch-size cutoff controls the evolution of . By searching in the pair approximation for a region where and , we identified that part of the axis where patch splitting is suppressed and the cutoff elevated. We set based on the curves computed for  and . This value, in turn, provides a reasonable estimate for the evolved transmissibility obtained at other values of .

11 Discussion

But the problem, here, is that it’s a form of adaptation that hasn’t been studied enough in animals and plants, which is that each change in the species changes what we call the environment, so there is a co-evolution of organism and environment. […] The organism by its evolution changes the conditions of its life and changes what surrounds it. Organisms are always creating their own hole in the world, their own niche.
—Richard Lewontin [140]

Understanding the effects of spatial extent is a vital part of evolutionary ecology. Spatial extent changes the quantitative and qualitative characteristics of a model’s evolutionary behavior, compared to well-mixed models. The short-term success rate of novel genetic varieties is not indicative of their long-term chance of success relative to the prevalent type. Standard stability criteria fail to reflect the actual stability achieved over time. We must instead consider extended timescales because they are determined by spatial patterns, whose ongoing formation is an intrinsic part of nonequilibrium evolutionary dynamics. Our analysis provides a clear understanding of why there are dramatic differences between spatial models and mean-field models, which simplify away heterogeneity through mixing populations, averaging over variations or mandating a globally connected patch structure. We have further shown that transplanting organisms dramatically changes the dynamics of spatial systems, even when we preserve local correlations as would be considered in a pair approximation treatment. Our results prove that any model striving to capture the effects of heterogeneity that does not change its behavior with organism transplanting cannot fully capture the dynamics of spatial evolution. The following subsections summarize the general conclusions we draw from these results.

11.1 Defining Fitness

In our host–consumer models, each individual either survives or it does not, and any individual has a specific number of offspring and survives over a certain amount of time; that is to say, an “individual fitness” (in the terminology of [141]) is a well-defined concept. To find expected individual fitness, or average individual fitness, we must define a set of individual organisms over which to take an average, which is the very concept we have established to be problematic. Consequently, derived notions of fitness, which depend on comparisons between such averages [141], become elusive, context-dependent quantities. The problem is both temporal and spatial: Average relative fitness in one generation is not necessarily a good measure of the long-term success of a strain in one, or a combination of, the broad variety of dynamically-generated niches. This problem is not, however, the same as the traditional concept of variation of fitness across a static set of niches, because the niche dynamics of our spatially explicit model ensures that evolutionary outcomes are not reflected in any standard definition of the average.

In the previous chapter, we examined the idea of “frequency-dependent fitness” [45, 44], and one might be inclined to apply that term to voracious invasive strains in this system, as the invasive strain is successful initially when rare but fails when it becomes more common. The term “frequency-dependent fitness” is, however, a misnomer in this context, because the organism type is rare and successful when it is newly introduced, but as it declines to extinction it becomes rare and unsuccessful. Nor can we attribute the decline to the frequency of hosts: the average population density of hosts remains essentially unchanged, because the boom and the following bust are localized. Frequency, being defined by an average over the whole ecosystem, is only a proper variable to use for describing the ecosystem in the panmictic case. One might attempt to refine the concept of global frequency by including local frequencies. However, the breakdown of moment-closure techniques implies that defining fitness as a function of organism type together with average local environment [142] will, in many circumstances, not be an adequate solution.

Consequently, we find that trying to assign a meaningful invasion fitness value to an invasive variety of organism is too drastic a simplification. In turn, this implies that we cannot assign a fitness value to a phenotypic or genetic characteristic such as infectiousness or transmissibility. To understand this point, we rephrase the spatial host–consumer model in terms of alleles. In an invasion scenario, an individual consumer can have one of two possible alleles of the “transmissibility gene”, one coding for the native value of  (e.g., ) and the other for the invasive value (e.g., ). A mean-field treatment would then involve specifying the fraction of the population which carries the native allele versus the fraction which carries the invasive variant. We have seen, however, that the predictions based on such a heavily coarse-grained caricature of the original model deviate from its actual behavior. In short, the evolutionary dynamics cannot be characterized using the allele frequencies at a particular time.

If we can no longer summarize the genetic character of a population by an allele frequency—or a set of allele frequencies for well-defined local subpopulations—then computing the fitness of a genotype from its generation-to-generation change in frequency is a fruitless task. In a world which exhibits nonequilibrium spatial pattern formation, allele frequencies are the wrong attribute for understanding the dynamics of natural selection. Formally, the conventional assumption that the allele frequencies are a sufficient set of variables to describe evolutionary dynamics is incorrect. The spatial structure itself is a necessary part of the system description at a particular time in order to determine the subsequent generation outcomes, even in an average sense.

The timescale-dependence issues which arise in spatial host–consumer ecosystems exist in a wider context. Multiple examples indicate that initial success and eventual fixation are only two extremes of a continuum which must be understood in its entirety to grasp the stability of a system. In the study of genetic drift, it has been found that neutral mutations can fixate and beneficial mutations fail to fixate due to stochasticity [44]. Likewise, in the study of clonal interference [143, 144], one beneficial mutation can out-compete another and prevent its fixation. Closer to the theme of this chapter, recent work has also emphasized that selection acting at multiple timescales is important for the evolution of multicellularity [145, 146].

Furthermore, classical genetics makes much use of the Price equation for studying the change in a population’s genetic composition over time [46, 47], and it is well known that analytic models built using the Price equation lack “dynamic sufficiency”. That is, the equation requires more information about the current generation than it produces about the next [147, 45, 47, 148, 33], and so predictions for many-generation phenomena must be made carefully, if they can be made at all. Modeling approaches which are fundamentally grounded in the Price equation, such as “neighbor-modulated” fitness calculations [149, 46, 59, 89, 150, 151] and their “multilevel” counterparts [149, 46, 152, 55, 153, 154, 155], are not likely to work well here, as the analyses in question draw conclusions only from the short-timescale regime. In addition, those particular analyses which address host–consumer-like dynamics either rely on moment closures [89] or they assume a fixed, complete connection topology of local populations which are internally well-mixed [59, 152]. These simplified population structures are quite unlike the dynamical patch formation seen in the host–consumer lattice model. (Wild and Taylor [156] demonstrate an equivalence between stability criteria defined via immediate gains, or “reproductive fitness”, and criteria defined using fixation probability; however, their proofs are explicitly formulated for the case of a well-mixed population of constant size, neither assumption being applicable here. Whether fixation probability is equivalent to any other criterion of evolutionary success generally depends on mutation rates, even in panmixia [33].)

We will discuss invasion fitness and moment-closure calculations in more mathematical detail in Chapter \thechapter. The Price equation and the ideas which cluster around it will be our subject in Chapter \thechapter.

In the adaptive dynamics literature, models have been studied in which “the resident strikes back” [157, 158, 159]. That is, an initially rare mutant variety can invade a resident population of type , but does not supplant and become the new resident variety, even though a population full of type is robust against incursions by type . This is often considered a rare occurrence, requiring special conditions to obtain [158, 159], though the theorems proved to that effect apply to nonspatial models, and in adaptive dynamics, it is standard to consider small differences between mutant and resident trait values. The spatial host–consumer ecosystem has the important property that, if mutation is an ongoing process, the spatial extent allows genetic diversity to grow. We initialize the system with all the consumers having the same trait value, but soon enough, different local subpopulations have different trait values. If the effects of single mutations are small, then the different varieties arising have roughly comparable survival probabilities, and so the distribution of extant trait values can spread out. However, the cumulative effect of many mutations which happen to act in the same direction on a trait such as transmissibility creates a variety which may engender its own local Malthusian catastrophe. So, the results of rare, big mutations tell us about the spread of trait values we see in the case of frequent, small mutations.

Chapter \thechapter will treat adaptive dynamics in greater depth, exploring the simplifications which its approximations allow. In essence, adaptive dynamics illustrates the maxim that when one can apply a Taylor expansion, one can simplify. One finds that reductions in complexity depend upon assumptions of smoothness that are remarkably convenient, but are not always applicable.

In our model, transmissibility and consumer death rate are independently adjustable parameters. One can also build a model in which one of these quantities is tied to the other, for example by imposing a tradeoff between transmissibility and virulence of a disease. Different functional forms of such a relationship are appropriate for modeling different ecosystems: host/pathogen, prey/predator, sexual/parthenogenetic and so forth. As long as spatial pattern formation occurs and organism type impacts on the environment of descendants via ecosystem engineering, the shortcomings of mean-field theory are relevant, as are limitations of pair approximations [90].

11.2 Pair Approximations and Stability

Pair approximations have been used to test for the existence of an Evolutionary Stable Strategy (ESS) in a system—that is, a strategy which, when established, cannot be successfully replaced by another [113]. In addition to the limitations of pair approximation for representing patch structure [104], as we saw in the previous section, the question of whether a mutant strain can initially grow is distinct from the question of whether that strain achieves fixation or goes extinct [160, 161, 162, 81, 163, 143, 164, 92, 115]. The former is a question about short-term behavior, and the latter concerns effects apparent at longer timescales. This distinction is often lost or obscured in analytical treatments. The reason is that one typically tests whether a new type can invade by linearizing the corresponding differential equations at a point where its density is negligible. However, this only reveals the initial growth rate (see the fixed-point eigenvalue analysis in [103, 104, 95, 51]).

Our analysis implies that pair approximations are inadequate for analysis of systems with spatial inhomogeneity. Even including including triple and other higher-order corrections does not suffice, as this series approximation is poorly behaved at phase transitions [111]. Such higher order terms continue to reflect only the local structure of the system and not the existence of well separated areas that diverge in their genetic composition. Nonequilibrium pattern formation will necessarily also be poorly described, at least until the order of expansion reaches the characteristic number of elements in a patch, or an area that encompasses any relevant heterogeneity. Given the algebraic intricacy of higher-order corrections to pair approximations [95, 117, 165], it is useful to know in advance whether such elaborations have a chance at success. As approximation techniques based on successively refining mean-field treatments are blind to important phenomena, we therefore need to build our analytical work on a different conceptual foundation.

11.3 Percolation

The mathematical connection between pathogen–host and percolation problems can provide insight into the difficulties in analytical treatment of the biological problem. Spatial heterogeneity gives rise to failure of traditional analytic treatments of percolation and a need for new methodologies. Since the pathogen problem maps onto the percolation problem under some circumstances, the same analytic problems must arise in the biological context. While the presence of a nonequilibrium transition point indicates that traditional analysis techniques fail, it raises the possibility that new tools from the theory of phase transitions [121, 166, 74, 111] will become applicable. For example, in Section 10.6, we saw that percolation theory enables us to make quantitatively accurate predictions of population growth and of the critical parameter values which divide one ecological regime from another. Indeed, specific important problems in public health, such as the growth in number of individuals infected in a pandemic, can be considered directly within the context of percolation. Simulations of propagation on approximations of real world networks may help provide accurate predictions, but the general properties of disease propagation can be understood analytically.

Later, in Chapter \thechapter, we will see that percolation models are also helpful in evolutionary game theory, where they yield quantitative results near phase transitions, as they do for host–consumer models. Chapter \thechapter will develop the mathematics necessary to understand percolation phase transitions. Pursuing this topic in depth will take us into the subject of statistical field theory.

11.4 Adaptive Networks

Our results also have significance in the context of adaptive-network research. This field studies systems in which a network’s wiring pattern and the states of its nodes change in interrelated ways. Prior modeling efforts have considered epidemics on adaptive networks, where the spread of the disease through the network changes the connections of the network [167, 110, 168, 169, 170, 171, 172, 173]. In such models, if a susceptible node has an infected neighbor, it can break that connection by rewiring to another susceptible node. A key point in the analysis is that the new neighbor is chosen at random from the eligible population. This choice of rewiring scheme is exactly what makes a pair approximation work for that epidemic model, because it eliminates higher-order correlations in the system [110]. (Chapter \thechapter will develop in detail how this method of rewiring allows us to write differential equations for the model.) In our system, by contrast, hosts can form new connections by reproducing into empty sites, but these contacts can only connect geographically proximate individuals.

The difference we have seen between lattice behavior on one hand and RRG or swapping-enabled behavior on the other emphasizes the need to study the effect of spatial proximity on link rewiring. While the structure-erasing nature of unconstrained rewiring among susceptible hosts has been acknowledged [170, 171], new rewiring rules which reflect spatial and community structure have yet to be systematically investigated. The reason that they have not is naturally related to the need for different analytic approaches. “Myopic” rewiring rules, such as restricting the set of eligible new partners to the neighbors of a node’s current partners, have on occasion been considered, but in contexts other than epidemiology, like evolutionary game theory [174, 175], making the endeavour of exploring such rules in epidemic models all the more worth pursuing.

11.5 Conclusions

Fisher [176] introduced modern genetic theory in large part motivated by the need to describe the existence of biodiversity.1 However, the expressions he described which apply in panmictic populations and mean-field treatments lead to a population genetics that rapidly converges to homogeneous populations. Spatial extents and their violation of the mean-field approximations are a key to biodiversity in nature. Their proper theoretical treatment will be a large step forward for evolutionary biology.

Most laboratory experiments, guided by traditional evolutionary thinking, have used well-mixed populations. The results obtained are consistent with theoretical analysis precisely because the conditions are consistent with those assumptions. Such experiments do not provide insight into the role of spatial extent and the implications for real-world biological populations. A growing number of experiments today are going beyond such conditions and, as is to be expected, are obtaining quite different results [87, 178, 118, 164, 92, 179, 119, 64, 180, 84].

Mean-field models are often helpful as a first step towards understanding the behavior of systems, but we cannot trust them to provide a complete story, and we should not let mean-field thinking furnish all the concepts we use to reason about evolutionary dynamics. Our analysis of transplanting organisms can be considered parallel to real world concerns and manifest effects of invasive species introduced by human activity and the impact of shipping and air transportation on pathogen evolution [81, 91]. These are among the well-established examples of situations in which spatial extent influences evolutionary dynamics [108, 164, 181, 179, 32, 182, 183, 184, 185, 186, 187]. Identifying specific implications of the issues explored in this chapter for particular biological systems [87, 178, 118, 164, 92, 179, 119, 64, 180, 84, 188, 189, 190, 191] requires field and laboratory work, as well as theoretical insight to guide the questions that are being asked.

This chapter is based in part on a paper by myself, Andreas Gros and Yaneer Bar-Yam, originally published in October 2011 and updated at the beginning of 2014 [192]. The research reported in that paper was a project I initiated, building on earlier work by Bar-Yam, de Aguiar, Rauch, Sayama, Werfel and others. I wrote the first version of the basic simulation code; Andi figured out how to distribute the work across multiple computers and implemented the organism-swapping idea, which I had after a conversation with him about something tangentially related. The paper-writing process was a collaboration, in which I produced most of the words and my coauthors provided the Darwinian editorial pressure.

Chapter \thechapter A Volunteer’s Dilemma

The ability of any organism to survive and produce viable offspring depends on its environment, and that environment consists in significant part other living beings. In Chapter \thechapter, we studied this in an evolutionary system based on ecological competition: success required having food to eat. Now, we turn to evolutionary dynamics defined using game theory. Interactions among organisms will be represented by mathematical games, and reproductive success will depend on the numerical payoffs won when individuals play those games together.

We can think of the evolutionary game which will be our focus for this chapter as a realization of the general idea discussed at the end of Chapter \thechapter. This game, which we designate the Volunteer’s Dilemma, is a scenario in which organisms must act in concert to achieve success in the struggle for life.

12 Well-Mixed Populations with Carrying Capacity

We introduce the Volunteer’s Dilemma in the context of two species living in the same environment. The ecosystem is well-mixed, so we can describe it by population densities. The total number of individuals is restricted, so that the sum of the population densities is bounded by unity:

(54)

Here, denotes the population density of Volunteers, and is the population density of Slackers. The game is played in the following way: a group of agents is assembled by randomly drawing from the population. Volunteers pay a cost, . If all the agents in the group volunteer, then they each gain a benefit, . A single agent slacking off deprives all the agents from gaining the benefit. Note that we are here taking the “benefit” and “cost” of strategies as given parameters of the model, rather than as emergent consequences derived from some more fundamental dynamics. This is unlike the way in which the idea of competition is realized in the spatial host–consumer system of Chapter \thechapter; we can think of these two different styles of modeling as complementary.

This game is also known as an -player stag hunt [193, 194, 195]. We will stick with the Volunteer’s Dilemma terminology in this chapter in order to avoid ambiguity: the stag hunt game is typically defined as a two-player interaction, and so “ people play a stag-hunt game” could be interpreted as one person engaging in two-player stag hunts with others, and then receiving the total payoff from these separate games.

Reproduction in this environment depends upon the availability of empty space, so the probability of a reproduction event diminishes as the total population density increases. We take a pessimistic view of volunteerism: the cost of being a volunteer must be paid whether or not reproduction happens. This is an idealization of a case where, for example, volunteering requires additional metabolic products which demand heightened energy expenditure regardless of the availability of space to reproduce into. Consequently, we treat the cost as augmenting the death rate , while the benefit augments the growth rate, which is also modulated by the total population density. For convenience, we define to be the number of additional Volunteers which a Volunteer must have in its peer group in order to obtain the benefit. We write the following coupled equations for the time evolution of the ecosystem:

(55)
(56)

We follow Durrett and Levin [196] in having the cost and benefit parameters directly modify the relevant rates. The parameters , , and are all positive. The increase in Volunteer reproduction due to the benefit depends on how likely it is that a pool of agents will contain only Volunteers. A generalization is possible to a case where the size of the interacting group is larger than the critical number of Volunteers required to gain the benefit. This would introduce a term of the form