Twolevel FisherWright framework with selection and migration: An approach to studying evolution in group structured populations
Abstract
A framework for the mathematical modeling of evolution in group structured populations is introduced. The population is divided into a fixed large number of groups of fixed size. From generation to generation, new groups are formed that descend from previous groups, through a twolevel FisherWright process, with selection between groups and within groups and with migration between groups at rate . When m=1, the framework reduces to the often used traitgroup framework, so that our setting can be seen as an extension of that approach. Therefore our framework is sufficiently flexible to allow the analysis of many previously introduced models in which altruists and nonaltruists compete, and provides new insights into these models. We focus on the situation in which initially there is a single altruistic allele, in the population, and no further mutations occur. The main questions are conditions for the viability of that altruistic allele to spread, and the fashion in which it spreads when it does. Because our results and methods are mathematically rigorous, we see them as shedding light on various controversial issues in this field, including the role of Hamilton’s rule, and of the Price equation, the relevance of linearity in fitness functions, the need to only consider pairwise interactions, or weak selection, etc. In the current paper we analyze the early stages of the evolution, during which the number of altruists is small compared to the size of the population. We show that during this stage the evolution is well described by a multitype branching process. The driving matrix for this process can be readily obtained, reducing the problem of determining when the altruistic gene is viable to a comparison between the leading eigenvalue (PerronFrobenius eigenvalue) of that matrix, and the fitness of the nonaltruists before the altruistic gene appeared. This leads to a generalization of Hamilton’s condition for the viability of a mutant gene. That generalized viability condition can be interpreted in an appropriate neighbor modulated fitness sense, providing a gene’s eye view of the generalized rule. Our generalized Hamilton rule reduces to the traditional one for public goods games, and more generally under the condition of linearity of the fitness of each carrier of the gene A as a function of the number of copies of that gene in the same group. Our analysis also suggests a broadly applicable criterion, that we make explicit, for the viability of a mutant gene, in a more general setting. Our generalized Hamilton condition simplifies considerably when selection is weak, and further when groups are large. We analyze a significant number of examples, and observe that the altruistic gene can spread under relatively low levels of relatedness in the groups, corresponding to relatively high levels of migration. This happens, for instance, when the fitness of individuals is affected by repeated activities in their groups, and the altruistic mutant gene promotes cooperation in each round in a fashion that is conditional on the behavior of the group members in previous rounds. This class of models is a natural extension to the group structured population setting of titfortat and related conditional strategies in the iterated two player setting. We propose that this kind of conditional altruistic behavior in groups be investigated as a possible route for the spread of altruistic behavior through natural selection.
1. Dept. of Mathematics, University of California at Los Angeles, CA 90095, USA
2. Dept. of Applied Mathematics, Instituto de Matemática e Estatística, Universidade de São Paulo, 05508090, São PauloSP, Brazil
3. Dep. de Física Geral, Instituto de Física, Universidade de São Paulo, CP 66318, 05315970, São PauloSP, Brazil
Key words and phrases: Natural selection; FisherWright model; population genetics; evolutionary game theory; traitgroup framework; altruism; cooperation; kin selection; group selection; Price equation; Hamilton’s rule; relatedness; neighbormodulated fitness; iterated public goods game; generalized titfortat strategies; threshold models; strong and weak selection; multitype branching processes; PerronFrobenius eigenvalue and eigenvector; viability or survivability criterion; survival mechanism; Wright’s infinite islands model.
Acknowledgements: R.H.S. is glad to thank Rob Boyd for many hours of stimulating and informative conversations on the subjects in this paper. He also warmly thanks Marek Biskup for finding a derivation of (55) in a special case; that result motivated us to engage in the work that eventually resulted in Section 5 of this paper. R.H.S. is also grateful to Clark Barrett, Maciek Chudek, Daniel Fessler, Sarah Mathew and Karthik Panchanathan for nice conversations and feedback on various aspects of this project and related subjects. This project was partially supported by CNPq, under grant 480476/20098.
1 Core results
We introduce a stylized framework for studying the evolution of a group structured population. Our goal is to shed light and clarify issues in the ongoing debate on the interplay between group selection and kin selection. We will focus here on the application of the framework to the question of the spread of an altruistic gene A, resulting from a mutation, in the absence of further mutations. This is the central issue addressed in the debate, and is well suited for introducing our framework. Comments on other natural applications of the framework will be made at various places in this paper. For background material, and a significant sample of work addressing altruism, cooperation, group selection and kin selection, from different perspectives, we refer the reader to the papers/books listed in the reference section (except for [11] and [28]) and references therein.
We conceived our approach in the spirit of basic stylized frameworks in population genetics, like the FisherWright framework with selection. By this we mean that we aimed at keeping the elements in the modeling mathematically precise, and as simple as possible, provided they would still capture the basic biological features that one wants to study. Central to the contribution in the present work is the fact that rigorous mathematical methods can be used to decide the fate of the mutant gene A. This allows us to compare our rigorous conditions for the spread of altruism with basic concepts and issues including Hamilton’s rule, the Price equation, neighbor modulated fitness computations, the compatibility between a gene’s eye view and a group selection mechanism, whether pairwise interactions, linearity of fitness functions, or weak selection have to be assumed, the possibility of altruism to spread in group structured populations when the migration rate is significantly higher than the inverse of the group size, etc. We believe that our results help in clarifying these issues, and others that are being debated, and we hope that it will bring some consilience to this field, allowing for a greater level of collaboration among the various groups contributing to the area.
Our framework can be seen as a mathematically precise version of what in [40], p.6737, is called a “typical kin selection model”. One of our main goals was to develop methods that apply to much more general fitness functions than those considered there, and that do not require the assumption that selection is weak.
Our framework is also a natural extension of the classical traitgroup framework (for the origins of this framework see [82]) and therefore allows for the analysis of the models that have previously been studied in that framework. What distinguishes our framework from the traitgroup one, is a migration rate parameter , with the case reducing to the traitgroup framework. When , our framework introduces assortment, in the sense that offsprings of members of the same group tend to stay together. The migration parameter determines the strength of this assortment; the smaller it is the stronger the assortment.
Under natural conditions on the altruistic gene A, to be specified later, we show that there are two critical values of the migration rate , namely , playing roles as follows. For the gene A is eliminated. For the gene A has a positive probability of fixating, replacing the wild allele N. In the intermediate regime, when , the outcome is model dependent, but typically there is a positive probability for the mutant gene A to spread and reach a polymorphic equilibrium with the wild allele N. In this paper we will focus on the critical point (the subscript stands for survival of the mutant allele A), and the corresponding mathematically rigorous conditions for the spread of altruism in our framework. Results on and the corresponding conditions for fixation of A, as well as results on the evolution of the frequency of the gene A in the population when it spreads (either in the intermediate polymorphic regime, or in the fixation regime) will be presented elsewhere ([60]).
For the reader’s benefit, and for brevity, we will not present here the full rigorous mathematical proofs, but rather explain why the various results are true at a level that, we hope, will make them generally quite intuitive. We emphasize that the results are mathematically rigorous and hold for any strength of selection. In the special case of weak selection simplifications occur and will also be discussed.
We consider a population in which individuals live in a large number of groups of size . Individuals are of two genetically determined phenotypic types, the wild N and the altruistic mutant A. Reproduction is asexual and the type is inherited without mutation by the offsprings. Each individual has a relative fitness that depends on its type and the types of the other members of its group (the idea being that altruists, at a cost to themselves, provide a benefit to the members of their group). The relative fitness of an altruist, and that of a nonaltruist, both in a group that has a total number of altruists, will be written, respectively, as
with the convention that , i.e., . The quantities and represent payoffs to altruistic, or nonaltruistic behavior. The parameter indicates the strength of selection, with the limit corresponding to the limit of weak selection, and the case corresponding to the case in which there is no selection, only neutral genetic drift. Examples of payoff functions will be provided in Section 2. See also Fig. 2.
Evolution operates as the next generation is formed through a process that involves group competition and competition within groups, followed by migration at rate , as summarized in Fig.1. Competition among groups is idealized as an (intergrouplevel) FisherWright process with selection, described as follows. We associate to each group a relative fitness given by the average relative fitness of its members. This means that a group with altruists has relative group fitness
(1) 
Each group in the new generation has a parental group from the previous generation, chosen independently with probability proportional to group relative fitness. Competition among members of a group is described by an (intragrouplevel) FisherWright process with selection, described as follows. The members of each group in the new generation each has a parent from among the individuals in their parental group, chosen independently with probability proportional to the fitnesses of the members of that parental group. (A standard probability computation, using conditioning on the parental group, shows that each individual in the old generation has then an expected number of offspring proportional to its relative fitness. Conversely, for this to be true, the fitness associated to the groups in the intergroup level FisherWright process must be given by (1). The conceptual relevance of this equivalence is emphasized in [36].) Once the new groups have been formed according to this twolevel FisherWright process, a fraction of the individuals migrates from their group to a randomly chosen group, preserving the constancy of the number of members of the groups. More precisely, each individual, independently of anything else, leaves its group with probability ; the migrants then return to the groups in a random fashion, filling vacancies, so that each group has again members. Each possible way of assigning the migrants to the vacancies left in the groups is equally likely, meaning that the migration process is completely random. (Mathematically: each individual is independently of anything else, with probability , declared to be a migrant, and one applies then a random permutation to the set of migrants.)
In the case , we can equivalently think that the new groups are formed by random assortment from a metapopulation with individuals. Each one of these individuals has a parent chosen independently with probability proportional to relative fitness from the individuals in the old generation. This is precisely the traditional traitgroup framework.
A model within our framework is specified by giving the values of and the relative fitnesses and . The number of groups will be considered to be very large, corresponding to taking the limit in the computations. The study of how finiteness of modifies the conclusions is very interesting, but will be deferred to a later investigation.
Our results on the viability of a single gene A to spread do not depend on any conditions on the parameters of the model. (These results are summarized in the paragraph that contains display (9).) But to keep the presentation more focused and interesting, we will assume that conditions (C1) and (C2) below hold, except when stated otherwise.

(C1) , i.e., , so that an isolated type A individual has lower fitness than the wild type N has in groups without altruists.

(C2) , i.e., , so that type A individuals have greater fitness when in singletype groups than type N individuals have when in singletype groups.
Condition (C1) is sometimes referred to, after [83], as the condition for A to be called “strongly altruistic”. This condition means that an isolated gene A is at a disadvantage with respect to the wild type N in the population at large. In the traitgroup framework, , this condition makes it impossible for A to invade. We will see that, as expected, this condition makes it impossible for this gene to invade also when is close to 1, so that .
On the other hand, we will see that condition (C2) is sufficient for A to spread when is close to 0, so that .
We will study the evolution of the population, when started in generation 0 from the situation in which only one individual is of type A. Naturally this refers to the situation in which a mutation from N to A has just occurred, and to the assumption that the mutation rate is so low that no further mutations will occur before the fate of that mutant gene has been decided. Obviously the mutant A may disappear in a few generations, but we want to determine here when it is viable, in the sense that it has a fair chance of spreading. We will denote by the number of altruists in generation . In case , meaning that the mutation is neutral, the expected number of altruists remains constant, , and a standard martingale argument gives probability to the event that A will not disappear, but rather fixate eventually. As this probability vanishes. When , and condition (C1) holds, prospects are even worse for the mutant gene A in generation 1. The expected number of type A individuals then is . When , these bad prospects worsen with time, since in the first few generations the possible type A are likely to all be in different groups (since is large), and so are always carrying the same fitness . This leads to , and to the certain elimination of the gene A. In the opposite extreme, when , a group with altruists may be created by chance in a few generations. Groups that descend from this one will always have only type A individuals, who therefore have average fitness , by (C2). In this situation it is reasonable to expect that the altruistic gene can spread with a probability that does not vanish as . The rigorous analysis of what happens in this case and in the more important case can be done using the theory of multitype branching processes, as covered, for instance in Chapter II of [28]. We turn next to the application of that theory to solving our problem.
In applying multitype branching process theory here, we must emphasize that that theory describes well the evolution in our framework only in its early stage. By early stage, to be abbreviated E.S., we mean the generations before the number of groups that contain altruists is comparable to . We will nevertheless see that this early stage period is of order generations, so that it covers a large number of generations since is large.
We say that a group is of type if it has exactly altruists. First we explain how multitype branching theory can be used when . In the E.S., there are few groups with altruists, compared with the total number of groups. Therefore in the intergroup FisherWright process the competition among groups with altruists is basically irrelevant; groups are mostly competing with the groups without altruists, that form the background on which groups with altruists may or not spread. To see this, note first that in each generation there are new groups being formed, and that they choose their parental groups independently with probability proportional to group fitness. Since the vast majority of the groups have no altruists, and therefore group fitness , a type group has a probability close to of being the parent of each new group. Hence, each group of type , with , can be seen, in first approximation, as creating independently of the other ones a number of offspring groups that is given by a binomial distribution with parameters and (well approximated by a Poisson distribution with mean ). During the E.S., there are much less than groups with altruists, and they are each producing a number of offspring groups that is also small compared to . For this reason each one of these groups interferes little with the other groups with altruists in their creation of offspring groups. This independence in the creation of offspring groups is what defines a multitype branching process. The next fact to observe is that each group that has as its parental group a group of type will be of type , due to the intragroup FisherWright process, with probability
where is a binomial random variable with attempts, each with probability of success. Assembling the pieces above, we conclude that, through the twolevel FisherWright process, a group of type creates in the average
(2) 
groups of type in the next generation, independently of anything else. When this is the whole story. The matrix , of size , defined by (2), with , , characterizes the evolution of this process
When , the creation of the new generation of groups is complicated by migration. One could be concerned that a multitype branching process description is no longer feasible. Fortunately this fear is unfounded, thanks to the fact that we are only considering the E.S., during which . Altruists form then a minute fraction of the migrant population, and as a consequence it is unlikely that migrant altruists will settle in groups that contributed altruists to the migrant population, or that any group will receive more than one migrant altruist. A group that has altruists before migration, will keep after migration a random number of altruists given by a binomial distribution with attempts and probability of success. This means that the probability that after migration this group is replaced by a group with altruists is given by
(3) 
That group that had altruists before migration, will also be contributing with an expected number of migrant altruists, who are likely each to settle in a different group that had no altruists before migration, and has exactly one altruist after migration, i.e., is now of type 1. This means that the expected number of groups of type created from groups of type 0 that received altruists from our group that had altruists before migration is given by
(4) 
The matrix should therefore be replaced, due to migration, with the matrix in describing the expected number of groups of type created in the new generation by each group of type in the old generation. We will use the notation for the number of groups of type in generation , and also write . In summary, we have, in matrix notation, that for in the E.S.,
(5) 
Obviously . Therefore, the survival of the altruistic gene is equivalent to the survival of the multitype branching process . Next we describe the necessary and sufficient condition for the survival with positive probability of this multitype branching process. In what follows we will suppose that ; the cases and and can be treated as limits. Because the matrix has then only strictly positive entries, it results from the PerronFrobenius Theorem (see, e.g., Theorem 5.1 in Chapter II of [28]) that it has an eigenvalue that is simple, positive and larger in absolute value than all other eigenvalues. It corresponds to left and right eigenvectors, both of which have all their entries strictly positive. We will denote by this lefteigenvector, normalized so as to represent a probability distribution over group types: . (Illustrations of and as functions of and appear in Fig. 3, Fig. 5 and Fig. 8, for various models.) A consequence of (5) and of the PerronFrobenius Theorem is that for large, but still in the E.S.,
(6) 
where is a constant.
Theorem 7.1 in Chapter II of [28] states that the survival with positive probability of the multitype branching process is equivalent to the condition
(7) 
(We will make , , etc, explicit in the notation , , etc, only when important.) The critical value is then obtained by solving the equation
(8) 
(For some numerical examples, see Fig. 3, Fig. 4, Fig. 6 and Fig. 7.) To see that this equation has a solution in the open interval , we use continuity of eigenvalues and eigenvectors. In particular is continuous, and it is enough to observe how it behaves as approaches or . When , has the eigenvalue (by (C2)), corresponding to the lefteigenvector . This implies that must be larger than 1 when is close to 0. When , the matrix , and has only the first column not identically 0. Therefore any of its lefteigenvectors must be of the form . When such a vector is multiplied by , the result is . This shows that is the only eigenvalue of . Therefore must converge to (by (C1)), as .
The argument above shows that (8) has a solution. Uniqueness of this solution is not guaranteed, unless additional conditions are assumed on the fitnesses. In any case, if there is more than one solution to (8), the natural definition of , that we adopt, is as the largest one, representing the least strength of assortment of altruists that suffices to allow altruism to survive.
The theory of multitype branching processes provides us with further detailed information on the patterns of evolution, when the altruistic gene A survives. Theorem 9.2 of Chapter II of [28] shows that in the event that the process survives, it behaves in a rather regular fashion: as becomes large, the vector tends to become a multiple of , and to grow at rate . (More precisely, the distribution of the random vector converges to , where is a random variable.) Intuitively, this is a sort of law of large numbers: if the multitype branching process survives, the relative frequency of groups of each type tends to stabilize, as that given by the vector , as becomes large. But randomness in the values of ,…, persist (and are given by the one dimensional above), due to the randomness that affects the process in the first few generations, before large number phenomena can take place.
We summarize now what we have learned about the evolution of the gene A in our framework. When (7) fails this gene dies out in a few generations. On the other hand, when (7) holds, the picture of its evolution is a dichotomy. Either A is eliminated in a few generations, or else it survives and, as it spreads, its distribution stabilizes dynamically, in the sense that
(9) 
when . We will refer to this time period as the stationary early stage, abbreviated S.E.S.. Note that the upper bound on the magnitude of is equivalent to the condition that during this period . The condition is then what defines the E.S., and we will refer to the first few generations, before the S.E.S., as the very early stage, abbreviated V.E.S.. During the V.E.S., evolves very randomly, displaying little regularity, since the number of copies of the gene A is small. The random variable reflects how randomness during that period affects the later S.E.S., and the extent to which it is not washed out as the number of copies of A grows and the evolution becomes more regular.
Finally, the later period when is no longer negligible as compared to (provided that A has survived) will be called late stage, abbreviated L.S.. The evolution of during the L.S. will no longer be well approximated by the multitype branching process, and is more challenging to study. To focus in the current paper only on the issue of survival of the gene A, we will postpone our analysis of that problem to a later publication ([60]). Here we only observe that in that regime, laws of large numbers allows us to well describe the evolution as a dynamical system (in dimensions, representing the fractions of groups of each type). That dynamical system turns out to be nonlinear (due to migration) and to sometimes have more than one stable equilibrium. Nevertheless, under the condition that (C2) holds and is small, or some alternative conditions (for instance (C3) in the next section), when is small, it has a single stable equilibrium, corresponding to fixation of the gene A in the population. This is what characterizes the critical point . It is worthwhile to stress in this connection that the linearity of the evolution during the E.S., as given by (5), in spite of migration, makes the issue of deciding when the mutant A can survive much easier than it would be otherwise. This is what allows the reduction of the problem to a standard eigenvalue problem.
Returning to our analysis of the E.S., We can see (9) as a form of selforganization of the gene A. If it survives, it arranges itself according to the distribution , that is a lefteigenvector of the driving matrix . Lefteigenvectors are precisely the arrangements that the process can have which are preserved in time. To better appreciate what is special about , and for several future uses, we observe that if is a lefteigenvector of , with eigenvalue , and no negative entries, then
(10) 
In other words, is the average fitness of the individuals who carry the gene A (or simply, the average fitness of the gene A), when the groups with altruists are distributed according to . Identity (10) is an easy consequence of two observations. First that from (5) we know that if for some constant , then . This implies that . Second, that in the multitype branching process an individual who carries the gene A and belongs to a group with altruists produces an expected number of offspring , so that we can also write . Comparison of these expressions yields (10).
We combine (7) with (10) to write the necessary and sufficient condition for viability of the gene A as
(11) 
Since is the only lefteigenvector of the maximal eigenvalue , (9) is telling us that when the altruistic gene survives, it tends to organize itself (or we can also say “nature organizes it, through natural selection”) in the stable way that maximizes its average fitness. This observation also makes the condition of survival (11) look particularly natural. If , there is no stable way for the gene A to be organized so that it will have an average fitness that is larger than that of the wild type N in the population at large, where A is still rare; A will then not be viable. On the other hand, when , the gene A can be organized according to , that is stable, and provides it with mean fitness larger than 1, as needed for it to spread among the wild type N.
It is enlightening to see what happens when the inequality in (11) fails. Even in this case, chance may produce at a time during the V.E.S. the arrangement , meaning that there are exactly altruists, all in the same group. Condition (C2) tells us that at time the average fitness of the gene A is larger than 1. And indeed, , so that the altruistic gene is spreading at this time. But this arrangement is not stable. In successive generations, the distribution of is driven to a combination of lefteigenvalues of , and can grow then at most at rate , so that eventually it is eliminated.
In contrast, when (11) holds, chance will dictate if during the V.E.S. an arrangement of copies of A will form that not only provides that gene with mean fitness larger than 1, but is also likely to produce a succession of arrangements in the next generations, all with this property. The important point is that, under (11), such arrangements do exist, and drive the evolution towards .
The contrast in the last two paragraphs is one of the main lessons from our analysis. This lesson goes beyond the specific aspects of the stylization that we are adopting here. We suggest making this idea explicit as a guiding criterion, that should be of use when considering any framework, model or experimental situation.
Viability or Survivability Criterion: In a large population, a single mutant gene A, in the absence of further mutations, will be viable, i.e., will have a positive probability of surviving and spreading, if and only if this gene can produce in a few generations an arrangement of its copies in a number that is still small compared to the size of the population, but is likely to produce in the next generations a sequence of arrangements with a growing number of its copies, until it accounts for a nonnegligible fraction of the alleles in the population.
We will refer to such a sequence of arrangements as a survival mechanism for the mutant gene A, so that the criterion stated above postulates the existence of a survival mechanism as a necessary and sufficient condition for the viability of the mutant gene A. Such a mechanism can, for instance, be started by an arrangement of copies of the gene A that satisfied the three conditions below:

(i) When in this arrangement the average fitness of the mutant gene is larger than that of the wild type in the population at large, before the mutant appeared.

(ii) This arrangement is likely to produce in the next generation another arrangement with the same property (i) above.

(iii) Due to the growth in the number of copies of the gene, the probability of success in step (ii) above increases from generation to generation, fast enough to assure that the probability of producing the sequence of arrangements mentioned in the criterion is large.
Indeed, in our framework, once an arrangement of copies of A is produced with distribution in groups close to , conditions (i), (ii), (iii) are fulfilled, provided that . On the other hand, when , no arrangement exists that satisfies these three conditions.
We end this section with some observations on generalizations of our methods. One can modify the intergroup and the intragroup competition procedures, from the FisherWright ones that we consider here, and in this way extend our framework further. For instance, modifications to the intragroup selection procedure can be fairly general, and would only require a modification of the matrix . Instances of such a modification could include domination patterns in the intragroup reproduction mechanism, that result in reproductive skew within the group. In an extreme case, a single member of the group, chosen at random with probabilities proportional to individual fitness of the group members, could mother all the offspring of an offspring group.
Modifications of the intergroup competition mechanism are even simpler to consider. Note that we did not use in our analysis the full power of the assumption that this mechanism is a FisherWright procedure. We only assumed that if altruists are rare, then groups with altruists father each in the next generation an almost independent random number of groups, with mean proportional to group fitness (defined as average fitness of group members). Under these broad assumptions our methods and results above, and in the remainder of this paper, are unchanged. We chose to introduce our framework with a FisherWright competition mechanism among groups for concreteness. This choice forces the number of offspring groups of each group to be binomially distributed (well approximated by Poisson, since is large). The observation in the current paragraph is of special relevance then in situations in which the number of groups fathered by each group is better modeled by a distribution that is far from Poisson, as for instance in cases in which their variance is much smaller than their mean (as happens, for example, when is small, the mean is close to 1, and the variance is much smaller than 1, with most groups fathering exactly one group).
2 Models
In this section we will introduce several models and discuss their relevance. Fig. 2 provides an overview of some of their typical features. Fig. 4, Fig. 6 and Fig. 7 provide values of as function of the strength of selection for some of them. Notice, from these figures, that is not always monotone in , but that it often increases substantially when is large. This fact highlights the relevance of studying the models not only when selection is weak. Notice also that in Fig. 6 and Fig. 7, the product can be of the order of . This is relevant in view of the widespread claim that altruism cannot survive when is significantly larger than . As far as we know, this perception resulted from the analysis of particular models (e.g., in [43], [2] and [10]) and an excessive emphasis on the public goods game (Example 1, below). One of the main messages from this section and the following ones will, indeed, be that mechanisms that go beyond the public goods game may be central to the understanding of the spread of altruistic genes, and can be analyzed in our framework with no special difficulty.
Conditions (C1) and (C2) are very mild, and basically characterize the effect of the gene A on its carriers as an individually beneficial social effect which comes with a personal cost to them. One does not have to restrict oneself to behavioral effects of the gene A on its carriers phenotype. For example, another kind of application could include anatomic and physiologic effects that carry a cost, but produce benefits to those with the altered phenotype when in groups with others that share this feature. For instance, gene A could promote changes that facilitate verbal communication with others that have the same changes, but at a cost, say, in adding expensive tissue to the brain. An isolated carrier of gene A would suffer the costs of carrying it, but without the possibility of benefiting from its potential advantages.
In order to assure , in a forthcoming paper ([60]), we will either have to assume that in addition to (C2) holding, is small, or else we will need to add an additional assumption. A sufficient one will be:

(C3) , i.e., , for , so that the average fitness of a group is maximized when the group contains only altruists.
When the gene A affects behavior, what precise conditions on the fitnesses and should be required for this gene to be called “altruistic”? There is no agreement on the answer. The issues are very nicely presented and discussed in [37]. We list next a few of the conditions that can naturally be associated with altruism. These ones, and a few more can be found in [37], where detailed references and credit are given.
Some of the conditions require the altruistic behavior to be beneficial. Typical conditions of this kind are:

(C4) , or equivalently, , is increasing in , so that altruists are always better off sharing their group with more altruists.

(C5) , or equivalently, , is increasing in , so that nonaltruists are always better off sharing their group with more altruists.

(C6) , or equivalently, , is increasing in , so that the members of a group are in the average better off with more altruists in the group. (Note that (C6) is an extension of (C3).)
And complementary conditions require the altruistic behavior to come at a cost to the actor:

(C7) , i.e., , , so that altruists are always worse off than nonaltruists in the same group.

(C8) , i.e., , , so that an individual that suffered a mutation from N to A, would be worse off.
Condition (C8) extends condition (C1). It is known that when it holds, then in the traitgroup framework, , starting from any fraction of genes A in the population, these genes will be eliminated. ([37] attributes this result to [45].) The assortment provided by a low is nevertheless sufficient to allow a single altruistic gene A to invade, if condition (C2) holds.
We will illustrate the use of our framework with several examples (see Fig.2):
Example 1. Public goods game:
for positive constants and . One can think that at a cost to itself, each altruist provides a benefit to each one of the other members of its group. Alternatively, one can think that at a cost to itself, each altruist provides a benefit to each member of its group, itself included. Set then , for the net cost to the altruist, and , for the total benefit to the other members of the group. Each one of these two descriptions is common in nature, has its theoretical advantages and both appear often in the literature (see [55] for more on this point). The former description is often referred to as “otheronly” trait, and the latter one is then referred to as “wholegroup” trait. Their mathematical equivalence illustrates something that presents itself a number of times. Two models may be different in relevant biologic aspects, but lead to the same functions and , possibly after some change of variables, as above. In this case, we will say that the models are materially different, but formally equivalent.
There is a second way in which the present example splits into two materially different, but formally equivalent descriptions. On one hand, altruists could be performing individual actions, producing identical benefits to all the other (or to all, self included) members of the group. In some applications this may be a good description of what is happening. For instance, altruists could be individuals with a hygiene habit that is beneficial to the group, in preventing disease, but costly to the actor. Alarm calls are another example.
On the other hand, the fitness functions in this example can also accommodate collective actions, in which altruists act together, to produce a common good for the group. Fighting in a war against another group, or participating in collective hunting activities (with the product of the hunt shared among all members of the group) would be examples of this kind.
In all cases, the assumptions that the total benefit produced, , grows linearly with the number of actors, and the net cost to each actor, , is constant, may be unrealistic. We will explore these points in Example 5, below.
Note that . Condition (C1) holds since . We suppose that , so that (C2) holds, since . Note that then also (C3)(C8) all hold.
The public goods game has rightfully been called “the mother of all cooperative models” (see footnote 1 in [70]). It is natural to study its behavior, and to understand how the gene A can spread in this case. In the next examples we will nevertheless try to convey the message that one should aim at developing methods, as we do here, that can address more general models, as well. In the spirit of that metaphor, it is natural to see the next example as “a special daughter of the public goods game. It derives from the public goods game in the same way that (in a two player setting) the iterated prisoner dilemma and the titfortat strategy derive from a one shot prisoner dilemma game and a simple cooperative strategy.
Example 2. Iterated public goods game. Altruists cooperate conditionally:
for positive constants and , and . Here we suppose that a public goods game is repeated a random number of times , with average . Each time each member of the group can cooperate at a cost to itself, resulting in a benefit to each one of the other members of its group. Defectors incur no costs and produce no benefits. We suppose that altruists cooperate in the first round, and afterwards only cooperate if at least other members of the group cooperated in the previous round. This is a generalization of the well known titfortat strategy, which corresponds to the case , . We will refer to the strategy of the altruists in this example then as “manyindividualstitfortat (with threshold )”.
Note that when , regardless of the value of , this example is identical to Example 1. Note also that if , and if . Again, we suppose that , and so both, (C1) and (C2) hold, since and . It is also easy to see that then (C3), (C5), (C6) and (C7) hold.
Condition (C4) will only hold under additional assumptions. A very natural one, that we will assume, unless stated otherwise, is that the threshold satisfies
(12) 
i.e., when altruists keep playing the game, it is never in their disadvantage to do so.
It is instructive to look into what happens with (C8) in detail: , if ; if ; but in the case , . Therefore, if (12) holds as a strict inequality, then , for large , and (C8) fails. But if (12) fails, or holds as an equality, then (C8) holds, for arbitrary . Note that if (12) holds as an equality (which can only occur if is an integer), then all the conditions (C1)(C8) are satisfied.
This model was studied independently in [7] and in [32], in the traitgroup framework. Both papers identified stable equilibria with positive fractions of altruists (a phenomenon that can occur only when (C8) fails). But they also observed that altruists could not invade when rare (a phenomenon that always holds under (C1)). In [7] an approach was then introduced to provide assortment and allow the gene A to invade when rare. The authors concluded that such invasion by gene A could only occur under very restrictive conditions, and that therefore this model and the corresponding notion of manyindividualstitfortat were of marginal relevance. One of our contributions in the current paper is to rectify this perception. In our framework, we obtain values on large enough to indicate that this model should be seriously considered as a possible mechanism for the spread of altruistic genes (see Fig. 6 and Fig 15). Indeed we will see that the estimates in [7] contained an unreasonably pessimistic assumption, that is not supported in our framework (see last paragraph in Section 5).
From a theoretical point of view, this model is still mathematically simple enough to lead to an interesting detailed analysis of the conditions under which the gene A can spread, in case selection is weak and the group size is large, with the threshold proportional to . This analysis (illustrated in Fig. 16, Fig. 17, Fig. 18, Fig. 19 and Fig. 20) will be a good illustration of the simplifications that will be obtained, in Section 5, in that regime.
We see this example as the prototype for an important class of models, that we make explicit in Example 6 below, and that we believe should be seriously considered and studied.
Example 3. Threshold model:
for positive constants , and , and an integer . The idea here is simple: the gene A carries a cost, but allows its carriers to gain benefits if sufficiently many are in the group. Nonaltruists obtain benefits also when altruists do, but we allowed for the possibility that those are smaller or larger than those of the altruists.
This model may be seen as a simplification of Example 2. It shares with it the features that when few altruists are present, they incur costs, but when in numbers larger than a threshold, have positive payoffs that may be much larger than those costs. The fitnesses in this model are sufficiently simpler than those in Example 2, to allow for a more transparent analysis of its behavior. We will see that this model is of great value when we discuss conceptual issues, including Hamilton’s rule. In [70], the case of this model (called there “stag hunt game”) was discussed in connection to the conceptual issue of the role of Hamilton’s rule. This raised a debate in [44] and [71] and further analysis in [20]. We will comment on this at the end of Section 3.
Example 3 is also of great value for comparison purposes, providing meaningful bounds on the behavior of more elaborate and realistic models. For instance, the very elaborate fitness functions studied in [6] are well approximated by those in Example 3. We are currently reanalyzing the work and ideas from [6] in the context of our framework, and taking advantage of this relationship in that project ([61]).
This model can also be seen as a simple instance of another natural class of models that we introduce below, in Example 5.
If , then either (C1) or (C2) is violated, since then . So we suppose that . Under this assumption, (C1) is immediate from , and we suppose that , so that (C2) is also satisfied. Conditions (C4) and (C5) are clearly satisfied then. Condition (C7) will be satisfied in case . We have , if , and , if . So (C3) (and therefore also (C6)) may not be satisfied. (C3) holds, nevertheless, if . (But even under this assumption, (C6) fails, unless .) As for (C8), it fails, regardless of the value of , in the same fashion that it failed (in general) in Example 2, since .
Numerical results for Example 3 appear in Fig. 7, Fig. 8 and Fig. 14). As with Example 2, Example 3 also nice illustrates the simplifications that will be obtained, in Section 5, when is small and is large (see (56) and (57)).
Example 4. Additive pairwise interactions (general linear fitness functions):
Here we suppose that members of the group interact in a pairwise manner throughout their lives. Each such pair interaction contributes a certain amount to the total payoff of each one of the two individuals. The contribution from each pairwise interaction to each one of the two participants depends only on their types. The payoff to a type interacting with a type will be denoted , = A,N. In addition, each individual has a self contribution, to its payoff that depends only on its type = A,N. These contributions are added to produce the final payoff. The result is displayed above, and then rewritten in terms of , , , where we incorporated the assumption that
This assumption carries no loss of generality, since a constant can be added to all the payoffs and without modifying the behavior of our process. (The behavior of the process is clearly not modified by multiplying all fitnesses, , , by the same constant. If we add a constant to all the payoff functions , , the new fitnesses are equivalent in this sense to the old fitnesses with replaced by .) This condition amounts simply to our convention that . Expressing the fitness functions in terms of , and , makes their relationship with Example 1 and Example 5 below easy to see. Finally the result is also rewritten in an equivalent form that emphasizes the nature of the dependence of the fitnesses on : they are linear functions. Here , and .
An important point to make is that given linear fitness functions and , with , as above, they can always be represented in the other two ways, with an appropriate choice of the constants. For instance, we can take , and , and then take , , , .
When , the current example is the most general possible choice of the payoff functions and . But this is obviously not the case when . For each value of , the most general form of the payoff functions are polynomials of degree .
The linearity of the functions and will play important roles in relating our results to other concepts, especially Hamilton’s rule. This is one of the reasons this is a major class of models. The mathematical equivalence between these linearities and having pairwise additive interactions should not confuse one into thinking that the linearities imply that the fitnesses must indeed have originated from that special kind of interaction. In Example 1, the members of a group may be interacting in a collective way (hunting together, warfare, etc). All that the mathematical equivalence says is that the fitnesses obtained there, are the same ones of a, ficticious in this case, pairwise interaction scenario. This is a good illustration of two formally equivalent, but materially different models.
There are realistic stories that are conceptually associated to the public goods game, Example 1, but lead to the more general payoff in the current example, with . For instance, the altruistic activity could be hunting more dangerous but also more nutritional prey. If the products of the hunts are always shared by the group, we have the model in Example 1. But if the hunters are able to consume the best part of the hunt, before sharing the rest with the group, we would have .
While materially different from the public goods game, Example 1, the current example is formally equivalent to it when the following equivalent conditions hold:
(13) 
Condition (13) is know as “equal gains from switching”, since the payoffs in the matrix , A, N, change by the same amounts if one switches strategies, regardless of what the other player is doing. Unfortunately the terminology “additivity condition” or “linearity condition” is also used in the literature for (13). This is confusing, since in our context, additivity refers to the fact that the payoffs and are obtained additively over the pairwise interactions that each individual has with the other members of its group. This additivity has no relationship with (13). And in our context, linearity, refers to the linearity of and as functions of . As we explained above, in the mathematically standard way in which we are using the terms pairwise additivity and linearity, they are equivalent to each other, and logically independent of (13).
Under condition (13), it is common to use the representation , , , . When , this is a classical prisoner’s dilemma. It corresponds to , .
The matrix represents the lifetime payoff for each pairwise interaction. This lifetime payoff may result from the accumulation of payoffs from iterated games. In this way we can see that the setup in this example is flexible enough to accommodate a gene A that produces a conditional behavior over such iterated games, like, for instance a titfortat strategy. For this, suppose that each pair of individuals interact repeatedly with payoffs given by the standard prisoner’s dilemma matrix. If type N always defects, and type A uses a titfortat strategy, we have , , , , where is the average number of repetitions of the basic interaction over a lifetime. If , (13) fails, and we have
(14) 
It is common to write and call it a “synergy” term. It represents an additional benefit (possibly negative) to altruists when interacting with other altruists. In the iterated pairwise prisoner dilemma game with A playing titfortat, (14), we have .
Deciding when each one of the conditions (C1)(C8) holds in the current example is tedious and not so relevant. We observe only a few facts. Assuming s (C1), (C2) and (C4), that only depend on . If , then also (C5) holds. The other conditions depend on how relates to and . We just make the simple remark that if and is close enough to , then all the conditions (C1)(C8) hold, since they hold with slack when (Example 1).
In Example 1, we observed that the assumed linearity of and there may not be realistic. The same observation holds about pairwise additivity of interactions. Many interactions in a group are between pairs, but it is not always clear that their effects on fitness should be additive. When an individual interacts repeatedly with another member of the group, like in the story that lead to (14), it may not be able to interact as often with the other members of the group. Also, the beneficial effects of the pairwise interactions may saturate, and be subadditive, rather than additive. Additivity/linearity is mathematically a natural first level simplification/approximation. But one should be aware of its limitations. With this in mind, we turn to the next example.
Example 5. Variable costs and benefits:
Remarks in Example 1 and 4, above, motivate this class of of examples. Without further assumptions on the costs and benefits functions, , and , any model can be fit into this form. So that what we are proposing here is first a convenient notation for comparing models. Next we discuss some interesting assumptions on the costs and benefits functions.
It is very natural, in various applications, to assume that is nonincreasing and and are nondecreasing. If the gene A prompts its carriers to act in some collective way, it is often the case that the cost to each participant decreases with the number of participants, while the total benefits produced grow faster than linearly with the number of participants. This is called an increasing return to scale. Reasonable assumptions can be that decrease as a power of , , for some constant , while and , with , , , .
Another distinct assumption on the benefit functions is that and first grow slowly with , then steeply (close to a threshold value of ) and then more slowly again, as the gains from scale saturate. For instance this will happen if , , with positive constants , , , .
An interesting class of models covered by the current example is the object of [29]. Experimental results with microbes often indicate the need to consider nonlinear payoff functions, as those discussed in the current example; see, for instance [8] and [63].
In case of a collective action that requires a minimum number of participants, we should have , for . And for larger values of , and should grow, but again typically not linearly. For instance, gene A could promote a behavior that can only be implemented in groups of at least 4 individuals, say. This could be a type of large game hunt, that requires 4 hunters. Gene A causes changes to the individual’s phenotype that make this kind of hunt possible, but at the expense of adding expensive muscular and/or brain tissue. Types N just hunt individually small game. We suppose that the hunters share their product with the group, and that large game produces greater benefits per person in the group, than small game hunt. How will and grow when ? The answer will depend on ecological conditions, and detailed aspects of the hunting technique. Can several different groups hunt simultaneously? Would the hunt be more efficient with 7 hunters than with 4? If most group members are hunting large game, would the productivity of small game hunt increase so as to make it advantageous for the group to combine both types of hunt? And so on.
Example 6. Iterated game. Altruists cooperate conditionally, based on feedback:
Here can be seen as an average number of repetitions of a basic activity. This class of models builds on the models in Example 5 in a fashion that generalizes the way Example 2 was built on Example 1. We are supposing that a certain activity presets itself to the group periodically. The output each time depends on the behavior of the group members, and gene A modifies this behavior. Carriers of gene A behave first in a way that is beneficial to the group. And afterwards, they will or not continue acting in this way, depending on feedback that they receive. For instance, if the activity is a type of collective hunt, they will have feedback as they consume the product of the hunt. In Example 2, the feedback was a count of the number of participants, and this is also a possibility, but not the only one. In Example 2, was a step function, jumping from to , at . But it seems also natural to consider smoother functions , that increase first slowly, then steeply, and then saturate. This could result from the fact that the feedback, from each repetition of the activity, is subject to random noise, and only gives clear cues to the altruists, outside of a critical window of values of . For values of in the critical window, altruists may repeat the activity a few times, before deciding to stop participating.
Models represented in Example 5, with nonincreasing and nondecreasing , have a natural threshold value of , where their payoff to altruists becomes positive. In Example 1, this threshold value , corresponds to the condition (12) that appears then in Example 2. If the feedback that affects the willingness of altruist to continue participating in the activities promotes a function that (as in Example 2, with assumption (12)) starts growing above that threshold, the resulting behavior becomes more adaptive than it would be without feedback effect.
Mathematically the models in the current example can be incorporated in Example 5, by modifying the definition of , and . But the point of the current example is to provide justification for, and a mechanism behind, a class of models in which as increases, switches from small negative values to much larger positive values after a threshold value of is crossed. These models can be approximated by and/or compared to the simple case given in Example 3. It then becomes natural to analyze that model with ratios which can be as large as 100, or 1000. For instance, suppose the species under consideration to be early humans, with an adult reproductive lifespan of over 20 years. If the activity in consideration is repeated with a frequency of 50 per year, and if with negative feedback altruists stop participating after about 10 repetitions, then we can consider a factor of between with large and with small .
Obviously the perceived feedback cannot be the payoff in the model, which is related to expected number of offspring in the future. But it is sufficient that the feedback be strongly correlated with this payoff. This is not a special problem about altruism, when behavior is mediated by feedback, natural selection will align reaction to feedback with fitness.
The combination of increasing returns to scale, with the possibility of only pursuing the behavior when it is advantageous to the actors, due to a high level of collective participation, may have been a powerful set of mechanisms that led to the evolution of altruistic and cooperative social traits. This idea has, for instance, been explored in [6], where a model of that nature was analyzed in connection with the trait of punishing, at a cost, those that do not cooperate with the group. In that model, types A produce first a costly signal announcing that they are willing to participate in costly punishment. But they only implement the punishment if a sufficiently large number of group members signaled their willingness of doing it too.
Another story that fits with increased returns to scale, modified by feedback induced discontinuation, can be suggested for the emergence of “compassionate feelings” for group members. A gene that promotes those emotions, would cause their carriers to help fellow group members in need, at a cost to themselves. While there are few group members carrying this gene, they will not be able to do much for all those that may need help, in a large group. Feedback, in the form of frustration for not being able to help as they want, may lead them to discontinue their helping activities, in a short while. But with enough carriers of that gene in a group, they become able to successfully provide help to all that need it. Under this condition they pursue this activity, throught their lives. At the same time, under this conditions, the altruists themselves have their fitnesses increased substantially, from living in a helping environment. This suggests that this story may be represented by a payoff function of the type that we are introducing in the current example, with that can reasonably be taken to vary by a factor of 100 or more, as a threshold window is crossed.
3 Conceptual discussion
We start next on a long conceptual discussion of our results, and how they may help clarify some controversial issues related to the emergence of altruism through natural selection.
First we consider how our results and the underlying mechanisms revealed by them relate to “group selection”, “multilevel selection” and “kin selection”. This is not the place, and neither do we have the expertise to discuss in detail the various nuances of the semantics involved in these questions. But a few words are in order, and should be of value to the readers.
The use of the expression “group selection” has changed over the years, and is still somewhat controversial. Nevertheless it seems to us that under any reasonable use of that expression, group selection is an important force operating in our setting. In our framework, individuals belong to groups, groups compete among themselves, and in this way the fitness of individuals depends on the constitution of their groups. In our typical examples, the fitness of each altruist is strictly smaller than the fitness of the nonaltruists in its group (Condition (C7)). It is only through the higher fitness of groups with many altruists, that the altruistic gene A is then able to survive and spread. Individuals also compete for reproduction with other members of their own group, so that in our setting group selection is one of the components of a “multilevel selection” process.
Is the mechanism of kin selection present when the mutant A spreads? We understand kin selection as a process in which copies of a gene, originating from a recent common ancestor, interact with each other providing themselves with an average fitness large enough for this gene to survive and spread. This is precisely what happens in our case, and is also what we tried to capture in more detail in the viability criterion and associated survival mechanism presented at the end of Section 1. In other cases, the assortment and organization of the genes could be caused by kin recognition. In our case it is caused by viscosity, that results from the group structure of the population and limited migration. In a model in which an isolated mutant has relative fitness lower than the wild type (condition (C1) in our case), its only hope for survival is in the creation by chance of a few of its copies, that happen to be so arranged that they have higher average fitness than the wild type in the population at large, and for the structure of the population to be such that this gene can then spread by what we called survival mechanism.
In this connection, condition (11) can be seen as providing a “gene’s eye view” of viability in our setting. As noted above, the left hand side in this condition is simply the average fitness of the gene, when it is arranged in the best possible stable way, to assure its spreading. It is common to refer to the average fitness of a gene A as its neighbor modulated fitness. Under certain conditions it is known that the neighbor modulated fitness can be computed also as an “inclusive fitness”, in which one adds the effects of a randomly selected gene A from the population on the other genes A. At this point, we are not sure how far this method can be extended. (See [24] for a general investigation of this question.) We will address later in this paper the issue of the validity of the related Hamilton rule in our setting, but we will defer the answer to the question of when the neighbor modulated fitness of the gene A can also be computed as an inclusive fitness to a later investigation.
It is worth also clarifying that in our view multilevel selection and kin selection are not the same concept, even if both are central to our study. We see multilevel selection as a process in which the demographics and/or the biology (including behavior) associates individuals to groups in such a way that the reproductive success of each individual depends on the composition of its group. Kin selection can happen, as in our framework, in a multilevel selection setting. But it can also happen in populations that are organized in other ways, in which individuals are not sorted into groups. Moreover, in our framework, in the late stage, when types A exist in numbers comparable with , multilevel selection will continue to be a basic driving force acting on the population. But the average fitness of types A will no longer result only from their interaction with other types A that are close kin. Whether one should still refer to kin selection as an important force then is an issue that we postpone to the paper in preparation in which we study the late stage ([60]).
In the way that we use the expressions “multilevel selection” and “kin selection” in the discussion above, they are qualitative concepts, rather than computational or accounting procedures. These concepts are nevertheless sometimes associated to certain computational procedures: Multilevel selection is sometimes associated to the Price equation, and kin selection is often associated to either neighbor modulated fitness, or inclusive fitness computations. But specially in an area in which semantic issues are a source of difficulties, one should carefully separate concepts from computational procedures. As we explained above, while the concept of neighbor modulated fitness fits easily into our framework, it is currently not clear to us that the concept of inclusive fitness could fit as well. The Price equation obviously applies in our framework, since it requires minimal conditions, and is a natural, mathematically rigorous, tool to consider when studying groupstructured populations. We will elaborate below on what it adds to the solution of the specific problems that we are addressing in this paper, and why we did not use it in our analysis in Section 1.
To decide on the viability of the mutant A in our framework, we see no simpler mathematical method (when selection is strong – we will study the important simplifications in case of weak selection later in this paper) than computing or . This does not mean that alternative or complementary methods cannot add insights, intuition and relevant information. Moreover, it is important to understand how different approaches relate to each other, and computational parameters relate to experimentally accessible variables. With this in mind, we turn now to a discussion of how our mathematical approach compares to the use of the Price equation, and to the role of Hamilton’s rule in our setting.
Before proceeding we need to introduce some more notation:
This means that is the distribution of the various types of groups, including groups of type 0, in generation , and is the fraction of altruists in the population then.
The Price equation provides the expected value of , when is given. It can be stated in several mathematically equivalent versions. In our setting the simplest one is
(15) 
where
are, respectively, the average fitness of the individuals in the population and the average fitness of the altruists in the population, when the distribution of the group types is given by . Equation (15) is an immediate consequence of the fact that each individual has an expected number of offspring proportional to its relative fitness. By adding and subtracting terms, it can be rewritten as
(16) 
where is a random variable with . In the E.S., since the fraction of altruists is negligible in the large population, we have in good approximation , reducing (15) to
(17) 
Can (15) or the equivalent (16), or the simplified (17) be used to predict when altruism can survive in our setting? The answer is negative. In generation we have , and the average fitness of A, which is simply the fitness of the only A present, is , under condition (C1). Therefore (17) leads to for any value of , as we already new. If one wants to iterate (17) over time, in order to learn under what conditions eventually increases and does not vanish, one needs to be able to compute , , … etc. The well known problem is that (17) does not provide information about given . Actually (17) carries no information at all about and how the groups are formed in generation .
The Price equation in its various forms is a useful tool for many purposes. When the right hand side is split, as in (16), into two terms that correspond respectively to intergroup and intragroup competition, it carries great heuristic power, and beauty. We hope that, nevertheless, our current study may help clarify some of its limitations in the analysis of evolution in group structured populations. It is interesting to look into what (17) tells us in combination with what we already know about the evolution of our population. When the gene A survives, in the S.E.S., (9) implies that , where is time dependent and random, but onedimensional. Therefore , thanks to (10). We obtain now from (17)
This is compatible with the growth rate given by (6) and (9), and is not new information to us, but the consistency is reassuring. When , first decreases, but then eventually increases, reaching growth rate in the S.E.S.. As explained above, in this case either the gene A dies out early on, or else, natural selection organizes it in groups according to the distribution , that is stationary for the evolution driven by and maximizes the rate of growth of among all such stationary distributions. The power of (5) over (17), is that it provides the evolution of the whole distribution over group types, not just the fraction of altruists in the population.
Next we address the question of whether Hamilton’s rule applies in our setting, to provide the condition under which altruism, and other genetically determined behaviors that are costly to the actor, can spread. The answer here greatly depends on what one means by “Hamilton’s rule”. In its broadest sense, it is natural to use this title for any inequality that is a necessary and sufficient condition for altruism to have a positive probability of spreading, in other words, for any “viability condition”. With this interpretation, (11) is a generalized Hamilton rule that is universal in our framework, when started with a single altruist. In this paper we will use the terminology “generalized Hamilton rule”, or “generalized Hamilton condition”, and “viability condition” with equivalent meanings (some may prefer “survivability condition”). We hope that rather than creating confusion, this usage will shift the discussion from a semantic matter into questions with scientific content: How does the viability condition (11) simplify in special cases? Can it be formulated in terms of concepts of relatedness, costs and benefits? Can it be formulated in terms of experimentally meaningful variables? Can it be written in ways that help compare different models?
We should clarify that here we are considering the question of when the altruistic gene A is viable, starting from a single copy of it. Conditions for selection to favor gene A when the process is started with a number of copies of A comparable to are a different problem, that we will address when studying the evolution in our framework in the late stages ([60]). Here we will only address this further question in the special important case of pairwise additive interactions, Example 4, so as to illustrate how things can change when the gene A becomes common in the population.
Before we can analyze how (11) compares in spirit and content with more traditional forms of Hamilton’s rule, we need to introduce and review several additional concepts.
Suppose that in generation , a group is chosen at random, and its members are ordered in a random fashion. The chosen group is called the focal group, the first individual in the ordering is called the focal individual and the second individual in the ordering is called the cofocal individual. Note that this random experiment is equivalent to choosing a focal individual at random and then choosing a cofocal individual at random from the other individuals in the focal’s group. We will denote by probabilities that refer to the sampling just described, and use for corresponding expected values. We denote by the event that the j individual in the random ordering of the members of the focal group is a type A. And we denote by the indicator of the event , i.e., is the random variable that takes value 1 if occurs and value 0 if its complement, , occurs. We denote by the random number of types A in the focal group. And we set , for the random fraction of altruists in the focal group.
Clearly does not depend on and is the fraction of altruists in generation . Linearity of expectations yields the following relationships, that will be of great use:
(18) 
(19) 
(20) 
For in the S.E.S., we have the following fundamental relationship
(21) 
for , where the second equality introduces a new notation. To see this, first note that conditioning on implies that the gene A has survived into the S.E.S., and therefore (9) holds. The sampling, at time , is therefore of a population that has mostly groups with no altruists, but has also a large number (of order ) of groups with altruists, distributed according to . If the conditioning was on , rather than on , the conditional probability would be . We would be just sampling unbiasedly from the groups with altruists. But conditioning on introduces size bias: For ,