Host-pathogen coevolution and the emergence ofbroadly neutralizing antibodies in chronic infections

Host-pathogen coevolution and the emergence of
broadly neutralizing antibodies in chronic infections

Armita Nourmohammad, Jakub Otwinowski, Joshua B Plotkin Joseph-Henri Laboratories of Physics and Lewis-Sigler Institute for Integrative Genomics,
Princeton University, Princeton, NJ 08544, USA
Department of Biology, University of Pennsylvania, Philadelphia, PA 19104

The vertebrate adaptive immune system provides a flexible and diverse set of molecules to neutralize pathogens. Yet, viruses such as HIV can cause chronic infections by evolving as quickly as the adaptive immune system, forming an evolutionary arms race. Here we introduce a mathematical framework to study the coevolutionary dynamics of antibodies with antigens within a host. We focus on changes in the binding interactions between the antibody and antigen populations, which result from the underlying stochastic evolution of genotype frequencies driven by mutation, selection, and drift. We identify the critical viral and immune parameters that determine the distribution of antibody-antigen binding affinities. We also identify definitive signatures of coevolution that measure the reciprocal response between antibodies and viruses, and we introduce experimentally measurable quantities that quantify the extent of adaptation during continual coevolution of the two opposing populations. Using this analytical framework, we infer rates of viral and immune adaptation based on time-shifted neutralization assays in two HIV-infected patients. Finally, we analyze competition between clonal lineages of antibodies and characterize the fate of a given lineage in terms of the state of the antibody and viral populations. In particular, we derive the conditions that favor the emergence of broadly neutralizing antibodies, which may be useful in designing a vaccine against HIV.


It takes decades for humans to reproduce, but our pathogens can reproduce in less than a day. How can we coexist with pathogens whose potential to evolve is -times faster than our own? In vertebrates, the answer lies in their adaptive immune system, which uses recombination, mutation, and selection to evolve a response on the same time-scale at which pathogens themselves evolve.

One of the central actors in the adaptive immune system are B-cells, which recognize pathogens using highly diverse membrane-bound receptors. Naive B-cells are created by processes which generate extensive genetic diversity in their receptors via recombination, insertions and deletions, and hypermutations Janeway et al. (2005) which can potentially produce variants in a human repertoire Elhanati and et al. (2015). This estimate of potential lymphocyte diversity outnumbers the total population size of B-cells in humans, i.e.,  Trepel (1974); Glanville and et al. (2009). During an infection, B-cells aggregate to form germinal centers, where they hypermutate at a rate of about per base pair per cell division over a region of 1-2 kilo base pairs Odegard and Schatz (2006). The B-cell hypermutation rate is approximately orders of magnitude larger than an average germline mutation rate per cell division in humans Campbell and Eichler (2013). Mutated B-cells compete for survival and proliferation signals from helper T-cells, based on the B-cell receptor’s binding to antigens. This form of natural selection is known as affinity maturation, and it can increase binding affinities up to 10-100 fold  Meyer-Hermann and et al. (2012); Victora and Nussenzweig (2012); Cobey et al. (2015), see Fig. 1A. B-cells with high binding affinity may leave germinal centers to become antibody secreting plasma cells, or dormant memory cells that can be reactivated quickly upon future infections Janeway et al. (2005). Secreted antibodies, which are the soluble form of B-cell receptors, can bind directly to pathogens to mark them for neutralization by other parts of the immune system. Plasma B-cells may recirculate to other germinal centers and undergo further hypermutation Victora and Nussenzweig (2012).

Some viruses, such as seasonal influenza viruses, evolve quickly at the population level, but the adaptive immune system can nonetheless remove them from any given host within a week or two. By contrast, chronic infections can last for decades within an individual, either by pathogen dormancy or by pathogens avoiding neutralization by evolving as rapidly as B-cell populations. HIV mutation rates, for example, can be as high as per generation per genome Duffy et al. (2008). Neutralizing assays and phylogenetic analyses suggest an evolutionary arms race between B-cells and HIV populations during infection in a single patient Richman et al. (2003); Frost and et. al. (2005); Moore and et al. (2009); Liao and et al. (2013); Luo and Perelson (2015a). Viruses such as HIV have evolved to keep the sensitive regions of their structure inaccessible by the immune system e.g., through glycan restriction or immuno-dominant variable loops Kwong and et al. (2002); Lyumkis and et al. (2013). As a result, the majority of selected antibodies bind to the most easily accessible regions of the virus, where viruses can tolerate mutations and thereby escape immune challenge. Nonetheless, a remarkably large proportion of HIV patients () eventually produce antibodies that neutralize a broad panel of virions Simek and et al. (2009); Doria-Rose and et al. (2010) by attacking structurally conserved regions, such as the CD4 binding site of HIV env protein Walker and et al. (2009); Chen and et al. (2009); Zhou and et al. (2010); Walker and et al. (2011); Liao and et al. (2013). These broadly neutralizing antibodies (BnAbs), can even neutralize HIV viruses from other clades, suggesting it may be possible to design an effective HIV vaccine if we can understand the conditions under which BnAbs arise Walker and et al. (2009, 2011); Mouquet and Nussenzweig (2013); Kwong et al. (2013); Klein and et al. (2013); Liao and et al. (2013); Wang and et al. (2015).

Recent studies have focused on mechanistic modeling of germinal centers in response to one or several antigens Meyer-Hermann and et al. (2012); Childs et al. (2015), and elicitation of BnAbs Chaudhury et al. (2014); Wang and et al. (2015). However, these studies did not model the coevolution of the virus and B-cell repertoire, which is important to understand how BnAbs arise in vivo. Modeling of such coevolution is difficult because the mechanistic details of germinal center activity are largely unknown Luo and Perelson (2015a, b), and the multitude of parameters make it difficult to identify generalizable aspects of a model. While evidence of viral escape mutations and B-cell adaptation has been observed experimentally Richman et al. (2003); Frost and et. al. (2005); Moore and et al. (2009); Liao and et al. (2013) and modeled mechanistically Chaudhury et al. (2014); Wang and et al. (2015), it is not clear what are the generic features and relevant parameters in an evolutionary arms race that permit the development, or, especially, the early development of BnAbs. Phenomenological models ignore many details of affinity maturation and heterogeneity in the structure of germinal centers and yet produce useful qualitative predictions Perelson (2002); Luo and Perelson (2015a, b). Past models typically described only a few viral types Wang and et al. (2015); Childs et al. (2015), and did not account for the vast genetic diversity and turnover seen in infecting populations. A recent study by Luo & Perelson Luo and Perelson (2015b) described diverse viral and antibody populations, relying primarily on numerical simulations.

In this paper, we take a phenomenological approach to model the within-host coevolution of diverse populations of B-cells and chronic viruses. We focus on the chronic infection phase, where the immune response is dominated by HIV-specific antibody-mediated mechanisms, which follow the strong response by the cytotoxic T-lymphocytes (i.e., CD8 killers T-cells), around 50 days after infection McMichael and et. al. (2009). During the chronic phase, population sizes of viruses and lymphocytes are relatively constant but their genetic compositions undergo rapid turnover Shankarappa and et al. (1999). We characterize the interacting sites of B-cell receptors and viruses as mutable binary strings, with binding affinity, and therefore selection, defined by matching bits. We keep track of both variable regions in the viral genome and conserved regions, asking specifically when B-cell receptors will evolve to bind to the conserved region, i.e., to develop broad neutralization capacity. The main simplification that makes our analysis tractable is that we focus on the evolution of a shared interaction phenotype, namely the distribution of binding affinities between viral and receptor populations. Specifically, we model the effects of mutations, selection and reproductive stochasticity on the distribution of binding affinities between the two populations. Projecting from the high-dimensional space of genotypes to lower dimension of binding phenotypes allows for a predictive and analytical description of the coevolutionary process Nourmohammad et al. (2013a), whilst retaining the salient information about the quantities of greatest biological and therapeutic interest.

Using this modeling approach we show that the evolution of the binding affinity does not depend on details of any single-locus contribution, but is an emerging property of all constitutive loci. Even though the coevolution of antibodies and viruses is perpetually out of equilibrium, we develop a framework to quantify the amount of adaptation in each of the two populations by defining fitness and transfer flux, which partition changes in mean fitness. We discuss how to measure the fitness and transfer flux from time-shifted experiments, where viruses are competed against past and future antibodies, and we show how such measurements provide a signature of coevolution. We use these analytical results to interpret empirical measurements of time-shifted neutralization assays from two HIV-infected patients Richman et al. (2003), and we infer two qualitatively different regimes of viral-antibody coevolution. We discuss the consequences of competition between clonal B-cell lineages within and between germinal centers. In particular, we derive analytic expressions for the fixation probability of a newly arisen, broadly neutralizing antibody lineage. We find that BnAbs have an elevated chance of fixation in the presence of a diverse viral population, whereas specific neutralizing antibody lineages do not. We discuss the implications of these results for the design of preventive vaccines that elicit BnAbs against HIV.


Interaction between antibodies and viruses

B-cell receptors undergo mutation and selection in germinal centers, whereas viruses are primarily affected by the receptors secreted into the blood, known as antibodies. Our model does not distinguish between antibodies and B-cells, so we will use the terms interchangeably. To represent genetically diverse populations we define genotypes for antibodies and viruses as binary sequences of , where mutations change the sign of individual loci. Mutations in some regions of a viral genome are highly deleterious, e.g.  at sites that allow the virus to bind target cell receptors, including CD4-binding sites for HIV. To capture this property we explicitly model a conserved region of the viral genome that does not tolerate mutations, so that its bits are always set to . We let viruses have variable bits at positions , and conserved bits at positions ; while antibodies have variable bits at positions ; see Fig. 1B.

Figure 1: coevolution of antibodies and viruses. (A) Schematic of affinity maturation in a germinal center. A naive, germline B-cell receptor (black) with marginal binding affinity for the circulating antigen (red pentagon) enters the process of affinity maturation in a germinal center. Hypermutations produce a diverse set of B-cell receptors (colors), the majority of which do not increase the neutralization efficacy of B-cells, except for some beneficial mutations that increase binding affinity (dark blue and green) to the presented antigen. The selected B-cells may enter the blood and secrete antibodies, or enter further rounds of hypermutations to enhance their neutralization ability. Antigens mutate and are selected (yellow pentagon) based on their ability to escape the current immune challenge. (B) We model the interaction between the genotype of a B-cell receptor and its secreted antibody (blue) with a viral genotype (red) in both variable and conserved regions of the viral genome. The black and white circles indicate the state of the interacting loci with values . Loci in the conserved region of the virus are fixed at . The length of the arrows indicate the contribution of each locus to the binding affinity, , which is a measure of the accessibility of an antibody lineage to viral epitopes. The blue arrows indicate the interactions that increase binding affinity (i.e., loci with same signs in antibody and viral genotype), whereas red arrows indicate interaction that decrease the affinity (i.e., loci with opposite signs in antibody and viral genotype.)

Naive B-cells generate diversity by gene rearrangements (VDJ recombination), which differentiates their ability to bind to different epitopes of the virus; and then B-cells diversify further by somatic hypermutation and selection during affinity maturation. We call the set of B-cells that originate from a common germline sequence a clonal lineage. A lineage with access to conserved regions of the virus can effectively neutralize more viral genotypes, since no escape mutation can counteract this kind of neutralization.

The binding affinity between antibody and virus determines the likelihood of a given antigen neutralization by an antibody, and therefore it is the key molecular phenotype that determines selection on both immune and viral populations. We model the binding affinity as a weighted dot product over all loci, which for antibody chosen from the genotype space and virus with has binding affinity


where, denotes the locus of the antibody genotype, and the locus of the viral genotype. Matching bits at interacting positions enhance binding affinity between an antibody and a virus; see Fig. 1B. Similar models have been used to describe B-cell maturation in germinal centers Wang and et al. (2015), and T-cell selection based on the capability to bind external antigens and avoid self proteins Detours and Perelson (1999, 2000). The conserved region of the virus with is located at positions for all viral sequences. Consequently, the total binding affinity is decomposed into the interaction with the variable region of the virus, and with the conserved region of the virus, . We call the lineage-specific binding constants and the accessibilities, because they characterize the intrinsic sensitivity of an antibody lineage to individual sites in viral epitopes. We begin by analyzing the evolution of a single antibody lineage, and suppress the notation for brevity. Coevolution with multiple antibody lineages is discussed in a later section.

Both antibody and viral populations are highly polymorphic, and therefore contain many unique genotypes. While the binding affinity between a virus and an antibody is constant, given by eq. (1), the frequencies of the antibody and viral genotypes, and , and all quantities derived from them, change over time as the two populations coevolve. To characterize the distribution of binding affinities we define the genotype-specific binding affinities in each population, which are marginalized quantities over the opposing population: for the antibody , and for the virus . We will describe the time evolution of the joint distribution of , , and , by considering three of its moments: (i) the mean binding affinity, which is the same for both populations , (ii) the diversity of binding affinity in the antibodies, and (iii) the diversity of binding affinities in the viruses, . Analogous statistics of binding affinities can be defined for the conserved region of the virus, which we denote by for the mean interaction, and for the diversity across antibodies. The diversity of viral interactions in the conserved region must always equal zero, .

Coevolution of an antibody lineage and viruses

We first characterize the affinity maturation process of a single clonal antibody lineage coevolving with a viral population, which includes hypermutation, selection, and stochasticity due to population size in germinal centers, i.e., genetic drift.

Genetic drift and evolutionary time-scales.

Stochasticity in reproductive success, known as genetic drift, is an important factor that depends on population size, and therefore we model genetic drift by keeping populations at finite size for antibodies, and for viruses. Although the population of B-cells can reach large numbers within an individual host, significant bottlenecks occur in germinal centers, where there may be on the order of B-cells Meyer-Hermann and et al. (2012). For HIV, estimates for intra-patient viral divergence suggests an effective population size of about , which is much smaller than the number of infected cells within a patient  Lemey et al. (2006).

Fluctuations by genetic drift define an important time-scale in the evolution of a polymorphic population: the neutral coalescence time is the characteristic time that two randomly chosen neutral alleles in the population coalesce to their most recent common ancestor, and is equal to generations. Neutral coalescence time is estimated by phylogenetic analysis, and is often interpreted as an effective population size, which may be different from the census population size. Coalescence time can be mapped onto real units of time (e.g., days) if sequences are collected with sufficient time resolution. Without loss of generality, we assume that generation times in antibodies and viruses are equal, but we distinguish between the neutral coalescence time of antibodies and viruses by using distinct values for their population sizes, i.e., in antibodies and in viruses.


In the bi-allelic model outlined in Fig. 1B, a mutation changes the sign of an antibody site, i.e., , affecting binding affinity in proportion to the lineage’s intrinsic accessibility at that site, . Therefore, a mutation in an antibody at position changes by . Likewise, a mutation at position of a virus affects binding affinity in proportion to . We assume constant mutation rates in the variable regions of the viruses and antibodies: and per site per generation.

Empirical estimates of per-generation mutation rates for viruses or hypermutation rates of BCR sequences are extremely imprecise, and so we rescale mutation rates by neutral coalescence times. To do this, we consider measurements of standing neutral sequence diversity, estimated from genetic variation in, e.g., four-fold synonymous sites of protein sequences at each position. Neutral sequence diversity for the antibody variable region, which spans a couple of hundred base pairs, is about  Elhanati and et al. (2015). Nucleotide diversity of HIV increases over time within a patient, and ranges between in the env protein of HIV-1 patients, with a length of about a thousand base pairs Zanini and et al. (2015). Interestingly, the total diversity of the variable region in BCRs is comparable to the diversity of its main target, the env protein, in HIV. Both populations have on the order of 1-10 mutations per genotype per generation, which we use as a guideline for parameterizing simulations of our model.


Frequencies of genotypes change according to their relative growth rate, or fitness. The change in the frequency of antibody with fitness is per generation, where we define (malthusian) fitness as proportional to the growth rate, and denotes the mean fitness of the antibody population (see Section A of the Appendix). Likewise, the change in frequency of virus due to selection per generation is, , where denotes the mean fitness in the viral population.

During affinity maturation in a germinal center, a B-cell’s growth rate depends on its ability to bind to the limited amounts of antigen, and to solicit survival signals from helper T-cells Victora and Nussenzweig (2012). At the same time, viruses are neutralized by antibodies that have high binding affinity. The simplest functional form that approximates this process, and for which we can provide analytical insight, is linear with respect to the binding affinity,


for antibody and virus . The selection coefficient quantifies the strength of selection on the binding affinity of antibodies. The value of may decrease in late stages of a long-term HIV infection, as the host’s T-cell count decays Perelson (2002), but we do not model this behavior. The viral selection coefficient represents immune pressure impeding the growth of the virus. The contribution of the conserved region to the fitness of the virus is independent of the viral genotype in eq. (3), and it does not affect the relative growth rates of the viral strains.

The number of sites and the magnitude of their accessibilities affect the overall strength of selection on binding affinity. Therefore, it is useful to absorb the intrinsic effects of the phenotype magnitude into the selection strength, and use rescaled values that are comparable across lineages of antibodies, and across experiments. We therefore rescale quantities related to the binding affinity by the total scale of the phenotypes and , such that and , resulting in rescaled mean binding affinities and , and diversities , and in variable and conserved regions of both populations. Accordingly, we define rescaled selection coefficients , , and , which describe the total strength of selection on binding affinity; see Section B.1 of the Appendix for details.

Many aspects of affinity maturation are not well known, and so it is worth considering other forms of selection. In Section B.5 of the Appendix we describe fitness as a non-linear function of the binding affinity. In particular, we consider fitness that depends on the antibody activation probability, which is a sigmoid function of the strongest binding affinity among a finite number of interactions with antigens. The linear fitness function in eq. (2) is a limiting case of this more general fitness model. While most of our analytical results are based on the assumption of linear fitness function, we also discuss how to quantify adaptation for arbitrary fitness models, and we numerically study the effect of nonlinearity on the rate of antibody adaptation during affinity maturation.


Evolution of the mean binding affinity

We focus initially on understanding the (rescaled) mean binding affinity , between a clonal antibody lineage and the viral population, since this is a proxy for the overall neutralization ability that is commonly monitored during an infection. Combining genetic drift with mutation and selection, and assuming a continuous-time and continuous-frequency process, results in a stochastic dynamical equation for the evolution of rescaled mean binding affinity in the variable region,


and in the conserved region,


where and are standard Gaussian noise terms, and time is measured in units of the antibody coalescence time . Our analysis neglects the correlation between the variable and the conserved regions of the virus, which is due to physical linkage of the segments. In Section B.4 of the Appendix we show that a difference in evolutionary time-scales between these regions reduces the magnitude of this correlation.

Figure 2: Effect of selection on immune-virus binding affinity. The stationary mean binding affinity, rescaled by antibody binding diversity (), on the y-axis, is well approximated by the scaled selection difference between antibody and viral populations, , as predicted by our analysis (eq. (6)). Points show results of Wright-Fisher simulations, and the solid line has slope 1. Note that the mean binding affinity is insensitive to the details of heterogeneous binding accessibilities, , associated with an antibody lineage. Accessibilities are drawn from several different -distributions, shown in legend. Small deviations from the predicted mean binding are caused by higher moments of binding affinities, which can also be understood analytically (Fig. S1). Simulation parameters are detailed in the Appendix.

As eqs. (4, 5) reflect, mutations drive the mean affinity towards the neutral value, zero, whereas selection pushes it towards positive or negative values. The efficacy of selection on binding affinity is proportional to the binding diversity , in each of the populations. If a population harbors a large diversity of binding affinities then it has more potential for adaptation from the favorable tail of the distribution, which contains the most fit individuals in each generation Fisher (1930a); Price (1970). It follows that selection on viruses does not affect the evolution of their conserved region, where the viral diversity of binding is always zero, . In Section B.3 of the Appendix and Fig. S2 we study the evolution of the higher central moments in detail.

The dynamics in eqs. (4,5) simplify in the regime where selection on individual loci is weak (), but the additive effects of selection on the total binding affinity are substantial (). This evolutionary regime is, in particular, relevant for HIV escape from the humoral neutralizing antibody response Zanini and et al. (2015), that follows the initial strong response to cytotoxic T-lymphocytes Kessinger et al. (2013). In this parameter regime, the binding diversities are fast variables compared to the mean affinity, and can be approximated by their stationary ensemble-averaged values (Fig. S3), which depend only weakly on the strength of selection even for substantial selection : and . Higher-order corrections (Section B.3 of the Appendix and Fig. S2) show that strong selection reduces binding diversity. The ensemble-averaged mean binding affinities relax exponentially towards their stationary values,


where is an effective selection coefficient for binding affinity in the variable region, combining the effect of selection from both populations and accounting for their distinct genetic diversities. The stationary mean binding affinity quantifies the balance of mutation and selection acting on both populations. A strong selection difference between two populations results in selective sweeps for genotypes with extreme values of binding affinity in each population, and hence, reduces the binding diversity. We validated our analytical solution for stationary mean binding, with corrections due to selection on binding diversity (Section B.3 of the Appendix), by comparison with full, genotype-based Wright-Fisher simulations across a broad range of selection strengths (Figs. S1 and S2).

The weak dependence of binding diversity on selection allows for an experimental estimation of the stationary rescaled mean binding affinity, using measurements of the binding affinity distribution and neutral sequence diversities. The rescaled binding affinity can be approximated as: and . Fig. 2 demonstrates the utility of this approximation, and it shows that heterogeneous binding accessibilities, , drawn from several different distributions, do not affect stationary mean binding. Only the total magnitude of the accessibilities is relevant, as it determines the effect of selection on the whole phenotype. Although we have formulated a high-dimensional stochastic model of antibody-antigen coevolution in polymorphic populations, we can nonetheless understand the long-term binding affinities, which are commonly measured in patients, in terms of only a few key parameters.

In Section B.5 of the Appendix we numerically study the non-linear fitness landscapes described in the Model section, and their effect on the stationary mean binding and rate of adaptation (Fig. S4). While the results differ quantitatively, we can qualitatively understand how the stationary mean binding affinity depends on the form of non-linearity.

Fitness and transfer flux

The antagonistic coevolution of antibodies and viruses is a non-equilibrium process, with each population constantly adapting to a dynamic environment, namely, the state of the opposing population. As a result, any time-independent quantity, such as the stationary mean binding affinity studied above, is itself not informative for the extent of coevolution that is occurring. For example, a stationary mean binding affinity of zero (equivalently in eq. (6)) can indicate either neutral evolution or rapid coevolution induced by equally strong selection in antibody and viral populations.

Figure 3: Fitness and transfer flux in antibody-viral coevolution. The schematic diagram shows adaptation of antibody (blue diamond) and viral (red diamond) populations on their respective fitness landscapes, which depend on the common binding phenotype shown on the x-axis (i.e., the mean binding affinity). During one step of antibody adaptation (left), mean binding affinity increases (horizontal blue arrow) to enhance the fitness of the antibody population, with a rate equal to the antibody fitness flux (upward blue arrow). In the regime of strong selection, the fitness flux is proportional to the variance of fitness in the population; see eq. (8). Adaptation of antibodies reduces the mean fitness in the viral population, with a rate proportional to the transfer flux from antibodies to viruses (downward red/blue arrow). On the other hand, viral adaptation (right) reduces the binding affinity and affects the fitness of both populations, with rates proportional to the viral fitness flux (upward red arrow) and the transfer flux from viruses to antibodies (downward blue/red arrow); see eq. (9). Cumulative fitness flux (the sum of upward arrows) and cumulate transfer flux (the sum of downward arrows) over an evolutionary period quantify the amount of adaptation and interaction in the two antagonistic populations.

To quantify the amount of adaptation and extent of interaction in two coevolving populations we will partition the change in mean fitness of each population into two components. We measure adaptation by the fitness flux Mustonen and Lässig (2009, 2010); Held et al. (2014), which generically quantifies adaptation of a population in response to a changing environment (in this case the opposing population); see schematic Fig. 3. For our model, the fitness flux of the antibody population quantifies the effect of changing genotype frequencies on mean fitness, and is defined as , where denotes the mean fitness of antibodies, and the derivative measures the change in frequency of the antibody . The forces of mutation, drift, and selection all contribute to fitness flux, however the portion of fitness flux due to selection equals the population variance of fitness, in accordance with Fisher’s theorem Fisher (1930a). The second quantity we study, which we term the transfer flux, measures the amount of interaction between the two populations by quantifying the change in mean fitness due to the response of the opposing population (schematic Fig. 3). The transfer flux from viruses to antibodies is defined as . Analogous measures of adaptation and interaction can be defined for the viral population (see Section C of the Appendix).

The fitness flux and transfer flux represent rates of adaptation and interaction, and they are typically time dependent, except in the stationary state. The total amount of adaptation and interaction during non-stationary evolution, where the fluxes change over time, can be measured by the cumulative fluxes over a period of time: and , where time is measured in units of neutral coalescence time of antibodies . In the stationary state, the ensemble-averaged cumulative fluxes grow linearly with time. For coevolution on the fitness landscapes given by equations (2,3), the ensemble-averaged, stationary cumulative fitness flux and transfer flux in antibodies are


Note that the factor in eq. (9), which is a rescaling of time in units of viral neutral coalescence time , emphasizes the distinction between the evolutionary time scales of antibodies and viruses. The first terms on the right hand side of eqs. (8, 9) represent the fitness changes due to mutation, the second terms are due to selection, and the effects of genetic drift are zero in the ensemble average for our linear fitness landscape. Notably, the flux due to the conserved region of the virus is zero in stationarity, as is the case for evolution in a static fitness landscape (i.e., under equilibrium conditions). In the stationary state, the cumulative fitness and transfer fluxes sum up to zero, .

Fitness flux and transfer flux are generic quantities that are independent of the details of our model, and so they provide a natural way to compare the rate of adaptation in different evolutionary models or in different experiments. In the regime of strong selection , non-linearity of the fitness function results in a more narrow distribution of fitness values in the antibody population, and hence, reduces the rate of adaptation and fitness flux; see Fig. S4. In the following section we show how to use fitness and transfer flux to detect signatures of significant antibody-antigen coevolution.

Signature of coevolution and inferences from time-shifted experiments

Measuring interactions between antibodies and viruses isolated from different times provides a powerful way to identify coevolution. These “time-shifted” neutralization measurements in HIV patients have shown that viruses are more resistant to past antibodies, from which they have been selected to escape, and more susceptible to antibodies from the future, due to selection and affinity maturation of B-cells Richman et al. (2003); Frost and et. al. (2005); Moore and et al. (2009).

Figure 4: Time-shifted binding assays between antigens and antibodies provide a definitive signature of viral-immune coevolution. Viruses perform best against antibodies from the past and perform worst against antibodies from the future due to the adaptation of antibodies. (A) Stationary rescaled binding affinity between viruses from time and antibodies from time , averaged over all t: , and (B) time-shifted mean fitness of viruses , are shown for three regimes of coevolutionary dynamics: strong adaptation of both populations , with (blue), stronger adaptation of viruses with (red), and stronger adaptation of antibodies with (green). Wright-Fisher simulations (solid lines) are compared to the analytical predictions from eqs. (S102, S103) in each regime (dashed lines). The “S”-shape curve in the blue regime is a signature of two antagonistically coevolving populations . For large time-shifts, binding relaxes to its neutral value, zero, as mutations randomize genotypes. In the absence of selection in one population, the time-shifted binding affinity reflects adaptation of one population against stochastic variation in the other due to mutation and genetic drift. The slope of time-shifted fitness at lag is the viral population’s fitness flux (slope towards the past) and the transfer flux from the opposing population (slope towards the future), which are equal to each other in the stationary state. The slope of the dotted lines indicate the predicted fitness flux and transfer flux (eqs. (8, 9). Time-shifted fitness shown here does not include binding to the conserved region since that value is constant for all time-shifts in stationarity (see Fig. S6 for non-stationary state). Simulation parameters are given in the Appendix. (C) Empirical time-shifted fitness measurements of HIV based on a neutralization titer (Richman et al. (2003), averaged over all time points with equal time-shift . Circles show averaged fitness 1 standard error, and crosses show fitness at time-shifts with only a single data point. Solid lines show analytical fits of our model to the data (see Section F of Appendix). In patient TN-1, viruses and antibodies experience a comparable adaptive pressure, with a similar time-shift pattern to the blue “S-curve” in panel (B). In patient TN-3, however, adaptation in viruses is much stronger than in antibodies, resulting in an imbalanced shape of the time-shifted fitness curve, similar to the red curve in panel (B).

We can predict the form of time-shifted binding assays under our model; see Section D of the Appendix for details. The rescaled time-shifted binding affinity between viruses at time and antibodies at time is given by and for the variable and the conserved region, respectively. The corresponding viral mean fitness at time against the antibody population at time is . The slope of the time-shifted viral fitness at the time where the two populations co-occur (i.e., ), approaching from negative , i.e., from the past, measures the amount of adaptation of the viral population in response to the state of the antibody population, and it is precisely equal to the fitness flux of viruses: . The slope approaching from positive time-shifts, i.e., from the future, measures the change in the mean fitness of the viral population due to adaptation of the antibody population, and it is precisely equal to the transfer flux from antibodies to viruses . Similarly, we can define time-shifted fitness with antibodies as the focal population; see Section D of the Appendix. In stationarity, the sum of fitness flux and transfer flux is zero on average, and so the slopes from either side of are equal, as in Fig. 4 and Fig. S5. Note that these relationships between time-shifted fitness and the flux variables hold in general, beyond the specific case of a linear fitness landscape. In a non-stationary state, the fitness flux and transfer flux are not balanced, and so has a discontinuous derivative at (Fig. S6). Therefore, observation of such a discontinuity provides a way to identify stationarity versus transient dynamics, given sufficient replicated experiments.

Whether in stationarity or not, the signature of out-of-equilibrium evolution is a positive fitness flux and negative transfer flux. For time-shifted fitness, this means that for short time shifts, where dynamics are dominated by selection, viruses have a higher fitness against antibodies from the past, and have lower fitness against antibodies from the future. This is true even when one population is evolving neutrally and the other has substantial selection, as shown in Fig. 4A. For long time shifts, the sequences are randomized by mutations and the fitness decays exponentially to the neutral value. When selection and mutation are substantial on both sides the time-shifted fitness curve has a characteristic “S” shape – a signature of coevolution, whose inflected form can be understood in terms of the fitness and transfer fluxes. In Section D of the Appendix we analytically derive the functional form of the time-shifted binding affinity and fitness dependent on the evolutionary parameters. Fig. 4A,B and Fig. S5 show good agreement between Wright-Fisher simulations and our analytical predictions in eqs. (S102, S103) for the stationary time-shifted binding affinity and viral fitness.

We can use our analytical results to interpret empirical measurements of time-shifted viral neutralization by a patient’s circulating antibodies. We analyzed data from Richman et al. Richman et al. (2003) on two HIV-infected patients. We approximated the fitness of the virus against a sampled serum (antibodies) as the logarithm of the neutralization titer ; here titer is the reciprocal of antibody dilution where inhibition reaches 50% (Blanquart and Gandon (2013). A signature of coevolution can sometimes be obscured when the fitnesses of antibodies and viruses also depend on time-dependent intrinsic and environmental factors, such as drug treatments Blanquart and Gandon (2013). Therefore, we used fitness of a neutralization-sensitive virus (NL43) as a control measurement to account for the increasing antibody response during infection, shown in Fig. S7. The relative time-shifted viral fitness in Fig. 4C for the two HIV patients (TN-1 and TN-3), match well with the fits of our analytical equations (see Section F of the Appendix). The inferred parameter values indicate two distinct regimes of coevolutionary dynamics in the two patients. In patient TN-1, viruses and antibodies experience a comparable adaptive pressure, as indicated by the “S-curve” in Fig. 4C (blue line), whereas in patient TN-3, adaptation in viruses is much stronger than in antibodies, resulting in an imbalanced shape of the time-shifted fitness curve in Fig. 4C (red line). We describe the inference procedure and report all inferred parameters in Section F of the Appendix. The resolution of the data Richman et al. (2003) allows only for a qualitative interpretation of coevolutionary regimes. A more quantitative analysis can be achieved through longer monitoring of a patient, detailed information on the inhibition of viral replication at various levels of antibody dilution, and directed neutralization assays against HIV-specific antibody lineages.

Competition between multiple antibody lineages

B-cells in the adaptive immune system are associated with clonal lineages that originate from distinct ancestral naive cells, generated by germline rearrangements (VDJ recombination) and junctional diversification Janeway et al. (2005). Multiple lineages may be stimulated within a germinal center, and also circulate to other germinal centers Victora and Nussenzweig (2012). Lineages compete for activation agents (e.g., helper T-cells) and interaction with a finite number of presented antigens Victora and Nussenzweig (2012). We extend our theoretical framework to study how multiple lineages compete with each other and coevolve with viruses. This generalization allows us to show that lineages with higher overall binding ability, higher fitness flux, and lower (absolute) transfer flux have a better chance of surviving. In particular, we show that an antibody repertoire fighting against a highly diversified viral population, e.g., during late stages of HIV infection, favors elicitation of broadly neutralizing antibodies compared to normal antibodies.

The binding preference of a clonal antibody lineage to the viral sequence is determined by its site-specific accessibilities , defined in Fig. 1B. The distribution of site-specific accessibilities over different antibody lineages characterizes the ability of an antibody repertoire to respond to a specific virus. Without continual introduction of new lineages, one lineage will ultimately dominate and the rest will go extinct within the coalescence time-scale of antibodies, (Fig. 5A). In reality, constant turn-over of lineages results in a highly diverse B-cell response, with multiple lineages acting simultaneously against an infection Hoehn and et al. (2015).

Stochastic effects are significant when the size of a lineage is small, so an important question is to find the probability that a low-frequency antibody lineage reaches an appreciable size and fixes in the population. We denote the frequency of an antibody lineage with size by . The growth rate of a given lineage depends on its relative fitness compared to the rest of the population,


where is the average fitness of the entire antibody population, and is a standard Gaussian noise term. For the linear fitness landscape from eq. (2), the mean fitness of lineage is . The probability of fixation of lineage equals the asymptotic (i.e., long time) value of the ensemble-averaged lineage frequency, .

Similar to evolution of a single lineage, the dynamics of a focal lineage are defined by an infinite hierarchy of moment equations for the fitness distribution. In the regime of substantial selection, and by neglecting terms due to mutation, a suitable truncation of the moment hierarchy allows us to estimate the long-time limit of the lineage frequency, and hence, its fixation probability (see Section E of the Appendix). For an arbitrary fitness function, fixation probability can be expressed in terms of the ensemble-averaged relative mean fitness, fitness flux and transfer flux at the time of introduction of the focal lineage,


where is the fixation probability of the lineage in neutrality, which equals its initial frequency at the time of introduction, . The first order term that determines the excess probability for fixation of a lineage is the difference between its mean fitness and the average fitness of the whole population. Thus, a lineage with higher relative mean fitness at the time of introduction, e.g., due to its better accessibility to either the variable or conserved region, will have a higher chance of fixation. Moreover, lineages with higher rate of adaptation, i.e., fitness flux , and lower (absolute) transfer flux from viruses tend to dominate the population.

For evolution in the linear fitness landscape, we can calculate a more explicit expansion of the fixation probability that includes mutation effects. In this case, the fixation probability of a focal lineage can be expressed in terms of the experimentally observable lineage-specific moments of the binding affinity distribution, instead of the moments of the fitness distribution (see Section E of the Appendix).

Figure 5: Competition between antibody lineages, and fixation of broadly neutralizing antibodies. (A) Simulation of competition between 20 clonal antibody lineages against a viral population. Lineages with higher mean fitness, higher fitness flux, and lower transfer flux tend to dominate the antibody repertoire. Each color represents a distinct antibody lineage, however there is also diversity within each lineage from somatic hypermutations. The reduction in the number of circulating lineages resembles the reduction in the number of active B-cell clones within the life-time of a germinal center Victora and Nussenzweig (2012). Lineages are initialized as 500 random sequences with random accessibilities ’s, unique to each lineage, drawn from an exponential distribution with rate parameter . Total population sizes are . Other simulation parameters are as specified in the Appendix. (B) Analytical estimates of the fixation probability of a new antibody lineage, based on the state of the populations at the time of its introduction, compared to Wright-Fisher simulations (points) with two competing antibody lineages. A novel BnAb (blue) or non-BnAb (red) lineage is introduced at frequency 10% into a non-BnAb resident population (simulation procedures described in the Appendix). BnAb lineages have a higher chance of fixing, compared to non-BnAb antibodies, when the viral population is diverse, whereas both types of Abs have similar chances in the limit of low viral diversity. The solid line is the analytical estimate for given by eq. (S140), which is valid when the rate of adaptation is similar in antibodies and viruses. The dashed line is the analytical estimate for using the approximation in eq. (S141), which is suitable when there is a strong imbalance between the two populations, as is the case for invasion of a BnAb with strong antibody selection or against a viral population with low diversity. In the absence of selection (neutral regime), the fixation probability of an invading lineage is equal to its initial frequency of 10%. Panels show different strengths of antibody selection against a common viral selection strength . Viral diversity is influenced mostly by the viral nucleotide diversity , which ranges from to . Other simulation parameters are specified in the Appendix.

Emergence of broadly neutralizing antibodies

With our multi-lineage model, we can understand the conditions for emergence of broadly neutralizing antibodies (BnAbs) in an antibody repertoire. Similar to any other lineage, the progenitor of a BnAb faces competition with other resident antibody lineages that may be dominating the population. The dominant term in the fixation probability is the relative fitness difference of the focal lineage to the total population at the time of introduction. Lineages may reach different fitnesses because they differ in their scale of interaction with the viruses, in the variable region and in the conserved region; see Section E of the Appendix for details. Lineages which bind primarily to the conserved region, i.e., , are not vulnerable to viral escape mutations that reduce their binding affinity. Such BnAbs may be able to reach higher fitnesses compared to normal antibodies which bind to the variable region with a comparable scale of interaction. The difference in the mean fitness of the two lineages becomes even stronger, when viruses are more diverse (i.e., high ), so that they can strongly compromise the affinity of the lineage that binds to the variable region; see eq. (11).

If the invading lineage has the same fitness as the resident lineage, then the second order terms in eq. (11) proportional to the fitness and transfer flux may be relevant. A BnAb lineage that binds to the conserved region has a reduced transfer flux than a normal antibody lineage, all else being equal. The difference in transfer flux of the two lineages depends on the viral diversity , and becomes more favorable for BnAbs when the viral diversity is high. Overall, a BnAb generating lineage has a higher advantage for fixation compared to normal antibodies, when the repertoire is coevolving against a highly diversified viral population, e.g., during late stages of HIV infection.

In Fig. 5B we compare the fixation probability of a BnAb lineage, that binds only to the conserved region, with a normal antibody lineage that binds only to the variable region. In both cases we assume that the emerging lineage competes against a resident population of normal antibodies. We compare our analytical predictions for fixation probability as a function of the initial state of the antibody and viral populations in eqs. (S140, S141), with Wright-Fisher simulations of coevolving populations (numerical procedures detailed in Section G of the Appendix). Increasing viral diversity increases the fixation of BnAbs, but does not influence fixation of normal lineages. For low viral diversity, fixation of BnAbs is similar to normal Abs, and therefore they might arise and be outcompeted by other antibody lineages.


We have presented an analytical framework to describe coevolutionary dynamics between two antagonistic populations based on molecular interactions between them. We have focused our analysis on antibody-secreting B-cells and chronic infections, such as HIV. We identified effective parameters for selection on B-cells during hypermutation that enhance their binding and neutralization efficacy, and conversely parameters for selection on viruses to escape antibody binding. The resulting “red-queen” dynamics between antibodies and viruses produces a characteristic signature of coevolution in our model, i.e., viruses are resistant to antibodies from the past and are susceptible to antibodies from the future. We used our results to infer modes of immune-viral coevolution based on time-shifted neutralization measurements in two HIV-infected patients. Finally, we have shown that emergence and fixation of a given clonal antibody lineage is determined by competition between circulating antibody lineages, and that broadly neutralizing antibody lineages, in particular, are more likely to dominate in the context of a diverse viral population.

Luo and Perelson Luo and Perelson (2015b) found that competition between lineages caused BnAbs to appear late in their simulations. In addition, they found that multiple viral founder strains dilutes the competition of BnAbs with specific antibodies, leading to higher chance of BnAb appearance. The assumptions of their simulations differ in many ways from those of our model, and yet their overall finding agrees with our analytical results: BnAbs fix more readily when there is a large diversity of viral binding. In contrast to Luo and Perelson’s simulations which made assumptions about the immunogenicity of BnAbs, our analytic results show explicitly how differences in fitness of antibodies and the efficacy of viral escape affect competition between antibody lineages.

Our model is simple enough to clarify some fundamental concepts of antibody-antigen dynamics. However, understanding more refined aspects of B-cell-virus coevolution will require adding details specific to affinity maturation and viral reproduction, such as non-neutralizing binding between antibodies and antigens Wyatt and Sodroski (1998); Luo and Perelson (2015a), epitope masking by antibodies Zhang and et al. (2013) and spatial structure of germinal centers Victora and Nussenzweig (2012). Importantly, viral recombination Lemey et al. (2006); Neher and Leitner (2010); Zanini and et al. (2015) and latent viral reservoirs Chun and et al. (1997) are also known to influence the evolution of HIV within a patient. Similarly, the repertoire of the memory B-cells and T-cells, which effectively keep a record of prior viral interactions, influence the response of the adaptive immune system against viruses with antigenic similarity.

While our analysis has focused on coevolution of chronic viruses with the immune system, our framework is general enough to apply to other systems, such as bacteria-phage coevolution. Likewise, the notions of fitness and transfer flux as measures of adaptation are independent of the underlying model. Bacteria-phage interactions have been studied by evolution experiments Brockhurst and Koskella (2013); Koskella and Brockhurst (2014), and by time-shifted assays of fitness Hall et al. (2011); Betts et al. (2014), but established models of coevolution typically describe only a small number of alleles with large selection effects Agrawal and Lively (2002). In contrast, our model offers a formalism for bacteria-phage coevolution where new genotypes are constantly produced by mutation, consistent with experimental observations Hall et al. (2011). Similarly, our formalism may be applied to study the evolution of seasonal influenza virus in response to the “global” immune challenge, imposed by a collective immune landscape of all recently infected or vaccinated individuals. Time-shifted binding assays of antibodies to influenza surface proteins are already used to gauge the virulence and cross-reactivity of viruses Fonville and et al. (2014). Quantifying the fitness flux and transfer flux, based on these assays, is therefore a principled way to measure rates of immunologically important adaptation in these systems.

One central challenge in HIV vaccine research is to devise a means to stimulate a lineage producing broadly neutralizing antibodies. Common characteristics of BnAbs, such as high levels of somatic mutation or large insertions, often lead to their depletion by mechanisms of immune tolerance control Verkoczy et al. (2011). Therefore, one strategy to elicit these antibodies is to stimulate the progenitors of their clonal lineage, which may be inferred by phylogenetic methods Kepler and et. al. (2014), and to guide their affinity maturation process towards a broadly neutralizing state. Understanding the underlying evolutionary process is necessary to make principled progress towards such strategies, and this study represents a step in that direction. For example, our results suggest that a vaccine based on a genetically diverse set of viral antigens is more likely to stimulate BnAbs.


We acknowledge Francois Blanquart, Simon Frost, Mehran Kardar, Michael Lässig and Alan Perelson for helpful discussions. AN thanks the Kavli Institute for Theoretical Physics (UCSB) for its hospitality, where part of this work was performed. We acknowledge funding from the US National Science Foundation grants PHY-1305525 and PHY11-25915, the James S. McDonnell Foundation 21st century science initiative-postdoctoral program in complexity science / complex systems, the David and Lucile Packard Foundation, the U.S. Department of the Interior (D12AP00025), the U.S. Army Research Office (W911NF-12-1-0552). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


A. Antibody-viral coevolution in genotype space

We represent the antibody population as a set of genotypes consisting of vectors, , and corresponding genotype frequencies , with elements satisfying . Similarly, we consider a viral population with possible genotypes , and frequencies with elements with . Note that superscripts are indices, not exponentiation, unless next to parentheses, e.g. . The frequencies change over time, although we omit explicit notation for brevity, and hence every quantity that depends on the frequencies is itself time-dependent. In the following, we describe separately contributions from three evolutionary forces (i) mutation, (ii) selection, and (iii) genetic drift, and build a general stochastic framework for coevolution of antibodies and viruses in the space of genotypes. We assume that population sizes are large enough, and changes in frequencies are small enough to accommodate a continuous time and continuous frequency stochastic process Gardiner (2004); Kimura (1964).

(i) Mutations. The change of the genotype frequencies due to mutations follow,

where we define and as the genotype-specific components of the mutational fields in the antibodies and viruses, and is the antibody mutation rate from genotype to , and similarly, is the viral mutation rate from the genotype to . We assume constant mutation rates , , per generation per site for antibodies and viruses, with the exception of for the viral constant region, which implies that mutations in that region are lethal for the virus.

(ii) Selection and interacting fitness functions. The fitness of a genotype determines its growth rate at each point in time. We define fitness of genotypes in one population as a function of the genotypes in the other population. The general form of change in genotype frequencies due to selection follows,

The subscript for the antibody and viral fitness functions, and , refer to the genotypes in the corresponding population. The explicit conditional dependence of the antibody fitness function on the viral frequency vector emphasizes that fitness of an antibody depends on the interacting viral population . Similar notation is used for the fitness function of the viruses. The subtraction of the population’s mean fitness, and , ensures that the genotype frequencies remain normalized in each population. In terms of linearly independent frequencies and , these evolution equations take the forms,

where and are the respective time-dependent selection coefficients of the antibody and the viral strain , which depend on the state of the both populations at that moment in time. The inverse of the response matrices, and , play the role of metric in the genotype space (see below and e.g., Antonelli and Strobeck (1977)). The change in the mean fitness due to selection after an infinitesimal amount of time follows,


where and are the infinitesimal changes in the genotype frequencies, and and, are respectively the change in the selection coefficient of the antibody and the virus due the evolution of opposing population. This measure of fitness transfer is a useful concept for interacting populations. Intuitively, it can be understood as the change of fitness in one population only due to the changes of allele or genotype frequencies in the opposing population.

(iii) Genetic drift and stochasticity. The stochasticity of reproduction and survival, commonly known as genetic drift, is represented as discrete random sampling of offspring genotypes from the parent’s generation with the constraint that the total population size remains constant. The magnitude of this sampling noise is proportional to inverse population size. and are the effective population sizes of the antibody and the viral populations, which represent the size of population bottlenecks e.g., in a germinal center. In the continuous time, continuous frequency limit, genetic drift is represented as noise terms in a diffusion equaiton with magnitude proportional to inverse population size Kimura (1964). The diffusion coefficients are characteristics of the Fisher metric Fisher (1930b); Antonelli and Strobeck (1977),

The generalized Kimura’s diffusion equation Kimura (1983) for the joint distribution of genotype frequencies in both populations reads,

This Fokker-Planck equation acts on the high-dimensional genotype space of antibodies and viruses, which are likely to be under-sampled in any biological setting. In the following, we introduce a projection from genotype space onto a lower dimensional space of molecular traits (phenotypes) to make the problem tractable.

B. Antibody-viral coevolution in phenotype space

B.1 Molecular phenotypes for antibody-viral interaction

We define the binding affinity between an antibody and viral genotype as the molecular interaction phenotype under selection, for which we describe the evolutionary dynamics. Antibody and viral genotypes are represented by binary sequences of . Antibody sequences are of length , while viral sequences consist of a mutable region of length , and a conserved (i.e. sensitive) region of length , where each site is always +1, as was similarly done in Wang and et al. (2015). We model the binding affinity between antibody and virus as a weighted dot product over all sites,


where , and denote the site in antibody and virus , respectively. The set of coupling constants for the mutable and conserved region, represent the accessibility of a clonal antibody lineage to regions of the viral sequence. Matching bits at interacting positions enhance binding affinity between an antibody and a virus. Similar models have been used to describe B-cell maturation in germinal centers Wang and et al. (2015), and T-cell selection based on the capability to bind external antigens and avoid self proteins Detours and Perelson (1999, 2000). In Section Selection., we extend our model to multiple lineages, where each lineage has its own set of accessibilities. Antibody lineages with access to the conserved regions of the virus can potentially fix as broadly neutralizing antibodies. We denote the quantities related to the conserved sites of the virus with a hat: .

We project the evolutionary forces acting on the genotype to the binding phenotype , and quantify the changes of the binding phenotype distribution in each population over time. For a single antibody genotype we characterize its interactions with the viral population by the genotype-specific moments,

Statistics of the binding affinity distribution for antibody :

(i) average in the variable region:
(ii) average in the conserved region:
(iii) central moment in the variable region:

Since the viral population is monomorphic in the conserved region, the average mean binding affinity of an antibody is independent of the state of the viral population, , and the higher central moments are zero, . Similarly, we characterize the interactions of a given viral genotype with the antibody population,

Statistics of the binding affinity distribution for virus :

(i) average in the variable region:
(ii) average in the conserved region:
(iii) central moment in the variable region:
(iii) central moment in the conserved region:

One of the most informative statistics that we characterize is the distribution of population-averaged antibody and viral binding interactions, respectively denoted by and . The mean of these distributions are equal to each other, but the higher moments differ. We denote the population-specific moments of the average interactions by,

Mean binding affinity in,


central moment of the average affinities in,


Note that the population central moments and are distinct from the genotype-specific moments, and . The central moments of the viral population in the conserved region are equal to zero, .

Trait scale and dimensionless quantities. It is useful to measure phenotypes in natural units, which avoids the arbitrariness of the physical units (), and the total number of sites . As previously shown in Nourmohammad et al. (2013b, a), there exist summary statistics of the site-specific effects, (here ), which define a natural scale of the molecular phenotype. We denote the moments of the site-specific effects along the genome by,


We express the phenotype statistics in units of the trait scales, i.e., the squared sum of the site-specific effects, in the variable region, and in the conserved region. The rescaled phenotype statistics follow,


These scaled values are pure numbers (we distinguish them by use of lower case letters from the raw data). The trait scales and provide natural means to standardize the relevant quantities because they are the stationary ensemble variances of the population mean binding affinity in an ensemble of genotypes undergoing neutral evolution in the weak-mutation regime (see Section B.3 for derivation of the stationary statistics),


where indicates averages over an ensemble of independent populations.

Binding probability. The probability that an antibody is bound by an antigen determines its chance of proliferation and survival during the process of affinity maturation, and hence, defines its fitness. We describe two distinct models for antibody activation. The simplest model assumes that the binding probability of a given antibody is a sigmoid function of its mean binding affinity against the viral population,


where is the threshold for the binding affinity and determines the amount of nonlinearity, and is related to the inverse of temperature in thermodynamics. Following the rescaling introduced in eq. (S22), the binding threshold and the nonlinearity in eq. (S24) rescale as and . In the following, we will use eq. (S24) to characterize a biophysically grounded fitness function for antibodies.

For the virus, binding to an antibody reduces the chances of its survival. Similar to eq. (S24), the probability that a given virus is bound by antibodies follows,


where and are similar to eq. (S24).

In Section B.5, we will discuss an alternative model for activation of an antibody which is based on its strongest binding affinity with a subset of viruses.

B.2 Coevolutionary forces on the binding affinity

Similar to genotype evolution, stochastic evolution of a molecular phenotype generates a probability distribution, , which describes an ensemble of independently evolving populations, each having a phenotype distribution with mean affinity and and central moments of the averaged affinity in the antibody population, , , and in the viral population, (see also Nourmohammad et al. (2013b)). The probability distribution can be expressed in therms of the distribution for genotype frequencies,