Immune networks: multitasking capabilities at medium load
Abstract
Associative network models featuring multitasking properties have been introduced recently and studied in the low load regime, where the number of simultaneously retrievable patterns scales with the number of nodes as . In addition to their relevance in artificial intelligence, these models are increasingly important in immunology, where stored patterns represent strategies to fight pathogens and nodes represent lymphocyte clones. They allow us to understand the crucial ability of the immune system to respond simultaneously to multiple distinct antigen invasions. Here we develop further the statistical mechanical analysis of such systems, by studying the medium load regime, with . We derive three main results. First, we reveal the nontrivial architecture of these networks: they exhibit a high degree of modularity and clustering, which is linked to their retrieval abilities. Second, by solving the model we demonstrate for the existence of large regions in the phase diagram where the network can retrieve all stored patterns simultaneously. Finally, in the high load regime we find that the system behaves as a spin glass, suggesting that finiteconnectivity frameworks are required to achieve effective retrieval.
pacs:
75.10.Nr, 87.18.Vf1 Introduction
After a pioneering paper [1] followed by a long period of dormancy, recent years have witnessed a surge of interest in statistical mechanical models of the immune system [2, 3, 4, 5, 6, 7, 8, 9, 10]. This description complements the more standard approaches, which tend to be phrased in the language of dynamical systems [11, 12, 13, 14]. To make further progress, however, it has become clear that we need new quantitative tools, able to handle the complexities which surfaced in e.g. [15, 16]. This is the motivation for the present study.
There is an intriguing and fruitful analogy (from a modelling perspective) between neural networks, which have been modelled in statistical mechanics quite extensively, and immune networks. Let us highlight the similarities and differences. In neural networks the nodes represent neurons, which interact with each other directly through Hebbian synaptic couplings. In (adaptive) immune systems, effector branches (Bclones) and coordinator branches (helper and suppressor Tclones), interact via signaling proteins called cytokines. The latter can represent both eliciting and suppressive signals. Neural and immune systems are both able to learn (e.g. how to fight new antigens), memorize (e.g. previously encountered antigens) and ‘think’ (e.g. select the best strategy to cope with pathogens). However, neural networks are designed for serial processing: neurons perform collectively to retrieve a single pattern at a time. This is not acceptable in the immune context. Multiple antigens will normally be present at the same time, which requires the simultaneous recall of multiple patterns (i.e. of multiple defense strategies). Moreover, the architectures of neural and immune networks are very different. A model with fully connected topology, mathematically convenient but without a basis in biological reality, is tolerable for neural networks where each neuron is known to have a huge number of connections with others. In contrast, in immune networks, where interactions among lymphocytes are much more specific, the underlying topology must be carefully modelled and is expected to play a crucial operational role. From a theoretical physics perspective, a network of interacting B and Tcells resembles a bipartite spin glass. It was recently shown that such bipartite spin glasses exhibit retrieval features which are deeply related to their structures [15, 16], and this can be summarized as follows:

Simultaneous pattern recall is essential in the context of immunology, since it implies the ability to respond to multiple antigens simultaneously. The analysis of such systems requires a combination of techniques from statistical mechanics and graph theory.
The last point is the focus of the present paper, which is organized as follows. In Section 2 we describe a minimal biological scenario for the immune system, based on the analogy with neural networks. We define our model and its scaling regimes, and prepare the stage for calculations. Section 3 gives a comprehensive analysis of the topological properties of the network in the extremely diluted regime, which is the scaling regime assumed throughout our paper. Section 4 is dedicated to the statistical mechanical analysis of the system in the medium load regime, focusing on simultaneous pattern recall of the network. Section 5 deals with the high load regime. Here the network is found to behave as a spin glass, suggesting that a higher degree of dilution should be implemented – in remarkable agreement with immunological findings [25, 26] – and this will be the focus of future research. The final section gives a summary of our main conclusions.
2 Statistical mechanical modelling of the adaptive immune system
2.1 The underlying biology
All mammals have an innate (broad range) immunity, managed by macrophages, neutrophils, etc., and an adaptive immune response. The latter is highly specific for particular targets, handled by lymphocytes, and the focus of this paper. To be concise, the following introduction to the adaptive immune system has already been filtered by a theoretical physics perspective, and immunological observables are expressed in‘physical’ language. We refer to the excellent books [25, 26] for comprehensive reviews of the immune system, and to a selection of papers [2, 3, 4, 15, 16, 27] for explanations of the link between ‘physical’ models and biological reality. Our prime interest is in Bcells and in Tcells; in particular, among Tcells, in the subgroups of socalled ‘helpers’ and ‘suppressors’. Bcells produce antibodies and present them on their external surface in such a way that they are able to recognize and bind pathogenic peptides. All Bcells that produce the same antibody belong to the same clone, and the ensemble of all the different clones forms the immune repertoire. This repertoire is of size clones in humans. The size of a clone, i.e. the number of identical Bcells, may vary strongly. A clone at rest may contain some cells, but when it undergoes clonal expansion its size may increase by several orders of magnitude, to up to . Beyond this size the state of the immune system would be pathological, and is referred to as lymphocytosis.
When an antigen enters the body, several antibodies (i.e. several Bcells belonging to different clones) may be able to bind to it, making it chemically inert and biologically inoffensive. In this case, conditional on authorization by Thelpers (mediated via cytokines), the binding clones undergo clonal expansion. This means that their cells start duplicating, and releasing high quantities of soluble antibodies to inhibit the enemy. After the antigen has been deleted, Bcells are instructed by Tsuppressors, again via cytokines, to stop producing antibodies and undergo apoptosis. In this way the clones reduce their sizes, and order is restored. Thus, two signals are required for Bcells to start clonal expansion: the first signal is binding to antigen, the second is a ‘consensus’ signal, in the form of an eliciting cytokine secreted by Thelpers. This latter mechanism prevents abnormal reactions, such as autoimmune manifestations^{1}^{1}1Through a phenomenon called ‘crosslinking’, a Bcell can also have the ability to bind a selfpeptide, and may accidentally start duplication and antibody release, which is a dangerous unwanted outcome..
Thelpers and Tsuppressors are lymphocytes that work ‘behind the scenes’, regulating the immune response by coordinating the work of effector branches, which in this paper are the Bcells. To accomplish this, they are able to secrete both stimulatory and suppressive chemical signals, the cytokines [28, 29]. If within a given (small) time interval a Bclone recognizes an antigen and detects an eliciting cytokine from a Tcell, it will become activated and start duplicating and secreting antibodies. This scenario is the socalled ‘twosignal model’ [30, 31, 32, 33]. Conversely, when the antigen is absent and/or the cytokine signalling is suppressive, the Bcells tuned to this antigen start the apoptosis programme, and their immunosurveillance is turned down to a rest state. For simplicity, we will from now on with the term ‘helper’ indicate any helper or suppressor Tcell. The focus of this study is to understand, from a statistical mechanics perspective, the ability of helpers and suppressors to coordinate and manage simultaneously a huge ensemble of Bclones (possibly all).
2.2 A minimal model
We consider an immune repertoire of different clones, labelled by . The size of clone is . In the absence of interactions with helpers, we take the clone sizes to be Gaussian distributed; without loss of generality we may take the mean to be zero and unit width, so . A value now implies that clone has expanded (relative to the typical clonal size), while implies inhibition. The Gaussian clone size distribution is supported both by experiments and by theoretical arguments [4]. Similarly, we imagine having helper clones, labelled by . The state of helper clone is denoted by . For simplicity, helpers are assumed to be in only two possible states: secreting cytokines () or quiescent (). Clone sizes and the helper states are dynamical variables. We will abbreviate , and .
The interaction between the helpers and Bclones is implemented by cytokines. These are taken to be frozen (quenched) discrete variables. The effect of a cytokine secreted by helper and detected by clone can be nonexistent (), excitatory (), or inhibitory (). To achieve a Hamiltonian formulation of the system, and thereby enable equilibrium analysis, we have to impose symmetry of the cytokine interactions. So, in addition to the Bclones being influenced by cytokine signals from helpers, the helpers will similarly feel a signal from the Bclones. This symmetry assumption can be viewed as a necessary first step, to be relaxed in future investigations, similar in spirit to the early formulation of symmetric spinglass models for neural networks [34, 35]. We are then led to a Hamiltonian for the combined system of the following form (modulo trivial multiplicative factors):
(1) 
In the language of disordered systems, this is a bipartite spinglass. We can integrate out the variables , and map our system to a model with helperhelper interactions only. The partition function , at inverse clone size noise level (which is the level consistent with our assumption ) follows straightforwardly, and reveals the mathematical equivalence with an associative attractor network:
(2)  
in which, modulo an irrelevant additive constant,
(3) 
Thus, the system with Hamiltonian , where helpers and Bclones interact through cytokines, is thermodynamically equivalent to a Hopfieldtype associative network represented by , in which helpers mutually interact through an effective Hebbian coupling. See Figure 1. Learning a pattern in this model then means adding a new Bclone with an associated string of new cytokine variables.
If there are no zero values for the , the system characterized by (3) is well known in artificial intelligence research. It is able to retrieve each of the ‘patterns’ , provided these patterns are sufficiently uncorrelated, and both the ratio and the noise level are sufficiently small [4, 20, 36, 42]. Retrieval quality can be quantified by introducing suitable order parameters, viz. , in terms of which the new Hamiltonian (3) can be written as
(4) 
If is sufficiently small, the minimum energy configurations of the system are those where for some (‘pure states’), which implies that and pattern is said to be retrieved perfectly. But what does retrieval mean in our immunological context? If , all the helpers are ‘aligned’ with their coupled cytokines: those that inhibit clone (i.e. secrete ) will be quiescent (), and those that excite clone (i.e. secrete ) will be active () and release the eliciting cytokine. As a result the Bclone receives the strongest possible positive signal (i.e. the random environment becomes a ‘staggered magnetic field’), hence it is forced to expand. Thus the arrangement of helper cells leading to the retrieval of pattern corresponds to clonespecific excitatory signalling upon the Bclone .
However, if all so the bipartite network is fully connected, it can expand only one Bclone at a time. This would be a disaster for the immune system. We need the dilution in the bipartite BH network that is caused by having also (i.e. no signalling between helper and clone ), to enable multiple clonal expansions. The associative network (3) now involves patterns with blank entries, and ‘pure states’ no longer work as low energy configurations. Retrieving a pattern no longer employs all spins , and those corresponding to null entries can be used to recall other patterns. This is energetically favorable since the energy is quadratic in the magnetizations . Conceptually, this is only a reshaping of the network’s recall tasks: no theoretical bound for information content is violated, and global retrieval is still performed through bits. However, the perspective is shifted: the system no longer requires a sharp resolution in information exchange between a helper clone and a Bclone^{2}^{2}2In fact, the highresolution analysis is performed in the antigenic recognition on the Bcell surface, which is based on a sharp keyandlock mechanism [2].. It suffices that a Bclone receives an attack signal, which could be encoded even by a single bit. In a diluted bipartite BH system the associative capabilities of the helper network are distributed, in order to simultaneously manage the whole ensemble of Bcells. The analysis of these immunologically most relevant patterndiluted versions of associative networks is still at an early stage. So far only the low storage case has been solved [15, 16]. In this paper we analyse the extreme dilution regime for the BH system, i.e. with .
3 Topological properties of the emergent networks
3.1 Definitions and simple characteristics
We start with the definition of the bipartite graph, which contains two sets of nodes (or vertices): the set representing Bcells (labelled by ) and the set representing Tcells (labelled by ), of cardinality and , respectively. Nodes belonging to different sets can be pairwise connected via links, which are identically and independently drawn with probability , in such a way that a random bipartite network is built. We associate with each link a weight, which can be either or ; these weights are quenched and drawn randomly from a uniform distribution. As a result, the state of each link connecting the th Bclone and the th Tclone can be denoted by a random variable , distributed independently according to
(5) 
We choose , with subject to , and . Upon tuning , displays different topologies, ranging from fully connected (all possible links are present, for ) to fully disconnected (for ). We have shown in the previous section how a process on this bipartite graph can be mapped to a thermodynamically equivalent process on a new graph, built only of the nodes in , occupied by spins that interact pairwise through a coupling matrix with (correlated) entries
(6) 
The structure of the marginalized system is represented by a weighted monopartite graph , with weights (6), whose topology is controlled by . To illustrate this, let us consider the weight distribution , which can be interpreted as the probability distribution for the endtoend distance of a onedimensional random walk. This walk has a waiting probability , and probabilities of moving left () or right () equal to , i.e.
(7) 
Therefore, we can write
(8) 
where the prime indicates that the sum runs only over values of with the same parity as . The result (8) can easily be generalized to the case of biased weight distributions [38], which would correspond to nonisotropic random walks. The first two moments of (8) are, as confirmed numerically in Figure 2:
(9) 
We now fix a scaling law for , namely , with . This includes the highload regime for , as well as the mediumload regime for . The lowstorage regime has already been treated elsewhere [15, 16]. We then find
(10) 
The probability of having scales like for , while for it approaches in the limit . One can recover the same result via a simple approximation, which is valid in the case :
(11) 
see also A.1 for a more rigorous derivation of . Given the assumed scaling of , we get , which translates into
(12) 
This quantity can be interpreted as the average link probability in . The average degree over the whole set of nodes^{3}^{3}3The degree or coordination number of node is the number of its nearestneighbors, i.e. the number of links stemming from the node itself. Thus, the average degree measures the density of links present in the graph. can then be written as
(13) 
Thus, if we adopt a meanfield approach based only on the estimates (12,13), we find that can display the following topologies, expressed in terms of the average degree of (the average number of links per node):
fully disconnected,  fully disconnected,  
finitely connected,  finitely connected,  
extremely diluted, but  ——  
finitely diluted,  ——  
fully connected,  —— 
The missing entries in the table correspond to forbidden values . The various cases are also summarized in the left panel of Fig. 3. This picture is confirmed by numerical simulations. The right panels of Fig. 3 give a finite size scaling analysis for the average degree and its fluctuations , measured in realizations of for several choices of parameters. We also show corresponding data for ErdösRényi graphs, where all links are independently drawn with probability , for comparison (here and ). We find that (i) in the fully disconnected regime of the phase diagram, decays to zero exponentially as a function of , (ii) in the extremely diluted regime, scales with according to a power law, (iii) in the finitely diluted regime is proportional to , and (iv) in the fully connected regime saturates to . There is thus full agreement with the predictions of the meanfield approach. The fluctuations are slightly larger that those of a purely randomly drawn network, which suggests that exhibits a certain degree of inhomogeneity. This will be investigated next.
It is important to stress that, as the system parameters and are tuned, the connectivity of the resulting network can vary extensively and therefore, in order for the Hamiltonian (4) to scale linearly with the system size , the prefactor of the Hopfield model embedded in complete graphs, is not generally appropriate. One should normalize according to the expected connectivity of the graph.
3.2 Component size distribution
As is increased, both and become more and more diluted, eventually underpercolating. The topology analysis can be carried out more rigorously for the (bipartite) graph , since its link probability is constant and identical for all . We can apply the generating function formalism developed in [39, 40] to show that the size of the giant component () diverges when
(14) 
Hence, upon setting , the percolation threshold for the bipartite graph is defined by
(15) 
which is consistent with the results of Section 3; we refer to A.2 for full details. Below the percolation threshold the generating function formalism also allows us to get the distribution for the size of the small components occurring in . A (connected) component of an undirected graph is an isolated subgraph in which any two vertices are connected to each other by paths; the size of the component is simply the number of nodes belonging to the component itself. We prove in A.2 that just below the percolation threshold, scales exponentially with . One finds that this is true also for the distribution of graph (see Figure 4, left panel).
Interestingly, the small components of that emerge around and below the percolation threshold play a central role in the network’s retrieval performance. To see this, one may consider the extreme case where the bipartite graph consists of trimers only. Here each node is connected to two nodes , that is and . The associated graph is then made up of dimers only, and for all . The energetically favorable helper cell configuration is now the one where , for any . This implies that retrieval of all patterns is accomplished (under proper normalization). In the opposite extreme case, is fully connected, and the helper cell system becomes a Hopfield network where parallel retrieval is not realized. In general, around and below the percolation threshold, the matrix turns out to be partitioned, which implies that also the coupling matrix is partitioned, and each block of corresponds to a separate component of the overall graph . For instance, looking at the bipartite graph , a starlike module with node at its center and the nodes as leaves^{4}^{4}4The opposite case of a starlike module with the center belonging to is unlikely, given that . can occur when the leaves share a unique nonnull th entry in their patterns, that is . For the graph this module corresponds to a complete subgraph of nodes. In this case the retrieval of pattern is trivially achieved. In fact, a complete subgraph in can originate from more general arrangements in : each leaf can display several other null entries beyond , but these are not shared, that is . For instance, all stars with centers belonging to and with leaves of length or fall into this extended class. Again, the retrieval of pattern and possibly of further patterns is achieved. However, the mutual signs of magnetizations are no longer arbitrary, as the terms are subject to constraints.
We can further generalize the topology of components in compatible with parallel retrieval, by considering cliques (i.e. subsets of nodes such that every two nodes in the subset are connected by an edge), which are joined together by one link: each clique consists of nodes that share the same nonnull entry, so that the unique link between two cliques is due to a node displaying at least two nonnull entries. This kind of structure exhibits a high degree of modularity; each clique is a module and corresponds to a different pattern. As for retrieval, this arrangement works fine as there is no interference between the signal on each node in . For this arrangement to occur a sufficient condition is that is devoid of squares, so that two nodes do not share more than one neighbor. This implies that, among the nodes connected to , the probability that any number of these display another common neighbor is vanishing:
(16) 
Since , we obtain the condition . Examples of numerically generated graphs , for different choices of parameters, are shown in Fig. 5.
3.3 Clustering properties
We saw that in the operationally most important parameter regime the graph is built of small cliques which are poorly (if at all) connected to each other. This means that nonisolated nodes are highly clustered and the graph will have a high degree of modularity, i.e. dense connections between nodes within the same ‘module’, but sparse connections between nodes in different ‘modules’. The clustering coefficient of a node measures how close its neighbors are to being a clique. It is defined as
(17) 
where is the number of links directly connecting nodes pairs in , while is the total number of nonordered node pairs in . Hence . The average clustering coefficient measures the extent to which nodes in a graph tend to cluster together. It is easy to see that for a bipartite graph, by construction, for any node, while for homogeneous graphs the local clustering coefficients are narrowly distributed around . For instance, for the ErdösRény random graph, where links are identically and independently drawn with probability , the local coefficients are peaked around . As for our graphs , due to their intrinsic inhomogeneity, the global measure would give only limited information. In contrast, the distribution of local clustering coefficients informs us about the existence and extent of cliques or ‘bulk’ nodes, which would be markers of low and high recall interference, respectively. Indeed, as shown in Figure 6, in the highlydiluted regime most of the nodes in are either highly clustered, i.e. exhibiting , or isolated, with , whereas the coefficients of the remaining nodes are distributed around intermediate values with average decreasing with , as expected. In particular, when both and are relatively large, approaches a bimodal distribution with peaks at and , whereas when is sufficiently larger than , there exists a fraction of nodes with intermediate clustering which make up a bulk. Therefore, although the density of links is rather small, the average clustering coefficient is very high, and this is due to the fragmentation of the graph into many small cliques.
To measure the extent of modular structures we constructed the topological overlap matrix , whose entry returns the normalized number of neighbors that and share. The related patterns for several choices of parameters are shown in the plots of Fig. 7, and compared to those of ErdösRènyi graphs . For ErdösRènyi graphs displays a homogeneous pattern, that is very different from the highly clustered cases emerging from . In particular, for the highly diluted cases considered here, we find that smaller values of induce a smaller number of modules, that are individually increasing in size.
4 Medium storage regime in extremely diluted connectivity: retrieval region
We now turn to the statistical mechanics analysis, and consider the immune network model composed of Tclones (, ) and Bclones (, ), such that the number ratio scales as
(18) 
The effective interactions in the reduced network with helper cells only are described by the Hamiltonian
(19) 
where the cytokine components are quenched random variables, independently and identically distributed according to
(20) 
with . The parameter must be chosen such that scales linearly with , and must therefore depend on and . Heuristically, since the number of nonzero entries in a generic pattern is , we expect that the network can retrieve a number of patterns of order . We therefore expect to see changes in only when crossing the region in the plane where pattern sparseness prevails over storage load (i.e. , where the system can recall all patterns), to the opposite situation, where the load is too high and frustration by multiple inputs on the same entry drives the network to saturation (i.e. ). To validate this scenario, which is consistent with our previous topological investigation, we carry out a statistical mechanical analysis, based on computing the free energy
(21) 
4.1 Free energy computation and physical meaning of the parameters
If the number of patterns is sufficiently small compared to , i.e. , we do not need the replica method; we can simply apply the steepest descent technique using the Mattis magnetizations as order parameters:
(22) 
with , and . Rescaling of the order parameters via then gives
(23) 
Hence, provided the limit exists, we may write via steepest descent integration:
(24) 
Differentiation with respect to the gives the self consistent equations for the extremum:
(25) 
With the additional new parameter , we now have two parameters with which to control separately two types of normalization: the normalization of the Hamiltonian, via , and the normalization of the order parameters, controlled by . To carry out this task properly, we need to understand the physical meaning of the order parameters. This is done in the usual way, by adding suitable external fields to the Hamiltonian:
(26) 
Now, with and the corresponding new free energy ,
(27) 
with the shorthand . The new free energy is then found to be
(28) 
Upon differentiation with respect to we find (27) taking the form
(29) 
We can then use expression (25) for to obtain the physcal meaning of our order parameters:
(30)  
Let us summarize the status of the various remaining control parameters in the theory, in the interest of transparency. Our model has three given external parameters:

: this quantifies the dilution of stored patterns, via ,

and : these determine the number of stored patterns, via .
It also has two ‘internal’ parameters, which must be set in such a way for the statistical mechanical calculation to be selfconsistent, i.e. such that various quantities scale in the physically correct way for :

: this must ensure that the energy scales as ,

: this must ensure that the order parameter are of order .
4.2 Setting of internal scaling parameters
To find the appropriate values for the internal scaling parameters and we return to the order parameter equation (25) and carry out the average over . This gives
(31)  
(32) 
Having nonvanishing in the limit clearly demands . If the will become independent of , which means that any phase transitions occur ar zero or infinite noise levels, i.e. we would not have defined the scaling of our Hamiltonian correctly. Similarly, if the effective local fields acting upon the (viz. the arguments of the hyperbolic tangent) and therefore also the expectation values , would be vanishingly weak. We therefore conclude that a natural ansatz for the free exponents is:
(33) 
This simplifies the order parameter equation to
(34) 
Let us analyze this equation further. Since with , we can for replace in the sum over with the sum over all ; the difference is negligible in the thermodynamic limit. In this way it becomes clear that for each solution of we have . Using the invariance of the free energy under , we can from now on focus on solutions with nonnegative magnetizations. If we denote with the number of with , then the value of is to be solved from
(35) 
It is not a priori obvious how the number of nonzero magnetizations (i.e. the number of simultaneously triggered clones) can or will scale with . We therefore set , in which the condition then places the following conditions on and : , and if or if . We expect that if is too large, equation (35) will only have the trivial solution for , so there will be further conditions on and for the system to operate properly. If , the noise due to other condensed patterns (i.e. the sum over ) becomes too high, and can only be zero:
(36) 
On the other hand, if this noise becomes negligible, and (35) reduces to the CurieWeiss equation, whose solution is just the Mattis magnetization [20, 36, 41]. It follows that the critical case is the one where when . Here we have for the following equation for :
(37) 
with the following discrete noise distribution, which obeys :
(38) 
4.3 Computation of the noise distribution
Given its symmetry, we only need to calculate for :
(39)  