E Agliari, A Annibale, A Barra, ACC Coolen, and D Tantari Dipartimento di Fisica, Università degli Studi di Parma, Viale GP Usberti 7/A, 43124 Parma, Italy INFN, Gruppo Collegato di Parma, Viale Parco Area delle Scienze 7/A, 43100 Parma, Italy Department of Mathematics, King’s College London, The Strand, London WC2R 2LS, UK Institute for Mathematical and Molecular Biomedicine, King’s College London, Hodgkin Building, London SE1 1UL, UK Dipartimento di Fisica, Sapienza Università di Roma, P.le Aldo Moro 2, 00185 Roma, Italy London Institute for Mathematical Sciences, 35a South St, Mayfair, London W1K 2XF, UK Dipartimento di Matematica, Sapienza Università di Roma, P.le Aldo Moro 2, 00185 Roma Italy
###### Abstract

Associative network models featuring multi-tasking properties have been introduced recently and studied in the low load regime, where the number of simultaneously retrievable patterns scales with the number of nodes as . In addition to their relevance in artificial intelligence, these models are increasingly important in immunology, where stored patterns represent strategies to fight pathogens and nodes represent lymphocyte clones. They allow us to understand the crucial ability of the immune system to respond simultaneously to multiple distinct antigen invasions. Here we develop further the statistical mechanical analysis of such systems, by studying the medium load regime, with . We derive three main results. First, we reveal the nontrivial architecture of these networks: they exhibit a high degree of modularity and clustering, which is linked to their retrieval abilities. Second, by solving the model we demonstrate for the existence of large regions in the phase diagram where the network can retrieve all stored patterns simultaneously. Finally, in the high load regime we find that the system behaves as a spin glass, suggesting that finite-connectivity frameworks are required to achieve effective retrieval.

###### pacs:
75.10.Nr, 87.18.Vf

## 1 Introduction

After a pioneering paper [1] followed by a long period of dormancy, recent years have witnessed a surge of interest in statistical mechanical models of the immune system [2, 3, 4, 5, 6, 7, 8, 9, 10]. This description complements the more standard approaches, which tend to be phrased in the language of dynamical systems [11, 12, 13, 14]. To make further progress, however, it has become clear that we need new quantitative tools, able to handle the complexities which surfaced in e.g. [15, 16]. This is the motivation for the present study.

There is an intriguing and fruitful analogy (from a modelling perspective) between neural networks, which have been modelled in statistical mechanics quite extensively, and immune networks. Let us highlight the similarities and differences. In neural networks the nodes represent neurons, which interact with each other directly through Hebbian synaptic couplings. In (adaptive) immune systems, effector branches (B-clones) and coordinator branches (helper and suppressor T-clones), interact via signaling proteins called cytokines. The latter can represent both eliciting and suppressive signals. Neural and immune systems are both able to learn (e.g. how to fight new antigens), memorize (e.g. previously encountered antigens) and ‘think’ (e.g. select the best strategy to cope with pathogens). However, neural networks are designed for serial processing: neurons perform collectively to retrieve a single pattern at a time. This is not acceptable in the immune context. Multiple antigens will normally be present at the same time, which requires the simultaneous recall of multiple patterns (i.e. of multiple defense strategies). Moreover, the architectures of neural and immune networks are very different. A model with fully connected topology, mathematically convenient but without a basis in biological reality, is tolerable for neural networks where each neuron is known to have a huge number of connections with others. In contrast, in immune networks, where interactions among lymphocytes are much more specific, the underlying topology must be carefully modelled and is expected to play a crucial operational role. From a theoretical physics perspective, a network of interacting B- and T-cells resembles a bipartite spin glass. It was recently shown that such bipartite spin glasses exhibit retrieval features which are deeply related to their structures [15, 16], and this can be summarized as follows:

• There exists a structural equivalence between Hopfield neural networks and bipartite spin glasses. In particular, the two systems share the same partition function, and hence the same thermodynamics [17, 18].

• One can either dilute directly a Hopfield network, or its underlying bipartite spin glass. The former does not affect pattern retrieval qualitatively [19, 20, 21, 22, 23, 24], whereas the latter causes a switch from serial to parallel processing [15, 16] (i.e. to simultaneous pattern recall).

• Simultaneous pattern recall is essential in the context of immunology, since it implies the ability to respond to multiple antigens simultaneously. The analysis of such systems requires a combination of techniques from statistical mechanics and graph theory.

The last point is the focus of the present paper, which is organized as follows. In Section 2 we describe a minimal biological scenario for the immune system, based on the analogy with neural networks. We define our model and its scaling regimes, and prepare the stage for calculations. Section 3 gives a comprehensive analysis of the topological properties of the network in the extremely diluted regime, which is the scaling regime assumed throughout our paper. Section 4 is dedicated to the statistical mechanical analysis of the system in the medium load regime, focusing on simultaneous pattern recall of the network. Section 5 deals with the high load regime. Here the network is found to behave as a spin glass, suggesting that a higher degree of dilution should be implemented – in remarkable agreement with immunological findings [25, 26] – and this will be the focus of future research. The final section gives a summary of our main conclusions.

## 2 Statistical mechanical modelling of the adaptive immune system

### 2.1 The underlying biology

All mammals have an innate (broad range) immunity, managed by macrophages, neutrophils, etc., and an adaptive immune response. The latter is highly specific for particular targets, handled by lymphocytes, and the focus of this paper. To be concise, the following introduction to the adaptive immune system has already been filtered by a theoretical physics perspective, and immunological observables are expressed in‘physical’ language. We refer to the excellent books [25, 26] for comprehensive reviews of the immune system, and to a selection of papers [2, 3, 4, 15, 16, 27] for explanations of the link between ‘physical’ models and biological reality. Our prime interest is in B-cells and in T-cells; in particular, among T-cells, in the subgroups of so-called ‘helpers’ and ‘suppressors’. B-cells produce antibodies and present them on their external surface in such a way that they are able to recognize and bind pathogenic peptides. All B-cells that produce the same antibody belong to the same clone, and the ensemble of all the different clones forms the immune repertoire. This repertoire is of size clones in humans. The size of a clone, i.e. the number of identical B-cells, may vary strongly. A clone at rest may contain some cells, but when it undergoes clonal expansion its size may increase by several orders of magnitude, to up to . Beyond this size the state of the immune system would be pathological, and is referred to as lymphocytosis.

When an antigen enters the body, several antibodies (i.e. several B-cells belonging to different clones) may be able to bind to it, making it chemically inert and biologically inoffensive. In this case, conditional on authorization by T-helpers (mediated via cytokines), the binding clones undergo clonal expansion. This means that their cells start duplicating, and releasing high quantities of soluble antibodies to inhibit the enemy. After the antigen has been deleted, B-cells are instructed by T-suppressors, again via cytokines, to stop producing antibodies and undergo apoptosis. In this way the clones reduce their sizes, and order is restored. Thus, two signals are required for B-cells to start clonal expansion: the first signal is binding to antigen, the second is a ‘consensus’ signal, in the form of an eliciting cytokine secreted by T-helpers. This latter mechanism prevents abnormal reactions, such as autoimmune manifestations111Through a phenomenon called ‘cross-linking’, a B-cell can also have the ability to bind a self-peptide, and may accidentally start duplication and antibody release, which is a dangerous unwanted outcome..

T-helpers and T-suppressors are lymphocytes that work ‘behind the scenes’, regulating the immune response by coordinating the work of effector branches, which in this paper are the B-cells. To accomplish this, they are able to secrete both stimulatory and suppressive chemical signals, the cytokines [28, 29]. If within a given (small) time interval a B-clone recognizes an antigen and detects an eliciting cytokine from a T-cell, it will become activated and start duplicating and secreting antibodies. This scenario is the so-called ‘two-signal model’ [30, 31, 32, 33]. Conversely, when the antigen is absent and/or the cytokine signalling is suppressive, the B-cells tuned to this antigen start the apoptosis programme, and their immuno-surveillance is turned down to a rest state. For simplicity, we will from now on with the term ‘helper’ indicate any helper or suppressor T-cell. The focus of this study is to understand, from a statistical mechanics perspective, the ability of helpers and suppressors to coordinate and manage simultaneously a huge ensemble of B-clones (possibly all).

### 2.2 A minimal model

We consider an immune repertoire of different clones, labelled by . The size of clone is . In the absence of interactions with helpers, we take the clone sizes to be Gaussian distributed; without loss of generality we may take the mean to be zero and unit width, so . A value now implies that clone has expanded (relative to the typical clonal size), while implies inhibition. The Gaussian clone size distribution is supported both by experiments and by theoretical arguments [4]. Similarly, we imagine having helper clones, labelled by . The state of helper clone is denoted by . For simplicity, helpers are assumed to be in only two possible states: secreting cytokines () or quiescent (). Clone sizes and the helper states are dynamical variables. We will abbreviate , and .

The interaction between the helpers and B-clones is implemented by cytokines. These are taken to be frozen (quenched) discrete variables. The effect of a cytokine secreted by helper and detected by clone can be nonexistent (), excitatory (), or inhibitory (). To achieve a Hamiltonian formulation of the system, and thereby enable equilibrium analysis, we have to impose symmetry of the cytokine interactions. So, in addition to the B-clones being influenced by cytokine signals from helpers, the helpers will similarly feel a signal from the B-clones. This symmetry assumption can be viewed as a necessary first step, to be relaxed in future investigations, similar in spirit to the early formulation of symmetric spin-glass models for neural networks [34, 35]. We are then led to a Hamiltonian for the combined system of the following form (modulo trivial multiplicative factors):

 ^H(\boldmathb,\boldmathσ|ξ)=−1√NTNT∑i=1NB∑μ=1ξμiσibμ+12√βNB∑μ=1b2μ, (1)

In the language of disordered systems, this is a bipartite spin-glass. We can integrate out the variables , and map our system to a model with helper-helper interactions only. The partition function , at inverse clone size noise level (which is the level consistent with our assumption ) follows straightforwardly, and reveals the mathematical equivalence with an associative attractor network:

 ZNT(β,ξ) = ∑\boldmathσ∫db1…dbNBexp[−√β\leavevmode\nobreak ^H(\boldmathb,% \boldmathσ|ξ)] (2) = ∑\boldmathσexp[−βH(% \boldmathσ|ξ)],

in which, modulo an irrelevant additive constant,

 H(\boldmathσ|ξ) = −12NT∑ij=1σiJijσj,\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak Jij=1NTNB∑μ=1ξμiξμj (3)

Thus, the system with Hamiltonian , where helpers and B-clones interact through cytokines, is thermodynamically equivalent to a Hopfield-type associative network represented by , in which helpers mutually interact through an effective Hebbian coupling. See Figure 1. Learning a pattern in this model then means adding a new B-clone with an associated string of new cytokine variables.

If there are no zero values for the , the system characterized by (3) is well known in artificial intelligence research. It is able to retrieve each of the ‘patterns’ , provided these patterns are sufficiently uncorrelated, and both the ratio and the noise level are sufficiently small [4, 20, 36, 42]. Retrieval quality can be quantified by introducing suitable order parameters, viz. , in terms of which the new Hamiltonian (3) can be written as

 H(\boldmathσ|ξ)=−NT2NT∑μ=1m2μ(\boldmathσ). (4)

If is sufficiently small, the minimum energy configurations of the system are those where for some (‘pure states’), which implies that and pattern is said to be retrieved perfectly. But what does retrieval mean in our immunological context? If , all the helpers are ‘aligned’ with their coupled cytokines: those that inhibit clone (i.e. secrete ) will be quiescent (), and those that excite clone (i.e. secrete ) will be active () and release the eliciting cytokine. As a result the B-clone receives the strongest possible positive signal (i.e. the random environment becomes a ‘staggered magnetic field’), hence it is forced to expand. Thus the arrangement of helper cells leading to the retrieval of pattern corresponds to clone-specific excitatory signalling upon the B-clone .

However, if all so the bipartite network is fully connected, it can expand only one B-clone at a time. This would be a disaster for the immune system. We need the dilution in the bipartite B-H network that is caused by having also (i.e. no signalling between helper and clone ), to enable multiple clonal expansions. The associative network (3) now involves patterns with blank entries, and ‘pure states’ no longer work as low energy configurations. Retrieving a pattern no longer employs all spins , and those corresponding to null entries can be used to recall other patterns. This is energetically favorable since the energy is quadratic in the magnetizations . Conceptually, this is only a reshaping of the network’s recall tasks: no theoretical bound for information content is violated, and global retrieval is still performed through bits. However, the perspective is shifted: the system no longer requires a sharp resolution in information exchange between a helper clone and a B-clone222In fact, the high-resolution analysis is performed in the antigenic recognition on the B-cell surface, which is based on a sharp key-and-lock mechanism [2].. It suffices that a B-clone receives an attack signal, which could be encoded even by a single bit. In a diluted bipartite B-H system the associative capabilities of the helper network are distributed, in order to simultaneously manage the whole ensemble of B-cells. The analysis of these immunologically most relevant pattern-diluted versions of associative networks is still at an early stage. So far only the low storage case has been solved [15, 16]. In this paper we analyse the extreme dilution regime for the B-H system, i.e. with .

## 3 Topological properties of the emergent networks

### 3.1 Definitions and simple characteristics

We start with the definition of the bi-partite graph, which contains two sets of nodes (or vertices): the set representing B-cells (labelled by ) and the set representing T-cells (labelled by ), of cardinality and , respectively. Nodes belonging to different sets can be pairwise connected via links, which are identically and independently drawn with probability , in such a way that a random bipartite network is built. We associate with each link a weight, which can be either or ; these weights are quenched and drawn randomly from a uniform distribution. As a result, the state of each link connecting the -th B-clone and the -th T-clone can be denoted by a random variable , distributed independently according to

 P(ξμi) = p2(δξμi,1+δξμi,−1)+(1−p)δξμi,0 (5)

We choose , with subject to , and . Upon tuning , displays different topologies, ranging from fully connected (all possible links are present, for ) to fully disconnected (for ). We have shown in the previous section how a process on this bipartite graph can be mapped to a thermodynamically equivalent process on a new graph, built only of the nodes in , occupied by spins that interact pairwise through a coupling matrix with (correlated) entries

 Jij=NB∑μ=1ξμiξμj. (6)

The structure of the marginalized system is represented by a weighted mono-partite graph , with weights (6), whose topology is controlled by . To illustrate this, let us consider the weight distribution , which can be interpreted as the probability distribution for the end-to-end distance of a one-dimensional random walk. This walk has a waiting probability , and probabilities of moving left () or right () equal to , i.e.

 pw=1−(c/NγT)2,\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak pl=pr=12(c/NγT)2. (7)

Therefore, we can write

 (8)

where the prime indicates that the sum runs only over values of with the same parity as . The result (8) can easily be generalized to the case of biased weight distributions [38], which would correspond to non-isotropic random walks. The first two moments of (8) are, as confirmed numerically in Figure 2:

 ⟨J⟩=0,\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak ⟨J2⟩=(c/NγT)2NB, (9)

We now fix a scaling law for , namely , with . This includes the high-load regime for , as well as the medium-load regime for . The low-storage regime has already been treated elsewhere [15, 16]. We then find

 ⟨J2⟩=αc2/N2γ−δT, (10)

The probability of having scales like for , while for it approaches in the limit . One can recover the same result via a simple approximation, which is valid in the case :

 P(J=0|NB,NT,γ,c)\leavevmode\nobreak ≈\leavevmode\nobreak pNBw=[1−(c/NγT)2]NB, (11)

see also A.1 for a more rigorous derivation of . Given the assumed scaling of , we get , which translates into

 (12)

This quantity can be interpreted as the average link probability in . The average degree over the whole set of nodes333The degree or coordination number of node is the number of its nearest-neighbors, i.e. the number of links stemming from the node itself. Thus, the average degree measures the density of links present in the graph. can then be written as

 ¯z=NTP(J≠0). (13)

Thus, if we adopt a mean-field approach based only on the estimates (12,13), we find that can display the following topologies, expressed in terms of the average degree of (the average number of links per node):

fully disconnected, fully disconnected,
finitely connected, finitely connected,
extremely diluted, but       ——
finitely diluted,       ——
fully connected,       ——

The missing entries in the table correspond to forbidden values . The various cases are also summarized in the left panel of Fig. 3. This picture is confirmed by numerical simulations. The right panels of Fig. 3 give a finite size scaling analysis for the average degree and its fluctuations , measured in realizations of for several choices of parameters. We also show corresponding data for Erdös-Rényi graphs, where all links are independently drawn with probability , for comparison (here and ). We find that (i) in the fully disconnected regime of the phase diagram, decays to zero exponentially as a function of , (ii) in the extremely diluted regime, scales with according to a power law, (iii) in the finitely diluted regime is proportional to , and (iv) in the fully connected regime saturates to . There is thus full agreement with the predictions of the mean-field approach. The fluctuations are slightly larger that those of a purely randomly drawn network, which suggests that exhibits a certain degree of inhomogeneity. This will be investigated next.

It is important to stress that, as the system parameters and are tuned, the connectivity of the resulting network can vary extensively and therefore, in order for the Hamiltonian (4) to scale linearly with the system size , the prefactor of the Hopfield model embedded in complete graphs, is not generally appropriate. One should normalize according to the expected connectivity of the graph.

### 3.2 Component size distribution

As is increased, both and become more and more diluted, eventually under-percolating. The topology analysis can be carried out more rigorously for the (bi-partite) graph , since its link probability is constant and identical for all . We can apply the generating function formalism developed in [39, 40] to show that the size of the giant component () diverges when

 p2=1/NBNT. (14)

Hence, upon setting , the percolation threshold for the bipartite graph is defined by

 N2γ−1−δT=c2α=O(N0T)\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak ⇒\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak γ=(δ+1)/2, (15)

which is consistent with the results of Section 3; we refer to A.2 for full details. Below the percolation threshold the generating function formalism also allows us to get the distribution for the size of the small components occurring in . A (connected) component of an undirected graph is an isolated subgraph in which any two vertices are connected to each other by paths; the size of the component is simply the number of nodes belonging to the component itself. We prove in A.2 that just below the percolation threshold, scales exponentially with . One finds that this is true also for the distribution of graph (see Figure 4, left panel).

Interestingly, the small components of that emerge around and below the percolation threshold play a central role in the network’s retrieval performance. To see this, one may consider the extreme case where the bi-partite graph consists of trimers only. Here each node is connected to two nodes , that is and . The associated graph is then made up of dimers only, and for all . The energetically favorable helper cell configuration is now the one where , for any . This implies that retrieval of all patterns is accomplished (under proper normalization). In the opposite extreme case, is fully connected, and the helper cell system becomes a Hopfield network where parallel retrieval is not realized. In general, around and below the percolation threshold, the matrix turns out to be partitioned, which implies that also the coupling matrix is partitioned, and each block of corresponds to a separate component of the overall graph . For instance, looking at the bipartite graph , a star-like module with node at its center and the nodes as leaves444The opposite case of a star-like module with the center belonging to is unlikely, given that . can occur when the leaves share a unique non-null -th entry in their patterns, that is . For the graph this module corresponds to a complete sub-graph of nodes. In this case the retrieval of pattern is trivially achieved. In fact, a complete sub-graph in can originate from more general arrangements in : each leaf can display several other null entries beyond , but these are not shared, that is . For instance, all stars with centers belonging to and with leaves of length or fall into this extended class. Again, the retrieval of pattern and possibly of further patterns is achieved. However, the mutual signs of magnetizations are no longer arbitrary, as the terms are subject to constraints.

We can further generalize the topology of components in compatible with parallel retrieval, by considering cliques (i.e. subsets of nodes such that every two nodes in the subset are connected by an edge), which are joined together by one link: each clique consists of nodes that share the same non-null entry, so that the unique link between two cliques is due to a node displaying at least two non-null entries. This kind of structure exhibits a high degree of modularity; each clique is a module and corresponds to a different pattern. As for retrieval, this arrangement works fine as there is no interference between the signal on each node in . For this arrangement to occur a sufficient condition is that is devoid of squares, so that two nodes do not share more than one neighbor. This implies that, among the nodes connected to , the probability that any number of these display another common neighbor is vanishing:

 limNT→∞n∑k>2(nk)pk(1−p)n−k=limNT→∞{1−(1−cNγ)n−cnNγ(1−cNγ)n−1}=0. (16)

Since , we obtain the condition . Examples of numerically generated graphs , for different choices of parameters, are shown in Fig. 5.

### 3.3 Clustering properties

We saw that in the operationally most important parameter regime the graph is built of small cliques which are poorly (if at all) connected to each other. This means that non-isolated nodes are highly clustered and the graph will have a high degree of modularity, i.e. dense connections between nodes within the same ‘module’, but sparse connections between nodes in different ‘modules’. The clustering coefficient of a node measures how close its neighbors are to being a clique. It is defined as

 Ci=2Ei/zi(zi−1), (17)

where is the number of links directly connecting nodes pairs in , while is the total number of non-ordered node pairs in . Hence . The average clustering coefficient measures the extent to which nodes in a graph tend to cluster together. It is easy to see that for a bipartite graph, by construction, for any node, while for homogeneous graphs the local clustering coefficients are narrowly distributed around . For instance, for the Erdös-Rény random graph, where links are identically and independently drawn with probability , the local coefficients are peaked around . As for our graphs , due to their intrinsic inhomogeneity, the global measure would give only limited information. In contrast, the distribution of local clustering coefficients informs us about the existence and extent of cliques or ‘bulk’ nodes, which would be markers of low and high recall interference, respectively. Indeed, as shown in Figure 6, in the highly-diluted regime most of the nodes in are either highly clustered, i.e. exhibiting , or isolated, with , whereas the coefficients of the remaining nodes are distributed around intermediate values with average decreasing with , as expected. In particular, when both and are relatively large, approaches a bimodal distribution with peaks at and , whereas when is sufficiently larger than , there exists a fraction of nodes with intermediate clustering which make up a bulk. Therefore, although the density of links is rather small, the average clustering coefficient is very high, and this is due to the fragmentation of the graph into many small cliques.

To measure the extent of modular structures we constructed the topological overlap matrix , whose entry returns the normalized number of neighbors that and share. The related patterns for several choices of parameters are shown in the plots of Fig. 7, and compared to those of Erdös-Rènyi graphs . For Erdös-Rènyi graphs displays a homogeneous pattern, that is very different from the highly clustered cases emerging from . In particular, for the highly diluted cases considered here, we find that smaller values of induce a smaller number of modules, that are individually increasing in size.

## 4 Medium storage regime in extremely diluted connectivity: retrieval region

We now turn to the statistical mechanics analysis, and consider the immune network model composed of T-clones (, ) and B-clones (, ), such that the number ratio scales as

 limNT→∞NB/NδT=α,\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak δ∈(0,1),\leavevmode\nobreak \leavevmode\nobreak α>0 (18)

The effective interactions in the reduced network with helper cells only are described by the Hamiltonian

 H(\boldmathσ|ξ)=−12NτTNT∑i,j=1NB∑μ=1ξμiξμjσiσj, (19)

where the cytokine components are quenched random variables, independently and identically distributed according to

 P(ξμi=1)=P(ξμi=−1)=c/2NγT,\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak P(ξμi=0)=1−c/NγT (20)

with . The parameter must be chosen such that scales linearly with , and must therefore depend on and . Heuristically, since the number of non-zero entries in a generic pattern is , we expect that the network can retrieve a number of patterns of order . We therefore expect to see changes in only when crossing the region in the plane where pattern sparseness prevails over storage load (i.e. , where the system can recall all patterns), to the opposite situation, where the load is too high and frustration by multiple inputs on the same entry drives the network to saturation (i.e. ). To validate this scenario, which is consistent with our previous topological investigation, we carry out a statistical mechanical analysis, based on computing the free energy

 f(β)=−limNT→∞1βNT⟨logZNT(β,ξ)⟩ξ. (21)

### 4.1 Free energy computation and physical meaning of the parameters

If the number of patterns is sufficiently small compared to , i.e. , we do not need the replica method; we can simply apply the steepest descent technique using the Mattis magnetizations as order parameters:

 f(β)=−1βlog2−limNT→∞1βNTlog∫\rmd\boldmathm e−12\boldmathm2+NT⟨logcosh(√β/NτT\boldmathξ⋅\boldmathm)⟩ξ. (22)

with , and . Rescaling of the order parameters via then gives

 f(β) = −1βlog2−limNT→∞1βNTlog∫\rmd\boldmathm eNT(−βc22Nτ+2θ−1T\boldmathm2+⟨logcosh(βcNθT\boldmathξ⋅\boldmathm)⟩ξ), (23)

Hence, provided the limit exists, we may write via steepest descent integration:

 f(β)=−1βlog2−1βlimNT→∞extr\boldmathm[⟨logcosh(βcNθT\boldmathξ⋅\boldmathm)⟩ξ−βc22Nτ+2θ−1T\boldmathm2]. (24)

Differentiation with respect to the gives the self consistent equations for the extremum:

 mμ=N1−τ−θTc⟨ξμtanh(βcNθT\boldmathξ⋅\boldmathm)⟩ξ. (25)

With the additional new parameter , we now have two parameters with which to control separately two types of normalization: the normalization of the Hamiltonian, via , and the normalization of the order parameters, controlled by . To carry out this task properly, we need to understand the physical meaning of the order parameters. This is done in the usual way, by adding suitable external fields to the Hamiltonian:

 H→H−NB∑μ=1λμNT∑i=1ξμiσi (26)

Now, with and the corresponding new free energy ,

 limNT→∞1NT⟨NT∑i=1ξμiσi⟩σ=−∂f(β,\boldmathλ)∂λμ∣∣\boldmathλ=0, (27)

with the short-hand . The new free energy is then found to be

 f(β,\boldmathλ)=−1βlog2−1βlimNT→∞extr\boldmathm[⟨logcosh(β\boldmathξ⋅[cNθT\boldmathm+\boldmathλ])⟩ξ−βc22Nτ+2θ−1T\boldmathm2]\leavevmode\nobreak (28)

Upon differentiation with respect to we find (27) taking the form

 limNT→∞1NT⟨NT∑i=1ξμiσi⟩σ=limNT→∞⟨ξμtanh(βcNθT\boldmathξ⋅\boldmathm)⟩ξ. (29)

We can then use expression (25) for to obtain the physcal meaning of our order parameters:

 mμ = limNT→∞N1−(τ+θ)Tc⟨ξμtanh(βcNθT\boldmathξ⋅\boldmathm)⟩ξ (30) = limNT→∞⟨1cNτ+θTNT∑i=1ξμiσi⟩σ.

Let us summarize the status of the various remaining control parameters in the theory, in the interest of transparency. Our model has three given external parameters:

• : this quantifies the dilution of stored patterns, via ,

• and : these determine the number of stored patterns, via .

It also has two ‘internal’ parameters, which must be set in such a way for the statistical mechanical calculation to be self-consistent, i.e. such that various quantities scale in the physically correct way for :

• : this must ensure that the energy scales as ,

• : this must ensure that the order parameter are of order .

### 4.2 Setting of internal scaling parameters

To find the appropriate values for the internal scaling parameters and we return to the order parameter equation (25) and carry out the average over . This gives

 mμ = N1−τ−θTc⟨tanh(βcNθT((ξμ)2mμ+ξμNB∑ν≠μξνmν))⟩ξ. (31) = N1−τ−θ−γT⟨tanh(βcNθT(mμ+NB∑ν≠μξνmν))⟩ξ. (32)

Having non-vanishing in the limit clearly demands . If the will become independent of , which means that any phase transitions occur ar zero or infinite noise levels, i.e. we would not have defined the scaling of our Hamiltonian correctly. Similarly, if the effective local fields acting upon the (viz. the arguments of the hyperbolic tangent) and therefore also the expectation values , would be vanishingly weak. We therefore conclude that a natural ansatz for the free exponents is:

 (τ,θ)=(1−γ,0) (33)

This simplifies the order parameter equation to

 mμ = ⟨tanh(βc(mμ+NB∑ν≠μξνmν))⟩ξ. (34)

Let us analyze this equation further. Since with , we can for replace in the sum over with the sum over all ; the difference is negligible in the thermodynamic limit. In this way it becomes clear that for each solution of we have . Using the invariance of the free energy under , we can from now on focus on solutions with non-negative magnetizations. If we denote with the number of with , then the value of is to be solved from

 m = (35)

It is not a priori obvious how the number of nonzero magnetizations (i.e. the number of simultaneously triggered clones) can or will scale with . We therefore set , in which the condition then places the following conditions on and : , and if or if . We expect that if is too large, equation (35) will only have the trivial solution for , so there will be further conditions on and for the system to operate properly. If , the noise due to other condensed patterns (i.e. the sum over ) becomes too high, and can only be zero:

 E[(K∑μ=1ξμ)2]=K∑μ=1E[ξμ2]=ϕcNδ′TNγT→∞. (36)

On the other hand, if this noise becomes negligible, and (35) reduces to the Curie-Weiss equation, whose solution is just the Mattis magnetization [20, 36, 41]. It follows that the critical case is the one where when . Here we have for the following equation for :

 m = ∑k∈Zπ(k|ϕ)tanh(βcm(1+k)) (37)

with the following discrete noise distribution, which obeys :

 π(k|ϕ) = (38)

### 4.3 Computation of the noise distribution π(k)

Given its symmetry, we only need to calculate for :

 π(k|ϕ) = limK→∞∫π−π\rmdψ2πe−iψk⟨eiψξ⟩Kξ=limK→∞∫π−π\rmdψ2πe−iψk(1+cϕK(cosψ−1))K (39) = ∫π−π\rmdψ2πe−iψk+ϕc(cosψ−1) = e−ϕc∫π−π\rmdψ2πe−iψk∑n≥0(ϕc)n2nn!(eiψ+e−iψ)n = e−ϕc∫π−π\rmdψ2πe−iψk∑n≥0(ϕc)n2nn!∑l≤nn!l!(n−l)!e−iψ(k−n+2l) = e−ϕc∑n≥0∑l≤n(ϕc2)n1l!(n−l)!δn,k+2l = e−ϕc∑l≥0(ϕc2)k+2l1l!(k+l)!\leavevmode\nobreak =\leavevmode\nobreak e−ϕc\leavevmode