Statistical Mechanics of Competitive Resource Allocation using Agent-based Models

# Statistical Mechanics of Competitive Resource Allocation using Agent-based Models

## Abstract

Demand outstrips available resources in most situations, which gives rise to competition, interaction and learning. In this article, we review a broad spectrum of multi-agent models of competition (El Farol Bar problem, Minority Game, Kolkata Paise Restaurant problem, Stable marriage problem, Parking space problem and others) and the methods used to understand them analytically. We emphasize the power of concepts and tools from statistical mechanics to understand and explain fully collective phenomena such as phase transitions and long memory, and the mapping between agent heterogeneity and physical disorder. As these methods can be applied to any large-scale model of competitive resource allocation made up of heterogeneous adaptive agent with non-linear interaction, they provide a prospective unifying paradigm for many scientific disciplines.

## I Introduction

Most resources are in limited supply. How to allocate them is therefore of great practical importance. The variety of situations is staggering: resources may be tangible (oil, parking space, chocolates) or intangible (time, energy, bandwidth) and allocation may happen instantaneously or over a long period, and may involve a central authority or none.

The optimal allocation of resources is a core concern of economics. The problem can be formalized as the simultaneous maximization of the utility of each member of the economy, over the set of achievable allocations. The key issue is that individuals have typically conflicting goals, as the profit of one goes to the disadvantage of the others. Therefore, the nature of the problem is conceptually different from optimization problems where a single objective function has to be maximized. One key insight is that markets, under some conditions, can solve efficiently the problem. This is sharply explained by Adam Smith in his famous quote:

It is not from the benevolence of the butcher, the brewer, or the baker, that we expect our dinner, but from their regard to their own interest.

Markets, under some conditions, allow individuals to exchange goods for money and to reach an optimal allocation mas1995microeconomic, where none can improve her well-being without someone else being worse off. This is called a Pareto efficient allocation. In the exchanges, prices adjust in such a way as to reflect the value that goods have for different individuals – codified in their marginal utilities.

The problem with this solution are that conditions are rather restrictive: i) assumptions on the convexity of preferences or production functions have to be made. ii) The existence of markets for any commodity that individuals may be interested in, is necessary. Even if markets exist, access to them typically involves transaction costs. iii) Markets do not work for the provision of public goods, i.e., those goods whose consumption does not exclude others to draw benefit from them. iv) Markets require perfect competition where no individual has the possibility to manipulate prices. One aspect, raised by Adam Smith himself long ago in The Theory of Moral Sentiments (1759), is that market functioning requires coordination on a set of shared norms and reciprocal trust. This aspect becomes acutely evident in times of crises, when markets collapse, as we have witnessed in the 2007-08 financial crisis.

Apart from all these issues, general equilibrium theory has provided remarkable insights on the properties of economies mas1995microeconomic. Its strength is that it allows to relate economic behavior to the incentives that motivate the behavior of individuals, as codified in their utility functions. This opens the way to normative approaches by which policy makers may intervene in order to achieve given welfare objectives.

Yet, the predictive power of this approach is rather limited: the mapping from the observable collective behavior to the agents’ utility functions is a one-to-many. For this reason, economists have focused on specific instances where all individuals are equal, which typically reduce to problems where the economy is populated by a single representative individual. The representative agent approach has also the virtue of allowing for closed form solution, yet several of its predictions are a direct consequence of its assumptions kirman1992whom. Furthermore, this approach is silent on the critical conditions that determine the stability of the system, so it provides no hints on the likelihood that markets may collapse, leading to disruption of economic activity.

General equilibrium theory is completely silent on how the equilibrium is reached and on the conditions that allows it to be reached. This aspect has recently been addressed in the community of computer scientists, who have developed algorithms for resource allocation. There the emphasis is on decentralized heuristics for (approximate) solutions of allocation problems that are efficient in terms of computer time and can work under imperfect information. For instance, operating systems use so-called schedulers to allocate CPU and input-output resources in (almost) real time between tasks. Scalability is a primary concern, particularly in a decentralized setting where agents need to do without global optimization. These intrinsically non-equilibrium dynamical problems are often solved by ad hoc methods perkins1999adhoc, some of them being extensions of the models that we review below shafique2011minorityCPU.

The collective behavior of systems of many interacting degrees of freedom has also been studied in physics. There, it has been found that the collective behavior is remarkably insensitive to microscopic details. For example, a law of “corresponding” states has been derived for gases long ago (see e.g., huang1987book) that shows that the macroscopic behavior of real gases is well approximated by a single curve. This “universality” extends also to systems of heterogeneous particles, such as disordered alloys and spin glasses MPV, thereby suggesting that a similar approach may be useful also for understanding economic phenomena. This universality allows one to focus on the simplest possible models that reproduce a particular collective behavior, which may be amenable of analytic approaches leading to a full-fledged understanding of the emergence of the collective behavior. Physicists have therefore applied their tools to analyze problems and model the behaviour of interacting agents in economics and sociology Chakrabarti2006; Sinha2010; Chakrabarti2013; mantegna1999; bouchaud2000; chakraborti2011a; chakraborti2011b. Yakovenko2009 reviewed in a colloquium, statistical models for wealth and income distributions, inspired from the kinetic theory of gases in physics, with minimally interacting economic agents exchanging wealth. Another review Castellano2009 discussed a wide range of topics from opinion formation, cultural and language dynamics, crowd behavior, etc. where physicists studied the collective phenomena emerging from the interactions of individuals as elementary units in social structures. Statistical mechanics of optimization problems would be well worth a review in itself (see e.g., mezard2009information; krzakala2012compressedsensing; mezard2002k-sat).

The present review discusses recent attempts to model and describe resource allocation problems in a population of individuals within a statistical mechanics approach. We focus on competitive resource allocation models with a fully decentralised decision process, that is, without explicit communication between the agents. We expect interaction to play a central role and to give rise to collective phenomena, such as interesting fluctuations, long memory and phase transitions. In addition the agents have a strong incentive to think and act as differently as possible in this type of competitive situation and possibly to revise their strategies Arthur, which implies strong heterogeneity and non-equilibrium dynamics. These ingredients are very appealing to physicists, who possess the tools and concepts able to analyse and sometimes solve the dynamics of such systems, and who feel that they may contribute in a significant way to the understanding of such systems. The usual caveat is that socio-economical agents may be orders of magnitude harder to model than, say, electrons since they have no immutable properties and are adaptive. This, we believe, only adds to their attractiveness and, since the methods of statistical mechanics are able to solve complex models of adaptive agents, it only makes physicists’ point stronger. Our aim is to provide an account of the various mathematical methods used for this family of models and to discuss their dynamics from the perspective of physics. Due to brevity of space, we cannot provide many mathematical details and restrict the bibliography to selected topics and representative publications, but refer the reader to books and reviews.

More specifically, we consider a population of agents that try to exploit resources. Generically, we assume that denotes the number of possible choices of the agents, hence that . Denote the choice of agent by ; his reward, or payoff, is then , where contains the choices of all the agents other than agent . A Nash equilibrium (NE) corresponds to a set such that : it is a maximum of the payoff function and thus no agent has an incentive to deviate from his behavior FudenbergLevine.

Section II is devoted to the simplest case . The agents must choose which resource to exploit, or alternatively, to exploit a resource or to abstain from it. We shall start with the El Farol Bar Problem (EFBP) Arthur: customers compete for seats at the bar. At every time step, they must choose whether to go to the bar or to stay at home. This section is then mainly devoted to the Minority Game (MG) CZ97, which simplifies the EFBP in many respects by taking . Section III assumes that the number of resources scales with and reviews in particular many results about the Kolkata Paise Restaurant problem (KPR), in which restaurants have a capacity to serve one customer each, the agents trying to be alone as often as possible kpr-physica. Section IV extends the discussion to other bipartite problems, where two distinct types of agents must be matched. The Parking space problem adds space and resource heterogeneity to KPR: drivers would like to park as close as possible from their workplace along a linear street and must learn at what distance they are likely to find a vacant space Hanaki2011. It then briefly shows the connection with the celebrated Stable marriage problem, which assumes that men and women have their own ranking of their potential counterparts, and studies what choosing algorithm to apply gale1962college. Finally, it mentions recommendation systems that try to guess the preference lists and suggest items (books, movies, etc.) to customers based on partial information lu2012recommender. The paper is concluded by some discussions on the approaches and on the perspective of the application of physics tools to a larger domain.

## Ii Minority Games

### ii.1 El Farol Bar Problem

Brian Arthur likes to listen to Irish music on Thursday evenings in El Farol bar Arthur. So do 99 other potential patrons. Given that there are only 60 seats in this bar, what should he do? How do his 99 competitors make up their minds, week after week? If this game is only played once, the Nash equilibrum consists in attending the concert with probability . As a side remark, a careful customer may well count the number of seats once he is in the bar, but cannot the total number of potential patrons, which makes NE really unlikely. Real fans of Irish music do repeatedly wish to take part in the fun and use trial and error instead, an example of bounded rationality.

There are indeed many ways to be imperfect, hence, to bound rationality, while still retaining reinforcement learning abilities. Arthur’s agents are endowed with a small set of personal heuristic attendance predictors that base their analyses on the past attendances. They include linear predictors such as moving averages and constant ones, e.g. 42. Adaptivity consists in using the better predictors with larger likelihood. Arthur assumes that the agents trust their currently best predictors. Adaptivity also includes to discard really bad predictors and replacing them with new ones, like Darwinian evolution.

A seemingly remarkable result is that even with such limited learning abilities, agents are able to self-organize and to collectively produce an average number of bar goers (attendance) equal to . It was later realized that this was a fortunate and generic by-product of the chosen unbiased strategy space, not the result of self-organization; other choices are less forgiving in some circumstances CMO03.

However, the point of Arthur remains fully valid: imperfect agents may reach a socially acceptable outcome by learning imperfectly from random rules in a competitive setting. Ongoing competition forces the agents to be adaptive (taken as synonym of learning) in order to outsmart each other. There is no ideal predictor, the performance of one of them depending on the those used by all the agents. Arthur adds that rationality is not applicable in this setting: if everyone is rational, excluding the possibility to take random decisions and assuming that everyone has access to the same analysis tools, everyone takes the same decision, which is the wrong one. Thus, negative feed-back mechanisms make heterogeneity of beliefs or analyzes a necessary ingredient for efficient resource allocation. In passing, heterogeneity may also emerge in absence of competition and negative feedback because it yields better outcomes to some of the players, see e.g. matzke2011emergence.

The very reasons of bewilderment among economists and computer scientists who became interested in this model, were the sames ones which triggered the interest of physicists:

• This model comprises interacting entities. Since they all are heterogeneous, interaction may also be heterogeneous.

• Each run of the game produces a different average attendance, even when the average is done over an infinite number of time steps, which reminds of disordered systems.

• The fact that the agents learn something may be connected in some way to artificial neural networks.

• It is easy to take large values of and . Intuitively, taking some kind of thermodynamical limit should be possible. This particular idea went against other fields’ intuition at the time, see e.g. Casti.

The icing on the cake was the mean-field nature of this family of models: everybody interacts with everybody else because individual rewards depends on the choice of the whole population and rewards are synchronized.

### ii.2 From El Farol to Minority Games

The original version of the EFBP focuses on average attendance, i.e., equilibrium, and has a loosely defined strategy space. Fluctuations are potentially much richer than average attendance. Focusing on on them, i.e, on efficiency amounts to setting and considering a symmetric strategy space: this is the Minority Game CZ97. We refer the reader to the nice preface written by Brian Arthur in MGbook.

It is useful to think of this model as two separate parts:

• The minority mechanism that is responsible for the interaction between the players and negative feed-back.

• The learning scheme that determines the overall allocation performance.

The minority rule can be formalized as follows: each one of the agents must select one of two options; the winners are those who choose the least popular option, i.e., who are in the minority. Mathematically, if the action of agent is , the global action is ; being in the minority is achieved if and have opposite signs; hence, the payoff is .

The MG is a negative sum game: the sum of all payoffs is , with strict inequality if is odd. This is why the fluctuations of are a measure of global losses. Another important measure is the asymmetry of the average outcome, i.e., the crumbs left by the agents, measured by , where the brackets stand for temporal average in the stationary state. When , the outcome is statistically predictable.

Intuitively, a stable situation is reached if no agent has an incentive to deviate from his current behavior; this is known as a Nash equilibrium. When all the agents have the same expected gain, the equilibrium is called symmetric, and asymmetric otherwise. The Nash equilibria of the MG are discussed in MC00:

1. A symmetric NE is obtained when all the agents toss a coin to choose their action; it corresponds to , which yields an imperfect allocation efficiency. Another such NE corresponds to (for even).

2. An asymmetric NE is obtained when for odd. Other such equilibria are reached when agents play and play , while the remaining play randomly. There are very many of them.

#### Re-inforcement learning and allocation efficiency

Simple Markovian learning schemes are well suited to familiarize oneself with the interplay between learning and fluctuations in MGs. Learning from past actions depends on receiving positive or negative payoffs, reinforcing good actions and punishing bad ones. In minority games, as mentioned earlier, the reward to agent is . More generally, the rewards may be where is an odd function. The original game had a sign payoff, i.e. , but linear payoffs are better suited to mathematics, since they are less discontinuous Oxf1; CMZe00. Since learning implies playing repeatedly, it is wise to store some information about the past in a register usually called the score. After time steps, the score of agent that corresponds to playing is

 Ui(t+1)=−t∑t′=1A(t′)N=Ui(t)−A(t)N. (1)

Reinforcement learning is achieved if the probability that agent plays increases when increases and vice-versa. It is common to take a logit model McFadden,

 P[ai(t)=+1]=1+tanh[ΓUi(t)]2, (2)

where tunes the scale of reaction to a change of score; in other words, it is a learning rate. The limit of very reactive agents corresponds to Arthur’s prescription of playing the best action at each time step. This defines a simple MG model introduced and studied in MC00; MarsiliMinMaj; mosetti2006minority; CAM08. If all the agent scores start with the same initial condition, they all have the same score evolution; hence, the dynamics of the whole system is determined by Eq. (1) without indices , whose fixed point is unstable if . In this case, learning takes place too rapidly; a finite fraction of agents reacts strongly to random fluctuations and herds on them. This produces bimodal , hence . On the other hand, if , fluctuations are of binomial type, . To perform better, the agents must behave in a different way. For instance, more heterogeneity is good: the more non-uniform the initial conditions , the smaller and the higher MarsiliMinMaj.

The simple model described above does reach a symmetric equilibrium which is not of the Nash type; one can think of it as a competitive equilibrium. How to maximize efficiency is a recurrent theme in the literature (see Chapter 5 of MGbook). A seemingly small modification to the learning scheme described helps the agents reach an asymmetric NE; the key point is self-impact: when evaluating the performance of the choices and , the agents should account for their own impact on their payoff. More precisely, the payoff is , where : the chosen action on average yields a smaller payoff than the other one, a generic feature of negative feedback mechanisms. This is why MC00 proposed to modify Eq. (1) into

 Ui(t+1)=Ui(t)−A(t)−ηai(t)N,

where allows agent to discount his own contribution; as soon as , the agents reach an optimal asymmetric NE ().

There is a simpler way to obtain a similar result, however: laziness (or inertia). Reents assume that the agents in the minority do not attempt to change their decision, while those in the majority do so with fixed probability . The process being Markovian, a full analytical treatment is possible. Since there are losers, the number of agents that invert their decisions is proportional to . Accordingly, the three regimes described above still exist depending on .

Quite nicely, the agents never need to know the precise value of , only whether they won or not; in addition, the convergence time to is of order when . This performance comes at a cost: the agents need to choose as a function of , i.e., they need to know the number of players, which assumes some kind of initial synchronization or central authority.

All the above approaches do not optimize the speed of convergence to the most efficient state. Dhar2011 noticed that for a given the probability of switching should be such that the expected value of is 0, which is achieved when

 p(t)=|A(t)|−1N+|A(t)|. (3)

This dynamics holds the current record for the speed of convergence to , which scales as time steps. As an illustration . The price to pay was of course to give even more information to the agents: computing of Eq. (3) requires the knowledge of and . This kind of dynamics was extended further in Biswas2012: replacing by in Eq. (3) allows a dynamical phase transition to take place at : when , ; when , converges to 1 in a time proportional to , which duly diverges at . A similar picture emerges when each agent has his own .

Finally, all these types of simple conditional dynamics are very similar to those proposed in the reinforcement learning literature sutton1998reinforcement, although nobody ever made explicit connections. This point is discussed further in Sec. II.3.1.

#### Original Minority Game

The original MG follows the setup of the EFBP: it adds a layer of strategic complexity to this setup, as the agents choose which predictors to use rather than what actions to take. More specifically, a predictor specifies what to do for every state of the world (which was a vector of past attendance in the EFBP). For the sake of simplicity, we assume that the set of the states of the world has elements. A predictor is therefore a fixed function that transforms every into a choice , which is nothing else than a vector of binary choices that the literature on the MG prefers to call strategies. There are of them. Since does not depend on in any way this ensures that that one can define a proper thermodynamic limit, in contrast with the EFBP. In addition, one already can predict that fluctuations are likely to be large if there are many more agents than available strategies: some strategies will be used by a finite fraction of agents, leading to identical behavior, or herding.

In the original MG, is a number corresponding to the binary encoding of the last past winning choices, being the history length, hence, . Note that simple MGs discussed in Sec. II.2.1 will be referred to as henceforth, since .

Adaptivity consists of being able change one’s behavior and is highly desirable in a competitive setting. In the original MG, the agents need therefore at least two strategies to be adaptive; for the sake of simplicity, we shall only consider here agents with two strategies where can take two values; it is advantageous to denote them . The case is investigated in e.g. Savit2; MCZe00; CoolenS>2; AdemarS>2.

In addition, the agents use reinforcement learning on the strategies, not on the bare actions. One thus attributes a score to each strategy that evolves according to

 Ui,s(t+1)=Ui,s(t)−aμ(t)i,s(t)A(t)N, (4)

where and denotes the strategy played by agent at time . Using also a logit model, as in Eq. (1), one writes for

 P[si(t)=+1] =eΓUi,+(t)eΓUi,+(t)+eΓUi,−(t) =1+tanh[Γ(Ui,+(t)−Ui,−(t))/2]2. (5)

The original MG follows Arthur’s ‘use-the-best’ prescription, which corresponds to , while finite was introduced by Oxf1 as an inverse temperature. Extrapolating the results for in Sec. II.2.1, one expects some herding for larger than some value provided that is “large enough”.

Equation (5) shows that the choice of a strategy only depends on the difference of scores for . It is therefore useful to introduce and ; Eq. (4) then becomes

 Yi(t+1)=Yi(t)−Γξμ(t)i(t)A(t)N. (6)

If one denotes , the individual action can be written

 ai(t)=ωμ(t)i+ξμ(t)isi, (7)

and thus , which strongly suggests to introduce ; finally

 A(t)=Ωμ(t)+N∑i=1ξμ(t)isi(t). (8)

Savit showed that the control parameter of the MG is . In other words, properly rescaled macroscopic measurables are invariant at fixed for instance when both and are doubled; this opens the way to systematic studies and to taking the thermodynamic limit. When performing numerical simulations, too many papers overlook the need to account first for the transient dynamics that leads to the stationary state, and many more that this model has an intrinsic timescale, . Intuitively, this is because the agents need to explore all answers from both their strategies in order to figure out which one is better than the other one. As a rule of thumb, one is advised to wait for iterations and to take averages over the next , or each for iterations, although this rough estimate is probably too small near a critical point (see Sec. II.3.3).

Figure 1 reports scaled fluctuations for various values of as function of . The generic features of this kind of plot are the following:

1. The collapse is indeed excellent, up to finite size effects.

2. In the limit , , which corresponds to random strategy choices; since the latter are initially attributed randomly, the resulting actions are also random.

3. In the limit , , which means that a finite fraction of agents is synchronized, i.e., herds.

4. There is a clear minimum of at whose precise location depends on (see also Fig. 3).

Savit also note that the average sign of conditional to a given is zero for and systematically different from zero for , which means that there is some predictability in the asymmetric phase. Denoting the average of conditional to as , one defines a smoother conditional predictability

 H=1PP∑μ=1⟨A|μ⟩2. (9)

Figure 1 reports the behavior of as a function of : there is a transition at where is cancelled. In addition behaves in a smooth way close to the transition . Since is a measure of asymmetry, this behavior is in fact tell-tale of a second-order phase transition with broken symmetry. Accordingly, the two phases are known as symmetric () and asymmetric (), or (un-)predictable.

Provided that the agents are given look-up tables, the presence of a phase transition is very robust with respect to changes in the choice of strategies and various sources of noise in the decision-making process. galla2008transition show that MGs with look-up tables undergo this kind of phase transition as long as a finite fraction of the agents behaves as those of the original MG. The stationary state does not depend on the value of in the asymmetric phase, nor does the location of the phase transition. This is remarkable, as this parameter was introduced as an inverse temperature, but it is not able to cause a phase transition. In fact, it is rather the time scale over which the agents average the fluctuations of their s. In the symmetric phase, however, as shown in Fig. 2, the fluctuations decrease when decreases DosiExp; Oxf1; this is because adding noise to the decision process breaks the herding tendency of the agents by decreasing the sensitivity of the agents to fluctuations of their payoffs, exactly as in the case. Even more, one can show (see Sec. II.3.3) that the dynamics becomes deterministic when . In this limit for all values of MC01.

Initial conditions, i.e., initial score valuations, have an influence only the stationary state of the symmetric phase, i.e., on the emergence of large fluctuations dhulst2000strategy; Moro1. The insights of the case are still valid MarsiliMinMaj: large fluctuations are killed by sufficiently biased initial conditions. This point will be discussed again in Sec. II.3.

### ii.3 Mathematical approaches

Statistical mechanics has been applied successfully to two-player games that have a large number of possible choices galla2013complex; berg1998matrix; berg2000statistical. The MG case is exactly the opposite: two choices, but very many players, with proportionally many states of the world.

#### Algebra: why is there a critical point?

Before understanding why for , it is wise to investigate why is possible at all MC01. From Eq. (9), setting requires all conditional averages to be zero, i.e., , i.e., from Eq. (8),

 ∑iξμi⟨si⟩=−Ωμ. (10)

It helps thinking of as a continuous variable : achieving requires to solve a system of linear equations of variables.

This set of equations yields surprisingly many insights on the stationary state of minority game-like models:

1. The fact that means that one needs more that variables to solve this set of equations; this is because the s are bounded.

2. The control parameter is the ratio between the number of equations and the number of variables, , and not , i.e., the total number of possible strategies per agent.

3. is always a solution if all . In other words, if all the agents have two opposite strategies, one predicts that for , and that ; the exact solution confirms this intuition MMM. What happens for is similar whatever the distribution of : the degrees of freedom not needed to cancel allow the agents to herd and synchronize; as a consequence, .

4. Since s are bounded, some agents have when the equations are not satisfied: those agents always play the same strategy. They are fittingly called frozen CM99. Once frozen, the contribution of an agent is fixed, hence can be incorporated into the fields . Accordingly, the number of degrees of freedom decreases; denoting the fraction of frozen agents by , the remaining number of degrees of freedom is . At , the number of degrees of freedom must equate , i.e., MCZe00; MC01. This intuition is confirmed by the exact solution of the model (see Sec. II.3.6).

5. This set of equations specifies which subspace is spanned by the dynamic variables in the stationary state MC01. GallaClubbing noted that as long as the dynamics is of the form for some function , a similar set of equations is solved by the dynamics; however, if acquires dependence on , or if is multiplied by a discount factor, or if a constant is added to the payoff, no such set of equations holds and no phase transition is found.

#### Continuous time

MC01 derive the continuous-time limit of Eq. (6). The key idea is to average the payoffs to agents in a time window of length proportional to the intrinsic time scale of the system, , thus, to define the continuous time .1 In the thermodynamic limit, at fixed , becomes continuous. Finally, setting , one finds

 dydτ =−¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ξμi⟨A(τ)|μ⟩y+ζi (11) =hi+∑jJi,jtanh(yj)+ζi, (12)

where the average is over the distribution of the s at time , i.e., depends on the s at time , and and the noise term is

 ⟨ζi(τ)⟩ =0 (13) ⟨ζi(τ)ζj(τ′)⟩ =ΓN¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ξμiξμj⟨A2|μ⟩yδ(τ−τ′). (14)

This shows that the dynamics becomes deterministic when ,.

The autocorrelation of the noise term does not vanish in the thermodynamic limit. Even more, it is proportional to the instantaneous fluctuations, which makes sense: this reflects the uncertainty faced by the agents, which is precisely . This is in fact a powerful feedback loop and is responsible for the build-up of fluctuations near the critical point. Deep in the asymmetric phase, this feedback is negligible, thus is a good approximation, which consists of equating the instantaneous volatility to the stationary volatility. This is fact is a very good approximation over the whole range of . The noise autocorrelation takes then a form familiar to physicists,

 ⟨ζi(τ)ζj(τ′)⟩≃2TJi,jδ(τ′−τ), with T=Γσ22N. (15)

This effective theory is self-consistent: depends on , which depends on , which depends on . The probability distribution function in the stationary state is given in MC01.

The derivation of continuous-time dynamics makes it possible to apply results from the theory of stochastic differential equations. Using Veretennikov’s theorem veretennikov2000polynomial, ortisi2008polynomial derives an upper bound to the speed of convergence to the stationary state which expectedly scales as for .

#### Signal-to-noise ratio, finite size effects and large fluctuations

Figure 3 shows the existence of finite-size effects near . In particular, the larger the system size, the smaller the minimum value of and the smaller the location of its minimum. To understand why this happens, one has to take the point of view of the agents, i.e., of their perception of the world, which is nothing else than Eq. (11). The fluctuations of the score of agents and become correlated via their noise terms if the strength of the latter becomes comparable to that of their payoffs, i.e., when , where is a proportionality factor. Since , this condition becomes, by incorporating into ,

 Hσ2=K√P. (16)

and are known from the exact solution for infinite systems (see Sec. II.3.6). The above intuition is confirmed by numerical simulations and the exact solution (Fig. 3): one sees that the intersection between and the ratio given by the exact solution predicts the point at which deviates significantly from the exact solution, defined as the locus of its minimum. Since for (see Sec. II.3.6), the size of this region scales as . Similar transitions are found in all MGs in which the noise may acquire a sufficient strength, in particular in market-like grand-canonical games (see Sec. II.5); the procedure to find them is the same: derive continuous time equations, compute the inter-agent noise correlation strength, and match it with the drift term. This transition is ubiquitous: it happens in any model underlaid by a minority mechanism when the agents do not account for their impact.2

#### Reduced set of strategies

A naive argument suggests that herding should occur when a finite fraction of agents adopt the same strategies, hence that . The problem lies in the definition of ness: the fraction of different predictions between strategies and is the Hamming distance

 d(a,b)=1+1P∑μaμbμ2. (17)

For large , two strategies do not differ by much if they differ by only one of their predictions. ZENews defines three levels of sameness: either same (), opposite (), or uncorrelated (). Starting from an arbitrary strategy , there are exactly strategies that are either same, opposite, or uncorrelated with and with each other CZ98; this is called the reduced strategy set (RSS). Forcing the agents to draw their strategies from this set yields very similar CZ98. Now, since , the RSS allows to decouple the correlation term into the contributions of uncorrelated, correlated and anti-correlated agents; the latter two are known as herds and anti-herds, or crowds and anti-crowds. Thus, the final value of the fluctuations can be seen as the result of competition between herding and anti-herding. This yields several types of analytical approximations to the fluctuations that explain the global shape of as a function of . More generally, as it reduces much the dimension of the strategy space, this approach simplifies the dynamics of the model and allows one to study it in minute details; it has been has been applied to a variety of extensions JohnsonCrowds; JohnsonDeterministic; JohnsonCrowdsTheory; choe2004errordriventransition. In addition, when the agents only remember the last payoffs, the whole dynamics is Markovian of order ; simple analytical formulations give many insights about the origin of large fluctuations JohnsonHorizon; satinover2008cycles.

#### The road to statistical mechanics

A great simplification comes from the fact that the global shapes of and are mostly unchanged if one replaces the bit-string dynamics of s with random drawn uniformly with equal probability Cavagna.3

In short, two methods are known to produce exact results: the replica trick and generating functionals à la DeDominicis. The replica trick is simpler but less rigorous; in addition it requires to determine what quantity the dynamics minimizes, which is both an advantage as this quantity reveals great insights about the global dynamics and a curse as there may be no discernible minimized quantity, precluding the use of this method. Generating functionals consist of rigorous ab initio calculus and does not require the knowledge of the minimized quantity, which is both regrettable and a great advantage (invert the above statements about the replica calculus). The following account aims at giving the spirit of these methods and what to expect from them, i.e., their main results, their level of complexity and their limitations.

For lack of space, we can only give the principles of the methods in question. Detailed calculus is found in demartino2006statistical, who deal with statistical mechanics applied to multi-agent models of socio-economic systems, galla2006anomalous who review anomalous dynamics in multi-agent models of financial markets, and the two books on the MG MGbook; CoolenBook.

#### Replica

The drift term in Eq. (11) contains the key to determine the quantity minimized by the dynamics: one can write ; therefore, the predictability is akin to a potential. When , the dynamics is deterministic and is a Lyapunov function of the system and is minimized; when , still tends to its minimum. A similar line of reasoning applies to non-linear payoffs and yields more intricate expressions MC01.

Let us focus on linear payoffs. Given its mathematical definition, possesses a unique minimum as long as , which determines the properties of the system in the stationary state. Regarding as a cost function, i.e., an energy, suggests to use a partition function , which yields the minimum of at zero temperature

 min{mi}H=−limβ→∞1βlogZ. (18)

This only holds for a given realization of the game, i.e., for a given set of agents, which is equivalent to fixed (quenched) disorder in the language of physics. Averaging over all possible strategy attributions is easy in principle: one computes . Averages of logarithms are devilishly hard to compute, but the identity leaves some hope: one is left with computing , which must be interpreted as replicas of the same game running simultaneously, each with its own set of variables. The limit is to not to be taken as annihilation, but as analytical continuation.

Finally, one takes the thermodynamic limit, i.e., , at fixed . In this limit, the fluctuations of global quantities induced by different strategy allocations vanish: the system is called self-averaging. In passing, this implies that numerical simulations require less samples as the size of the system decreases in order to achieve similar error bars.

As usual, one loves exponentials of linear terms when computing partition functions. is a sum of squared terms that are transformed into linear terms averaged over Gaussian auxiliary variables. This finally yields

 H0=limN→∞HN=1+Q02(1+χ)2, (19)

where measures strategy-use ‘polarization’ and is the integrated ‘response’ to a small perturbation. These two quantities are defined as

 Q0 =1−√2πe−ζ2ζ−(1−1ζ2)erf(ζ√2), (20) χ =erf(ζ√2)α−erf(ζ√2), (21)

where is determined for by

 α=[1+Q0(ζ)]ζ2. (22)

Hence, is a function of and determines all the above quantities. Since , this equation is easily solved recursively by writing , with .

is only possible when , i.e., when the response function diverges. This happens at the phase transition, which implies that

 αc =erf(ζc√2)=0.3374… (23)

and that near the critical point in the asymmetric phase.

The full distribution of is given by

 P(m)=ϕ2δ(m−1)+ϕ2δ(m+1)+ζ√2πe−(ζm)2/2, (24)

where is the fraction of frozen agents. Incidentally, this confirms that at the critical point, as guessed in Sec. II.3.1.

The fluctuations do not depend on initial conditions for and are given by

 σ20=H0+12(1−Q0). (25)

The more rigorous generating functionals discussed below reproduce all the above equations and bring many more insights on the dynamics. They also show in what limit Eq. (25) is correct CoolenOnline.

The case is more complex. The good news is that the above equations equation are still valid when . By introducing Gaussian fluctuations around the stationary values of , MC01 give the first-order expansion

 σ20=1−Q02[1+1−Q0+α(1−3Q0)4αΓ+O(Γ2)] (26)

whose validity can be checked in the inset of Fig. 2. Furthermore, MC00 derive above which becomes of order . As shown in Fig. 2, reaches its large value plateau for a finite value of ; as a consequence, the limit can be interpreted as equivalent to large enough .

Replica calculus has been extended to account for biased initial conditions in the symmetric phase in an indirect way; for example, the limit of infinite bias yields for small MC01.

Finally, replica calculus for games with is found in MCZe00. CM00 take into account the diversity of frequency of the in games with real histories. Replica calculus can also be applied to the extensions discussed in Sec. II.4 that have a discernible cost function.

#### Generating functionals

Generating functionals keep the full complexity of the dynamics of the model in an elegant way DeDominicis. The reasoning is as follows: the state of a MG at time is given by the vector of score differences ; one is thus interested in and in its evolution, written schematically as

 Pt+1(Y′) =∫dyP(Y)Wt(Y′|Y) (27) Wt(Y′|Y) =∏iδ[Y′i−(Yi−Γξμ(t)iA(t)/N)], (28)

where one recognizes Eq. (6) in Eq. (28). This suggests a way to describe all the possible paths of the dynamics: the generating functional of the dynamics is

 Extra open brace or missing close brace (29)

from which one can extract meaningful quantities by taking derivatives of with respect to the auxiliary variables, for instance . An important point is that the really hard work resides in taking the average over all possible paths. What is multiplied by can be chosen at will depending on what kind of information one wishes to extract from . In addition, it is very useful to add a perturbation to the dynamical equations, so that : taking derivatives of with respect to yields response functions of the system.

Nothing prevents in principle to include the dynamics of and thus solve the full original MGs. Given the length of the calculus, it is worth trying to simplify further the dynamics of and by assuming that the agents update every time steps, and that the ’s appear exactly once during this interval. This is called the batch minority game Moro1, while the update of the ’s at each time step is referred to as on-line. Crucially, once again, the global shape of , and are left intact. Batch games lead to a simpler , which reads now .

The calculus is long: after putting the last Dirac and Heaviside function in an exponential form, and removing all the non-linear terms of the argument of the exponential with auxiliary variables, one performs the average of the disorder (i.e., strategy assignment) and then take the thermodynamic limit. One is then usually rewarded by the exact effective agent dynamics

 Y(t+1)=Y(t)+θ−α∑t′≤t(\mathbbm1+G)−1tt′sgnY(t′)+√αη(t), (30)

where is the average response function of the spins encoding strategy choice at time to a perturbation applied at an earlier time and is a Gaussian zero-average noise with correlation given by , where and is the average spin autocorrelation between time and . This equation is not that of a representative agent, but is representative of all the agents: one agent corresponds to a given realization of the noise .

A further difficulty resides in extracting information from Eq. (30). In the asymmetric phase, one exploits the existence of frozen agents, for which and assumes that the stationary state correspond to time translation invariance for . Thus, introducing the notations and ,

 ~Y=−α1+χs+√α^η, (31)

where , and and correspond to the quantities defined in the replica section; note that generating functions give a precise mathematical definition of . After some lighter computations, one recovers all the equations of the replica calculus; in addition, one also can discuss in greater rigor the validity of simple expressions for . The symmetric phase still resists full analysis demartino2011nonergodic, which prompted the introduction of a further simplified MG, of the spherical kind galla2003dynamics (see Sec. II.4).

The above sketch shows that the dynamics is ever present in the equations; reasonable assumptions about some quantities may be made regarding their time dependence or invariance, etc. This method allows one to control the approximations; it has confirmed the validity of the continuous time equations and of the effective theory introduced in Sec. II.3.2.

The original MG gradually yielded to the power of generating functionals: first batch MGs CoolenBatch, then on-line MGs with random histories CoolenOnline, then on-line MGs with real histories CoolenRealHistories, which is a genuine mathematical tour de force; the case is treated in CoolenS>2 and was later simplified in AdemarS>2.

### ii.4 Modifications and extensions

The MG is easily modifiable. Given its simplicity and many assumptions, a large number of extensions have been devised and studied. Two types of motivation stand out: to determine in what respect the global properties, e.g. herding, phase transition, etc., depend on the strategy space and learning dynamics, and to remedy some shortcomings of the original model.

#### Payoffs

The original MG has a binary payoff, which is both a curse for exact mathematical methods and a blessing for ad-hoc combinatorial methods. MC01 show how to derive the quantity minimized by a MG with a generic payoff function ; C04 extends this argument to explain why the location of the critical point is independent on as long as is odd SavitPayoff, which is confirmed in papadopoulos2009theory, who solved the dynamics of the game for any payoff with generating functionals and add that should also be increasing (see also Sec. II.5). The dynamics of the symmetric phase does depends on the choice of payoff. For instance the game becomes quasi-periodic only when a sign payoff is used CM99; galla2005strategy, and only for small enough liaw2007three.

From a mathematical point of view, the majority game can be considered a MG with another payoff; is now maximized, in a way reminiscent of Hopfield neural networks hopfield1982neural, which makes possible to use replicas kozlowski2003majGame and generating functionals papadopoulos2009theory. Mixing minority and majority players also yields to mathematical analysis deMGM03; papadopoulos2009theory and is discussed further in Sec. II.5.

#### Strategy distributions

The thermodynamic limit only keeps the two first moments of the strategy distribution (a consequence of the central limit theorem). Its average must be rescaled, , in order to avoid divergences; the location of the critical point depends on both variance and average of CCMZ00; CMO03.

The agents may draw their strategies in a correlated manner; for instance, an agent may draw a first strategy at random as before, but he chooses his second one so that , with MMM; garrahan2001correlated; galla2005strategy.

Strategies may be used in a different way: Oxf1 propose to perform an inner product between a given strategy, considered a vector, and a random vector living on the unity sphere; this model is solved in coolen2008inner.

Sec. II.5 deals with strategies that also contain a ‘zero’ choice, i.e., the possibility to refrain from playing.

#### Spherical Minority Games

A special mention goes to spherical MGs galla2003dynamics whose dynamics is exactly and explicitly solvable in all phases while keeping almost the same basic setup of the original MG; when using a generating function for the latter, calculus is hindered by the non-linearity of : the boldest way to remove it is to set . Because of Eq. (7), the agents may now use any linear combination of their two strategies. Since may diverge, one adds the spherical constraint . This family of models also undergoes phase transitions; its phase space has a quite complex structure.

Many extensions to the MG have been made spherical, thus, duly solved galla2003dynamics; galla2005stationary; galla2005strategy; papadopoulos2008market; bladon2009spherical; demartino2011nonergodic.

#### Impact of used strategies and Nash equilibrium

The agents have several strategies to choose from and use only one at a time. A key point to understand why the agents fail to control the fluctuations better in the symmetric phase is the difference of expected payoff between the strategies that an agent does not use, and the one that he plays. The discussion parallels that of the case (see Sec. II.2.1 ): separating the contribution of trader from in Eq. (4) shows once again that self impact results in payoffs that are biased positively towards the strategies not currently in use and explains why all the agents are not frozen in the original MG. Agents may experience difficulties in estimating their exact impact; hence, MCZe00 proposed to modify Eq. (4) to

 Ui,s(t+1)=Ui,s(t)−aμ(t)i,s(t)A(t)/N+ηδs,si(t). (32)

Remarkably, the agents lose the ability to herd as soon as : is discontinuous at in the symmetric phase; a Nash equilibrium is reached for and all the agents are frozen; there are exponentially (in ) many of them demartino2001replicasymbreaking; the one selected by the dynamics depends on the initial conditions. The agents minimize , which coincides with when . MCZe00 noted that the difference between and is similar to an Onsager term in spin glasses MPV. When has no more a single minimum, the replica calculus is more complex; one needs to use the so-called 1-step replica symmetry breaking assumption (1-RSB) MPV. demartino2001replicasymbreaking applies this method and reports the line at which ceases to have a single minimum, also known as the de Ameilda-Thouless (AT) transition line AT1978. AdemarHeimel use generating functionals to solve the dynamics of Eq. (32) and discuss this transition from a dynamical point of view by focusing on long-term memory and time translation invariance. A simpler way to compute the AT line is given in MGbook.

#### Time scales and synchronization

The original MG has two explicit intrinsic time scales, and , which are common to all the agents. There is a third one, the time during which a payoff is kept in , and is infinite by default. Introducing a finite payoff memory is easy if one discounts exponentially past payoffs, which amounts to writing

 Ui,s(t+1)=Ui,s(t)(1−λP)−aμ(t)i,s(t)A(t)N, (33)

where and the factor was chosen so as to introduce as a separate timescale; the typical payoff memory length scales as for small . This seemingly inconspicuous alteration of the original dynamics changes very little the dynamics of the asymmetric phase. It does however solve the problem of non-ergodicity of the symmetric phase since initial score valuations are gradually forgotten CDMP05. Unfortunately, it also has a great influence on analytical results, since an infinitesimal has so far prevented from obtaining any mathematical insight about the stationary state from generating functionals: they still yield the exact effective agent dynamics but nobody has found a way to extract information about the stationary state because there are no more frozen agents CDMP05; demartino2011nonergodic. The spherical MG with payoff discounting is of course exactly solvable with this method bladon2009spherical; demartino2011nonergodic. Replicas can be applied in some cases: marsili2001learning study an MG with impact and discounting; the quantity minimized by the dynamics is now ; as the ratio between the memory and learning timescales increases, the system undergoes a dynamical phase transition at between a frozen RSB phase and a phase in which it never reaches a Nash equilibrium. Finally, the case is easily solved with . For instance the critical learning rate is : forgetting the past destabilizes the dynamics as this decreases the effective over which past payoffs are averaged mosetti2006minority.

There is converging evidence that human beings act at widely different timescales in financial markets LilloUtility; zhou2012strategies.4 In the context of the MG, they may therefore differ in , or . mosetti2006minority split the populations in subgroups that each have a different set of and/or , for : it turns out that it is advantageous to have a smaller and a larger . In other words, to learn as little as possible and to forget it as soon as possible, i.e., to behave as randomly as possible. This makes senses, as a random behavior is a Nash equilibrium. Heterogeneity of is studied e.g. in CZ97; CZ98; SavitPayoff; JohnsonEnhancedWinnings; MMM.

Another way to implement heterogeneous time scales is to assume introduce the possibility of not doing anything for some , i.e., to generalize the probability distribution of to Piai; each agent has an intrinsic frequency drawn from a known distribution; agents that play frequently are less likely to be frozen. Replicas Piai and generating functionals demartino2003dynamics solve this extension.

Finally, the MG assumes perfect synchronization, which is a strong assumption, but a useful one. Note that introducing frequencies as discussed above is a cheap way to build partial synchronicity, especially for small average value of . mosetti2009structure proposed a way to fully desynchronize agent-based models; the maximally asynchronous MG keeps its phase structure provided that the temporal structure of interaction is not too noisy.

#### Learning algorithm

The common rationale of all learning schemes is that using them should a priori improve the realized payoffs. Quite remarkably, the literature on the MG has mainly considered variations of the theme of the logit model, most often fixed look-up tables, and simple ad-hoc Markovian algorithms, ignoring the rest of the vast reinforcement learning (RL) literature, which in passing goes against the golden rule of learning: agents (including researchers) should find the balance between learning and exploration catteeuw2012heterogeneous; see sutton1998reinforcement for a superb review written at the time of the introduction of the MG. In particular, -learning is currently thought to mimic very well how human beings learn; see montague2006imaging for a review. It consists in exploiting optimally the relationship between one’s actions at time and the payoff at future time , conditionally on the states of the system at times and : the payoffs at time therefore also comprise some future expected payoffs. The definition of states and actions are to be chosen wisely by the authors: MG-Qlearning use look-up tables ; the possible actions and state space are the choice of strategy; this means that agent chooses according to a -learning rule; the resulting fluctuations are very similar to a Nash equilibrium for look-up tables, though nobody has ever checked it accurately. catteeuw2012heterogeneous assume instead that the state is (real histories) and possible actions are ; they also assume that the resource level is a sinusoid and show that -learning does very well in this context. catteeuw2009learning considers a setting and shows that -learning also converges to the Nash equilibrium , as do other very simple schemes from RL literature that are close to the ad-hoc ones discussed in Sec. II.2.1; interestingly, using -learning is a dominant strategy if the agents may select their RL scheme by Darwinian evolution. No analytical results have so far been reported about these alternate RL schemes, although obtaining some seems within reach.

Strategy exploration by the agents, i.e., letting the agents evolve badly performing strategies, has been investigated in MG literature: a look-up table is akin to a DNA piece of code, hence changing it is akin to genetic mutations. CZ98; SavitEv1 let the worst performing agents replace their strategies, either at random, or by cloning those of the best players; Sysi-Aho2003a; Sysi-Aho2003b; Sysi-Aho2003c; Sysi-Aho2004a give to the agents the possibility of hybridization and genetic crossover of their own strategies; CZ97; SavitEv2 allow the agents to choose their memory length. In all these papers, strategy exploration is beneficial to the agents and to the system as a whole, and sometimes spectacularly so, see Sysi-Aho2003a; Sysi-Aho2003b; Sysi-Aho2003c; Sysi-Aho2004a.

In Kinzel, Kinzel2, kinzel2002interacting, the agents use simple neural networks (perceptrons); the authors derive an analytical expression for as a function of the learning rate. They also note that the neural networks have the peculiar task of anti-learning, which tends to produce seemingly random outputs, and discuss a possible application to cryptography.

### ii.5 Minority Game and financial markets

The connection between financial markets and MGs is both strikingly intuitive and deceptively hard to formalize clearly. At a high level, it rests on the following observations:

1. Financial markets are competitive and their dynamics is similar to Darwinian evolution ZMEM; FarmerForce; lo2004adaptive.

2. They are negative sum games, if only because of transaction costs.

3. They tend to have bursts of fluctuations (called volatility in this context).

4. They tend to be almost unpredictable because traders (human beings or algorithms) tend to exploit and reduce price predictability.

So far, the MG has all the ingredients needed to model the dynamics of a model of price predictability dynamics, except a price dynamics. Since is an excess demand or offer of something, assume for the time being that it has some relationship with price evolution (this point is discussed at length below). Then Fig. 1 provides a very appealing scenario for the emergence of large fluctuations in financial markets: predictable prices correspond to mild fluctuations are bound to attract more traders who then reduce ; once the signal-to-noise ratio becomes too small, the agents herd on random fluctuations and produce large fluctuations. Large price fluctuations are therefore due to too a small predictability. In other words, markets are stable as long as they are predictable and become unstable if the traders (i.e., money) are in play. MarsiliInstabMarkets also find the existence of a critical amount of invested capital that makes markets unstable in a very different model. This suggests in turn that real markets should hover over a critical point, which explains periods of quiescence and periods of large fluctuations.

One of the shortcomings of the above picture is that is fixed in the game, which implies some sort of adiabatic approximation. Adaptive agents should be able to decide by themselves when they are willing to play.5 In a financial market context, the agents must not only decide which is the best strategy to play, but also if it is worth using it. In other words, the agent’s decision should rest not only on payoff differences (e.g. ), but also on the value of SZ00; J99; J00: this leads to the Grand Canonical MG (GCMG), in which a reservoir of agents may or may not play at a given time step depending on whether one of their trading strategies is perceived as profitable. This, in fact, mimics precisely for instance how quantitative hedge funds behave. The learning algorithms that they apply are hopefully more sophisticated; for instance, some of them try to account for their impact on the price dynamics when backtesting a strategy.

In the simplest version of the GCMG, the agents have only one trading strategy and the possibility of not playing; this is equivalent to having two strategies, one drawn at random, and the zero strategy CM03. The score difference dynamics is

 Yi(t+1)=Yi(t+1)(1−λP)−aμi(t)A(t)P−ϵP. (34)

The last term is a benchmark, i.e., the value attributed to not playing. It is the sum of the interest rate and transaction costs, and possibly of the willingness to play of a given agent. When , does not make sense since an agent that comes in and then goes out of the game experiences a sure net loss. The typical timescale of the GCMGs is proportional to .

Since the GCMG is a negative sum game, all the agents stop playing after a while if the score memory length is large enough. In other words, they need to feed on something. MMM introduce additional agents with fixed behavior, called producers, who use the markets for other purposes than speculation. The producers play a negative sum game, but a less negative one thanks to the speculators, which may play a positive game thanks to the producers. This defines a kind of market ecology best described as a symbiosis MMM; ZMEM; CCMZ00. One assumes that there are speculators and producers.

For and , this model possesses a semi-line of critical points : in other words, it is in a critical state as soon as there are enough speculators in the reservoir. The signal-to-noise transition is still present, which leads to anomalous fluctuations: using the method described in Sec. II.3.3, one finds

 Hσ2+2ϵ√HPPσ2+ϵPσ2≃K√P, (35)

which is confirmed in Fig. 4; when , one recovers Eq. (16), thus behaves as in Fig. 2. When , the region of anomalous fluctuations shrinks as the system size diverges; see CM03; galla2006anomalous for more details. The version of the GCMG has additional instabilities compared to a standard MG CAM08.

Not only the distribution of becomes anomalous, but the strength of fluctuations acquires a long memory. This is a feature generically found in MGs where agents can modulate their activity, either by re-investing a fraction of their gains, or by deciding to trade or not to trade. This result is even more generic: BouchaudGiardina shows that any model in which the agents decide to trade or not depending on the sign of a random walk acquires automatically long memory in its activity and, by extension, to volatility. In the case of market-like MGs, whether to trade or not is based on the trading performance of a strategy. The agents that switch between being active and inactive have a strategy score that is very well approximated by a random walk.

The two possible actions and (and possibly ) may mean sell and buy, respectively. In that case is an excess demand, which has an impact on price evolution; J00 use a linear price impact function FarmerImpact; ContBouchaud, . This implies that is a price return.

But this raises the question of why the traders are rewarded to sell when the majority buys, and reversely. There are two answers to this. First, when an agent makes a transaction, being in the minority yields on average a better transaction price MGbook. Why should an agent transact at every time step, then, unless he is a market maker?6 MarsiliMinMaj argued that the agents do not know which price they will obtain when they trade, thus that they need to form expectations on their next transaction price: the agents who believe that the price follows a mean-reverting process play a minority game, while those who believe that prices changes are persistent play a majority game. deMGM03 therefore introduced a model with minority and majority traders and give its solution with replica and generating functions, later generalized in papadopoulos2009theory.

There remains, however, an inconsistency: predictability is linked to speculation, but the agents cannot really speculate, as their actions are rewarded instantaneously. This is why BouchaudGiardina; dollargame proposed to reward current actions with respect to future outcomes, i.e., : this is a delayed majority game whose peculiarity is that the agents active at time also play at time ; it is known as the $-game. The nature of this game depends on the sign of the autocorrelation of : an anticorrelated causes an effective minority game, and reversely; left alone,$-game players tend to be equivalent to majority players ferreira2005real; satinover2008cycles. BouchaudGiardinadefine a more realistic model and show that the price may be periodic (i.e., produce bubbles and crashes), stable, or intermittent (i.e. realistic) depending on the ratio and the contrarian/trend-following nature of the strategies.

And yet, modeling speculation must include at least two time steps: its somehow counter-intuitive salient feature is that one possibly makes money when waiting, that is, when doing nothing. ferreira2005real stretched single time-step look-up tables to their limit by assuming that agent whose action was at time must play at time . If all agents act synchronously, and the $-game becomes a minority game. When some people act at even times and the others at odd times, the nature of the market is more complex: in a mixed population of minority/majority/$-game players, the game tends to be a minority game.

Modeling speculation requires to walk away from single time-step look-up tables. One wishes however to keep a discrete number of states, which makes it easy to define price predictability. C05 still assumes that the market states are , either random or real; an agent can only recognize a small number of market states and may only become active when is one of them; he may invests between pairs of patterns if he think it worthwhile. Accordingly, global price predictability is now defined between all pairs of market states. Price fluctuations, predictability and gains of speculators as a function of the number of speculators are very similar to those of GCMGs.

We believe therefore that the MG is the correct fundamental model to study the dynamics of predictability, hence market ecology and their influence on price fluctuations. Reversely, any correct model must contain agents that learn, exploit and reduce predictability; it therefore contains some kind of minority mechanism, be it explicit or hidden. For instance, hasanhodzic2011computational introduced independently a model of agents learning price predictability associated to a given binary pattern and study how information is removed; it is best described as a minority game. Another attempt to define ab initio a model with producers and speculators in which the speculators remove predictability patzelt2012unstable is equivalent to the MG defined in galla2009minority.

All MG models are able to reproduce some stylized facts of financial markets; notably and allow their agents to modulate their investments according to their success, as for instance GCMGs. In addition, evolving capitals and reinvestment have the same effect and lead to power-law distributed at the critical point for CCMZ00, as well as for galla2009minority. At this critical point, anomalous fluctuations are not finite size effects. Even better, generating functionals solve the model; what happens at the critical point awaits further investigations.

Since the dynamics of market-like MGs is reasonably well-understood, one may probe how it reacts to dynamical perturbations. The effect of Tobin-like taxes in a GCMG is akin to increasing the baseline ; not only it reduces the occurrence of anomalous fluctuations in the stationary state, but the dynamical decrease of anomalous fluctuations in reaction to a sudden increase of is very fast bianconi2009tobin. On the other hand, papadopoulos2008market introduced a constant or periodic perturbation to in a spherical MG; the effect of such a perturbation is counter-intuitive: may lock-in in phase with the perturbation, which increases fluctuations. A third work investigated the effect of a deterministic perturbation that lasts for a given amount of time in the non-spherical GCMG; this corresponds to a sometimes long series of transactions of the same kind (e.g., buy) known as meta-orders; see BouchaudFarmerLillo for a review. Using linear response theory and results from the exact solution of the game, barato2011impact computed the temporal shape of the impact on the price to expect from such transactions.

There is yet another way of understanding the relationship between the MG and financial markets C05: on a abstract level, corresponds to perfect coordination, as it is an equilibrium between two opposite actions. These two actions may be to exploit or not gain opportunities, labeled by . If too few traders exploit it, more people should be tempted to take this money-making opportunity; if there are too many doing so, the realized trading gain is negative. In this sense, the MG is connected to trading, since market participants use a trading strategy that exploits a set of gain opportunities that seems profitable only if under-exploited. In this cas, a minority mechanism is found because people try to learn an implicit ressource level.

JohnsonLargeChanges apply the GCMG to the prediction of real market prices by reverse-engineering their time-series. They propose to find the specific realization of the GCMG that reproduces some price changes over a given period most accurately and to run it a few time-steps in advance. Large cumulative price changes produced by the game are reportedly easily predictable. According to SornettePocket, these pockets of predictability come from the fact that sometimes many agents will take the same decision time-steps in advance, irrespective of what happens between now and then. Some more statistical results about the prediction performance of minority/majority/\$-games are reported in wiesinger2010reverse. A few other papers use modified MGs in the same quest krause2009evaluating; ma2010minority. The same principle was used to predict when to propose discounts on the price of ketchup MGketchup. As emphasized by J00, the whole point of using non-linear adaptive agent-based models is to profit from strong constraints on future dynamics to predict large price changes; this goes way beyond the persistence of statistical biases for some ’s.

Making the connection with more traditional mathematical finance, ortisi2012minority assumed that the price dynamics of a financial asset is given by the continuous-time dynamics of the vanilla MG and GCMG, computed analytical expressions of the price of options, and proposed a method to calibrate the MG price process to real market prices.

Finally, all the previous papers focus on a single asset, but most practitioners wish to understand the origin and dynamics of price change cross-correlations. bianconi2008multiassetsMG gives to the traders the opportunity to choose in which game, i.e., assets, they wish to take part, both for the original MG and for the GCMG; more phase transitions are found depending on how much predictability is present in either asset; generating functionals solve the lot; AdemarS>2 extended to this calculus to more generic ways of choosing between many assets .

### ii.6 Multichoice Minority Games

Extending the MG to more than two choices seems easy: it is enough to say that may take values. Kinzel3, chow2003multiplechoiceMG, quan2004evolutionarymultichoiceMG consider and reward agents that select the least crowded choice; dhulst1999threesidedMG introduce cyclical trading between three alternatives.

There may also be types of finite resources, e.g., bars: savit2003finiteresources; savit2005generalallocgames assume that agents choose between types of resources, each of them able to accommodate agents. This situation arises in CPU task scheduling. shafique2011minorityCPU take the reverse point of view: the agents may be groups of CPU cores competing for tasks to execute. A more complex structure underlies multi-assets models: the agents first choose in which asset to invest, and then play a minority game with the other agents having made the same asset choice bianconi2008multiassetsMG.

Whereas these studies assume that is fixed while may be arbitrarily large, many real-life situations ask for to scale linearly with : this is the assumption of Secs. III and IV.

### ii.7 Minority mechanism: when?

The definition of the MG is quite specific and seems to restrict a priori in a rather severe way its relevance. We wish to suggest a more optimistic point of view. There are universal relationships between fluctuations and learning in MGs. Therefore, should a minority mechanism be detected in a more generic model, one can expect to understand which part of its global properties come from the minority mechanism. This requires to understand what a minority mechanism really is and where it may hide.

EFBP does contain one, since it is a MG with a generic resource level CMO03; JohnsonAsym. This resource level may depend on time galstyan2003resource; the resulting fluctuations will come both from the transient convergence to a new and from fluctuations around a learned , which are of the MG type. This view indicates that a minority mechanism arises when a population self-organizes collectively around an explicit resource level. Self-consistent resource levels sometimes contain minority mechanisms: C04 considers one population of producers pooling their contributions and one population of buyers grouping their monetary offers for the production on offer. Producers should decrease their output if and buyers should do so when . This suggests a payoff to producer and to buyer , hence, that is the resource level for the producers and is the resource level for the buyers, both time-dependent and self-consistently determined. The stationary state of this model is equivalent to the EFBP and is exactly solvable.

In conclusion, one may expect a minority mechanism, hence, MG-like phenomenology in a situation where a population self-organizes collectively around an explicit or implicit resource level that it may contribute to determine.

Let us now mention a few papers which seem a priori to have little to do with the MG, but that do contain a minority mechanism. This helps understanding their phenomenology, but cannot describe quantitatively their behavior, which may be much more complex and richer.

cherkashin2009reality introduced a game with two choices (in its simplest form) whose mechanism is stochastic: the probability that choice is the right one is (in the notations of this paper), with : this introduces mechanism noise, but the average nature of the game is easy to determine: indeed, the expected payoff of agent is ; introducing , and expanding , one finds that . Clearly, minority games appear when , which is what the authors call self-defeating games. One thus rightly expects that learning reduces predictability in the latter case and increases it in the other case. A closely related extension of learning in MGs is decision noise, which causes the agents to invert their decision with some probability; see e.g. CoolenBatch.

Berg introduced a model in which agents receive partial information about the real market state : they each are given their own projection function of onto two possible states and , denoted by and randomly drawn at the beginning of the game. The resource level is , the market return when the global state is . Each agent effectively computes two averaged resource levels and and finds out how much to invest conditionally on or . Remarkably, when there are enough agents, the prices converge to . The phenomenology of such models is different, but similar to that of the MGs. Accordingly, some parts of the exact solution, both with replicas Berg and generating functionals demartino2005asymmetricinformation, are quite similar to those for in the MG.

## Iii The Kolkata Paise Restaurant Problem

In Kolkata (formerly Calcutta), there used to be cheap and fixed rate “Paise” 7 restaurants which were popular among daily workers. During lunch hours, the workers used to walk, to save the transport costs, to one of these restaurants. They would miss lunch if they got to a restaurant where there were too many customers, since walking to the next restaurant would mean failing to resume work on time!

kpr-physica, mathematica, kpr-proc, kpr-njp proposed the Kolkata Paise Restaurant (KPR) problem. It is a repeated game between prospective customers that choose from restaurants each day simultaneously (in parallel). and are both assumed to be large and proportional to each other. Assuming that each restaurant charges the same price for a meal eliminates budget constraints for the agents.

An agent can only visit one restaurant each day, and every restaurant has the capacity to serve food to one customer per lunch (generalization to a larger value is trivial). When many agents arrive at one restaurant, only one of them, randomly chosen, will be served. The main measure of global efficiency is the utilization fraction defined as the average fraction of restaurants visited by at least one customer on a given day; following the notations of the literature on this topic, we denote by its average in the steady state.

Two main points have been addressed: how high efficiency can depend on the learning algorithm and the relative number of customers per restaurant, denoted by , and at what speed it is reached.

Additional complications may arise if the restaurants have different ranks. This situation is found for instance if there are hospitals (and beds) in every town but the local patients understandably prefer to go to hospitals of better rank elsewhere, thereby competing with the local patients of those hospitals. Unavailability of treatment in time may be considered as a lack of service for those people and consequently as social wastage of service by the unvisited hospitals. This is very similar to the stable marriage problem briefly reviewed in Sec. IV.2.

The most efficient solution to the KPR problem is dictatorship: each customer is assigned a restaurant and must eat there. If the restaurants have a ranking, the customers must take turns and sample each of them in a periodic way, which is