A The equations of motion in MG with linear payoffs

# Effects of payoff functions and preference distributions in an adaptive population

## Abstract

Adaptive populations such as those in financial markets and distributed control can be modeled by the Minority Game. We consider how their dynamics depends on the agents’ initial preferences of strategies, when the agents use linear or quadratic payoff functions to evaluate their strategies. We find that the fluctuations of the population making certain decisions (the volatility) depends on the diversity of the distribution of the initial preferences of strategies. When the diversity decreases, more agents tend to adapt their strategies together. In systems with linear payoffs, this results in dynamical transitions from vanishing volatility to a non-vanishing one. For low signal dimensions, the dynamical transitions for the different signals do not take place at the same critical diversity. Rather, a cascade of dynamical transitions takes place when the diversity is reduced. In contrast, no phase transitions are found in systems with the quadratic payoffs. Instead, a basin boundary of attraction separates two groups of samples in the space of the agents’ decisions. Initial states inside this boundary converge to small volatility, while those outside diverge to a large one. Furthermore, when the preference distribution becomes more polarized, the dynamics becomes more erratic. All the above results are supported by good agreement between simulations and theory.

###### pacs:
05.70.Ln, 02.50.Le, 87.23.Ge, 64.60.Ht

## I Introduction

Many natural and artificial systems consist of a population of agents with coupled dynamics. Through their mutual adaptation, they are able to exhibit interesting collective behavior. Although the individuals are competing to maximize their own payoffs, the system is able to self-organize itself to globally efficient states. Examples can be found in economic markets and communication networks (1); (2); (3); (4).

As a prototype of an adaptive population, the Minority Game (MG) considers the dynamics of buyers and sellers in a model of the financial market in which the minority group is the winning one (2). The agents adapt to each other through adjusting the payoffs of their strategies, wherein the payoffs summarize the collective behavior of the population when the environment changes. Theoretical studies using the replica method (5); (6) and the generating functional (7); (8); (9) successfully describe the statistical properties of these systems. However, since adaptation is a dynamical process, much of the attractor behavior remains to be explored.

An important factor affecting the behavior of an adaptive population is the dependence of the payoffs on the environment experienced by the individual agents. The payoffs help the agents to assess the preferences of their decisions, hence inducing them to take certain actions when they experience a similar dynamical environment in the future. Thus, the payoff function is crucial to the mechanism of adaptation. For example, the ways the agents evaluate their strategies in financial markets have a large influence on the market behavior. There are agents who focus their attention mainly on opportunities of large profits, while others pay equal attention to all profitable opportunities large and small. Markets with individualistic agents may have diversified opinions about what the best strategies are.

As an illustration of how payoffs influence market behaviors, a payoff scheme based on expected price trends results in markets with a mixture of trend-followers and contrarians (10). The \$-Game considers payoffs rewarding correct expectations one step forward, giving rise to a self-sustained speculative phase (11). Bubbles, crashes and intermittent behaviors are also found in a similar extension of the MG (12). A recent extension of the MG considers agents rewarding trend-following strategies when the winning margin is small, and rewarding contrarian ones otherwise (13); (14). As a result, non-Gaussian return distributions, sustained trends and bubbles are found, reminiscent of real markets. The Wealth Game uses a wealth-based payoff scheme, and produces behaviors resembling those of arbitrageurs and trendsetters, and markets with positive sums (15). The Wealth Game payoff scheme enables the agents to have a strong history dependence when applied to Hang Seng index data.

The agents in the original version of MG use a step payoff function (2); (16); (17), meaning that the payoffs received by the winning group are the same, irrespective of the winning margin (the difference between the majority and the minority groups). Latter versions of MG use a linear payoff function (5); (6); (7); (8); (9), in which the payoffs increase with the winning margin. Other payoff functions yield the same macroscopic behaviors in their dependence of the population variance on the complexity of strategies (18); (19). Thus, it appears that the behavior of the population is universal as long as the payoff function favors the minority group.

However, when one considers details beyond the population variance, one can find that the agents self-organize in different ways for different payoff functions. For a payoff function that favors a large winning margin, the distribution of the buyer population is double-peaked (2). This shows that the dynamics of the population self-organizes to favor large winning margins of either buyers or sellers, because the agents have adapted themselves to maximize their payoffs.

Another important factor is the initial preference of an agent towards the individual strategies she holds. Considering the example of financial markets, the agents may enter the market with their own preferences of strategies according to their individual objectives, expectations and available capital, even in the case that they hold the same set of strategies. For example, some agents have stronger inclinations towards aggressive strategies, and others more conservative. Hence, it is interesting to consider the effects of a diverse distribution of strategy preferences on the system behavior. Our recent work on MG revealed that when the diversity of strategy preferences increases, the system dynamics generally converges slowly, but the maladaptation of the agents, as generally reflected in the fluctuations of their decisions, can be greatly reduced (20); (21). A scaling relation between the population variance and the diversity was found, but there are no dynamical transitions (20); (21).

Besides the diversity of preferences, the profile of the diversity distributions also influences the dynamics of the system. For example, the Gaussian distribution of preferences studied in (20); (21) is a prototype of a continuous distribution, modeling a population with less polarized opinion. This distribution is different from the bimodal distribution studied in (7); (8); (24); (22); (23), which is more appropriate to model a population with polarized opinion. Both cases share many common statistical features, such as the reduction of fluctuations on increasing diversity. However, since adaptation is a dynamical process, one would expect that the bimodal case may have a much more erratic temporal behavior than the Gaussian case. This constitutes one of the purposes of our study.

In this paper, we consider the attractor behavior of an adaptive population using linear and quadratic payoff functions, with the distribution of initial preferences being either Gaussian or bimodal. While the statistical behaviors in these cases are quite similar, their attractor behaviors are different, revealing the different dynamics by which a population adapts to its specific environment. A number of findings in this paper illustrate this point. For example, when the step payoff function is replaced by the linear payoff function, the scaling relation between population variance and diversity is replaced by a dynamical transition between vanishing and non-vanishing step sizes. For low signal dimensions, the dynamical transitions take place in the form of cascades for different signal dimensions. Even when the cascades are blurred at higher signal dimensions, it is possible that there is a crossover from anisotropic to isotropic motion in the phase space. When the Gaussian preference distribution is replaced by the bimodal distribution, the dynamics becomes more bursty, and the phase space motion becomes more jumpy. Going from linear to quadratic payoffs, we find that the basins of attractor for vanishing and non-vanishing step sizes coexist. All these rich behaviors demonstrate the flexibility of an adaptive population for self-organizing to states in which agents maximize their payoffs and is, hence, important in modeling of economics and distributed control.

The paper is organized as follows. After formulating MG in Section II, we consider the cases of linear and quadratic payoffs in Sections III and IV respectively, followed by a conclusion in section V. Detailed derivatives are presented in Appendices A to C.

## Ii Minority Game

The Minority Game model consists a population of N agents competing for limited resources, N being odd (2). Each agent makes a decision 1 or 0 at each time step, and the minority group wins. For economic markets, the decisions 1 and 0 correspond to buying and selling respectively, so that the buyers can win by belonging to the minority group, which pushes the price down, and vice versa. For typical control tasks, such as the distribution of shared resources, the decisions 1 and 0 may represent two alternative resources so that fewer agents utilizing a resource implies more abundance. The decisions of each agent are responses to the environment of the game, described by signal at time t, where . These responses are prescribed by strategies, which are binary functions mapping the D signals to decisions 1 or 0. In this paper, we consider both endogenous signals and exogenous signals. In the endogenous case, the signals are the history of the winning bits in the most recent m steps. Thus, the strategies have an input dimension of . In the exogenous case, the signals are randomly generated from the D possible choices at each time step. The parameter is referred to as the complexity.

Before the game starts, each agent randomly picks s strategies. Out of her s strategies, each agent makes decisions according to the most successful one at each step. The success of a strategy is measured by its cumulative payoff, as explained below.

Let when the decisions of strategy a are 1 or 0, responding to signal . Let be the strategy adopted by agent i at time t. Then,

 A(t)≡1N∑iξμ∗(t)a∗(i,t)

is the excess demand of the game at time t. The payoff received by strategy is then , where is the payoff function. The step and linear payoffs are and , respectively. Let be the cumulative payoff of strategy a at time t. We will consider both online and batch updates of the payoffs. The online updating dynamics is described by

 Ωa(t+1)=Ωa(t)−ξμ∗(t)aφ(√NA(t)), (1)

whereas the batch updating dynamics is given by

 Ωa(t+1)=Ωa(t)−∑μξμaφ(√NAμ(t)), (2)

wherein is a -dimensional vector

 Aμ(t)≡1N∑iξμa∗(i,t).

The diversity of initial preferences of strategies is introduced by initializing the cumulative payoffs of strategy a (a) of agent i at time to random biases , with for  (20); (21). Thus, the preferences influence the score of each strategy for the rest of the game. In this paper, we will consider both the Gaussian and delta distributions of the initial preferences of the strategies. In the Gaussian case, the preference distribution has a mean 0 and variance R,

 P(ωia)=e−ω2ia/2R√2π% R. (3)

The ratio is referred to as the diversity. In the delta function case, , as considered in (7). Since the system behavior is invariant with respect to random permutations of the strategies, the introduction of the delta function is equivalent to a bimodal preference distribution

 P(ωia)=δ(ωia−√ρN)/2+δ(ωia+√ρN)/2, (4)

with mean 0 and standard deviation . As we shall see, the two preferences distributions have different effects on the dynamics of the game.

To monitor the mutual adaptive behavior of the population, we measure the variance of the population making decision 1, as defined by

 σ2N≡N4⟨[Aμ∗(t)(t)−⟨Aμ∗(t)(t)⟩]2⟩, (5)

where the average is taken over time when the system reaches steady state and over the random distribution of strategies and biases.

## Iii Mg With Linear Payoffs

### iii.1 The Gaussian Distribution of Strategies’ Initial Preferences

#### The onset of instability.

We first consider the case of online dynamics with a Gaussian distribution of initial preferences. As Fig. 1 shows, the dependence of the variance on the complexity for linear payoffs is very similar to that for step payoffs (20); (21). For above a universal critical value , the variance drops when is reduced. The effect of introducing the diversity is also similar to that for step payoffs; namely, the variance remains unaffected when , but decreases significantly with the diversity when .

However, there are differences when one goes beyond this general trend. As Fig. 2 shows, the variance curves at different values of cross at , indicating the existence of a continuous phase transition at from a phase of vanishing variance at large to a phase of finite variance at small . This behavior is very different from that for step payoffs, where the variance scales as and there are no dynamical transitions (Fig. 2 inset).

The picture is confirmed by analyzing the dynamics of the game for small . The dynamics can be conveniently described by introducing the -dimensional vector which is defined in Eq. (2). While only one of the signals corresponds to the historical signal of the game, the augmentation to components is necessary to describe the attractor structure of the game dynamics. Fig. 3(a) illustrates the attractor structure in this phase space for the visualizable case of with online update and endogenous signals. The dynamics proceeds in the direction that tends to reduce the magnitude of the components of (5). However, the components of overshoot, resulting in periodic attractors of period . For , the attractor is described by the sequence , and takes the L-shape as shown in Fig. 3(a) (21). Note that the displacements in the two directions may not have the same amplitude. This is also true for online update and exogenous signals as Fig. 3(b) shows, although it does not have periodic attractors like those in Fig. 3(a).

Following steps similar to those in (21), we find that for not too large and for convergence within time steps much less than ,

 Aμ(t+1)=Aμ(t)−√2πRφ(√NAμ(t))δμμ∗(t). (6)

For step payoffs, Eq. (6) converges to an attractor confined in a -dimensional hypercube of size , irrespective of the value of . On the other hand, for linear payoffs, becomes a linear function of with a slope of . Hence, for , the step sizes

 ΔAμ(t)≡∣Aμ(t+1)−Aμ(t)∣

converge to zero. In particular for , the motion in the phase space converges with oscillations, whereas for , the motion converges without oscillations. On the other hand, for , steps of vanishing sizes become unstable, resulting in a continuous dynamical transition at .

The phase transition at illustrates how the agents adapt to the changing environment when the diversity changes. In general, at high diversity, only a small fraction of agents switches their strategies at each time step. This gentle movement results in, at most, small winning margins, as revealed in the example of step payoffs (20); (21). In contrast, the winning margins at low diversity can be larger. Since we are considering payoffs that are linear functions of the winning margin, the agents are adapted to pay more attention to decisions that result in larger profits, leading to stronger responses at low diversity and vanishing responses at high diversity.

#### The time scales of convergence.

The onset of the instability when decreases is accompanied by the separation of two convergence times. The first time scale is the state convergence time, which is defined as the number of time steps needed to reach the attractor (for , this is the sequence 0,1,1,0) 1.

The second time scale is the population convergence time which is defined for online update and endogenous dynamics as follows. Note that the excess demand may not arrive at the steady-state time series even after the attractor states become steady. To measure the convergence time scale for , the periodic trend of in the attractor must be subtracted. This is done by measuring the sample average of for and monitoring the time dependence of . As shown in the inset of Fig. 4, this difference converges exponentially to a baseline, and the inverse of the slope of the exponential convergence yields the population convergence time. As shown in Fig. 4, the state convergence time increases smoothly with diversity. On the other hand, the population convergence time is distinct from the state convergence time in the low diversity region. The turning point is between and 0.25, indicating a relation with the dynamical transition of step sizes at .

However, when , the step sizes for each of the signals may not be equal. To see this, we monitor the variance for each of the signals and rank them. The rth maximum variance is then given by

 Sr=largeμ(N4[⟨(Aμ)2⟩|μ=μ∗(t)−(⟨Aμ⟩|μ=μ∗(t))2],r), (7)

where is the rth largest function for . As Figs. 5-6 show, the step sizes for the signals do not bifurcate simultaneously at , Rather, only the first maximum bifurcates from zero when falls below , while the step sizes for the remaining signals remain small. When the diversity further decreases to around 0.05, the second maximum becomes unstable as well, and a further bifurcation takes place. For , there are further bifurcations of the third or higher order maxima, resulting in a cascade of dynamical transitions when the diversity decreases.

This cascade of transitions is confirmed by analysis. For , we can generalize Eq. (6) to convergence times of the order . Assuming, without loss of generality, that bifurcates while remains small. We find that the variance of the buyer population, as derived in Appendix A, is

 σ2N=N32(ΔA1)2,ΔA1=erf(ΔA1√8ρ), (8)

where is the step size responding to signal 1. As the inset in Fig. 5 shows, the analytical and the simulated results agree down to . However, when the diversity decreases further, this simple analysis implies that the variance will saturate to a constant , but simulated results are clearly higher. This discrepancy is due to a further bifurcation of the minimum step size. This can be analyzed by considering the effect of a perturbation in the direction of . As derived in Appendix B, the accumulated perturbation becomes

 δA0(t+4)=[1−1√2πρ(1+e−(ΔA1)28ρ)]2δA0(t). (9)

At , where , the coefficient on the right-hand side of Eq. (9) reaches the value 1, and diverges on further reduction of . Numerical iterations of the analytical equations for , averaged over samples of different initial conditions, yield the theoretical curves in Fig. 5 and the inset, agreeing very well with the simulated results. Similarly, the agreement between analytical and simulated results are satisfactory for , as shown in Fig. 6(a).

For , as shown in Fig. 6(b), the bifurcation of , remains distinctive from those of other directions, and the picture of at least one cascade is valid. Furthermore, we observe that in the low diversity limit for the maximum step size approaches the same value of following the arguments in Appendix A. We also want to mention that results show that although the attractor structure of online update with exogenous signals is different from that with endogenous signals, the dependence of and on are totally the same for both signals.

This shows that the cascades are a general feature of the adaptive dynamics of the agents. When the diversity decreases, the agents find it more profitable to induce large winning margins responding to certain, but not all signals. At sufficiently low diversity, large winning margins become possible responses to all, signals, resulting in the cascades of transitions. However, for large signal dimensions at higher m, the interference of the responses to different signals will blur the transitions.

#### Batch update

We close this section briefly mentioning the results of batch update. We found that the variance of the buyer population with batch update is larger than that with online update (those of endogenous and exogenous signals update are the same) and the difference increase as increases, as shown in Fig. 7. We also found that the dynamical transition to non-vanishing step sizes in the case of online update was replaced by a gradual crossover around , and the cascades of transitions in different directions are replaced by a simultaneous crossover around .

### iii.2 The Bimodal Distribution of Strategies’ Preferences

In this section we consider how the dynamics depends on different distributions of agents’ initial preferences of strategies. In the previous section we have studied the case of Gaussian distributions of preferences, which model populations with less polarized opinions. In this section, we replace the Gaussian distribution with a bimodal distribution described by Eq. (4), which is more appropriate to model a population with polarized opinions. As we shall see, the latter shares many common statistical features with the former, such as the cascades of dynamical transitions (subsection 1) and the change in the isotropy of motion (subsection 2) when the diversity changes. On the other hand, the latter exhibits more erratic temporal behavior, such as the changing fraction of time spent on locations of high and low volatility in the attractor (subsection 3) and the bursty dynamics (subsection 4). The theoretical dynamical equations for both batch and online updates are derived in Appendix C.

We found that for batch updates, Fig. 8(a) shows that the cascade of dynamical transitions is present for small signal dimensions but disappears as signal dimensions increase. For large signal dimensions, step sizes for different signal dimensions bifurcate simultaneously at . As shown in Fig. 8(b), there is a large jump of , indicating that it is a discontinuous transition, while for the Gaussian case, it is a continuous transition. This discontinuous transition was found previously in (7), and the transition point is around with the magnitude of the jump scaling as . Results are similar for online update.

#### Isotropy of motion

For batch update, we have devised new global parameters to describe whether the motion in the phase space is isotropic at each update. For , the two parameters are

 U1≡⟨⟨[(ΔA1)2−(ΔA0)2]2⟩t⟩sample

and

 V1≡⟨⟨[2ΔA1ΔA0]2⟩t⟩sample,

they measure the displacements along the axial or diagonal directions in the phase space of and .

We can see from Fig. 9 that as diversity decreases, the axial parameter grows from nearly zero to nonzero values at around 0.2, while the diagonal parameter remains small. This shows that at each time step, the system responds to only one signal. When decreases to about 0.06, the diagonal parameter becomes nonzero too, showing that the system responds to more than one signal at each time step, and the motion becomes more isotropic. This cascade behavior is also well supported by our theory displayed on the same figure.

For higher signal dimensions, the isotropy of motions in the attractors can be described by ranking the D components of randomly picked step sizes. The ranked components of different samples, as shown in Fig. 10, exhibit several classes of behavior with increasing isotropy. Samples in Fig. 10(a) correspond to small steps in all dimensions. Those in Fig. 10(b) correspond to steps with one large component and small components. Steps in Figs. 10(c) and (d) have, respectively, two and more than two large components.

As shown in Fig. 11, attractors with small step sizes are dominant in the region of large . Step sizes increase when decreases, but the non-vanishing components do not increases isotropically. Rather, steps with one or two non-vanishing components become significant when , and become more isotropic on further reduction of .

This sequential onset of isotropy is consistent with the cascade of dynamical transitions for the Gaussian online case when the diversity decreases. However, when increases further, interferences between different signal dimensions will blur the anisotropic attractors.

#### Attractor structure

The structures of the attractors also change with . For convenience, we take the example of , whose phase space can be plotted in two dimensions. From Fig. 12, we can see that the attractor structure with bimodal distribution is totally different from that with Gaussian distribution. For systems with Gaussian distribution and batch update, the attractor has only two clusters of fixed points, between which systems oscillate (Fig. 12(a)), whereas for systems with bimodal distribution, the attractor visits many points located around an octagon in the small phase (), as shown in Fig. 12(b). The system jumps among these points and occasionally stays at the origin. However in the large phase (), the system will spend more time around the origin with occasional jumping out and back.

#### Bursty dynamics

Unlike the case of Gaussian distributions, the transition from vanishing to non-vanishing step sizes on decreasing diversity takes place in a bursty manner. For a given diversity, we define the activity as the fraction of time that the system stays away from the origin in the phase space. Fig. 13 shows that the activity is low at high diversities, but increases to a high value when is reduced below a critical value . This critical value depends on . Estimating as the point with an activity of 0.5 in Fig. 13, and 0.1 for and 0.032 respectively. We anticipate that approaches the value of 0.06 in the limit vanishing , as proposed by (7).

Looking deeper into the dynamics, we can find bursty behavior by introducing the payoff components , which are related to the accumulative payoffs by

 Ωa(t)=∑μkμ(t)ξμa. (10)

In other words, is the total payoff of decision 1 of strategy a for signal during the history of the game up to time . For , the payoff components and also have different behaviors at low and high diversities. As shown in Fig. 14(a) for low diversity, both and oscillate around 0 with large step sizes, resulting in the phase with non-vanishing variance. In contrast, Fig. 14(b) shows that for high diversities, the payoffs accumulate with small step sizes, building up gradually to high values, and then return to low values with a huge step size. This bursty process resembles many natural phenomena in which energy is stored gradually and released suddenly, such as earthquakes and volcano eruptions.

## Iv MG WITH QUADRATIC PAYOFF FUNCTIONs

To see how the behavior depends on the payoff functions, we change the payoff to the form of where . Figure 15 shows the general trend that a greater diversity gives a smaller variance of attendance and this effect is especially sensitive in the intermediate diversity region. It does not show any scaling behavior like in the step payoff model, nor any phase transition in the linear payoff model. Instead, it shows that a larger population always have a greater drop in the variance and a lower minimum variance.

Besides, we are also interested to investigate how the initial position of the game affects its variance. First, 1000 samples were simulated for each of the several different diversities. Then the variance for the 1000 samples was arranged in ascending order and listed graphically in Fig. 16.

We can see that more and more samples have a relatively small variance as the diversity becomes larger. For each curve in Fig. 16, the arrangement of the variance shows a gap in the variance, as pointed by an arrow. To facilitate the explanation, we define samples that are left to the gap to be ”small variance”, while those that are right to the gap to be ”large variance”.

We find that these two groups of samples are strongly correlated with the initial states of the system. From Figs. 17(a) to (d), it shows a general trend that the initial positions of the small variance samples concentrate around the origin while those of the large variance samples spread around. A basin boundary which is indicated by the lines appears among the light (orange) and dark (pink) dots and it becomes more recognizable at higher diversity. Also, we can notice that more samples with small variance and less samples with large variance appear as the diversity increases.

To determine the boundary analytically, we should look into the dynamics of the payoff function, Following the steps similar to those in Section III, we found

 Aμ(t+1)=Aμ(t)−√2πR(√NAμ(t))2sgnAμ(t) (11)

The square boundary can then be determined through the intersection between the Eq. (11) and the lines .

 −Aμ(t)=Aμ(t)−√2πR(√NAμ(t))2 (12)

or

 √NAμ(t)=±√2πρ (13)

If the initial position of the system is located inside this basin boundary, that is, , will eventually converge to the origin, which implies that the agents are not motivated to respond to the low payoffs. If the initial position of the system locates outside this basin boundary, will eventually diverge, It implies that the agents are motivated to respond to high payoffs. Also, from Eq. (13), the boundary increases with diversity . Thus, a larger diversity will give a smaller size of the basin of attraction for samples with large variance.

After locating the basin boundary, we can compute the probability of finding attractors with small variance and large variance, respectively. Since the probability density function of the initial state is Gaussian with mean and variance , the probability of finding small variance attractor is given by

 Psmall=(erf√πρ)2 (14)

and the probability of finding large variance attractor is

 Plarge=1−Psmall.

This result is consistent with the simulation in that when the diversity is large, the probability of finding samples with small variance will be higher. A comparison between theory and simulation is given in Fig. 18, which shows a good agreement.

## V Conclusion

We have studied the behavior of an adaptive population by using a payoff function that increases linearly and quadratically with the winning margin. We found in linear payoffs, a continuous dynamical transition when the adaptation rate of the population was tuned by varying their diversity of preferences. This is in contrast with the case of payoff functions independent of the winning margin, in which there is only a scaling relation between the variance and the diversity, and no phase transitions are found. The dynamical transition is due to the payoffs being enhanced by large winning margins at low diversity. Furthermore, for systems with multi-dimensional signals feeding the strategies, we found a cascade of dynamical transitions in the responses to different signals, with the population variance increasing at each cascade. When the cascades are blurred at higher signal dimensions, a classification of the step vectors as done in Fig. 10 also reveals the possibility of a crossover from anisotropic to isotropic motion in the phase space.

We have also studied the effects of polarization of the initial preference distribution on the attractor behavior of the system by comparing the Gaussian and bimodal distributions. In the Gaussian case, there is a gradual increase in step sizes when the diversity decreases, whereas in the bimodal case, there is rather a sharp increase in the fraction of time with non-vanishing sizes in a narrow range of diversity, as shown in Fig. 13. We also found that for bimodal distributions, the payoff components go through processes of accumulation and bursts. These observations illustrate that the more polarized the preferences are, the more erratic the dynamics is. There are many similarities shared among the different cases that we have studied. For both Gaussian and bimodal distributions, there are cascades for low signal dimensions which gradually disappear as signal dimensions increase. For online update, both endogenous and exogenous signals result in similar macroscopic behaviors, for example, the diversity dependence of the variance of the buyer population and the ranked step sizes.

For quadratic payoffs, a basin boundary separating two groups of samples is found. The group of vanishing step sizes adapts to the low payoff region of the quadratic payoff function and the group of non-vanishing step sizes adapts to the region with more rapidly increasing payoffs of the quadratic payoff function.

In summary, we have found three pathways through which fluctuations increase with decreasing diversity. For linear payoffs with Gaussian preference distribution, fluctuations increase by increasing step sizes via cascades of continuous dynamical transitions. For linear payoffs with bimodal distributions, fluctuations increase is also due to the increase in the fraction of time with non-vanishing step sizes. For quadratic payoffs, fluctuations increase by enlarging the basin of the attractors with large fluctuations. This illustrates the rich behavior that the population can self-organize in different environments.

It is interesting to consider the phenomenology experienced by the agents at each side of the described dynamical transitions and crossovers. In this respect, we make the following observations.

1) Agents using linear payoffs focus more of their attention on opportunities of large winning margins, and tend more to neglect marginal wins. In markets with high diversity, agents with different preferences switch their strategies at different times, preventing the occurrence of a high volatility. Since the volatility is not large enough to induce strategy switches of such agents focusing more on large winning margins, the market is self-organized to a state of low volatility. However, in markets with low diversity, more agents switch their strategies at the same time, and the volatility is high. These profitable opportunities are exploited by the agents more sensitive to large winning margins, and the market is self-organized to a state of high volatility. In markets with intermediate diversity, the agents are selectively sensitive to some, but not all, of the signals in the market, leading to the cascades of phase transitions.

2) For agents using quadratic payoffs, their interest on marginal wins vanishes even faster. Hence the market is self-organized to a state of low volatility when the initial volatility is low, irrespective of the diversity of preferences of strategies. On the other hand, their emphasis on large winning margins rises even faster than linear payoffs. Hence the market is self-organized to a state of high volatility when the initial volatility is high, irrespective of the diversity. Changes in the diversity cannot preclude the attractors of either high or low volatility. They merely influence the sizes of their basins of attraction.

3) Both cases of linear and quadratic payoffs are different from the case of step payoffs (20); (21). Agents using step payoffs place an equal emphasis on all winning opportunities irrespective of the winning margin. Consequently, the volatility does not vanish at any values of diversity, and there are no dynamical transitions or multiple basins of attraction. Instead. a scaling relation between the volatility and the diversity is applicable.

4) In markets with a bimodal distribution of initial preferences, the agents have polarized opinions about their responses to the signals. When the diversity is high, it takes a large number of time steps before the opinions of a group of agents are reversed. This gives rise to periods of vanishing volatility, during which the system stays at the origin of the phase space. When the opinions of the agent group is eventually reversed, bursts of activities erupt.

Recent attention was drawn to the role of the payoff function on reproducing realistic market behavior such as non-Gaussian features, the formation of sustained trends and bubbles, and intermittency (13); (14). Our study further confirms that tuning the payoff function and the preference distribution can lead to a rich spectrum of self-organized states of the market. It would be interesting to consider populations of agents with different individual payoff functions and study how they interact.

## Appendix A The equations of motion in MG with linear payoffs

We consider the equations of motion in MG with linear payoffs. Here, we focus on the case of a Gaussian distribution of initial preferences, and the generalization to the case of the bimodal distribution is straightforward. Using Eq. (1), and averaging over Eq. (3), we obtain the step size for the historical state .

 ⟨ΔAμ(t)⟩ =122D−1∑a

Since the integral representation of the step function is given by

 Θ(y)=∫∞0dx∫dp2πeip(x−y), (16)

Eq. (15) becomes

 ⟨ΔAμ(t)⟩ =122D−1∑a

Decomposing the cumulative payoffs into payoff components as defined in Eq. (10), Eq. (17) becomes

 ⟨ΔAμ(t)⟩ =122D−1∑a

Using the identity for and

 eiθ(ξμa−ξμb)=cos2θ+isinθcosθ(ξμa−ξμb)+sin2θξμaξμb,

we arrive at

 ⟨ΔAμ(t)⟩=∫∞0% dx∫dp2πe−R2p2−ipx× ∏ν≠μcos2pkν[isin2p(kμ−√NAμ)−isin2pkμ]. (19)

where we have used the identity .

Similarly, for the non-historical states , we have

 ⟨ΔAν(t)⟩=∫∞0% dx∫dp2πe−R2p2−ipx ×∏λ≠μνcos2pkλ2isin% pkνcospkν ×[cos2p(kμ−√NAμ)−cos2pkμ]. (20)

For , there are only two states. Let and be the historical and non-historical states respectively.

After evaluating the integrals, we obtain

 ⟨ΔAμ(t)⟩ =18[erf2kμ(t+1)+2k¯¯¯μ(t)√2R (21) +erf2kμ(t+1)−2k¯¯¯μ(t)√2R+2erf2kμ(t+1)√2R −erf2kμ(t)+2k¯¯¯μ(t)√2R−erf2kμ(t)−2k¯¯¯μ(t)√2R −2erf2kμ(t)√2R].
 ⟨ΔA¯¯¯μ(t)⟩ =18[erf2kμ(t+1)+2k¯¯¯μ(t)√2R (22) −erf2kμ(t+1)−2k¯¯¯μ(t)√2R−erf2kμ(t)+2k¯¯¯μ(t)√2R +erf2kμ(t)−2k¯¯¯μ(t)√2R],

where .

Now suppose at . Following the dynamics sketched in Fig. 19, we can calculate and as follows:

 k0(1)=k0(0)−√NA0(0),k1(1)=k1(0) k0(2)=k0(1),k1(2)=k1(1)−√NA1(1) k0(3)=k0(2),k1(3)=k1(2)−√NA1(2) k0(0)=k0(3)−√NA0(3),k1(0)=k1(3) (23)

Eqs. (21-23) are the equations of motion in the phase space. When and , we obtain Eq. (6) for linear payoffs using Taylor expansion. On the other hand, there exist solutions with non-vanishing step sizes along one of the two directions, and vanishing step sizes along the remaining direction. Suppose and , and From in Eq. (23), we can get . Since , we have

 A1(1)=−12ΔA1(1)

and

 A1(2)=12ΔA1(1)

.

Let and , where is the average value of . By Eq. (21),

 Extra open brace or missing close brace +erfΔA1(1)−4(¯¯¯¯¯k1+k0)/√N√8ρ+2erfΔA1(1)+4¯¯¯¯¯k1/√N√8ρ +2erfΔA1(1)−4¯¯¯¯¯k1/√N√8ρ+erfΔA1(1)+4(¯¯¯¯¯k1−k0)/√N√8ρ +erfΔA1(1)−4(¯¯¯¯¯k1−k0)/√N√8ρ]. (24)

In general, the dynamics converges with , yielding the self-consistent equation for in Eq. (8) (we have made the dependence on implicit).

To find the variance, we use Eq. (5). Knowing the variance is calculated as:

 σ2N=N16[(−12ΔA1(1))2+(12ΔA1(1))2],

resulting in the expression for in Eq. (8).

## Appendix B The stability of the step size in the secondary direction

We consider the stability of the step size in the secondary direction. Following the notation in Appendix A, we assume and at , and the second direction is .

Suppose there is a small perturbation at , then

 k0(1)=k0(0)−√NδA0(0)=−√NδA0(0).
 k1(1)=k1(0)=¯¯¯¯¯k1−√N4ΔA1(1)≈−√N4ΔA1(1).

We obtain

 ΔA0(0)=18[−erf2δA0(0)+12ΔA1(1)√2ρ −2erf2δA0(0)√2ρ−erf2δA0(0)−12ΔA1(1)√2ρ]. (25)

Using Taylor expansion to the first order,

 ΔA0(0)= −1√2πρ(1+e−(ΔA1(1))28ρ)δA0(0)≡δA0(1). ΔA1(0)=0. (26)

Similarly, for ,

 k0(2)=−√NδA0(0),k1(2)=√N4ΔA1(1) ΔA1(1)=erf(ΔA1(1)√8ρ),ΔA0(1)=0. (27)

for ,

 k0(3)=−√NδA0(0),k1(3)=−√N4ΔA1(1) ΔA1(2)=−erf(ΔA1(1)√8ρ),ΔA0(2)=0. (28)

for ,

 k0(4)=−2√NδA0(0)−√Nδ%A0(1) k1(4)=−√N4ΔA1(1) ΔA0(3)=−1√2πρ(1+e−(ΔA1(1))28ρ)(δA0(0)+δA% 0(1)) ΔA1(3)=0. (29)

is the accumulated perturbation from the previous time steps. That is,

 ΔA0(4)=δA0(0)+δA0(1) −1√2πρ(1+e−(ΔA1(1))28ρ)(δA0(0)+δA0(1)). (30)

## Appendix C The equations of motion in MG with bimodal preference distribution

For bimodal distribution of preferences of strategies, the calculation is similar to that of Appendix A, except changing from to .

For , the theoretical result is as follows: Denoting the step functions by ,, we have

 Aμ=Θ1(2kμ(t)+2k¯¯¯μ(t))+Θ2(2kμ(t)+2k¯¯¯μ(t)) −Θ1(−(2kμ(t)+2k¯¯¯μ(t)))−Θ2(−(2kμ(t)+2k¯¯¯μ(t))), Bμ=Θ1(2kμ(t))+Θ2(2kμ(t)) −Θ1(−2kμ(t))−Θ2(−2kμ(t)), Cμ=Θ1(2kμ(t)−2k¯¯¯μ(t))+Θ2(2kμ(t)−2k¯¯¯μ(t)) −Θ1(−(2kμ(t)−2k¯¯¯μ(t)))−Θ2(−(2kμ(t)−2k¯¯¯μ(t)))

For online update

 kμ(t+1)=kμ(t)−√NAμ(t), k¯¯¯μ(t+1)=k¯¯¯μ(t),

then

 ΔAμ(t)=116[Aμ(t+1)+2Bμ(t+1)+Cμ(t+1) −(Aμ(t)+2Bμ(t)+Cμ(t))]. (33)
 ΔA¯¯¯μ(t)=116[Aμ(t+1)−Cμ(t+1) −(Aμ(t)−Cμ(t))]. (34)

Likewise, for batch update, with the notation ,

 Aμ=Θ1(2kμ(t)+2k¯¯¯μ(t))+Θ2(2kμ(t)+2k¯¯¯μ(t)) −Θ1(−(2kμ(t)+2k¯¯¯μ(t)))−Θ2(−(2kμ(t)+2k¯¯¯μ(t))), Bμ=Θ1(2kμ(t))+Θ2(2kμ(t)) −Θ1(−2kμ(t))−Θ2(−2kμ(t)), Cμ=Θ1(2kμ(t)−2k¯¯¯μ(t