Smart Transformations: The Evolution of Choice Principles
Abstract
Evolutionary game theory classically investigates which behavioral patterns are evolutionarily successful in a single game. More recently, a number of contributions have studied the evolution of preferences instead: which subjective conceptualizations of a game’s payoffs give rise to evolutionarily successful behavior in a single game. Here, we want to extend this existing approach even further by asking: which general patterns of subjective conceptualizations of payoff functions are evolutionarily successful across a class of games. In other words, we will look at evolutionary competition of payoff transformations in “metagames”, obtained from averaging over payoffs of single games. Focusing for a start on the class of symmetric games, we show that regret minimization can outperform payoff maximization if agents resort to a security strategy in case of radical uncertainty.
Keywords: evolution of preferences, evolutionary game theory, rationality norms, ecological rationality, metagames, evolutionary dynamics
1 Preamble
This is what I aim at, because the point of philosophy is to start with something so simple as not to seem worth stating, and to end with something so paradoxical that no one will believe it. [Russ18]
In the epistemic literature, there are two major and alternative, sometimes competing, trends in formalizing belief: a probabilistic approach, and a nonprobabilistic approach. As the names suggest, the probabilistic approach uses probabilities in order to model agents’ beliefs, whereas the nonprobabilistic approach relies on more qualitative structures (see for example [baltsme08]).
Through this abstract we will assume that probabilistic and nonprobabilistic beliefs are just two different and compatible forms of belief. In other words, we assume that agents may have probabilistic beliefs in some circumstances and nonprobabilistic beliefs in others. More specifically, by probabilistic belief we mean in general that the belief of an agent is expressed by a probability distribution, while by nonprobabilistic belief in this abstract we mean that the belief is not representable by a single specific probability distribution. Appealing to reasons of selfevidence and introspection, we hope this sounds an uncontroversial assumption, “so simple as not to seem worth stating”. Or, at least, we hope the reader will agree that it is much less controversial than assuming that agents have either only probabilistic beliefs or only nonprobabilistic beliefs.
2 Intro
Evolutionary game theory classically investigates which behavioral patterns, when competing against each other, are evolutionarily stable in a single game. More recently, a number of contributions have studied the evolution of preferences instead: which subjective conceptualizations of a game’s payoff function give rise to evolutionarily successful behavior in a single game ([algweib13], [DekElyYlan07], [heifshanspieg07], [OkVega01], [RobSam11], [Sam01]). This literature is grounded on the so called indirect evolutionary approach. While the classical evolutionary approach is aimed at studying whether a certain strategy in a given game is evolutionarily stable and robust, the indirect evolutionary approach allows to investigate whether certain preferences are evolutionarily successful or not. Yet it was argued that
The indirect evolutionary approach with unobservable preferences gives us an alternative description of the evolutionary process, one that is perhaps less reminiscent of biological determinism, but leads to no new results. [RobSam11]
In this work we adopt a standpoint close to the one taken in evolution of preferences, but we want to extend this existing approach even further by asking: which general patterns of subjective conceptualizations of payoff functions are evolutionarily successful across a class of games. In other words, we will look at evolutionary competition of general payoff transformations in metagames, obtained from averaging over payoffs obtained from single games. Focusing for a start on the class of symmetric games, we show that regret minimization can outperform payoff maximization if agents resort to a security strategy in case of radical uncertainty. I.e., payoff maximization turns out to be evolutionarily unstable under simple epistemic assumptions.
3 Evolution of preferences
The standard model for studying the evolution of preferences (in particular, we are referring to [DekElyYlan07]) is built on a symmetric twoplayer normalform game with finite action set and payoff function . This is usually called the fitness game since evolutionary selection is driven by payoff function . Players in the population represent subjective preferences, or subjective utility functions, that can diverge from the objective fitness given by . A subjective preference is a function , and the set of subjective preferences is . Each player chooses the action that maximizes her subjective preference, but receives the objective payoff defined by . Hence, player ’s action choice is determined by , but ’s evolutionary fitness is deterimined by .
Different authors enrich this basic picture with various features (e.g., observability [DekElyYlan07], assortative matching [algweib13], etc.) and study the resulting effects in the dynamics. We do not argue against any of these approaches here, but adopt a different one. Firstly, we add a metagame perspective by studying evolution of preferences across a class of games. Secondly, we pay attention to the epistemic situations of the agents and include the possibility that agents play a security strategy in case of radical uncertainty.
4 The model
Instead of one fitness game , we consider a class of fitness games. Here, we take to be the class of symmetric games. We are interested in the evolutionary competition of player types , conceived of as a pair of a subjective preference type and an epistemic type . Let denote the set of player types. We will enlarge on each component in turn in the following. We take a player type to specify action choices in each and thus think of a player type as a choice principle. This allows us to study the evolutionary competition between different subjective ways of representing a game’s utilities and different ways of using behavioral beliefs about the coplayer. To keep matters manageable, we restrict our attention to a selected subset of conceptually relevant player types, comparing players of four different and theoretically significant subjective preference types.
Subjective preference types
Formally, a preference type is a function from games to subjective preferences. Let be the set of preference types. We can think of preference types as transformations of , for any : may then be understood as a player’s way of thinking across games, a red thread that relates different subjective preferences across different games.
For perspicuity, we focus here on four conceptually relevant transformations in : (i) an actual payoff type, whose subjective preferences coincide with actual fitness payoffs , (ii) an altruistic type, whose subjective preferences are the sum of her own fitness and that of the coplayer, (iii) a competitive type, whose subjective preferences are her own fitness minus that of the coplayer, and (iv) a regret type, whose subjective preferences are given by each action’s regret ([loosug82],[halpass12]). Denoting the payoff function of game by , define these types as:

actual payoff type: ;

altruistic type: ;

competitive type: ;

regret type: , where stands for the best reply to under .
^{1}
Epistemic types
In full generality, an epistemic type is a general disposition to form beliefs about the coplayer’s behavior. As for preference types, in this abstract we limit ourselves to a small selection of epistemic types that are particularly interesting from a theoretical point of view. Here, we just consider two epistemic types :

a uniform probabilistic belief about the opponent’s behavior, or

the full set of all possible behavioral beliefs about the coplayer’s actions.
Hence, we are mainly focusing on two extreme epistemic
types for the moment: players can either have a probabilistic (flat) belief about the coplayer’s
actions, or be radically uncertain, i.e., have no specific
probability distribution on the coplayer. It would also be possible
to take into account different degrees of uncertainty, and to link
our results to the literature about ambiguity and uncertainty aversion
more tightly ([GhirMar02], [GilSch89], [MacMarRus06]), but we will only consider the two extreme
cases for this abstract.
There are many reasons why agents might be radically uncertain: lack
of cognitive capabilities, lack of information
Choice principles
We take player types to rise to choice principles, i.e., systematic mappings of each game into a subset of actions. Many possibilities are conceivable here. To be practical, we need to, again, make a principled selection based on theoretical relevance. For simplicity, we assume that players apply maximin expected utility ([GilSch89]) based on their subjective preference type and their epistemic type.
From the perspective of Maxmin expected utility ([GilSch89]), the behavior of an agent corresponds to maximizing the minimal expected (subjective) utility over the set of probability distributions that she is holding. In our particular case, this set can either be a singleton (a flat probability distribution ), or the full simplex , i.e., the set of all possible probability distributions over the coplayer’s choices (in case of radical uncertainty). In the first case, playing an action that maximinimizes subjective expected utility is the same as maximizing subjective expected utility, whereas playing an action that maximinimizes subjective expected utility in the second case amounts to playing standard maximin over the game with subjectively transformed preferences ([OsbRub94]).
Other possible construals of choice principles are conceivable, e.g., maximax, the maximization of the maximal utility. Our choice of maximin expected utility is motivated by the fact that it gives rise to wellknown decision rules. For player types we have standard maximization of expected utility; for player types we have standard maximin; for player types we have (positive) regret minimization ([halpass12]).
It is important to notice that player types and are actually behaviorally equivalent.
Remark 1. Maximization of expected utility and minimization of expected regret coincide: for any probabilistic belief, an action maximizes the expected utility if and only if action minimizes the expected (positive) regret.
Nonetheless, it is important from an evolutionary point of view to distinguish player types who conceptualize a game’s payoffs in terms of regret from those who consider the actual payoffs , especially when we consider evolutionary dynamics involving mutation (see below).
5 Results
In this section we present some results achievable in our setup. For reasons of exposition, we first focus on radical uncertainty, and then we allow players to have both probabilistic and nonprobabilistic beliefs (i.e., to be of both epistemic types ) and ).
Consider a population where the eight player types introduced above (, , , , , , and ) are present. We are interested in the question which player types will be evolutionarily successful when repeatedly playing random symmetric games.
To address this question, we use numerical simulation to approximate the average payoff accrued by each choice principle. To this end, we randomly generated symmetric games by sampling i.i.d. payoffs from the natural numbers in the set . For each sampled game, we let all choice principles play against each other and recorded the payoffs obtained after each play. Finally, we took the average. Table 1 gives the resulting payoff matrix with the row type’s average payoffs against each of the column types.
6.629  6.653  5.806  7.089  6.636  6.636  5.793  7.463  
6.455  6.468  6.067  6.685  6.462  6.462  6.065  6.834  
6.280  6.746  5.473  6.959  6.294  6.294  5.474  7.114  
5.936  5.735  5.336  6.379  5.929  5.929  5.327  6.538  
6.633  6.658  5.810  7.081  6.634  6.634  5.802  7.454  
6.633  6.658  5.810  7.081  6.634  6.634  5.802  7.454  
6.278  6.750  5.476  6.953  6.293  6.293  5.484  7.112  
6.311  5.885  5.475  6.536  6.299  6.299  5.466  7.123 
Radical uncertainty
To appreciate the following results, it helps to consider first a restricted scenario. Assume that all players have epistemic type , and so play a security maximin strategy on their subjective representation of the game. The relevant metagame for this case is the topleft payoff matrix of the full matrix in Table 1. Essentially, we are then considering the evolutionary competition between subjective preference types in a world of security players.
Notice, however, that the payoffs calculated for the “metagame” in Table 1 depend on details of our numerical simulation, in particular on the implicit probability with which particular types of games are sampled. Fortunately, we can generalize the result to an analytic statement that is independent of frequency effects, as long as every possible game has positive occurrence probability.
Proposition 1. Fix , , , and the class of symmetric games with i.i.d. payoffs sampled from the set of natural numbers . Then is the only evolutionarily stable type in the population.
Proof. See Appendix.
This is a conceptually noteworthy result: regret minimization evolutionarily outperforms classic maximin on repeated plays of symmetric games. In other words, when playing a security strategy it is strictly better to construe a game in terms of regrets than in terms of actual payoffs.
Full competition
Consider next the full “metagame” in Table 1. A monomorphic population of regret minimizers is no longer evolutionary stable; it could be invaded by expected utility maximizers of types and . Since the latter are behaviorally equivalent, neither is an evolutionarily stable strategy, but could at best be neutrally stable [MaynardSmith1982:Evolutionandt]. However, under our simulated metagame payoffs from Table 1 any population consisting entirely of and can be invaded by regret minimizers. This suggests that all three types would persist under standard evolutionary dynamics, in various relative proportions.
Simulation results of the (discrete time) replicator dynamics [TaylorJonker1978:EvolutionarySt] indeed show that random initial population configurations are attracted to states with only three player types: , and . The relative proportions of these depend on the initial population. This variability is eradicated if we add a small mutation rate to the dynamics. We assume a fixed, small mutation rate for the probability that a player’s preference type or her epistemic type changes to another random preference type or epistemic type. The probability that a player type randomly mutates into a completely different player type with altogether different preference type and epistemic type would then be . With these assumptions about “local mutations”, numerical simulations of the (discrete time) replicator mutator dynamics [Nowak2006:EvolutionaryDy] show that for very small mutation rates almost all initial populations converge to a single fixed point in which the majority of players are regret types. For instance, with , almost all initial populations are attracted to a final distribution with proportions:
0.279  0.383  0.281 
Again, this shows that there are plausible and simple conditions under which agents who represent the game in terms of regret may be favored by evolutionary selection and that, more specifically, regret minimizers are evolutionarily supported.
Variable, but correlated epistemic types
The foregoing results were based on the implicit assumption that players have a fixed epistemic type. Epistemic types were considered under evolutionary pressure in tandem with preference types. In the remainder we focus on the evolution of preference types, assuming that players are of variable epistemic types.
For a clearcut analytical corollary from the previous results in
Proposition 1 and Remark 1, let us assume that epistemic types are
correlated in random encounters: whenever two player types are matched
to play a game, they are always of the same epistemic type, with
positive probability that both are of either type and
.
Corollary 1. Fix . If agents’ epistemic types vary between encounters and both occur with positive probability, but are always correlated, so that coplayers are always of the same epistemic type in any particular round of play, is the only evolutionarily stable preference type in the population.
Proof. See Appendix.
Notice, moreover, that since Remark 1 holds for any arbitrary probabilistic belief , Corollary 1 also holds for any arbitrary correlated probabilistic belief and not only for flat belief . I.e., Corollary 1 holds for any possible probabilistic belief .
Variable, uncorrelated epistemic types
Finally, consider the case where epistemic types of players are variable but uncorrelated. Each player has a fixed preference type, but is of epistemic type with some probability . Probability is then the average probability of a player being a security player in the population, where that does not depend on the player’s preference type. For simplicity, consider fixed for the whole population. In that case we can compute, for each , the average payoffs of preference types in another metagame, derived from the full Table 1. It turns out that there is a very low threshold on above which regret types dominate the evolutionary armsrace. With only a small occurrence probability of security players , the derived metagame between preference types is:
6.634  6.635  5.802  7.450  
6.633  6.633  5.804  7.444  
6.293  6.297  5.484  7.111  
6.295  6.291  5.465  7.111 
Regret types are the only evolutionary stable type in this case. In sum, even with a small probability of lacking a concrete (flat) belief about the opponent, a subjective representation of payoffs in terms of regret is favored by evolutionary selection.
6 Conclusion
The assumption that players and decision makers maximize their preferences is central through all economic literature, and the maximization of actual payoffs is often justified by appealing to evolutionary arguments and natural selection. In contrast to the standard view, we showed the existence of player types with subjective utilities different from the actual payoffs that can outperform types who have subjective utilities equal to the actual payoffs.
While the literature on evolution of preference has focused on fixed games, or fixed types of games, we have taken a more general approach here. We suggested that attention to “metagames” is interesting, because what may be a good subjective representation in one type of game (e.g., cooperative preferences in the Prisoner’s Dilemma class) need not be generally beneficial. In fact, it turns out that our altruistic and competitive preference types pale in the light of regret types.
Taken together, we presented a variety of plausible circumstances in which evolutionary competition between choice principles on a larger class of games can favor subjective preference transformations focusing on regret.
References
Footnotes
 Formally, this is the negative regret. We use this formulation because it is the most convenient in this context.
 Lack of information might depend for instance on the fact that players haven’t played enough rounds to learn from experience and to form a precise probabilistic belief; alternatively, we can imagine that players have specific probabilistic beliefs if they have already met and know the coplayer, but they do not have a single probabilistic belief when they meet a coplayer for the first time. Similarly, we can also think that a player has specific probabilistic beliefs when she is facing a game that she alreday played before, and imprecise beliefs otherwise.
 This assumption of correlation of epistemic types can be motivated by the idea that some external circumstances of the game (or its context of presentation or occurrence) involuntarily trigger players into a particular epistemic type.