An efficient method for sorting and selecting for social behaviour
Mathematics Department, Uppsala University, Sweden
IT Department, Uppsala University, Sweden
Zoology Department, Stockholm University, Sweden
corresponding author :
alexander.szorkovszky@math.uu.se
Summary

In social contexts, animal behaviour is often studied in terms of grouplevel characteristics. One clear example of this is the collective motion of animals in decentralised structures, such as bird flocks and fish schools. A major goal of research is to identify how grouplevel behaviours are shaped by the traits of individuals within them. Few methods exist to make these connections. Individual assessment is often limited, forcing alternatives such as fitting agentbased models to experimental data.

We provide a systematic experimental method for sorting animals according to socially relevant traits, without assaying them or even tagging them individually. Instead, they are repeatedly subjected to behavioural assays in groups, between which the group memberships are rearranged, in order to test the effect of many different combinations of individuals on a grouplevel property or feature. We analyse this method using a general model for the group feature, and simulate a variety of specific cases to track how individuals are sorted in each case.

We find that in the case where the members of a group contribute equally to the group feature, the sorting procedure increases the betweengroup behavioural variation well above what is expected for groups randomly sampled from a population. For a wide class of group feature models, the individual phenotypes are efficiently sorted across the groups and thus become available for further analysis on how individual properties affect group behaviour. We also show that the experimental data can be used to estimate the individuallevel repeatability of the underlying traits.

Our method allows experimenters to find repeatable variation in social behaviours that cannot be assessed in solitary individuals. Furthermore, experiments in animal behaviour often focus on comparisons between groups randomly sampled from a population. Increasing the behavioural variation between groups increases statistical power for testing whether a group feature is related to other properties of groups or to their phenotypic composition. Sorting according to socially relevant traits is also beneficial in artificial selection experiments, and for testing correlations with other traits. Overall, the method provides a useful tool to study how individual properties influence social behaviour.
Keywords: collective behaviour, personality, artificial selection, group composition
1 Introduction
Using a ruler and measuring scales, it is straightforward to assess the size and weight of an individual. And by observing or testing an individual’s behaviour in isolation a number of times, we can measure some of its behavioural traits, as is now common practice in behavioural ecology [1, 2, 3]. However, social behaviour, by definition, only occurs when more than one individual interact. The extent to which individual traits can be measured in a social context depends critically on the social structure of the groups they live in [4]. This task is especially difficult in nonhierarchical groups that emerge from local interactions [5]. And while we may be able to identify differences between grouplevel behaviour, linking this to differences in individual phenotypes remains a significant challenge [6]. This makes it difficult to properly answer important questions in biology, such as how and why social behaviour has evolved.
Some experimental methods exist to quantify social behaviour in individuals. Sociability, for example, can be assessed as a repeatable trait in fish using partitioned aquaria [7]. Similarly, the latency to join a group can be used [8]. Individuals may also be assessed within groups if these groups have a consistent network structure or division of labour: for example, social networks based on proximity or dyadic interactions can be analysed to find the most gregarious or influential individuals [4, 9, 10, 11, 12]. In other cases, repeatable differences in spatial or temporal ordering can be used to quantify individuals in terms of leaderfollower or proactivereactive traits [13, 14, 15, 16]. In such experiments, individuals often need to be marked for identification, which can influence the expression of some behaviours [17]. Another potential issue is that the expression of an individual’s behaviour depends on which conspecifics are available to interact with it. This is particularly important when animal groups are artificially composed by researchers.
An alternative way to investigate social behaviour at the individual level is to see how different combinations of personality types, as tested in individuals, affect the behaviour of groups [18, 19, 7, 20, 21]. These equalsized groups may be formed randomly, or they may be chosen to have particular phenotypic compositions. Grouplevel measurements may then be correlated against various descriptive statistics of the individual phenotypes comprising each group. For example, the exploratory tendency of feral guppy (Poecilia reticulata) shoals is more correlated with the lowest individual activity score and the highest sociability score than the means of either, indicating a disproportionate influence of minorities on a group’s behaviour [7].
One general limitation of this approach is that some behaviours are only expressed when particular social cues are available (for example, schooling or flocking responses) and hence cannot be measured in isolation, and may not be strongly correlated with personality. Another shortcoming is in the adhoc methods of composing groups. Randomly creating groups sampled from a larger population generally does not maximise the betweengroup variation in behaviour, which leads to reduced statistical power in hypothesis tests. Randomly sampled groups are statistically expected to become more similar to each other as the group size increases. The alternative of handpicking groups with particular compositions increases power, but generally restricts the experiment to testing a few hypotheses (e.g. whether bold, shy, or 5050 mixed shoals of fish forage best [18]), out of several alternatives (e.g. shoals with one bold or one shy individual, and so on).
Here, we propose an efficient method to compose a number of animal groups that consistently behave differently to each other, such that repeatable individual differences in social behaviour can be quantified, without the need for individual labelling or assessment. Our method sorts individuals between groups, not according to their own behaviours or interactions, but on the basis of an arbitrary grouplevel property of interest, which we refer to here as a “group feature”. This feature should be captured by a single continuous variable. Although there is much current interest in the evolutionary implications of group composition [22], this feature does not need to be directly related to fitness. For example, groups may be sorted according to cohesion, as measured by average nearestneighbour distance. Our method is designed so that the variance in this group feature increases over time, as long as there is repeatable variation in the underlying individual behaviour: for example, different individual tendencies to aggregate. The resulting groups, ranging from highcohesion to lowcohesion, may then be subjected to further assays in order to determine how cohesion is related to other properties of the groups.
We first detail the sorting method in Section 2.1. In Section 2.2, we provide a model for individual social phenotypes, and a general model for how individual contributions affect the group feature. In Section 2.3 we provide measures to analyse the experimental results. In Section 3.1 we define a class of group feature models for which the phenotypes become sorted by group. In Section 3.2 we simulate the sorting procedure for four group feature models and analyse the data produced. We then show how to estimate the repeatability of the underlying individual traits from this data in Section 3.3. We conclude in Section 4 by discussing how the method can provide insight into general questions in ethology, behavioural ecology and collective behaviour.
2 Materials and methods
2.1 Sorting procedure
We assume there are animals. This method is to be conducted in a laboratory setting, using animals either sampled from the wild or from a laboratory colony. Here we consider individuals that cannot be classified by other means: for example by sex or age.
The sorting is initiated by randomly dividing the individuals into groups of . Without loss of generality, we assume that and are both even numbers and is chosen accordingly. Each round consists of three steps (see Fig. 1). Firstly, each group is subjected to a behavioural assay, from which a score is obtained for the group feature. Such grouplevel scores are often obtained from the analysis of tracked videos or GPS coordinates [23], averaging over a fixed period of time or until an objective is reached [24, 25]. Secondly, the groups are ranked from lowest to highest scoring. In the following analysis, the th lowest score in each round will be referred to as . Finally, adjacent groups in this ranking exchange half of their members, chosen at random. That is, the exchange is between the 1st and 2nd ranked, between the 3rd and 4th ranked, all the way up to the pair of groups ranked and .
The only labelling required is of the locations where the groups are housed between assays. This presents an alternative to other combinatorial optimisation algorithms, such as for group testing problems and multiarmed bandits [26], which keep track of individuals as they are used in various combinations. In comparison, our algorithm is very straightforward to implement, and is also wellsuited for animals that are difficult to distinguish or tag individually. The mixing stage makes it unlikely that the same exact combination of individuals appears twice, and allows the space of possible groups to be explored efficiently.
As a practical example, a selection experiment for social cohesion (i.e. small interindividual distance) in fish would start by randomly placing, for instance, eight individuals each in 20 different tanks. These groups are then assessed by tracking the positions of individuals in each group for ten minutes, then calculating the respective average social cohesion scores, which are then ranked. In the next round, new groups are composed by swapping half of the individuals of each group with half of the groups immediately adjacent to them (above or below) on the scoreboard. Here a good catching protocol is essential, because the most daring/proactive/leaststressed individuals will often otherwise be caught first [27], thereby biasing group composition in the next round. To further avoid experimenter bias, the person handling the animals should be unaware of the score of the groups; also the groups should be tested in a random order. This procedure is repeated until a predefined criterion is reached i.e. a certain number of rounds, or until the ranking of the groups remains stable.
A way of assessing whether the sorting has stabilised, independently of the distribution of group scores, is by comparing the rankings between rounds. The similarity of the rankings at round to those at round can be measured using the correlation between the two sets of rankings. This gives a measure of grouplevel repeatability of behaviour. An increase in this correlation indicates that groups are changing rank by a smaller amount, and hence that the mixing between pairs has a smaller effect on the group behaviour over time.
The mathematical analysis of the sorting procedure in this paper assumes a slight variation of the method presented above. In this version, on alternate rounds the mixing pairs are shifted by one group. That is, in these rounds mixing occurs between groups 2 and 3, groups 4 and 5, and so on. While this version is more mathematically tractable, results are not expected to differ from the experimentally simpler version presented above.
2.2 Social phenotypes and group composition
To evaluate the sorting algorithm, we assume that each individual has a phenotype, or combination of phenotypes that determines its average contribution to the group feature when assayed, and that this contribution varies randomly between evaluations. For example, an individual contribution might be “attraction to conspecifics”, with the corresponding group feature being “cohesion”. Mathematically, each individual is labelled and has a trait value . In this paper, we assume that the trait values are normally distributed with a standard deviation of one. An individual’s contribution to its group’s feature when evaluated at time is then a random variable drawn from a normal distribution with mean and standard deviation . In this paper we will refer to as withinindividual variation. Due to the controlled experimental setup, the physical environment is assumed to be constant across individuals and across time. The social environment is presumed to change during sorting, however its effect on individual behaviour is not directly accessible. We assume that the remaining source of variation is the same for all individuals. Although the random variation in behaviour may vary between individuals, even at the same ontogenetic stage, this assumption is parsimonious and commonly made in the statistical models that are used to estimate repeatability [2]. From these definitions, the individual repeatability can be written as [1]
(1) 
Therefore indicates highly repeatable behaviour at the individual level, while indicates low repeatability.
The second important part of the model is a function that describes the group feature in terms of the contributions of its members. This may describe both the effects of the social environment on the expression of social traits [28], as well as how these expressions are combined in the group feature. To keep track of group membership, each group has a label , and we denote its set of members at time as . The group feature for group can therefore be written as a mathematical function, using the contributions for every individual in the set as the inputs.
The sorting method does not assume any particular model for how individual contributions determine the group feature. To evaluate the method, we use some scenarios equivalent to those outlined in [22] in terms of “group phenotypic composition”. The simplest possibility is that the group feature is the sum or average of its parts. For example, the many wrongs hypothesis in group navigation [29] says that the directional accuracy of a flock is determined by the average navigational ability of its members. In this case, a group feature for group can be written as
(2) 
A similar averaging function also fits the observed group exploration of Argentine ants [20].
We may also consider a case where the median contribution drives the group feature
(3) 
This model is expected to behave qualitatively similarly to that given by Equation 2, but to be less sensitive to extreme phenotypes, and more sensitive to the withinindividual variation of the intermediate members of groups.
Another possibility is that the group feature is largely influenced by an individual with the most extreme contribution, which plays either a leading or inhibiting role in the group. For example, the most cautious member of a group may produce social cues to make the entire group more cautious. This type of behaviour has been observed in fish, where bold individuals conform to shy conspecifics but not viceversa [30], as well as in group activity levels [7]. The limiting case of this model is the group depending entirely on a single “weakest link”, in which case
(4) 
Equivalently, the group could be limited by the ability of a single leader. Weighting in this direction has been observed in geese, where bolder individuals are more likely to make decisions [31]. In the limiting case the maximum contribution is the one of interest
(5) 
For the analysis considered here, the above two models are equivalent.
Greater heterogeneity within a group may also increase the group feature, such as in the presence of social niches [32]. For example, foraging species may benefit from both bold and shy individuals to promote exploration and group cohesion, respectively [18, 33]. Variation in aggression levels may also be beneficial by reducing conflict [19]. A simple function approximately capturing this scenario is the standard deviation
(6) 
where is the mean as defined above in Equation 2. Conversely, in other scenarios, greater homogeneity within a group increases the group feature. This may occur when interactions with similar conspecifics are stronger or more frequent [34]. A wellknown benefit of group homogeneity is the “confusion effect” as a defence against predation [35, 36]. If we then let the group feature be a measure of, for example, synchronisation, then the negative of the above formula can be used.
All of these scenarios may be considered as special cases of a general group feature function
(7) 
where is any function of the contributions.
2.3 Simulation and measurements
We evaluate the sorting procedure based on simulations for individuals, divided into 10, 16 or 20 groups. Initial simulations were run with , such that the withinindividual variation was equal to betweenindividual variation. This corresponds to a repeatability of , which is close to the median value reported in the literature [1].
Simulations and subsequent analysis were run in MATLAB. Simulations were run for 50 rounds, and statistics were built up by running each simulation 5000 times. Each round produced ranked scores of the groups up to . These were used to quantify the betweengroup variation, given simply by the standard deviation of group scores.
(8) 
where is the mean group score in round . The gain is defined by the ratio of this quantity to that in the first round
(9) 
This measures how much the betweengroup variation is increased compared to the initial random selection of groups. The group scores were transformed using the known distributions for random samples, so that initial distributions of were normal.
A complementary set of useful data, less dependent on the overall distribution of phenotypes, is the sequence of changes in group rank. The group ranked in round , after swapping half of its members and being assayed again, has a new rank in round given by . For each round of sorting, a betweenrounds correlation can be calculated to quantify the stability of the rankings in that round, and thus track the sorting progress. We use the Spearman rank correlation, given by
(10) 
which takes a maximum possible value of one if the rankings are unchanged. The expected value of is zero if the rankings change randomly (i.e. is entirely independent of ).
Using a Markovian approximation that the distribution of is independent of for all rounds, a likelihood function can be calculated for the data
(11) 
where is the likelihood of the correlation in round being , based on histograms obtained from performing several simulations with withinindividual variation parameter .
3 Results
3.1 Sortability for monotonically increasing group feature
Before we look at simulations of our sorting algorithm, we prove that the algorithm will be effective for a wide class of group feature functions (equation 7). Specifically, many biologically realistic group features are such that replacing a member of a group with one of a higher trait value will not decrease the expected group feature. Mathematically, this means the group feature function is weakly monotonic, or nondecreasing on the individual contributions. These functions includes the mean (equation 2) and other weighted averages, as well as the minimum (equation 4), median (equation 3) and maximum (equation 5), and other functions based on quantiles or thresholds. This class does not include cases where the group feature depends on homogeneity, as in equation 6.
To make our analysis independent of the distribution of , we use the global ranks of the individual trait values. Let the global ranks of the individuals in group in round be through in ascending order, and let
(12) 
be the sum of ranks in group in round . By our definition above, the group scores are increasing with . As the group scores diverge over time, we expect the ranksums to be similarly diverging for monotonic functions . In the Supporting Information we prove that this is the case for weakly monotonic group feature functions.
The theorem implies that over time the sorting algorithm increases the withingroup homogeneity and betweengroup diversity in terms of the individual phenotypes. If the monotonicity condition does not hold, then it is not guaranteed that withingroup variation of phenotypes will decrease, even if betweengroup variation increases.
3.2 Simulation results
The changes in group feature scores over time are shown in Fig. 2(ad) for typical simulations of four group feature models, based on the mean (equation 2), the median (equation 3), the max (equation 5) and the standard deviation (equation 6) of the phenotypes. The individuals with the overall most extreme and median hidden phenotypes are tracked as they move between groups, indicating how they are sorted over time. Panels (eh) show the corresponding betweenround correlations in group ranks given by equation 10, and panels (il) show the average gain in betweengroup variation given by equation 9. Both of these measures, even for the nonmonotonic, ‘standard deviation’ model, increase up to an equilibrium value. The gain in betweengroup variation, as well as the maximum betweenround correlation, are largest for the ‘mean’ model.
Both the global statistics and the sorting of the hidden phenotypes depend on the group feature model. When the group feature is proportional to the mean of contributions (Figure 2(a,e,i)), the initial betweengroup standard deviation is . Over time, individuals will tend to become grouped according to the underlying ranks of their phenotypes. For fully sorted groups, the betweengroup variation should approach the betweenindividual variance of 1. For all three group sizes the standard deviation of the group feature increases by about 33% after 25 rounds. For the ‘median’ model (Figure 2(b,f,j)), the dynamics are similar, but less efficient in increasing the betweengroup gain and the betweenround correlation.
For the ‘maximum’ group feature model, shown in Figure 2(c,g,k), the highest scoring group always contains the individual with the highest contribution in that round, and over time the individuals with highest trait values become clustered together in this group. Under this scenario, the process is highly sensitive to withinindividual variation, which slows down the overall sorting procedure. While the individuals with low are not well correlated with the groups they are associated with, those individuals with high (i.e. those that dominate group behaviour) are sorted together in higherranked groups. While not shown in Figure 2, the opposite case of minimumbased or ”weakest link” limited groups (equation 4) is identical to this, only with all group features and individual trait values changing sign. In this case, the individuals with the lowest trait values become sorted into the lowerranked groups.
In the ‘mean’, ‘median’, ‘min’ and ‘max’ models, the individuals become approximately sorted according to their phenotypes, as predicted by the result in Section 3.1. When the group feature increases with heterogeneity in the group , i.e. the group feature is proportional to the standard deviation, this is no longer the case (Figure 2 (d,h,l)). While betweengroup variation still increases, this is due to both tails of the trait distribution becoming mixed in the same groups.
Keeping the total number of individuals constant and using a different group size has little effect on the betweengroup variation over 25 rounds (Figure 2 (il)). In all cases, the group size affects the gain by less than 5%. For all models and group sizes, the majority of the increase in betweengroup variation occurs within the first 15 rounds.
3.3 Estimation of individuallevel repeatability
The stability of the group rankings between rounds is dependent on the number of rounds of sorting that have passed, as well as the model for the group feature, as well as the individual repeatability (controlled by the withinindividual variation ). Figure 2(eh) shows how the betweenrounds correlation depends on time and on the group feature for a constant repeatability. Figure 3 (a) shows how the typical correlation close to equilibrium (25 rounds) increases with repeatability for each model. This is because in all cases, more predictable individuals lead to more predictable groups.
Using the betweenrounds correlation as a statistic for each round, a maximum likelihood method can be used to estimate the repeatability. For every model and for every combination of parameters, 1000 simulations were run and histograms were compiled to represent the likelihood of a correlation for each round (i.e. from Equation 11). Figure 3 (b) shows the result of a single simulated experiment which is fit against these models. In the experiment, the individual repeatability was and the group function was the mean. Fitting against the mean model, the estimate obtained by maximumlikelihood is close to the correct value. The alternative ‘maximum’, ‘standard deviation’ and ‘median’ models produce a similar likelihood, but estimate a much higher repeatability.
The ‘mean’ model, out of all considered, gives the most conservative estimate for the repeatability. Running multiple experiments, the uncertainty of this estimation can be quantified by the standard deviation of the estimates, shown by Figure 3 (c). The uncertainty decreases with the number of rounds as more data is collected, with greater uncertainty for larger groups. In all cases, the uncertainty of the conservative estimate is reduced below within 15 rounds. This uncertainty has only a slight dependency on the underlying repeatability and model, as shown in Supporting Information.
4 Discussion
We have proposed a simple and general method for increasing variation in behaviour at the group level, and for studying the individual basis of socially relevant behaviours. Whatever is the dependence between group features and individual contributions, the phenotypic composition of the top and bottom groups naturally approach the optimum compositions for extreme group features. This avoids the need to exhaustively test many different compositions, which for larger groups is only feasible with agentbased simulations [33, 10]. For a wide class of group feature functions, the individuals become increasingly sorted according to their trait values. This result is robust to the distribution of trait values and, as the sorting is driven by these expected phenotypes, we expect it to also be robust to betweenindividual variation in plasticity. For the ‘mean’ model, where all individuals contribute equally to the group feature, this sorting results in a substantial increase in the betweengroup variation.
Our approach is especially useful for animals that aggregate in selforganised groups such as fish shoals, since it does not require individual labelling or tracking behaviour of individuals within groups. It is in such selforganised groups that differences between individuals are the most obscured. Only a small number of sorting rounds (1020) is required to achieve groups that are close to maximally stable for all models considered here. For even relatively small groups () the total number of assays required is comparable to the number required to separately assay each individual twice, as is commonly done when estimating repeatability [1]. However, attention to welfare issues may be required, as individuals are subject to more handling by this method. We now highlight five areas of application where we believe our method can be particularly useful.
Increased betweengroup variation enables more statistical power when investigating the link between group behaviour and group phenotypic composition [18, 19, 31, 7, 20, 21]. This may be done after sorting, also without labelling, by assaying the individuals from each group. For instance, the members of the top group may be split into arenas for assaying, followed by the next group, and so on. Regression or correlations can then reveal how certain summary statistics for the assays (for example, the mean or standard deviation) are dependent on the group ranking. In principle, information about the ideal group composition can also be gleaned from the sorting data, using a more detailed model selection than is shown here. However, we expect that it may be difficult to distinguish the ‘maximum’ and ‘standard deviation’ models, for example, which evolve in a similar way during sorting. In addition, some measures, such as the gain in betweengroup variation, may not be reliable test statistics if habituation to the assay is to be expected.
Our method can be used to confirm and quantify consistent individuallevel differences in socially relevant behaviour. This has the potential to add several behaviours, which are only expressed in groups, to the behaviours in which repeatable variation is known to exist [1]. If the repeatability is very low, the group rankings are expected to be random every round. An increase in the betweenrounds correlation of the group rankings over time may be used as evidence of repeatable social behaviour. Without knowledge of the correct group feature model, the ‘mean’ model provides a conservative estimate of this repeatability. If the group feature model can be inferred, for example using the method above, a more exact estimate of the repeatability can be made.
Investigating the effects of group composition can also be used to test mechanistic models of collective behaviour. The basic parameters of these models, such as the size and weighting of alignment and attraction zones, are often approximated via detailed analysis of microlevel interactions over time [37, 38, 39, 40]. These models often further predict certain relationships between group phenotypic composition and macroscopic grouplevel behaviour [41, 42], but these individual differences are hard to identify in large groups. By comparing the ‘rules of interaction’ in groups with different features, it can be seen what rules are covarying with the group feature and with each other, which may provide insight into the perceptual basis of collective motion [43, 23].
Identification of other traits that are correlated with social behaviour has also been difficult due to the earlier identified problems with detecting individual qualities in a collective setting. Our method now allows for high throughput analysis of individual aspects of social behaviour and sorting of individuals for later analysis of, for example, morphological traits [44, 45] and lifehistory traits [46]. Correlations with other behavioural traits may also be used to investigate behavioural syndromes [47]. As we have shown, the individual phenotypes can be sorted as long as the group feature is a monotonic function of all member contributions.
Our method also provides a basis for the experimental design of artificial selection experiments on generic social behaviours. This opens up a new experimental route to studying the evolvability and genetic basis of social behaviour [48]. Artificial selection can be performed by breeding individuals from a top and/or bottom quantile of the sorted groups. If the actual dependence of the group feature on individual contributions is monotonic (as in the mean, median, minimum or maximum cases), this should result in directional selection. In the case where the group feature increases with the heterogeneity of the group (as in equation 6), the top and bottom trait values will be mixed together in the groups with highest feature levels. The lowestscoring groups in this case can be used for artificial stabilising selection.
Our analysis can be extended to the sorting of multiple traits with different influences on grouplevel behaviour. If the group feature is expected to be a linear combination of functions on independent traits (e.g. if increases with the mean of phenotypes of trait X and decreases with the variance of phenotypes of trait Y in the group) then the sorting is expected to operate on both traits independently, with relative efficiency depending on the relative repeatability of traits X and Y. However, if the group feature depends on an interaction between phenotypes of X and Y across individuals (e.g. if the product of the two means of the phenotypes of X and Y) then we expect results to differ from what is presented here, since multiple distinct combinations of phenotypes may result in the same group feature. This is a fruitful topic for further study.
Various other extensions can be made to the models and analysis presented in this paper. A more detailed model selection can be done using the full set of rank transitions, rather than just the correlation of rankings between rounds. Moreover, since our method is quite general, the analysis we have presented can be extended to include other group feature models. A timedependence may be added to the modelled group behaviour in order to account for habituation or changes in the physical environment. A timedependence could also be used to model social learning [49, 50], although the randomised exchange of group members in each round is expected to mitigate this effect.
To conclude, we have generated a general and efficient experimental method for studying the individual basis of group level behaviour. Not only does this algorithm produce data useful for inference, but it also lends itself to many otherwise elusive experiments in collective behaviour, making it possible to bridge the empirical gap between individual properties and collective outcomes.
Acknowledgements. AS would like to thank Oliver Johnson and Melinda Babits for helpful suggestions. The authors would also like to thank multiple referees of previous versions of this manuscript for substantial suggestions for improvement. This work was funded by The Knut and Alice Wallenberg Foundations grant 102 .
Data Accessibility. MATLAB code used for simulations and for maximum likelihood inference is available in Supporting Information.
References
 [1] Alison M Bell, Shala J Hankison, and Kate L Laskowski. The repeatability of behaviour: a metaanalysis. Animal Behaviour, 77(4):771–783, 2009.
 [2] Niels J Dingemanse, Anahita JN Kazem, Denis Réale, and Jonathan Wright. Behavioural reaction norms: animal personality meets individual plasticity. Trends in Ecology & Evolution, 25(2):81–89, 2010.
 [3] Matthew E Wolak, Daphne J Fairbairn, and Yale R Paulsen. Guidelines for estimating repeatability. Methods in Ecology and Evolution, 3(1):129–137, 2012.
 [4] Hal Whitehead. Analyzing animal societies: quantitative methods for vertebrate social analysis. University of Chicago Press, 2008.
 [5] David JT Sumpter. Collective animal behavior. Princeton University Press, 2010.
 [6] James E HerbertRead, S Krause, LJ Morrell, TM Schaerf, J Krause, and AJW Ward. The role of individuality in collective group movement. Proceedings of the Royal Society of London B: Biological Sciences, 280(1752):20122564, 2013.
 [7] Culum Brown and Eleanor Irving. Individual personality traits influence group exploration in a feral guppy population. Behavioral Ecology, 25(1):95–101, 2014.
 [8] Denis Réale, Simon M Reader, Daniel Sol, Peter T McDougall, and Niels J Dingemanse. Integrating animal temperament within ecology and evolution. Biological reviews, 82(2):291–318, 2007.
 [9] Noa PinterWollman, Elizabeth A Hobson, Jennifer E Smith, Andrew J Edelman, Daizaburo Shizuka, Shermin De Silva, James S Waters, Steven D Prager, Takao Sasaki, George Wittemyer, et al. The dynamics of animal social networks: analytical, conceptual, and theoretical advances. Behavioral Ecology, page art047, 2013.
 [10] LM Aplin, JA Firth, DR Farine, B Voelkl, RA Crates, A Culina, CJ Garroway, CA Hinde, LR Kidd, I Psorakis, et al. Consistent individual differences in the social phenotypes of wild great tits, parus major. Animal behaviour, 108:117–127, 2015.
 [11] Orr Spiegel, Stephan T Leu, Andrew Sih, and C Michael Bull. Socially interacting or indifferent neighbours? randomization of movement paths to tease apart social preference and spatial constraints. Methods in Ecology and Evolution, 2016.
 [12] Stefan Krause, Alexander DM Wilson, Indar W Ramnarine, James E HerbertRead, Romain JG Clément, and Jens Krause. Guppies occupy consistent positions in social networks: mechanisms and consequences. Behavioral Ecology, page arw177, 2016.
 [13] Alicia LJ Burns, James E HerbertRead, Lesley J Morrell, and Ashley JW Ward. Consistency of leadership in shoals of mosquitofish (gambusia holbrooki) in novel and in familiar environments. PLoS One, 7(5):e36567, 2012.
 [14] Benjamin Pettit, Andrea Perna, Dora Biro, and David JT Sumpter. Interaction rules underlying group decisions in homing pigeons. Journal of The Royal Society Interface, 10(89):20130529, 2013.
 [15] Benjamin Pettit, Zsuzsa Ákos, Tamás Vicsek, and Dora Biro. Speed determines leadership and leadership determines learning during pigeon flocking. Current Biology, 25(23):3132–3137, 2015.
 [16] Lucy M Aplin, Damien R Farine, Richard P Mann, and Ben C Sheldon. Individuallevel personality influences social foraging and collective behaviour in wild birds. Proceedings of the Royal Society B: Biological Sciences, 281(1789), 2014.
 [17] Dennis L Murray and Mark R Fuller. A critical review of the effects of marking on the biology of vertebrates. In Research techniques in animal ecology: controversies and consequences, pages 15–64. Columbia University Press, New York, NY, 2000.
 [18] John RG Dyer, Darren P Croft, Lesley J Morrell, and Jens Krause. Shoal composition determines foraging success in the guppy. Behavioral Ecology, 20(1):165–171, 2009.
 [19] Jonathan N Pruitt and Susan E Riechert. How withingroup behavioural variation and task efficiency enhance fitness in a social group. Proceedings of the Royal Society of London B: Biological Sciences, 278(1709):1209–1215, 2010.
 [20] Ashley Hui and Noa PinterWollman. Individual variation in exploratory behaviour improves speed and accuracy of collective nest selection by argentine ants. Animal behaviour, 93:261–266, 2014.
 [21] Isaac PlanasSitjà, JeanLouis Deneubourg, Céline Gibon, and Grégory Sempo. Group personality during collective decisionmaking: a multilevel approach. Proceedings of the Royal Society of London B: Biological Sciences, 282(1802), 2015.
 [22] Damien R Farine, PierreOlivier Montiglio, and Orr Spiegel. From individuals to groups and back: The evolutionary implications of group phenotypic composition. Trends in ecology & evolution, 30(10):609–621, 2015.
 [23] J. E. HerbertRead. Understanding how animal groups achieve coordinated movement. Journal of Experimental Biology, 219(19):2971–2983, 2016.
 [24] Dora Biro, David JT Sumpter, Jessica Meade, and Tim Guilford. From compromise to leadership in pigeon homing. Current Biology, 16(21):2123–2128, 2006.
 [25] Christos C Ioannou, Manvir Singh, and Iain D Couzin. Potential leaders trade off goaloriented and socially oriented behavior in mobile animal groups. The American Naturalist, 186(2):284–293, 2015.
 [26] Christos H Papadimitriou and Kenneth Steiglitz. Combinatorial optimization: algorithms and complexity. Courier Corporation, 1998.
 [27] Peter A Biro and John R Post. Rapid depletion of genotypes with fast growth and bold personality traits from harvested fish populations. Proceedings of the National Academy of Sciences, 105(8):2919–2922, 2008.
 [28] Mike M Webster and Ashley JW Ward. Personality and social context. Biological Reviews, 86(4):759–773, 2011.
 [29] Andrew M Simons. Many wrongs: the advantage of group navigation. Trends in ecology & evolution, 19(9):453–455, 2004.
 [30] Ashley J Frost, Alexandria WinrowGiffen, Paul J Ashley, and Lynne U Sneddon. Plasticity in animal personality traits: does prior experience alter the degree of boldness? Proceedings of the Royal Society of London B: Biological Sciences, 274(1608):333–339, 2007.
 [31] Ralf HJM Kurvers, Vena MAP Adamczyk, Sipke E van Wieren, and Herbert HT Prins. The effect of boldness on decisionmaking in barnacle geese is groupsizedependent. Proceedings of the Royal Society of London B: Biological Sciences, 278(1714):2018–2024, 2011.
 [32] Ralph Bergmüller and Michael Taborsky. Animal personality due to social niche specialisation. Trends in Ecology & Evolution, 25(9):504–511, 2010.
 [33] Pablo Michelena, Raphaël Jeanson, JeanLouis Deneubourg, and Angela M Sibbald. Personality and collective decisionmaking in foraging herbivores. Proceedings of the Royal Society of London B: Biological Sciences, 277(1684):1093–1099, 2010.
 [34] DP Croft, R James, AJW Ward, MS Botham, D Mawdsley, and J Krause. Assortative interactions and social networks in fish. Oecologia, 143(2):211–219, 2005.
 [35] Manfred Milinski. A predator’s costs of overcoming the confusioneffect of swarming prey. Animal Behaviour, 32(4):1157–1162, 1984.
 [36] Laurie Landeau and John Terborgh. Oddity and the “confusion effect” in predation. Animal Behaviour, 34(5):1372–1380, 1986.
 [37] David JT Sumpter, Richard P Mann, and Andrea Perna. The modelling cycle for collective animal behaviour. Interface Focus, 2(6):764–773, 2012.
 [38] Andrea Cavagna, Alessio Cimarelli, Irene Giardina, Giorgio Parisi, Raffaele Santagati, Fabio Stefanini, and Massimiliano Viale. Scalefree correlations in starling flocks. Proceedings of the National Academy of Sciences, 107(26):11865–11870, 2010.
 [39] RP Mann, James E HerbertRead, Q Ma, LA Jordan, David JT Sumpter, and AJW Ward. A model comparison reveals dynamic social information drives the movements of humbug damselfish (dascyllus aruanus). Journal of the Royal Society Interface, 11(90):20130794, 2014.
 [40] James E HerbertRead, Andrea Perna, Richard P Mann, Timothy M Schaerf, David JT Sumpter, and Ashley JW Ward. Inferring the rules of interaction of shoaling fish. Proceedings of the National Academy of Sciences, 108(46):18726–18731, 2011.
 [41] Iain D Couzin, Jens Krause, Richard James, Graeme D Ruxton, and Nigel R Franks. Collective memory and spatial sorting in animal groups. Journal of theoretical biology, 218(1):1–11, 2002.
 [42] Larissa Conradt and Timothy J Roper. Consensus decision making in animals. Trends in ecology & evolution, 20(8):449–456, 2005.
 [43] Ariana StrandburgPeshkin, Colin R Twomey, Nikolai WF Bode, Albert B Kao, Yael Katz, Christos C Ioannou, Sara B Rosenthal, Colin J Torney, Hai Shan Wu, Simon A Levin, et al. Visual sensory networks and effective information transfer in animal groups. Current Biology, 23(17):R709–R711, 2013.
 [44] Linda Partridge and Kevin Fowler. Responses and correlated responses to artificial selection on thorax length in drosophila melanogaster. Evolution, pages 213–226, 1993.
 [45] Alexander Kotrschal, Björn Rogell, Andreas Bundsen, Beatrice Svensson, Susanne Zajitschek, Ioana Brännström, Simone Immler, Alexei A Maklakov, and Niclas Kolm. Artificial selection on relative brain size in the guppy reveals costs and benefits of evolving a larger brain. Current Biology, 23(2):168–171, 2013.
 [46] John Hunt, Michael D Jennions, Nicolle Spyrou, and Robert Brooks. Artificial selection on male longevity influences agedependent reproductive effort in the black field cricket teleogryllus commodus. The American Naturalist, 168(3):E72–E86, 2006.
 [47] Andrew Sih, Julien Cote, Mara Evans, Sean Fogarty, and Jonathan Pruitt. Ecological implications of behavioural syndromes. Ecology Letters, 15(3):278–289, 2012.
 [48] D.S. Falconer and T.F.C. Mackay. Introduction to Quantitative Genetics. Longman, 1996.
 [49] Sian W Griffiths. Learned recognition of conspecifics by fishes. Fish and Fisheries, 4(3):256–268, 2003.
 [50] Darren P Croft, Jens Krause, and Richard James. Social networks in the guppy (poecilia reticulata). Proceedings of the Royal Society of London B: Biological Sciences, 271(Suppl 6):S516–S519, 2004.
Supplementary Information
A: Proof of sortability result
We consider individuals being sorted in groups of . The hidden individual trait values are constant in time, and are written in increasing order as .
Let the global ranks in group in round be through in ascending order
The groups in each round are sorted by their group score
(13) 
such that for all and .
In odd rounds, mixing occurs between groups and for all , while in even rounds, mixing occurs between groups and for all . This is assumed to be done by combining into a group of size then randomly partitioning back into groups of size .
Let
(14) 
be the sum of ranks in group in round . We expect these to be increasing with along with if the groups are well sorted. Let
(15) 
be the probability that groups and are in the wrong order in terms of the rank sum.
Proposition 1: When there is mixing of groups and and there is no rearranging between pairs of groups, the sum is conserved.
Proposition 2: Furthermore, if is monotonic we know and are both monotonically increasing functions of .
Proposition 3: When is monotonic
(16) 
If and are random partitions of the same set at time t, this implies the upper bound
(17) 
This therefore applies to all groups at i.e.
(18) 
since the starting groups are randomly sampled from the population. This observation also applies to each pair of groups upon mixing.
Theorem 1: If f is nontrivial and nondecreasing on all arguments, then
(19) 
for some and therefore on average the groups become more homogenous over time
Proof: Consider groups. At round , groups 2 and 3 are mixed. We know from proposition 3 that and . If the pair is anomalous at time , that is, (which has probability ) invoking proposition 1 leaves us with
(20) 
and
(21) 
In round , groups 1 and 2 are mixed, and groups 3 and 4 are mixed. In this anomalous case, the sum is expected to decrease, and the sum is expected to increase, since and were unchanged. Using proposition 2, we know that is expected to decrease, and is expected to increase. Therefore
(22) 
In the other case at , there is no expected change to and .
(23) 
By the law of total probability, when is nonzero, it must decrease by a nonzero amount every two rounds. This proof is easily expanded to groups.
B: Repeatability and sorting progress
To investigate the effect of changing repeatability on the speed of sorting, we ran simulations for , and , corresponding to repeatability values of , and respectively. For lower values of , the equilibrium betweenrounds correlation is lower, but is reached in a shorter time compared to larger . As more data is gathered with each round, the uncertainty in the conservative estimate of decreases by a similar amount regardless of the underlying chosen model and repeatability. For the ‘maximum’ and ‘standard deviation’ models with low repeatability, the uncertainty in the estimate is decreased. This may be an artifact due to the estimated values being close to zero and hence close to the boundary of possible repeatability values.