Election turnout statistics in many countries: similarities, differences, and a diffusive field model for decision-making

Election turnout statistics in many countries: similarities, differences, and a diffusive field model for decision-making

Christian Borghesi, Jean-Claude Raynal and Jean-Philippe Bouchaud borghesi@msh-paris.fr ; raynal@ehess.fr ; jean-philippe.bouchaud@cea.fr : Centre d’Analyse et de Mathématique Sociales (CAMS-EHESS), 190-198, avenue de France, 75013 Paris, France
: EHESS, Division Histoire, 190-198, avenue de France, 75013 Paris, France
: Capital Fund Management, 6-8 Bd Haussmann, 75009 Paris, France
July 14, 2019

We study in details the turnout rate statistics for 77 elections in 11 different countries. We show that the empirical results established in a previous paper for French elections appear to hold much more generally. We find in particular that the spatial correlation of turnout rates decay logarithmically with distance in all cases. This result is quantitatively reproduced by a decision model that assumes that each voter makes his mind as a result of three influence terms: one totally idiosyncratic component, one city-specific term with short-ranged fluctuations in space, and one long-ranged correlated field which propagates diffusively in space. A detailed analysis reveals several interesting features: for example, different countries have different degrees of local heterogeneities and seem to be characterized by a different propensity for individuals to conform to the cultural norm. We furthermore find clear signs of herding (i.e. strongly correlated decisions at the individual level) in some countries, but not in others.

I Introduction

Empirical studies and models of election statistics have attracted considerable attention in the recent physics literature, see e.g. costa_filho_scaling_vot (); lyra_bresil_el (); gonzalez_bresil_inde_el (); fortunato_universality (); daisy-model (); growth_model_vote (); araripe_role_parties (); araujo_tactical_voting (); universality_candidates (). In diffusive-field (), the present authors have studied the statistical regularities of the electoral turnout rates, based on spatially resolved data from 13 French elections since 1992. Two striking features emerged from our analysis: first, the distribution of the logarithmic turnout rate (defined precisely below) was found to be remarkably stable over all elections, up to an election dependent shift. Second, the spatial correlations of was found to be well approximated by an affine function of the logarithm of the distance between two cities. Based on these empirical results, we proposed that the behaviour of individual agents is affected by a space dependent “cultural field”, that encodes a local bias in the decision making process (to vote or not to vote), common to all inhabitants of a given city. The cultural field itself can be decomposed into an idiosyncratic part, with short range correlations, and a slow, long-range part that results from the diffusion of opinions and habits from one city to its close-by neighbours. We showed in particular that this local propagation of cultural biases generates, at equilibrium, the logarithmic decay of spatial correlations that is observed empirically diffusive-field ().

The aim of the present note is to provide additional support to these rather strong statements, using a much larger set of elections from different countries in the world. We discuss in more depth the approximate universality of the distribution of turnout rates, and show that some systematic effects in fact exist, related in particular, to the size of the cities. We also confirm that the logarithmic decay of the spatial correlations approximately holds for all countries and all elections, with parameters compatible with our diffusive field model. The relative importance of the idiosyncratic, city dependent contribution and of the slow diffusive part is however found to be strongly dependent on countries. We also confirm the universality of the logarithmic turnout rate for different elections, for different regions or for different cities, provided the mean and the width of the distribution is allowed to depend on the city size. Overall, our empirical analysis provides further support to the binary logit model of decision making, with a space dependent mean (the cultural field mentioned above).

Ii Data and Observables

We have analyzed the turnout rate at the scale of municipalities for 77 elections, from 11 different countries. For some countries, the number of different elections is substantial: 22 from France (Fr, municipalities in mainland France) data-mun-fr (), 13 from Austria (At, municipalities) data-mun-at (), 11 from Poland (Pl, municipalities) data-mun-pl (), 7 from Germany (Ge, municipalities) data-mun-ge (), while for others we have less samples: 5 from Canada (Ca, municipalities) data-mun-ca (), 4 from Romania (Ro, municipalities) data-mun-ro (), 4 from Spain (Sp, municipalities in mainland Spain) data-mun-sp (), 4 from Italy (It, municipalities in mainland Italy) data-mun-it (), 3 from Mexico (Mx, municipalities) data-mun-mx (), 3 from Switzerland (CH, municipalities) data-mun-ch () and 1 from Czech Republic (Cz, municipalities)data-mun-cz (). More details on the nature of these elections and some specific issues are given in Appendix.

For each municipality and each election, the data files give the total number of registered voters and the number of actual voters , from which one obtains the usual turnout rate . For reasons that will become clear, we will instead consider in the following the logarithmic turnout rate (LTR) , defined as:


Because we know the geographical location of each city, the knowledge of for each city enables us to create a map of the field and study its spatial correlations.

Iii Statistics of the local turnout rate

Whereas the average turnout rate is quite strongly dependent on the election (both on time and on the type of election – local, presidential, referendum, etc.), the distribution of the shifted LTR was found to be remarkably similar for the 13 French elections studied in diffusive-field ().111The notation means a flat average over all cities (i.e. not weighted by the population of the city). The LTR standard-deviation, skewness and kurtosis were found to be very similar between different elections. The distribution of the shifted and rescaled LTR,


was found to be very close in the Kolmogorov-Smirnov (KS) sense.

We have extended this analysis to the 9 new election data in France, and to all new countries mentioned above. For France, the Elections Municipales (election of the city mayor), not considered in diffusive-field (), have a distinctly larger standard deviation than national elections. However, is again found to be similar for all the French elections, except the Régionales of 1998 and 2004. These happen to be coupled with other local elections in half municipalities, which clearly introduces a bias. The distributions for all elections in France are shown in Fig. 1 and compared to a Gaussian variable. The distribution is clearly non Gaussian, with a positive skewness equal to and a kurtosis equal . A more precise analysis consists in computing the KS distances between each pair of elections. We recall here that a KS distance of corresponds to a probability that the two tested distribution coincide, while corresponds to a probability. Removing the Régionales, we find that the KS distance averaged over all pairs of elections is equal to , with a standard deviation of . These numbers are slightly too large to ascertain that the distributions are exactly the same since in that case the average should be equal to . On the other hand, these distances are not large either (as visually clear from Fig. 1), meaning that while systematic differences between elections do exist, they are quite small. We will explain below a possible origin for these differences.

The same analysis can be done for all countries separately; as for France, we find that for different elections are all similar, except for Germany for which – see Table 1, where we show the mean and the standard-deviation of KS distances between elections of a given country, and of the skewness and kurtosis of the distributions in a given country. Note that the values of are close to for Italy and Poland. On the other hand, these distributions is clearly found not to be identical across different countries. Table 2 shows the matrix of KS distances between countries “super-distributions”.222A “super-distribution” of of a country is obtained by aggregating the appropriately shifted LTR distributions over all “compatible” elections. Compatible elections have roughly the same distribution , i.e. without normalization by its standard-deviation. They are chosen as follows: for Canada and Poland all elections; for France all pure national elections (nor combined with local elections, i.e. all elections apart from 1998-rg, 2004-rg, 2001-mun and 2008-mun); for Mexico 2003-D and 2009-D; for Germany 2005-D and 2009-D; all Chamber of Deputies (D) elections for Austria, Spain, Italy and Switzerland; and for Romania, all elections apart from its European Parliament election (see Appendix for more details). The values of are all large, except for the pairs Fr-Cz, Fr-CH, Sp-CH, Sp-Ro and CH-Cz.

Figure 1: Probability distribution of the rescaled variable over all communes for France. A standardized Gaussian is also shown. We use the same symbols and color codes for the French elections throughout this paper.
Country skewness kurtosis Country skewness kurtosis
Fr 1.490.47 1.080.15 4.81.3 At 1.440.54 0.100.38 0.530.81
(1.420.45) (1.100.14) (5.10.9) (0.930.19) (-0.130.21) (0.540.43)
Pl 0.800.20 0.120.26 0.380.42 Ge 3.01.1 0.480.30 1.60.9
(0.800.20) (0.120.26) (0.380.42) (0.81) (0.200.05) (1.530.04)
Sp 1.780.68 0.270.25 1.81.1 It 0.700.09 -0.450.11 1.010.02
(1.24) (0.070.21) (2.51.2) (0.68) (-0.450.15) (1.010.003)
CH 1.670.43 0.510.08 1.41.4 Mx 1.280.35 0.320.09 1.10.8
(0.47) (2.9) (1.19) (0.350.11) (1.60.3)
Ca 1.230.39 -0.400.39 4.40.9 Ro 1.060.39 0.050.43 1.50.4
(1.230.39) (-0.400.39) (4.40.9) (0.950.36) (-0.140.25) (1.60.4)
Table 1: Mean and standard-deviation of KS distances () between all pairs of elections within each country. Mean and standard-deviation of skewness and kurtosis of distributions of over all municipalities is also given for each country. In parentheses, the same measures but restricted to compatibles elections in each country.
Fr At Pl Ge Sp It Cz Mx CH Ca Ro
Fr 0 5.01 5.61 8.00 2.28 6.13 0.93 2.18 0.83 6.72 2.66
At 5.01 0 1.62 1.49 2.43 1.58 3.24 2.31 2.25 4.60 1.57
Pl 5.61 1.62 0 2.32 2.41 3.12 3.16 1.83 2.06 6.62 1.99
Ge 8.00 1.49 2.32 0 3.74 1.73 4.84 2.81 2.83 7.15 2.85
Sp 2.28 2.43 2.41 3.74 0 3.17 1.71 2.19 1.11 3.53 0.95
It 6.13 1.58 3.12 1.73 3.17 0 3.65 3.13 2.58 4.62 2.05
Cz 0.93 3.24 3.16 4.84 1.71 3.65 0 2.12 0.58 2.45 1.94
Mx 2.18 2.31 1.83 2.81 2.19 3.13 2.12 0 1.87 4.06 1.95
CH 0.83 2.25 2.06 2.83 1.11 2.58 0.58 1.87 0 1.44 1.39
Ca 6.72 4.60 6.62 7.15 3.53 4.62 2.45 4.06 1.44 0 2.78
Ro 2.66 1.57 1.99 2.85 0.95 2.05 1.94 1.95 1.39 2.78 0
Table 2: Kolmogorov-Smirnov distance between different “super-distributions”.

In order to understand better these results, one should first realize that the statistics of the LTR does in fact strongly depend on the size of the cities. This was already pointed out in diffusive-field (); these (). For example, the average LTR for all cities of size (within a certain interval), that we denote as , is distinctly dependent, see Fig. 2. In most cases, the average turnout rate is large in small cities and declines in larger cities, with notable exceptions: for example, the trend is completely reversed in Poland, with more complicated patterns for parliament elections in Italy or Germany. Similarly, the standard-deviation of , , also depends quite strongly on (see Figs 4 and 5 below).

Figure 2: Average value, , of the conditional distribution , for all countries and all elections. These quantities are obtained as averages over bins with 100 (200 for France) municipalities of size .

However, the distribution of the rescaled variable over all cities of size for each election can be considered to be universal from a KS point of view, both within the same country for different but now also, when is large enough, across different countries. For example, the average KS distance between distributions corresponding to different ranges of in France is equal to , with standard-deviation . These numbers are respectively , and for Italy, Spain and Germany.333We have excluded the smallest cities, , that are have a distinctly larger KS distance with other cities – see below. Bins, ranked according to the municipality size contain each around 500 municipalities. In Table 3, we show for different bins of values the mean and standard-deviation KS distance between countries, illustrating that all distributions are statistically compatible, at least when is large enough.

Now, even if is universal and equal to , will reflect the country-specific (and possibly election-specific) shapes of and , and the country-specific distribution of city sizes, . Indeed, one has:


which has no reason whatsoever to be universal. But since for a given country the dependence on of and tends to change only weakly in time, the approximate universality of for a given country follows from that of . In fact, French national elections can be grouped into two families, such that the dependence of on is the same within each family but markedly different for the two families (see next section and Fig. 3 below). Restricting the KS tests to pairs within each families now leads to an average KS distance of with a standard deviation (identical for the two families), substantially smaller than from Table 1. This goes to show that the election specific shape of is indeed partly responsible for the weak non-universality of .

1.470.77 1.380.65 0.940.48 0.910.46 0.950.48
Table 3: Mean and standard-deviation over all pairs of countries of the KS distance between the aggregated distributions in each country, for different values of .

Zooming in now on details, we give in Table 4 the KS distance between aggregated over all elections of a country and a normalized Gaussian, for different ranges of and different countries. The skewness and kurtosis of the distribution and the KS distance to a Gaussian, aggregated over all , are given in Table 5-a for different countries, and aggregated over countries for fixed in Table 5-b. Two features emerge from these Tables:

  • While for some countries (Cz, Sp, Mx) the deviation of from a Gaussian appear small (both measured by KS or by the skewness and kurtosis), such an assumption is clearly unacceptable for Italy and Germany, for which the KS distance is large for all (see Table 4) and a substantial negative skewness can be measured. Furthermore, the aggregated distribution (over all ) is clearly incompatible with a Gaussian except in the Czech Republic, Spain and Mexico.

  • There is an interesting systematic dependence of the distance to a Gaussian, which is on average smaller for larger s, and maximum for small cities. This suggests that although the KS tests is unable to distinguish strongly the for different , there is in fact a systematic evolution for which we provide an argument below. In fact, as clearly seen in Table 3, the average KS distance between the of different countries is also systematically smaller as increases.

Fr 2.50 2.15 1.18 0.71 0.86
At 2.03 1.82 0.76 0.98 1.58
Pl 0.45 1.45 0.89 1.40 1.20
Ge 1.75 2.78 2.55 2.49 3.08
Sp 0.70 0.83 0.71 0.63 0.69
It 2.69 3.74 3.11 2.32 0.88
Cz 0.63 0.73 0.55 0.37 0.61
Mx 1.50 0.79 0.55 0.97 0.48
CH 1.38 1.49 0.65 0.69 0.44
Ca 3.48 1.09 0.60 0.53 0.59
Ro 1.73 1.48 1.14 0.63 0.92
Table 4: KS distance between and a normalized Gaussian for different ranges of and for different countries.
Country   skew   kurt Fr 2.55 -0.02 0.31 At 2.63 -0.05 0.15 Pl 2.13 0.18 0.58 Ge 4.09 -0.21 0.05 Sp 1.03 -0.16 0.41 It 5.61 -0.67 0.79 Cz 0.83 -0.32 0.30 Mx 1.21 0.12 -0.06 CH 1.85 0.24 0.88 Ca 2.93 -0.75 2.14 Ro 2.36 -0.06 1.25 Tab. 5-a Range of   skew   kurt 2.25 -0.07 0.43 3.50 -0.12 0.44 2.90 -0.12 0.42 1.74 -0.13 0.31 1.74 -0.19 0.43 Tab. 5-b
Table 5: KS distance () to a standardized Gaussian, and low-moment skewness (skew) and kurtosis (kurt) of aggregated distributions . Tab. 5-a: data are aggregated over all for each country. Tab. 5-b: data are aggregated over all countries for fixed .

Iv A theoretical canevas

In order to delve deeper into the meaning of the above results, we need a theoretical framework. In diffusive-field (), we proposed to extend the classical theory of choice to account for spatial heterogeneities. A registered voter makes the decision to vote () or not () on a given election. We can view this binary decision as resulting from a continuous and unbounded variable that we called intention (or propensity to vote). The final decision depends on the comparison between and a threshold value : when , and otherwise. In diffusive-field (), the intention of an agent at time who lives in a city , located in the vicinity of , was decomposed as:


where is the instantaneous and idiosyncratic contribution to the intention that is specific to voter , and and are fields that locally bias the decision of agents living in the same area. The first field is assumed to be smooth (i.e. slowly varying in time and space), as the result of the local influences of the surroundings. This is what we called a “cultural field”, that transports (in space) and keeps the memory (in time) of the collective intentions. The second field , on the other hand, is city- and election-specific, and by assumption has small inter-city correlations. It reflects all the elements in the intention that depend on the city: its size, the personality of its mayor, the specific importance of the election that might depend on the socio-economic background of its inhabitants, as well as the fraction of them who recently settled in the city, etc. (See diffusive-field () for a more thorough discussion of Eq. (4).)

Consider now agents living in the same city, i.e. with under the influence of same field values and . The turnout rate is by definition:


For sufficiently large, and if the agents make independent decisions, the Central Limit Theorem tells us that:


where is the probability that the conviction of the voter is strong enough, and is a standardized Gaussian noise. If, on the other hand, agents make correlated decisions (for example, everybody in a family decides to vote or not to vote under the influence of a strong leader), one expects the variance of the noise term to increase by a certain “herding” factor , which measures the average size of strongly correlated groups. Therefore we will write more generally:


Following a standard assumption in Choice Theory, we take the idiosyncratic ’s to have a logistic distribution with zero mean and standard-deviation , in which case the expression of becomes:


This allows one to obtain a very simple expression for the LTR :


where . Therefore, in this model, the statistics of directly reflects that of the cultural and idiosyncratic fields.

Let us work out some consequences of the above decomposition, and how they relate to the above empirical findings. Since the cultural field is by definition not attached to a particular city, it is reasonable to assume that and are uncorrelated. Without loss of generality, one can furthermore set . Therefore:


Two extreme scenarios can explain the dependence of : one is that the dispersion term is strongly dependent while the statistics of is independent, the other is that is essentially constant and reflects an intrinsic dispersion common to all voters in a population, while the average of the city-dependent field depends strongly on the size of the city. Of course, all intermediate scenarios are in principle possible too, but the data is not precise enough to hone in the precise relative contribution of the two effects. Here, we want to argue that the dependence of on is likely to be dominant. Indeed, if the first scenario was correct, one should observe:


The decrease of as a function of would therefore mean that itself is a decreasing function of when the mean LTR is positive. This is a priori reasonable: one expects more heterogeneity (and therefore a larger , and a smaller ) in large cities than in small cities. However, the same model would imply a smaller dependence on for low turnout rates, and even an inverted dependence of on for elections with a very low turnover rate, such that . This is not observed: quite on the contrary, the dependence is compatible with a mere vertical shift for similar elections, see Fig. 3.

On the other hand, a model where is constant, independent of and to a first approximation on the election, leads to:


which appears to be a good representation of reality. The dependence of – the average propensity to vote – on , could be the result of several intuitive mechanisms: for example, voters in small cities are less likely to be absent on election day (usually a sunday in France); the result of an election is sometimes more important in small cities than in large cities (for example, election of the mayor); the social pressure from the rest of the community is stronger in small cities; all these effects suggest that the average turnout rate is stronger in small cities. In order to explain the opposite behaviour (as in Poland), or a non-monotonous dependence, as in Italy or Germany for parliament elections, a systematic dependence of on might be relevant, although one should probably dwell into local idiosyncracies.

Figure 3 suggests that in France three families of elections clearly appear: a) “important” national elections (Presidential, Referendums, Parliament), for which shows a change of concavity around ; b) less important national elections (European, Régionales) for which the average turnout is low, for which the change of concavity is absent; and c) Municipales for which the variation of between small and large cities is the largest (as can be expected a priori). Note that the difference between the mean LTR for small and large cities is markedly different in the three cases: in case a), in case b), and in case c).

Figure 3: Shifted as a function of for French elections. Three families of elections clearly appear. a) Top curves: “important” national elections (Presidential, Referendums, Parliament); b) Bottom curves: less important national elections (European, Régionales); and c) Middle curves: Municipales (see text). Each point comes from the average over around 200 communes of size .

As a first approximation, we thus take to be constant for all cities. The standard-deviation of over all cities of a given size then writes:


We show in Fig. 4 the quantity minus the trivial binomial contribution, i.e. the last term of the RHS of the above equation, as a function of , for French elections. As predicted by the above model, we see that the limit is clearly positive , and to a good approximation independent of the election – including the Municipales: although the dependence of is found to be markedly different (as ), this quantity still extrapolates to the same asymptotic value. If one believes that our interpretation of as a persistent cultural field is correct, there is in fact no reason to expect that should change at all from election to election. The above result is therefore compatible with the fact that is to a first approximation election independent, as already suggested by Fig. 3 above. The same results hold for all other countries, although the statistics is not as good as in the case of France: the asymptotic value of for is only weakly dependent on the election, and in the range for all countries. Furthermore, the -dependence of is found to be roughly compatible with with in all cases.

Figure 4: as a function of for French elections. Each point comes from around 300 communes of size . Dashed line: as extracted from the spatial correlations of (cf. Tab. 6). The 1998 and 2004 Régionales elections are excluded here.

If is constant, the -dependent contribution of must come from the variance of the city-specific contribution . A simple-minded model for the statistics of predicts a variance that should decrease as . Indeed, a large city can be thought of as a patchwork of independent small neighbourhoods, each with a specific value of . The effective value of for the whole city has a variance that is easily found to be reduced by a factor , and therefore . A weaker dependence of on signals the existence of strong inter-neighbourhood correlations (or strong heterogeneities in the size of neighbourhoods), that lead to a reduction of the effective number of independent neighbourhood from to with . These inter-neighbourhood correlations are indeed expected, since some of the socio-economic and cultural factors affecting the decision of voters are clearly associated to the whole city. Interestingly, these correlations should be stronger for local elections, which is indeed confirmed by the fact that is markedly smaller for the Municipales elections in France. We therefore find the interpretation of the anomalous dependence of as due to the city-specific contribution rather compelling.

Let us now turn to the distribution of the rescaled variable . Within the above model, and again assuming that is constant, one finds that:


The last “binomial” term quickly becomes Gaussian as increases, and is at least four times smaller than the first two terms when (when ). Since the cultural field is, according to the model proposed in diffusive-field (), the result of averaging random influences over long time scales and large length scales, one expects, from the Central Limit Theorem, that is close to a Gaussian field as well. However, the statistics of has no reason to be Gaussian for small cities , for which it reflects local and instantaneous idiosyncracies, and for which no averaging argument can be invoked. The “universality” of across countries is therefore probably only apparent, since there is no reason to expect that the distribution of is independent of the country. In fact, in countries like Italy, Germany & the Czech Republic do exhibit a stronger skewness than in other countries. Still, according to the above discussion, the contribution of different neighbourhoods to must average out as increases, and one expects the distribution of itself to become more and more Gaussian as increases.

Figure 5: as a function of for each election. These quantities are obtained as averages over bins with 300 municipalities of size . The dashed line corresponds to as extracted from the spatial correlations of (cf. Fig. 9). For election labels, see Figs. 1, 2.

To sum up: the random variable is the sum of three independent random variables, two of which can be considered as Gaussian, while the third has a distribution that depends on and becomes more Gaussian for large , with a variance that decreases as . This allows one to rationalize the above empirical findings on the distributions : these are more and more Gaussian as increases, and closer to one another for different countries, since the country specific contribution becomes smaller (as ) and itself more Gaussian.

It is instructive to compare the relative contribution to the variance of the turnout rates of the cultural field on the one hand, and of the city-specific field on the other. The latter can be obtained by subtracting from the total variance of the LTR, , the contribution of the cultural field which is obtained as the extrapolation of to (see Figs 4 & 5) and the average contribution of the binomial noise, . The herding factor can be estimated using the method introduced in diffusive-field (), which compares different elections for which the binomial noises are by definition uncorrelated (see Eq. (10) of Ref. diffusive-field ()). The ratio of can be seen as an objective measure of the heterogeneity of behaviour in country, i.e. how strongly local idiosyncracies can depart from the global trend. Table 6 gives the ratio for all studied countries. Using this measure, we find that the most heterogeneous countries are Canada and the Czech Republic,444Although the ratios for Ca, Mx, Cz and Ge might be overestimated because the data did not allow us to estimate the herding ratio in these two cases. and the most homogeneous ones are Austria, Switzerland and Romania. Not surprisingly, however, the largest value of is found for the French Municipales, i.e. local elections, for which idiosyncratic effects are indeed expected to be large. Note also that the herding ratio is anomalously high for Romania (), and quite substantial for Poland (). Finally, it is interesting to notice that the quantity depends only weakly on the country (it varies by a factor between France and Italy). Since the total intention is only defined up to an arbitrary scale, one can always set . Therefore, we find that the idiosyncratic dispersion (or the propensity not to conform to the norm encoded by the cultural field) is strongest in France, Poland and the Czech Republic, and weakest in Italy and Austria.

Country () (Eq. 19)
Fr (mun)
Table 6: Decomposition of the total LTR variance into a cultural field component , and city-specific component , and a binomial component, , corrected by a herding coefficient . This last term is determined using the method proposed in diffusive-field (), which leads to a herding coefficient given in the second column. : when the direct fit gives a value of less than unity, we enforce . : the case of Germany seems to be special, maybe due to a large fraction of postal votes. : the method to determine requires more than one election, and therefore cannot be applied to the Czech Republic. In this case, we also set by default. : Missing data prevents us from determining precisely, so we again set by default. The value of the exponent is only indicative, since in some countries the power-law assumption is not warranted, see Fig. 5. We give two values for : one as the asymptotic extrapolation of for and the second from the rescaling coefficient , see below and Fig. 9. Both these determinations are only precise to within roughly .
Figure 6: Heat map of the normalized logarithmic turnout rate , for the 2004 European Parliament election in France, Germany, Italy, Poland and Spain. Germany had nomenclature reform of their municipalities which make more difficult to efficiently join spatial data to electoral data. Note the strongly heterogenous, but long-range correlated nature of the pattern. Note also some strong regionalities, for example in the German regions of Sarre or Bade-Wurtemberg, where the average turnout rate is strong and sharply falls across the region boundaries. In these cases, the implicit assumption of a translation invariant statistical pattern that we make to compute is probably not warranted, and it would in fact be better to treat these regions independently.

V Spatial correlations of turnout rates

Another striking empirical finding reported in diffusive-field (); these () is the logarithmic dependence of the spatial correlation of the LTR as a function of distance. The spatial pattern of the local fluctuations of the LTR in European countrie are shown in Fig. 6. One clearly sees the presence of long-ranged correlations. More precisely, for the 13 French elections studied there, one finds that the spatial correlation of (where is the spatial location of the city and is the average of over cities of similar sizes) decreases as:


where is of the order of the size of the country. We show in Fig. 7 the average for all French elections (except the two Municipales elections) and in Fig. 8 the normalized correlation functions for all elections, separately for each country for which the geographic position of cities is available to us.

Figure 7: Average of spatial correlations for all French elections (absent the 2 Municipales elections). In dashed lines: , as extracted from the asymptotic () dependence of .
Figure 8: Normalized spatial correlations of for all countries for which the geographic position of cities is available. The correlation is normalized by the variance of , such that . For labels of elections, see Figs. 1, 2.

Using the above decomposition, and noting that by assumption the fluctuations of around the suitable size dependent average have short-ranged correlations, one concludes that the long-range, logarithmic correlations above must come from those of the cultural field . One indeed finds:


since the other two terms only contribute for . As a consistency check of this decomposition, one should find that should quickly decay from to (e.g. for France). This is indeed seen to be well borne out, see Fig. 7. The agreement between two completely different determination of (one using the extrapolation of to infinite sizes, and the second using ) holds very well for France, Italy and the Czech Republic, and only approximately for other countries (see Tab. 6 and Fig. 5).

Inspired by a well-known model in statistical physics where these logarithmic correlations appear, we postulated in diffusive-field () that the field evolves according to a diffusion equation, driven by a random noise, which is meant to describe the exchange of ideas and opinions between nearby cities and the random nature of the shocks that may affect the cultural substrate. As we argued in diffusive-field (), the fact that people move around and carry with them some components of the local cultural specificity leads to a local propagation of . Through human interactions, the cultural differences between nearby cities tend to narrow according to:


where is a symmetric influence matrix, that we assume to decrease over a distance corresponding to regular displacements of individuals, say km or so. For concreteness, we take: . As is well known, the continuum limit of the right hand side of Eq. (17) reads , where is the Laplacian and is a measure of the speed at which the cultural field diffuses. Random cultural “shocks” add to the above equation a noise term .

If cities were located on the nodes of a regular lattice of linear size , it would be easy to compute analytically the stationary correlation function of the field . It is found to be given by a logarithm function of distance, provided :


However, the spatial distribution of cities in real countries is quite strongly heterogeneous, which leads to significant deviation from a pure logarithmic decay. In order to compare quantitatively our model with empirical data, we have therefore simulated the model using Eq. (17) with the exact locations of all cities for the different countries under consideration. The results, averaged over many histories of the noise term, are shown in Fig. 9-left for km, (but changing from km to km hardly changes the curves). Quite remarkably, we see that exhibits a significant concavity, very similar to what is observed for the empirical correlations. In order to see that the model is indeed compatible with observations, we have plotted in Fig. 9-right the empirical data superimposed with the prediction of the model for the French case (for which the data is best). The empirical correlation is rescaled by a country dependent value in order to achieve the best rescaling. This value of allows us to obtain a second determination of , through the relation:


Note however that the numerical model predicts a rather large dispersion around the average result, that comes from a strong dependence on the noise realisation . One should therefore expect that the empirical data (which corresponds to only a few histories) departs from the average theoretical curve, in a way perfectly compatible with Fig. 9-right. This also means that there is quite a bit of leeway in determination of , which is only determined to within . Finally, note that the shape of for Germany is significantly different, with a pronounced change of regime around km. This is clearly related to the strong regional idiosyncracies that we discussed in Fig. 6.

We conclude that our numerical model reproduces very satisfactorily the observations for all studied countries (with the possible exception of Germany, for the reason noted above). This lends strong support to the existence, conjectured in diffusive-field (), of an underlying diffusive cultural field responsible for both the long-range correlation (in space) and persistence (in time) of voting habits.

Figure 9: Average of spatial correlation, rescaled. Left: Average over numerical simulations of the model (with  km) with the true positions of all cities for each country. Right: Average over real election data for each country. We also shown the average and standard deviation (coming from different realizations of the noise history , and plotted as error bars) corresponding to the numerical model for French cities.

Vi Conclusion

In this paper, we have shown that the empirical results for the statistics of turnout rates established in diffusive-field () for some French elections appear to hold much more generally. We believe that the most striking result is the logarithmic dependence of the spatial correlations of these turnout rates. This result is quantitatively reproduced by a decision model that assumes that each voter makes his mind as a result of three influence terms: one totally idiosyncratic component, one city-specific term with short-ranged fluctuations in space, and one long-ranged correlated field which propagates diffusively in space. The sum of these three contributions is what we call the “intention”. A detailed analysis of our data sets has revealed several interesting (and sometimes unexpected) features: a) the city-specific term has a variance that depends on the size of the city as with , suggesting strong inter-city correlations; b) different countries have different degrees of local heterogeneities, defined as the ratio of the variance of the city-dependent term over the variance of the cultural field; c) different countries seem to be characterized by a different propensity for individuals to conform to a cultural norm; d) there are clear signs of herding (i.e. strongly correlated decisions at the individual level) in some countries, but not in others; e) the statistics of the logarithmic turnout rates become more and more Gaussian as increases.

Although we have confirmed the existence of a diffusive cultural field using election data from different countries, we feel that more work should be done to establish the general relevance of this idea to other decision making processes. It would be extremely interesting to find other data sets that would enable one to study the spatial correlations of decision making. An obvious candidate would be consumer habits – for example the consumption pattern of some generic goods, or the success of some movie, etc.

Finally, we believe that our detailed analysis of the statistics of turnout rates (or more generally of election results) reveals both stable patterns and subtle features, that could be used to test for possible data manipulation or frauds, or to define interesting “democracy” indexes. In that respect, the existence of strong herding effects in some countries is somewhat disturbing.

Vii Materials and Methods

The Appendix gives more information about the set of (public) electoral data studied in this paper. Most of them can be directly downloaded from official websites (see References).

Average values and standard-deviations do not take into account extreme values in order to remove some electoral errors, etc. Electoral values greater than 5 sigma are not taken into account 555For instance let 100 municipalities of size (as in Fig. 2), each one has a LTR (). First, and are the average value and the standard-deviation of over these 100 municipalities. Next, the final average value and the final standard-deviation, , over this sample of 100 municipalities are uniquely evaluated for municipalities, , such that ..

Appendix: Details on the data sources

Table 7 shows the nature of the 77 national elections from 11 countries, studied at the municipality scale. Countries are: France (Fr)6661994 and 2004 Régionales elections occurred at the same time as strictly local elections (cantonales, i.e. at a kind of county level) in half of municipalities., Austria (At)777Postal votes (Wahlkarten) are not taking account in this paper., Poland (Pl), Germany (Ge)888Länd Parliament elections at time less or equal to 2004 (or 2010) in each Land are written here as ‘2004-Ld’ (or ‘2010-Ld’)., Spain (Sp), Italy (It), Swiss (CH)999The referendums or votations ( and ) respectively occurred on March 11th and July 17th 2007., Czech Republic (Cz), Canada (Ca), Romania (Ro)101010The referendum studied here (about the Parlament unicameral and the reduction of the maximum of deputies) occurred at the same time than the first round of the Presidential election. Some Romanian electors, not registered in the lista electorala permanenta, are able to vote. For this country, we pursue to write the Number of Register Voters, the registered electors who take part to the election. and Mexico (Mx). Note that all the studied elections occurred in a same time over all the country (apart from 2 Länder elections in Germany) and are free of compulsory voting. Lastly, in our database for Germany, postal votes (Briehwahlen) are taken into account in some Länder, not in others, which artificially increases turnout heterogeneity between German regions.

Moreover Election turnout statistics have been located, identified and geocoded, based on a set of points, which were obtained by calculating the gravity center of each municipality or the position of the town-hall, and then adding the X and Y coordinates for each of these features. In addition to these coordinates, the objects are described with several attributes: logarithmic turnout rate, , normalized logarithmic turnout rate, , etc. This concerns 8 countries amongst the 11 previous ones 111111The Mexican spatial repartition of municipalities is so widely heterogeneous than the spatial study made for other countries is no longer efficient here.: Austria XY-at (), Czech Republic XY-cz (), France XY-fr (), Germany XY-ge (), Italy XY-it (), Poland XY-pl (), Spain XY-sp () and Switzerland XY-ch (). This study is limited to mainland municipalities (and each considered country have more than two thousands municipalities). Lambert 2 étendu is used for France, while WGS 84 coordinate system is used for other countries.

Ctry el mun spa elections
Fr 22 36000 Y 1992-R, 1993-D, 1994-E, 1995-P1, 1995-P2, 1997-D, 1998-rg, 1999-E, 2000-R, 2001-mun, 2002-P1,
2002-P2, 2002-D, 2004-rg, 2004-E, 2005-R, 2007-P1, 2007-P2, 2007-D, 2008-mun, 2009-E, 2010-rg
At 13 2400 Y 1994-D, 1995-D, 1996-E, 1998-P, 1999-E, 1999-D, 2002-D, 2004-P, 2004-E, 2006-D, 2008-D, 2009-E, 2010-P
Pl 11 2500 Y 2000-P1, 2001-D, 2003-R, 2004-E, 2005-D, 2005-P1, 2005-P2, 2007-D, 2009-E, 2010-P1, 2010-P2
Ge 7 12000 Y 2002-D, 2004-Ld, 2005-D, 2009-E, 2009-D, 2010-Ld
Sp 4 8000 Y 2004-D, 2004-E, 2008-D, 2009-E
It 4 7200 Y 2004-E, 2006-D, 2008-D, 2009-E
CH 3 2700 Y 2007-R, 2007-R, 2007-D
Cz 1 6200 Y 2003-R
Ca 5 7700 N 1997-D, 2000-D, 2004-D, 2006-D, 2008-D
Ro 4 3200 N 2009-E, 2009-R, 2009-P1, 2009-P2
Mx 3 2400 N 2003-D, 2006-D, 2009-D
Table 7: Nature of elections studied in this paper. For each country (Ctry), the number of elections (el) and the number of municipalities(mun) in the mainland are written. ”Y” (or reversely ”N”) mentions that municipalities are spatially (spa) localized. For each country, an election is identified by its year date and its nature. D: Chamber of Deputies election; E: European parliament election; P: presidential election (according to the constitution of the country, in only one round); P1 and P2: first and second round of a Presidential election; R: Referendum; Ld: German Länder elections; rg: French Régionales elections; mun: French municipales. For each country elections are given in a chronological order (but the 2009 Romanian Presidential (P) and Referendum (R) elections occurred the same day). Even if an election needs two rounds, only the first one is considered (e.g. the French Chamber of Deputies (D), Régionales (rg) and municipales (mun) elections) unless the contrary is indicated (e.g. P1 and P2).


C. B. would like to thank Brigitte Hazart, from the French Ministère de l’Intérieur, bureau des élections et des études politiques; Nicola A. D’Amelio, from the Italian Ministero dell’interno, Direzione centrale dei servizi elettorali; Radka Smídová, from the Czech Statistical Office, Provision of electronic outputs; Claude Maier and Madeleine Schneider, from the Swiss Office fédéral de la statistique, Section Politique, Culture et Médias; Alejandro Vergara Torres, Antonia Chávez, from the Mexican Instituto Federal Electoral; Matthias Klumpe from the German Amt für Statistik Berlin-Brandenburg; anonymous correspondents of the Élections Canada, Centre de renseignements and of the Spanish Ministerio del Interior, Subdirección General de Política Interior y Procesos Electorales, for their explanations and also for the great work they did to gather and make available the electoral data that they sent us.


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description