Modeling the emergence of a new language:
Naming Game with hybridization^{*}^{*}*The final publication will be available at
http://www.springer.com/lncs
Abstract
In recent times, the research field of language dynamics has focused on the investigation of language evolution, dividing the work in three evolutive steps, according to the level of complexity: lexicon, categories and grammar. The Naming Game is a simple model capable of accounting for the emergence of a lexicon, intended as the set of words through which objects are named. We introduce a stochastic modification of the Naming Game model with the aim of characterizing the emergence of a new language as the result of the interaction of agents. We fix the initial phase by splitting the population in two sets speaking either language A or B. Whenever the result of the interaction of two individuals results in an agent able to speak both A and B, we introduce a finite probability that this state turns into a new idiom C, so to mimic a sort of hybridization process. We study the system in the space of parameters defining the interaction, and show that the proposed model displays a rich variety of behaviours, despite the simple mean field topology of interactions.
1 Emergence of a lexicon as a language
The modeling activity of language dynamics aims at describing language evolution as the global effect of the local interactions between individuals in a population of agents, who tend to align their verbal behavior locally, by a negotiation process through which a successful communication is achieved [1, 2]. In this framework, the emergence of a particular communication system is not due to an external coordination, or a common psychological background, but it simply occurs as a convergence effect in the dynamical processes that start from an initial condition with no existing words (agents having to invent them), or with no agreement.
Our work is based on the Naming Game (NG) model, and on its assumptions [3]. In Fig. 1 we recall the NG basic pairwise interaction scheme. A fundamental assumption of NG is that vocabulary evolution associated to every single object is considered independent. This lets us simplify the evolution of the whole lexicon as the evolution of the set of words associated to a single object, equally perceived in the sensorial sphere by all agents.
The simplicity of NG in describing the emergence of a lexicon also relies on the fact that competing words in an individual vocabulary are not weighted, so that they can be easily stored or deleted [4]. In fact it turns out that for convergence to a single word consensus state, the weights are not necessary as it was supposed by the seminal work in this research field [3]. Every agent is a box that could potentially contain an infinite number of words, so the number of states that can define the agent is limited only by the number of words diffused in the population at the beginning of the process (so that anyone can speak with at least one word).
In this work we aimed not only at the aspect of competition of a language form with other ones, but also at introducing interactions between them, with the possibility of producing new ones. We investigate conditions for the success of a new idiom, as product of a synthesis and competition of preexisting idioms. To this purpose, we introduce a stochastic interaction scheme in the basic Naming Game accounting for this synthesis, and show in a very simple case that the success of the new spoken form at expense of the old ones depends both on the stochastic parameters and the fractions of the different idioms spoken by populations at the beginning of the process. We have simulated this process starting from an initial condition where a fraction of the population speaks with and the remaining with . It turns out that when the differentspeaking fractions are roughly of the same size the new form, which we call (therefore we shall refer to our model as the “ABC model” in the following), created from the synthesis of and , establishes and supplants the other two forms. Instead, when (or symmetrically ), above a threshold depending on the chosen stochastic parameters, the term establishes (or symmetrically ), namely one of the starting idioms prevails and settles in the population.
2 The ABC Model
The model we propose here is based on a mean field topology involving agents, i.e. any two agents picked up randomly can interact. Despite this pretty simple topology of interactions, the proposed model will show a richness of behaviors. In the basic Naming Game, the initial condition is fixed with an empty word list for each agent [4]. If an agent chosen as speaker still owns an empty list, he invents a new word and utters it. In our proposed new model all agents are initially assigned a given word, either or , so that there is no possibility to invent a brand new word, unless an agent with both and in his list is selected. In that case, we introduce a finite probability that his list containing and turns into a new entry (Fig. 2). We interpret as a measure of the need of agents to obtain a sort of hybridization through which they understand each other, or a measure of the natural tendency of two different language forms to be synthesized together. In the latter case, different values of would depend on the language forms considered in a real multi language scenario.
The stochastic rule may be applied either in the initial interaction phase by changing the dictionary of the speaker, or at the final stage by changing the state of the hearer after the interaction with the speaker. In this paper we show the results obtained by running the trasformation (and also when a speaker has got an vocabulary) before the interaction. Introducing the trasformation after the interaction changes the results from a qualitative point of view, producing only a shift of transition lines between the final states in the space of parameters defining the stochastic process.
A further stochastic modification of the basic Naming Game interaction, firstly introduced in [8], has also been adopted here. It gives account for the emergence or persistance of a multilingual final state, where more than one word is associated to a single object. This is done by mimicking a sort of confidence degree among agents: in case of a successful interaction, namely when the hearer shares the word uttered by the speaker, the update trasformation of the two involved vocabularies takes place with probability (the case obviously reduces to the basic NG). Baronchelli et al. [8] showed in their model (which corresponds to our model at ) that a transition occurs around . For the system converges to the usual one word consensus state (in this case only or ). For the system converges to a mixed state, where more than one word remains, so that there exist single and multi word vocabularies at the end of the process (namely , and ). A linear stability analysis of the mean field master equations of this model (describing the evolution of vocabulary frequencies , and ) shows that the steady state , (or symmetrically , ), which is stable for , turns unstable if , where viceversa the steady state , emerges as a stable state, with being a simple algebraic expression of the parameter . In our work, as shown next, we found that the transition orderdisorder (i.e. single word vs. multi word final state) at remains for all the values of .
The numerical stochastic simulation of the process for selected values of , indicates that the system presents a varied final phase space as shown in Fig. 3 and 4, left panel. The transition line at remains: for the system converges to a oneword final state, with a single word among , and , while for it converges to a multiword state with one or more than one word spoken by each agent.
This result is confirmed by the integration of the mean field master equation of the model, describing the temporal evolution of the fractions of all vocabulary species , , , , , , , present in the system. The results of such integration, involving a fourth order RungeKutta algorithm, display the same convergence behaviour of the stochastic model (Fig. 3 and 4, right panel), though they are obviously characterized by less noise.
2.1 High confidence
By looking at Fig. 3 at the region with , we note a transition interval between the final state composed only of either or (red region) and a final state with only (orange region). The fuzziness of the border dividing these two domains, evident in the left panel of the figure, can be ascribed to finite size effects, for the separation line gets sharper by enlarging the number of agents , eventually collapsing towards the strict line obtained by the integration of the mean field master equation (right panel of the figure), which we report in the Appendix section. The linear stability analysis of the cumbersome mean field master equation reveals that these two phases are both locally stable, and turn unstable when . The convergence to one or the other phase depends on the initial conditions, i.e. whether the system enters the respective actraction basins during its dynamical evolution. To demonstrate this, we studied the behaviour of the system by fixing and varying both and the initial conditions , , with . The result reported in Fig. 4 clearly shows a dependence on the initial conditions. In particular, if the initial condition is sufficiently large, the convergence to the phase disappears.
The corresponding threshold value of decreases slightly by decreasing the value of parameter , going from for to for (i.e. slightly above the transition signalled by ).
By numerically solving the mean field master equation, we analyzed the evolution of the system in proximity of the transition between the (orange) region characterized by the convergence to the state and the (red) region where the convergence is towards either the state or , being these latter states discriminated by the initial conditions or respectively. We show the result in Fig. 5 obtained by fixing and . Initially, both the fractions of (black curve) and (red curve) decrease in favour of (blue curve). Thereafter, also starts to decrease since the mixed and states can be turned directly into , causing an increase of (green curve). While the , , , fractions vanish quite soon, mainly because fewer agents have the state in their vocabulary, the states involving and survive, reaching a metastable situation in which and . This metastable state of the system is clearly visible in the mid panel of Fig. 5. The life time of the metastable state diverges by approaching the corresponding set of parameters ( in the Figure; depends on ), with the result that the overall convergence time diverges as well. The stochastic simulation would differ from the solution of the master equation right at this point: a small random fluctuation in the fraction of either or will cause the convergence of the stochastic model towards one or the other state, while the deterministic integration algorithm (disregarding computer numerical errors) will result always in the win of the state for or the state for . The fuzziness visible in all the figures related to the stochastic model is a direct consequence of those random fluctuations.
Another interesting area in the phase space of the model is the boundary between the region around , where one switches from a convergence to a single state () to a situation with the coexistence of multiple states (). As the time of convergence towards the consensus phase, which is the absorbing state whenever , diverges following a powerlaw with the same exponent as in the case of , where we recover the results of [8], i.e. (Fig. 6). Of course, in the case there is no state involved anymore and the competition is only between the and states. Moreover, as we note from Fig. 6, the convergence time to the one word consensus state is the highest when and decreases by increasing the value of . This result is somewhat counter intuitive since we expect that the presence of three states , , would slow down the convergence with respect to a situation with only two states and , but actually in the first case another supplementary channel is yielding the stable case, i.e. the channel (neglecting of course the rare ) thus accelerating the convergence.
The linearization of the mean field master equation around the absorbing points with delivers six negative eigenvalues, confirming that the points in the orange and red region of Fig. 3 are locally stable. Moreover, it comes out that those eigenvalues do not depend on showing that the choice of the initial conditions on and is crucial in entering the two different actraction basins. As a consequence of this independence on , the equation of the line dividing the orange and red regions cannot be calculated easily.
2.2 Low confidence
In the case we get multiword final states. The green color in Fig. 3 stands for an asymptotic situation where and (and of course a symmetric situation with replaced by when the initial conditions favour rather than ). The dependence of the asymptotic fractions and on is the same of that occurring for and presented in [8].
Instead, the blue color of Fig. 3 represents an asymptotic state where all vocabulary typologies are present , , , , , and , with and . In this case the vocabulary fractions depend both on and . White dots in the left panel of Fig. 3, which tend to disappear enlarging the population size , are points where the system has not shown a clear stable state after the chosen simulation time. In fact, they disappear in the final phase space described by the mean field master equation. Contrary to the case of , the final state does not depend on the particular initial conditions provided that initially . By fixing and varing and initial conditions , we get the steady behavior shown in Fig. 7.
The linearization of the mean field master equation around the absorbing points with in the green region of Fig. 3 reveals that those are actractive points (six negative eigenvalues) irrespective of the initial condition provided that . The equation of the transition line that divides the blue and green region can be inferred numerically, again with the linearization of the master equation. In particular the transition point at can be found analytically to be at . The independence on the initial conditions makes the region substantially different from the complementary region .
3 Conclusions
We modeled the emergence of a new language as a result of the mutual interaction of two different populations of agents initially speaking different idioms and and interacting each other without restriction (mean field). Such tight connections between individuals speaking two different idioms is certainly unrealistic, but the same reasoning can be extended to accomplish the birth of single hybrid words resulting from the interaction of two pronunciation variants of the same object (eg. the english word rowel, perhaps an hybridization of the latin rota and the english wheel).
Three parameters govern the time evolution of the model and characterize the final asymptotic state: the measure of the tendency of a hearer to adopt the shared word used by the speaker (confidence), the probability that two forms and are synthesized into the form , and the initial condition in the space . It turns out that:

for the system converges to multiword states, all containing a fraction of the state , and that do not depend on the initial conditions provided that .

for the system converges to a consensus state where all agents end up with the same state, either , or . The transition line separating the or convergence state from , which are all locally stable independently from , depends on the initial distribution of and , with . Moreover, the invention of produces a reduction of the time of convergence to the consensus state (all agents speaking with ) when starting with an equal fraction of and in the population.
Interestingly the modern point of view of linguists links the birth and continuous development of all languages as product of local interaction between the varied language pools of individuals who continuously give rise to processes of competition and exchange between different forms, but also creation of new forms in order to get an arrangement with the other speakers [9]. In this view of a language as mainly a social product it seems that the use of the Naming Game is particularly fit, in spite of the old conception of pure languages as product of an innate psychological background of individuals [10].
It would be interesting to apply our model in the study of real language phenomena where a sort of hybridization of two or more languages in a contact language ecology takes ground. There are many examples of this in the history, as for example the formation of the modern romance European languages from the contact of local Celtic populations with the colonial language of Romans, Latin. A more recent example of this is the emergence of Creole languages in colonial regions where European colonialists and African slaves came into contact [11].
The starting point for a comparison of our model with this kind of phenomena would be retrieving demographic data of the different ethnic groups at the moment they joined in the same territory and observing if a new language established. Our point of view would be obviously not to understand how particular speaking forms emerged, but to understand whether there is a correlation between the success of the new language forms and the initial language demography. In this case, a more refined modeling would take into account also the temporal evolution of the population due to reproduction and displacements, and the particular topologies related to the effective interaction scheme acting in the population.
Acknowledgements
The authors wish to thank V. Loreto and X. Castelló for useful discussions. The present work is partly supported by the EveryAware european project grant nr. 265432 under FP7ICT2009C.
Appendix: mean field master equation
The mean field master equation of the ABC model, in the case in which the speaker changes her vocabulary with the rule before the interaction, is the following:
(1)  
References
 [1] C. Castellano, S. Fortunato and V. Loreto, Statistical physics of social dynamics, Rev. Mod. Phys. 81, 591646 (2009).
 [2] V. Loreto, A. Baronchelli, A. Mukherjee, A. Puglisi and F. Tria, Statistical physics of language dynamics, J. Stat. Mech. P04006 (2011).
 [3] L. Steels, A selforganizing spatial vocabulary, Artificial Life 2, 319332 (1995).
 [4] A. Baronchelli, M. Felici, V. Loreto, E. Caglioti and L. Steels, Sharp transition towards shared vocabularies in multiagent systems, J. Stat. Mech. P06014 (2006)).
 [5] T. Satterfield, Toward a Sociogenetic Solution: Examining Language Formation Processes Through SWARM Modeling, Social Science Computer Review 19, 281295 (2001).
 [6] M. Nakamura, T. Hashimoto and S. Tojo, Language Dynamics, Lecture Notes in Computer Science 5457, 614625 (2009).
 [7] P. Strimling, M. Parkvall and F. Jansson, Modelling the evolution of creoles, in “The Evolution of Language: Proceedings of the 9th International Conference (EVOLANG9)”, World Scientific Publishing Company, pp 464465 (2012).
 [8] A. Baronchelli, L. Dall’Asta, A. Barrat, V. Loreto, Nonequilibrium phase transition in negotiation dynamics, Phys. Rev. E 76, 051102 Í(2007Í).
 [9] S. S. Mufwene, The ecology of language evolution, Cambridge: Cambridge University Press (2001).
 [10] L. Steels, Modeling the cultural evolution of language, Physics of Life Reviews 8, 339356 (2011).
 [11] S. S. Mufwene, Population movements and contacts in language evolution, Journal of Language ContactTHEMA 1 (2007).