Complexity Reduction in the Negotiation of New Lexical Conventions
Abstract
In the process of collectively inventing new words for new concepts in a population, conflicts can quickly become numerous, in the form of synonymy and homonymy. Remembering all of them could cost too much memory, and remembering too few may slow down the overall process. Is there an efficient behavior that could help balance the two? The Naming Game is a multiagent computational model for the emergence of language, focusing on the negotiation of new lexical conventions, where a common lexicon selforganizes but going through a phase of high complexity. Previous work has been done on the control of complexity growth in this particular model, by allowing agents to actively choose what they talk about. However, those strategies were relying on ad hoc heuristics highly dependent on finetuning of parameters. We define here a new principled measure and a new strategy, based on the beliefs of each agent on the global state of the population. The measure does not rely on heavy computation, and is cognitively plausible. The new strategy yields an efficient control of complexity growth, along with a faster agreement process. Also, we show that shortterm memory is enough to build relevant beliefs about the global lexicon.
Keywords: language emergence, active learning, multiagent model, control of complexity growth
1 Motivations
Lexical conventions constitute an important element of social interactions. They can emerge, evolve, or be learnt within a population, without necessarily having a centralized control. In other words, they can be negotiated through local interactions between individuals. In practice, this happens continuously in human societies, being the spread of new words and conventions, the acquisition of those conventions by infants or other learners, or even the emergence of new forms of communication. Despite the high complexity of the processes involved, humans deal with these issues quite efficiently.
Learning of high complexity tasks in individuals can in general be facilitated by an active control of the complexity of learning situations , often driven by intrinsic motivation, like for example maximization of the learning progress [Gottlieb2013, baldassarre2013, barto2013, oudeyer07]. This type of mechanism is also argued to be an evolutionary advantage for cognitive abilities [oudeyer_evolution], and can also be found in lexicon acquisition at the individual level [Partridge2015]. But does it have a significant impact on populationwide learning and conventions negotiation dynamics?
The Naming Game [steelskaplan1998_1, Wellens2012, Loreto2011, ke2002self] is an adapted framework to test this hypothesis. It is a class of multiagent models of language emergence and evolution, where pairs of randomly selected individuals try to communicate by referring to some predefined meanings using words. At the beginning, they do not share any convention about wordmeaning associations. Through repeated decentralized interactions, a common lexicon selforganizes. However, the process can be slow and pass through a highcomplexity phase where agents memorize a lot of conflictual information, in the form of synonyms and homonyms.
It has already been shown that active learning mechanisms can increase convergence speed towards a shared lexicon in different language emergence models [Cornudella2015, Schueller2016]. The main idea behind those mechanisms is to allow agents to actively choose the topic of their communication, based on information collected during their past interactions and driven by control of complexity growth. However, the algorithms used so far are based on ad hoc heuristics, constrained interaction scenarios and can depend heavily on finetuning of parameters themselves depending on population size and number of words and meanings.
In previous work, an approximation of the global state is built by each agent using the information of past interactions, in the form of an average vocabulary of the population [oliphant1997, devylder2007]. Is it possible to design a new principled algorithm for an active topic choice based on such a representation? Could decisions be driven by both the compatibility of an agent’s own lexicon with this average vocabulary, and a reduction of both their complexities? Such an algorithm should rely on a time scale for the memory of past interactions: Indeed, in the case of uncentralized negotiation of a lexicon, conflictual conventions will necessarily appear and have to be forgotten in order to converge to a functional global vocabulary. Remembering them could slow down the selforganization process.
In this work, we define a principled measure of correlation between an agent’s lexicon and a local approximation of the average lexicon of the population. We build a strategy driven by the maximization of this value without being computationally hard, to be cognitively plausible. We study and discuss the impact of this strategy on convergence time and complexity growth, depending on a time scale used for memory.
2 Methods
2.1 The Naming Game
We define here precisely the Naming Game model that we used (see fig.1 for an overview). We need to explicit:
 [noitemsep,nolistsep]

The interaction scenario itself

How agents represent their lexicon

How they update their lexicon at the end of each interaction
It is a simple modification of the standard Naming Game scenario [Loreto2011, Wellens2012].
Interaction process
We reuse a previously defined interaction process called Speaker’s Choice [Schueller2016]. It allows one of the interacting agents, called the speaker, choose actively the topic of the interaction.
Each interaction involves two agents, that are picked randomly from the population. One of them is assigned the role speaker, and the other the role hearer. The speaker chooses a topic and picks up a word for this topic. If it does not have a word associated so far to the meaning used as topic, it just invents a new meaningword association. It utters this word, which is interpreted by the hearer as a meaning, if it knows this word. If the interpreted meaning is the same as the topic, i.e. the meaning intended by the speaker, the communication is considered successful. Otherwise, it is considered a failure. See fig.2 for a detailed illustration of the interaction process.
Vocabulary Representation
Vocabularies, or lexicons, are a set of associations between meanings and words. In this work, we consider only a finite set of meanings and a finite set of words . In this context, vocabularies can be represented as associations matrices, where each row corresponds to a meaning, and each column to a word. This representation has been extensively used in related work [oliphant1997, steelskaplan1998_1, ke2002self]. Two parts of the lexicon are distinguished, the coding or production part, which maps a meaning to a set of words weighted by probabilities of usage, and a decoding or interpretation part, mapping a word to a set of meanings that can be interpretated from this word, also weighted by probabilities.
We represent the vocabulary of an agent as a matrix of size , with values of for each wordmeaning association used by the agent. Each agent starts with an empty vocabulary, a matrix filled with zeros. The coding matrix and decoding matrix are derived from by normalizing respectively over rows and columns:
(1) 
Normalization factors are used only if . In practice, when coding a meaning , a word is sampled using the distribution . When decoding a word , a meaning is interpreted, sampled from the distribution . In our case, these distributions are uniform either on the set of words associated to for coding, or on the set of meanings associated to for decoding. Those 2 sets change over time, during the vocabulary update.
Vocabulary Update Policy
At the end of each interaction, each agent takes into account the result of the interaction by modifying its lexicon. There exists various policies that have been described and studied in previous work [Wellens2012]. We are using the one called Minimal Naming Game.
In this policy, updates work this way: when the communication fails, both agents add the used wordmeaning association (meaning used as a topic by the speaker, and word uttered by the speaker) to their lexicon, and do nothing if they already had it. If the communication is successful, not only do they add this association to their respective lexicons, they also remove any conflicting synonyms and homonyms. See fig.3 for an illustration of the update policy in both cases.
Typically, among existing policies, Minimal NG and another one called Basic Lateral Inhibition are used: they are more realistic as they allow synonymy/homonymy and yield faster agreement. Moreover, Minimal NG has been shown to yield similar dynamics as Basic Lateral Inhibition, yet being simple and not depending on any parameter, while the latter depends on 3. This is the reason why we are using the Minimal NG as vocabulary update policy.
2.2 Measures
The selforganization process happening while simulating the Naming Game has complex dynamics, and goes through various states before reaching global consensus. We talk about those dynamics as a convergence process, towards the state where all agents share the exact same lexicon, with exactly one word for each meaning without synonymy and homonymy. This state is stable, lexicons will not change anymore whatever are the modalities of the interaction – which agent is the speaker, which is the hearer, and which meanings and words are used. Convergence and stability for different types of Naming Games has been proved analytically [devylder2007]. In this paper, we do not focus on whether the model converges or not, but on the speed and complexity properties of the dynamics before convergence. Measures for each of those aspects, used to describe the system while in this intermediate state, were defined in previous work [Loreto2011]. We distinguish local measures –accessible to each agent– from global measures, computed on the whole population.
TCS: Theoretical Communicative Success
The Theoretical Communicative Success is a measure of distance to the fully converged state. First, for each meaning, we can consider the probability of having a successful communication when using this meaning as a topic, given a state of the population. The TCS is the average of those probabilities, over all possible meanings. In the case of Random Topic Choice, this measure coincides with the general probability of having a successful interaction. By definition, it is a global measure, not accessible to individual agents. To retrieve its value, we can either estimate it using a snapshot of the population and a Monte Carlo method with random topic choice, or compute it. To detail the exact computation formula, we need to first define the probability of success between two given vocabularies of agents and . As detailed in the previous section, a vocabulary has 2 components: a coding part, used to find words associated to a meaning, and a decoding part, used to find meanings associated to a word. For vocabulary , we would then have the 2 matrices and . If A is the speaker and B the hearer, A is coding and B decoding, hence the formula of the probability of success in this case, averaged over all possible meanings:
(2) 
Because before an interaction we do not necessarily know which agent will be the speaker and which will be the hearer, the 2 situations (A speaker and B hearer / B speaker and A hearer) are to be considered, as equiprobable. The final value is the mean of and .
To scale up to population level, one can compute an average vocabulary for the whole population , and then the probability of success for an interaction between this lexicon and itself. For a large enough population, this value is indeed a good approximation of the probability of success. is an elementwise average of the lexicon matrices of all agents.
When using random topic choice, this value abruptly goes from 0 to 1 after a certain number of interactions. These dynamics can be seen on fig.4, where the random topic choice is represented – among active strategies that are explained in a following section. In practice, we use Monte Carlo estimation for the values at population level over time, and the exact computation for the active topic choice strategy (see following section), as it requires more precision and the population vocabulary is already built.
Local Complexity
The starting state of an agent’s vocabulary is empty (allzero matrices), and the end state is identical coding and decoding matrices, with exactly one distinct word per meaning. But between those 2 situations, through which states goes the vocabulary? How much conflictual information (synonymy and homonymy) has to be considered?
For each agent, we can define a local complexity measure, by counting the number of distinct associations present in the vocabulary. In our case, this is exactly the sum of all elements of the matrix . At the beginning of a simulation, while the vocabulary is empty, this measure equals 0. At the end, its value is the number of meanings . When using random topic choice, there is a fast growth to a maximum, before a slow decrease to the final value (can be seen in fig.4). This measure is nearly proportional to the minimal memory needed to represent the lexicon (as a sparse matrix or a list of wordmeaning associations), and therefore should remain low in a cognitively plausible situation.
2.3 Active Topic Choice Strategy
The main contribution of this work is the definition of an active strategy for the choice of the topic in each interaction, by comparison to the usual choice of picking meanings randomly (with a uniform distribution over the space of meanings). The strategy has to be local, i.e. use only information available to the agent, namely its own vocabulary and results of past interactions it was involved in.
To both converge quickly and control complexity, behavior should be driven by maximization at each interaction of the Theoretical Communicative Success. However, this value is a global measure, therefore not accessible at agent level. Agents only sample information about the global state of the population, or the average vocabulary , through their interactions as hearer or speaker.
The strategies for active topic choice found in previous work are separated in two levels of decision [Schueller2016]. First, a decision between exploring a new meaning (that is associated to no words in the vocabulary so far) and choosing (exploiting) a meaning among those already used before. Then, if exploiting, deciding which known meaning to use depending on past interaction results.
The strategy introduced in this work keeps those two levels, while basing both decisions on a new measure called Local Approximated Probability of Success (LAPS), using a local representation of .
LAPS: Local Approximated Probability of Success
Here, we define an approximation of , , using information sampled by agents during their interactions. We construct independently the coding and decoding parts and . For every meaning (and every word ), we use a sliding window over the recent past interactions – of maximal length , the time scale parameter– and count the number of times it is associated to each word (or meaning ). This value divided by is the local estimation of the probability of an other agent coding using (or decoding as ). With this, we retrieve the values of both matrices and .
Let be the memory of the past interactions where was the topic, if there has been such interactions. denotes the word used during the interaction of the agent using the meaning . We can now build :
(3) 
Similarly, by defining be the memory of the past interactions where was the topic, with such interactions, we can build :
(4) 
Until interactions have been done with a given meaning or word, and do not sum to . The remaining probability weight is assumed to be associated with failure. If we would normalize to , with a single interaction an agent would already estimate as 100% sure that the same wordmeaning association would be used again with the same topic for example. Without the normalization, this happens only after interactions. In other words, this reflects lack of information due to small sample size. We define a Local Approximated Probability of Success, or a local equivalent of the Theoretical Communicative Success for an agent with vocabulary :
(5) 
For some vocabulary update policies called lateral inhibition, similar matrices are computed, but used directly as an agent’s own representation of the lexicon. This usage does not prevent the complexity burst [Wellens2012].
Exploration vs. Exploitation
The first choice of our new strategy is between exploring new meanings or exploiting already known ones. Exploration should happen when agents are confident enough about their agreement with the rest of the population over their known meanings [Schueller2016]. The LAPS measure in itself is a measure of confidence, and the simplest way to take this into account is to only explore when reaching the maximum value where is the number of known meanings and the total number of meanings in the world. This value can actually be reached, thanks to the sliding window of parameter .
MultiArmed Bandit
The second decision process concerns the exploitation part, when picking the topic among the known meanings. We designed a behavior driven by the increase of the LAPS measure. In other words, agents seek the meaning that would yield the greatest increase of LAPS. However, computing the expectancy of this value is hard computationally speaking, and therefore not suitable for a model of a cognitive process. We can only consider the process a black box, where following a decision between a finite set of options, a reward value is obtained. This is exactly the definition of the MultiArmed Bandit problem, associated to a class of reinforcement learning algorithms that have been extensively studied [bubeck2012]. The name comes from an analogy with a person trying to maximize their gain while facing a set of slotmachines (also called one armed bandit), and being able to use only one at a time. The probability distribution of the reward of each machine is unknown, and the player has to both collect information by playing and exploit the highest rewarding machine – with limited knowledge of its reward distribution – hence keep balance between exploration and exploitation. In our problem, we can see known meanings as the possible arms, and the reward . Our case is quite specific, as: 1) distributions are non stationary, 2) they depend on past choices, 3) and the number of arms grows over time (and starts at 0). This specific situation led us to choose an algorithm, where weights associated to each arm undergo a decay over time, which let them stay at the same order of magnitude of the initial weights of new arms [bclement2015mab]. Our algorithm depends on 2 parameters: integrated balance between rewarddriven exploitation and random exploration between arms through the parameter , and time scale for the decay of weights. As a reward, we consider the increase of LAPS yielded by the interaction, , or if the latter is negative in order to avoid negative weights. See algorithm 1.
3 Results
For all simulations, we set ===, compute up to 80,000 interactions and take the mean over 8 trials. The situation = is the most constrained and complex to solve, as synonymy and homonymy are more probable. We ran simulations for , and set =. For the exploration rate, if the condition is respected, the actual value of does not matter much, as its only function is to avoid rare cases where some weights reach a value of 0 and cannot be selected anymore. We set =. However, we also ran simulations with pure random choices at the bandit level, to be able to study the influence of each level of our algorithm. This case identifies as =.
The evolution of the TCS and complexity over time is represented on fig.4, for several values of the time scale . They are compared on the same plots with Random Topic Choice. We can see that convergence is faster for low values of , the fastest being for =, which is 4 times faster than Random Topic Choice. As for complexity, for all configurations excepted = values stay below the final level 40. After reaching a first threshold, they increase linearly with time, the slope being smaller for higher values of . For =, the maximum value is only half of the maximum reached by Random Topic Choice. It is understandable that = is an outlier: in this case LAPS is an autocorrelation with the current interaction, by definition older interactions are not taken into account.
On fig.5, we can see the dependency of convergence time on the parameter , plotted for configurations =, = and the value for Random Topic Choice as a reference. Both have dynamics consistently faster than Random Topic Choice for low values of , however = performs better. Excepted for =, convergence time increases linearly with for both, with a minimum at =, and a smaller slope for =.
4 Discussion
Results show that the new strategy presented in this paper 1) allows fast convergence, 2) controls efficiently complexity growth, 3) its dynamics are consistent and highly correlated with parameter change, 4) the 2 levels of the algorithm each contribute to the increased performance.
With =2, each agent on average only speaks 15 times about each meaning before convergence (i.e. less than half the population), and information has already been both conveyed between all agents and disambiguated. The linearity of the evolution of TCS and complexity lets think that this algorithm may as well scale efficiently to other values of , and . Compared to previous work, this topic choice algorithm is more robust, and optimal parameters are easier to find. It generalized well to Minimal Naming Game and can be used for all other Naming Game models.
LAPS is coherent from a cognitive point of view, and corresponds to an actual internal confidence about quality of communication with the rest of the population. As stated in the results section, the case = is an outlier, being a simple autocorrelation with the current interaction. The optimal value = is then the lowest possible value taking into account past interactions, i.e. takes the lowest possible memory, which is therefore credible for humans. Further work will be needed to determine for which values of N, M and W =2 stays the optimal value.
Acknowledgments
The IdEx program (Univ. de Bordeaux) allowed W. Schueller to visit V. Loreto. We thank Miguel Ibañez Berganza and Benjamin Clément for the fruitful discussions.
Source code
The Python code used for the simulations of this paper is available as open source software: https://github.com/wschuell/notebooks_cogsci2018