Measuring non-trivial compositionality in emergent communication
Compositionality is an important explanatory target in emergent communication and language evolution. The vast majority of computational models of communication account for the emergence of only a very basic form of compositionality: trivial compositionality. A compositional protocol is trivially compositional if the meaning of a complex signal (e.g. blue circle) boils down to the intersection of meanings of its constituents (e.g. the intersection of the set of blue objects and the set of circles). A protocol is non-trivially compositional (NTC) if the meaning of a complex signal (e.g. biggest apple) is a more complex function of the meanings of their constituents. In this paper, we review several metrics of compositionality used in emergent communication and experimentally show that most of them fail to detect NTC — i.e. they treat non-trivial compositionality as a failure of compositionality. The one exception is tree reconstruction error, a metric motivated by formal accounts of compositionality. These results emphasise important limitations of emergent communication research that could hamper progress on modelling the emergence of NTC.
Compositionality is an important explanatory target in emergent communication and language evolution as well as a goal in representation learning and natural language processing (Brighton, 2002; Lake et al., 2016; Chaabouni et al., 2020). The vast majority of computational models of communication account for the emergence of only a very basic form of compositionality: trivial (Steinert-Threlkeld, 2020) or naïve compositionality (Kharitonov and Baroni, 2020). Natural languages are non-trivially compositional (NTC) as they include phenomena like quantifiers, negation, word order and context-dependence. A communication protocol is NTC if for a certain complex signal (e.g. biggest apple) its meaning is not just the intersection of the meanings of its constituents but some more complex function of the constituents. Despite being a necessary milestone towards accounting for the evolution of language, NTC has received little attention from the machine learning and evolutionary linguistics communities. We conjecture that this state of affairs is partly due to an inability to quantitatively measure progress toward NTC in computational settings.
In this study, we review seven metrics of compositionality and experimentally show that most fail to detect NTC — i.e. they treat non-trivial compositionality as a failure of compositionality. We do observe, however, that properly parametrised tree reconstruction error (TRE) (Andreas, 2019) — a metric directly motivated by a formal account of compositionality (Montague, 1970) — detects NTC to a significant degree.
To summarise, the contributions of this paper are (i) providing a common framework for comparing different approaches to measuring compositionality in emergent communication, (ii) experimentally showing that most metrics used in machine learning and evolutionary linguistics fail to detect NTC, and (iii) demonstrating how to parametrise TRE to be able to detect NTC. The anonymised code for all experiments and reusable implementations of all metrics are publicly available from https://anonymous.4open.science/r/79c16150-99df-49de-a67a-e75ce8c7dba9/.
Most research on emergent communication has focused on a Lewis signalling game of the following form: a sender sends a message to a receiver upon observing a situation and the receiver acts based upon the message (Lewis, 1969; Skyrms, 2010). If the incentives of the sender and the receiver are aligned, they will agree on a communication protocol. To simplify the problem of measuring compositionality, let us assume that the situations observed by the sender are governed by an underlying compositional structure (known to us but hidden from the sender) and let us focus on the communication protocol itself, understood as a mapping from those hidden compositional structures (henceforth called derivations) to messages (henceforth called representations) as shown in Figure 1.
A derivation can be thought of as a tree representing a situation. Derivations are defined recursively such that if and are derivations, then is a derivation, where is a derivation composition function. A primitive derivation is called a concept. For instance, a blue circle corresponds to a derivation built out of two concepts: blue and circle. In an emergent communication setting, that derivation can be the structure underlying an RGB image observed by the sender.
Let us have a set of representations . A representation can be thought of as a description of a derivation. Representations can be composed together, e.g. , where and is a representation composition function. A primitive representation is called a symbol. Finally, a communication protocol is a function mapping derivations to representations.
In this paper, we will consider three perspectives of what kind of object a representation is:
From a communication perspective, is the set of messages understood as strings over an alphabet , i.e. . Then, corresponds to string concatenation. For instance, .
From a semantic perspective, is the set of meanings associated with derivations. We will assume meanings to be sets of objects such as circles or boxes. Then, corresponds to a function over sets (e.g. set intersection). For instance, .
From a geometric perspective, is a vector space. Then, corresponds to vector addition. For instance, , where .
While we ultimately care about the communication perspective, defining the distinction between trivial compositionality and NTC requires the semantic perspective and measuring compositionality in terms of TRE requires approximating the semantic perspective in terms of the geometric perspective.
Intuitively, a communication protocol embodied by is compositional if the space of representations is homeomorphic to the space of derivations. More formally, is compositional if the following holds:
In other words, the composition function over representations mirrors the composition function over derivations : for each derivation obtained by applying operator its image can be obtained by applying a corresponding operator .
This mathematical model of compositionality, originally constructed by Montague (1970) using universal algebra, is the dominant approach in formal semantics (see (Janssen, 2010) for a review). In the context of emergent communication, this model was recently explicitly assumed by Andreas (2019) and Steinert-Threlkeld (2020).
Trivial and non-trivial compositionality
Let us take the semantic perspective and assume to be a set of sets of objects. Then, a communication protocol is trivially compositional (TC) if the representation composition function is set intersection. Alternatively, is NTC if is a more complex function over sets of objects (Steinert-Threlkeld, 2020).
Most signalling games studied in machine learning and evolutionary linguistics are confined to TC communication protocols. For instance, a communication protocol defined over objects with shapes and colours would probably be TC, with the meaning of a message blue circle being the intersection of the set of circle with the set of blue objects (Mordatch and Abbeel, 2017; Kottur et al., 2017; Korbak et al., 2019). On the other hand, a great deal of natural language semantics is NTC. For instance, the meaning of the phrase good cook is not the intersection of the set of cooks with the set of good people. Rather, the adjective good is highly contextual and complements the meaning of the noun cook differently than it complements the meaning of, for example, the noun climber.
Tree reconstruction error
TRE is a metric of compositionality proposed by Andreas (2019) and directly motivated by Montague’s account of compositionality embodied in (1). First, assume there is a distance function over representations . Then we can define a compositional approximation of with parameters as follows:
In other words, assigns each an embedding vector and composes these vectors using for complex derivations. can be a non-parametric vector operation, e.g. addition, or a parametric transformation, e.g. a linear transformation. The parameters (embedding vectors for concepts and possible parameters of ) are optimised so we have
The irreducible distance given the optimal parameters is the TRE.
Unlike in a signalling game, while optimising we do have explicit access to the underlying derivation . Therefore what TRE measures is how well a given communication protocol can be reconstructed while respecting the compositional structure of . A compositional protocol satisfying (1) will by definition respect , and hence can be reconstructed perfectly.
In our experiments, we consider a well-studied signalling game in which the sender observes objects endowed with two discernible features: shape and colour. The corresponding derivations are ordered tuples of two kinds of concepts: shapes and colours, e.g. . The set of primitive derivations consists of 25 colours and 25 shapes. We take the set of representations to be a set of strings of length over a finite alphabet, i.e. , where .
We consider nine pre-defined communication protocols suitable for solving the signalling game defined above: one TC, six NTC (entangled, diagonal, negation, rotated, context-sensitive) and two non-compositional baselines (random and holistic). We designed these protocols as minimal models of NTC phenomena found in natural languages and formal languages: negation (e.g. not circle), conversational context (e.g. requiring only shape to be communicated), word order (ab is different from ba) and entanglement in the representation learning sense (Kharitonov and Baroni, 2020). These probing protocols are thus aligned with linguistic intuitions (and with existing literature, whenever possible) about what constitutes (trivial or non-trivial) compositionality. For a detailed description of all communication protocols considered in this experiment, see appendix A.
We then consider seven metrics of compositionality used by the machine learning community and report how they score the protocols. The metrics considered are TRE, conflict count, topographic similarity, BOW disentanglement, generalisation, positional disentanglement and context independence. For TRE, we implemented , the composition functions for -dimensional vector representations of derivations, as linear transformation — i.e. , where are vector-encoded symbols and are learnable parameters. We describe experiments with other implementations of in appendix C. For a detailed descriptions of used metrics, see appendix B.
The results of the evaluation are displayed in Figure 2. We can observe that while (almost) all protocols assign high compositionality scores to the TC protocol and low compositionality scores to non-compositional (holistic and random) protocols, most also assign low scores to NTC protocols. Generalisation, somewhat in line with recent results (Chaabouni et al., 2020), is low for some NTC protocols (negation, context-sensitive, entangled, diagonal, rotated), which suggests that generalising to NTC requires higher capacity and/or stronger inductive biases than generalising to TC. Context independence, topographical similarity, positional disentanglement, BOW disentanglement and conflict count can pick up only the simplest forms of NTC such as negation and and order-sensitivity. TRE is the only metric that assigns high scores to all NTC protocols.
We conjecture that the reason most metrics fail to capture NTC is because they were designed under the assumption of TC as the canonical form of compositionality. This assumption, however, seems to be guided by a simplified signalling game setup rather than formal accounts of compositionality (Janssen, 2010) or corpus data. That design choice may, in turn, stem from a more general problem of designing meaningful metrics for emergent communication (Lowe et al., 2019) and translating theoretical accounts in linguistics into quantitative measures. To illustrate one such translation problem, if we do not restrict the composition function to be of a particular class, any language may be considered compositional (see appendix C).
Despite these difficulties, NTC is ubiquitous in natural languages. Phenomena such as function words and dependency relations (Rizzi and Cinque, 2016) demonstrate that primitive concepts cannot be treated as completely orthogonal (Murphy, 1988) and natural languages use more than one form of symbol composition (Gärdenfors, 1995). Our aim in this paper was to introduce NTC as an explanatory target for emergent communication and to demonstrate how to measure it in terms of TRE. We hope these contributions will guide future work accounting for the emergence of NTC and closing the gap between emergent and human communication.
The field of emergent communication constitutes basic research and is focused on a theoretical problem: the emergence of language. However, the problem of learning compositional representations and understanding compositional language has broader implications for natural language processing and representation learning. Concrete problems that can be posed as emergent communication include image captioning (Kottur et al., 2017)) and unsupervised machine translation (Lee et al., 2017) (both of which can be considered visually grounded communication) as well as explainability (Andreas et al., 2017). Research on learning compositional representations also informs the development of natural language processing technologies such as semantic parsing and machine reasoning (Hudson and Manning, 2018). These systems are vulnerable to bias, permit malicious use and can give rise to unintended adverse effects. On the other hand, the conjectured interpretability and robustness of compositional representations could improve the transparency and fairness of machine learning systems that utilise such representations, as well as advance progress on conversational systems that empower disadvantaged individuals.
Tomasz Korbak, Julian Zubek and Joanna Rączaszek-Leonardi were funded by a National Science Centre (Poland) grant OPUS 2018/29/B/HS1/00884. The authors are grateful to Krzysztof Główka, Łukasz Kuciński and Paweł Kołodziej for their helpful feedback.
Appendix A Communication protocols used in the experiments
Recall that given an observation based on derivation the sender sends a message composed of symbols :
Unless specified otherwise, , , where are primitive observations for shape and colour. Then, each message has a form .
The communication protocols are defined as follows.
Trivially compositional (TC) protocol
The TC protocol was constructed by assuming a fixed one-to-one mapping between concepts and symbols, e.g. blue with a and circle with b and generating a message for each shape–colour pair by concatenating associated symbols, e.g. .
In animal communication research, a holistic communication system is one in which messages (e.g. ) only have meaning as a whole and their parts ( and ) are meaningless without context. Hence, we construct a holistic communication protocol by uniformly sampling a pair of symbols (without replacement) from for each .
We construct the random protocol by sampling each symbol separately, with replacement, i.e. for each each .
Entangled NTC protocol
We use an example provided by Kharitonov and Baroni (2020), a compositional protocol that refers to complex properties of the objects constructed as combinations of basic concepts.
Let us also assume that both concepts and symbols are represented by non-negative integers from finite fields closed under modular addition and subtraction, e.g. . Then we have:
The non-triviality of here stems from the fact that it entangles shapes and colours, so that both and depend on both the shape and the colour. Kharitonov and Baroni (2020) consider this protocol to be a counter-example to naïve compositionality, which is essentially what we mean by NTC.
Rotated NTC protocol
The rotated protocol is similar to the entangled protocol. It is obtained by encoding concepts numerically, rotating the coordinate system by 45 degrees and then mapping obtained values to symbols in the order they appear on the newly obtained axes. Under the same assumptions as the entangled protocol above, the concept–symbol relationship is defined as:
Order-sensitive NTC protocol
This protocol is constructed analogously to the TC protocol with one difference: each symbol is used to communicate both a colour and a shape. For instance, d means blue when on the first position in the message and square when on the second position. This protocol is NTC because it constrains to be non-commutative.
Context-sensitive NTC protocol
This protocol is based on the game described by Barrett et al. (2018). We modify our setup so the derivations are nested tuples . context can be colour, shape or both. This corresponds to a signalling game when the sender is also provided with information which concept (shape and/or colour) must be communicated to the receiver and disincentivised from communicating too much. This protocol is NTC because message length is a function of context.
Negation NTC protocol
The negation protocol is based on the intuition that negation constitutes a minimal instance of NTC in natural language semantics. To illustrate this, we modify our setup by assuming that there are just two shapes, circle and box, and the agent only has a word x to refer to the box. However, it also uses a symbol ! as a negation and can refer to circles as ‘not boxes’, i.e. !x. The part of the message communicating the shape is then trivially composed, with the second part communicating colour. The non-triviality of this protocol lies in the fact that the meaning of !x is a nontrivial function of the meaning ! and x. The semantics of this protocol cannot be formulated in terms of set intersection, because there is no set that could constitute the meaning of the negation token !.
Diagonal NTC protocol
This protocol reflects a language with two-word utterances, where one word represents intensity and the second certain property or axis of variability (examples from natural language: ‘low brightness’, ‘high contrast’, ‘medium volume’). In this example we refer to a complex property being a combination of basic concepts.
Let us assume both concepts and symbols are represented by non-negative integers: , , , . An object embodying concepts , is represented with a pair of symbols :
Appendix B Compositionality metrics
Compositionality is widely considered to be the feature of language and thought that explains the generalisation capabilities of humans (Chomsky, 1957; Lake et al., 2016). While recent research in emergent communication shows that the relationship between compositionality and generalisation is nuanced (Chaabouni et al., 2020; Kharitonov and Baroni, 2020) and in some signalling games compositionality is not necessary for generalisation, generalisation to novel situations remains an intuitive hallmark of compositionality.
Here we measure the test set accuracy of a receiver trained to predict the ground-truth derivations based on messages send by a fixed sender . More concretely, we implement the receiver as a neural networks that first embeds each symbol of a message into a 50 embedding vector, feeds each of these embedding to a single-layer LSTM (Hochreiter and Schmidhuber, 1997) and then feeds the last hidden state vector of the LSTM into a two-layer feed-forward neural network. The output of the network is a tuple of categorical distributions over all concepts in the derivation. The loss function consists is a sum of cross-entropy errors for for all concepts. The neural network is implemented in PyTorch (Paszke et al., 2017) using EGG (Kharitonov et al., 2019). We train it using Adam (Kingma and Ba, 2014) with learning rate and batch size 1. We use regularisation with coefficient and initialise the embedding vectors by sampling from .
For the purpose of our experiments, we split the set of derivations into (80% of the derivations) and (20% of the derivations). We train the receiver on for 100 epochs or until it achieves training set accuracy 1. We then measure the accuracy of the receiver on . The reported accuracies are averaged across five random seeds.
Chaabouni et al. (2020) introduced positional disentanglement as an adaptation of similar metrics developed in the representation learning community (Chen et al., 2019). It is also related to context independence and residual entropy introduced by Resnick et al. (2020). Let denote the -th symbol of a message , and the concept with the highest mutual information with , and with the second highest mutual information:
where is mutual information and . Then, positional disentanglement posdis is defined as
where is the maximum message length and is entropy over the distribution of symbols at -th place in messages for each . We ignore positions with zero entropy.
Note that positional disentanglement assumes that compositionality involves fixed order (e.g. the meaning of symbol a at first place is different from the meaning of symbol a at second place in the message). Bag-of-words disentanglement relaxes this assumption by only considering symbols counts: is the number of occurrences in -th symbol in a message. Then, bag-of-words disentanglement, bosdis, is defined as
where is the number of symbols available in the protocol.
Tree reconstruction error
We define to be a neural network so we can optimise its parameters via gradient descent over . More concretely, to generate a reconstruction of a derivation , we follow (2) and first embed each concept forming into an -dimensional embedding vector, where (with the maximum message length, fixed in advance). Then, we encode the entire into an -dimensional embedding vector by recursively applying in a bottom-up manner. The ground truth message corresponding to derivation is encoded as one-hot vectors. We then define to be a sum of cross-entropy errors between -th segment of the reconstruction and -th one-hot-encoded symbol in the ground truth message . The neural network was implemented in PyTorch (Paszke et al., 2017). We train it for 1000 epochs using Adam (Kingma and Ba, 2014) with learning rate . We use regularisation with coefficient and initialise the embedding vectors by sampling from .
Context independence (Bogin et al., 2018) measures the alignment between symbols forming a message and concepts forming a derivations. Let us denote the set of concepts by and the set of symbols by . By , we mean the probability that maps a derivation containing concept to a message containing symbol . We define the inverse probability similarly. Finally, we define ; is the symbol most often sent in presence of a concept . Then, context independence metric is ; the expectation is taken with respect to the joint uniform distribution .
For instance, when the derivation consists of a shape and a colour, our experiments, context independence measures the consistency of associating symbols with shapes irrespective of colour and vice versa. Note that context independence effectively punishes the agents for using synonyms, i.e. associating multiple symbols with a single concept (Lowe et al., 2019).
Topographical similarity (Brighton and Kirby, 2006; Lazaridou et al., 2018) is a measure of structural similarity between messages and derivations. Let us define to be a distance over derivations and to be a distance over messages. Topographical similarity is the Spearman correlation of and measured over a joint uniform distribution . Topographical similarity mirrors the approach known as representation similarity analysis in systems neuroscience (Kriegeskorte, 2008) where it is used to quantify structural similarity between a stimulus and neural activity evoked by the stimulus
We choose to be the Levenshtein (1966) distance and treat derivations as ordered pairs of concepts so we can choose to be the Hamming distance.
Conflict count was introduced by Kuciński et al. (2020). It assumes that the number of concepts in a derivation is equal to message length and that there is a one-to-one mapping between each concept and symbol . It then counts how often this mapping is violated.
Let us denote each permutation mapping the position of a symbol to the position of concepts as , where . Then, let us denote the principal meaning of a symbol at position as , where
Here denotes the indicator function and the -th concept in a derivation . Then, conflict count is
Because conflict count assumes the number of concepts in a derivation to be equal to message length , it is undefined for two protocols violating this assumption: negation and context sensitive.
Appendix C Effect of composition function in TRE
In this additional experiment, we analyse the effect of various implementations of (the composition function for -dimensional vector representations of derivations ) on TRE scores across protocols. We consider three implementations of :
Additive composition, where is vector addition:
Linear composition, where is a linear transformation:
where are learnable parameters.
Non-linear composition, where is a two-layer feedforward neural network:
Here , , , and denotes the size of the hidden layer. We choose .
The results of the experiments are presented in Figure 3. While additive and linear composition perform similarly, the model capacity of non-linear composition is probably too strong for the task, resulting in severe overfitting (e.g. low TRE even for random and holisitc protocols) and a false negative for the context-sensitive protocol. The presented results were stable across hyperparameters of TRE (e.g. learning rate, weight decay coefficient, number of epochs).
- Translating Neuralese. Cited by: Broader impact.
- Measuring Compositionality in Representation Learning. International Conference on Learning Representations. Note: arXiv: 1902.07181 External Links: Cited by: §1, §2, §2.
- Hierarchical Models for the Evolution of Compositional Language. Institute for Mathematical Behavioral Sciences Technical Report MBS 18-03. Cited by: Appendix A.
- Emergence of Communication in an Interactive World with Consistent Speakers. arXiv:1809.00549 [cs]. Note: arXiv: 1809.00549 External Links: Cited by: Appendix B.
- Understanding Linguistic Evolution by Visualizing the Emergence of Topographic Mappings. Artificial Life 12 (2), pp. 229–242 (en). External Links: Cited by: Appendix B.
- Compositional Syntax From Cultural Transmission. Artificial Life 8 (1), pp. 25–54 (en). External Links: Cited by: §1.
- Compositionality and Generalization In Emergent Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 4427–4442 (en). External Links: Cited by: Appendix B, Appendix B, §1, §3.
- Isolating Sources of Disentanglement in Variational Autoencoders. arXiv:1802.04942 [cs, stat]. Note: arXiv: 1802.04942 External Links: Cited by: Appendix B.
- Syntactic structures. (English). Note: OCLC: 934673149 External Links: Cited by: Appendix B.
- Language and the Evolution of Cognition. (eng). Note: ISSN: 1101-8453 External Links: Cited by: §4.
- Long Short-Term Memory. Neural Computation 9 (8), pp. 1735–1780 (en). External Links: Cited by: Appendix B.
- Compositional Attention Networks for Machine Reasoning. Note: _eprint: 1803.03067 Cited by: Broader impact.
- Compositionality. In Handbook of Logic and Linguistics, Cited by: §2, §4.
- Emergent Language Generalization and Acquisition Speed are not tied to Compositionality. arXiv:2004.03420 [cs]. Note: arXiv: 2004.03420 External Links: Cited by: Appendix A, Appendix A, Appendix B, §1, §3.
- EGG: a toolkit for research on Emergence of lanGuage in Games. arXiv:1907.00852 [cs]. Note: arXiv: 1907.00852 External Links: Cited by: Appendix B.
- Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. Note: arXiv: 1412.6980 External Links: Cited by: Appendix B, Appendix B.
- Developmentally motivated emergence of compositional communication via template transfer. NeurIPS 2019 workshop Emergent Communication: Towards Natural Language. External Links: Cited by: §2.
- Natural Language Does Not Emerge ’Naturally’ in Multi-Agent Dialog. arXiv:1706.08502 [cs]. Note: arXiv: 1706.08502 External Links: Cited by: §2, Broader impact.
- Representational similarity analysis – connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience. External Links: Cited by: Appendix B.
- Emergence of compositional language in communication through noisy channel. Cited by: Appendix B.
- Building Machines That Learn and Think Like People. arXiv:1604.00289 [cs, stat]. Note: arXiv: 1604.00289 External Links: Cited by: Appendix B, §1.
- Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input. arXiv:1804.03984 [cs]. Note: arXiv: 1804.03984 External Links: Cited by: Appendix B.
- Emergent Translation in Multi-Agent Communication. Note: _eprint: 1710.06922 Cited by: Broader impact.
- Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, pp. 707. Cited by: Appendix B.
- Convention: a philosophical study. Nachdr. edition, Blackwell, Oxford (eng). Note: OCLC: 837747718 External Links: Cited by: §2.
- On the Pitfalls of Measuring Emergent Communication. arXiv:1903.05168 [cs, stat]. Note: arXiv: 1903.05168 External Links: Cited by: Appendix B, §4.
- Universal grammar. Theoria 36 (3), pp. 373–398 (en). External Links: Cited by: §1, §2.
- Emergence of Grounded Compositional Language in Multi-Agent Populations. In AAAI, Cited by: §2.
- Comprehending Complex Concepts. Cognitive Science 12 (4), pp. 529–562 (en). External Links: Cited by: §4.
- Automatic differentiation in PyTorch. Cited by: Appendix B, Appendix B.
- Capacity, Bandwidth, and Compositionality in Emergent Language Learning. arXiv:1910.11424 [cs, stat]. Note: arXiv: 1910.11424 External Links: Cited by: Appendix B.
- Functional Categories and Syntactic Theory. Annual Review of Linguistics 2 (1), pp. 139–163 (en). External Links: Cited by: §4.
- Signals: evolution, learning, & information. Oxford University Press, Oxford ; New York. Note: OCLC: ocn477256653 External Links: Cited by: §2.
- Towards the Emergence of Non-trivial Compositionaliy. Philosophy of Science. Cited by: §1, §2, §2.