A Mathematical Model for Linguistic Universals
Inspired by chemical kinetics and neurobiology, we propose a mathematical theory for pattern recurrence in text documents, applicable to a wide variety of languages. We present a Markov model at the discourse level for Steven Pinker’s “mentalese”, or chains of mental states that transcend the spoken/written forms. Such (potentially) universal temporal structures of textual patterns lead us to a language-independent semantic representation, or a translationally-invariant word embedding, thereby forming the common ground for both comprehensibility within a given language and translatability between different languages. Applying our model to documents of moderate lengths, without relying on external knowledge bases, we reconcile Noam Chomsky’s “poverty of stimulus” paradox with statistical learning of natural languages.
We human beings distinguish ourselves from other animals (?, ?, ?), in that our brain development (?, ?, ?) enables us to convey sophisticated ideas and to share individual experiences, via languages (?, ?, ?). Texts written in natural languages constitute a major medium that perpetuates our civilizations (?), as a cumulative body of knowledge. The quantitative mechanism underlying the mental faculties of language has long been a difficult problem for anthropologists, linguists, neurobiologists and psychologists (?, ?, ?, ?, ?), before attracting the attention of computer and data scientists (?, ?, ?, ?, ?, ?), in the recent wave of artificial intelligence. Instead of marveling at the partial success of data-hungry approaches (?, ?, ?, ?) to machine learning, we still crave for a cost-effective, interpretable and universal algorithm for understanding natural languages—one that mimics language acquisition and knowledge accumulation during early childhood, based on limited resources, as in Chomsky’s “poverty of stimulus” scenario (?, ?). Without filling the gap of data sizes, one cannot satisfactorily answer nativists’ criticism (?) against empiricists’ statistical models for natural languages.
Rising to the challenges outlined above, we perform a detailed mathematical analysis for computable “linguistic universals”—statistical patterns common to a wide range of human languages. On the theoretical side, we will present a stochastic “mentalese” model that depicts the timecourse of Markov states behind individual concepts. On the practical side, we will demonstrate (through automated word translation and question answering) that word’s meaning can be numerically characterized by moderate-sized Markov neural networks, even when there is relatively scant data input.
Our Markov model explains, up to acceptably small error margins, how our innate language faculties (nature) may help us understand the world, by connecting dots of our past experiences (nurture), irrespective of our mother tongue. Bridging nature to nurture, our stochastic algorithm for Markov neural semantics reconciles the views of nativists and empiricists.
Languages differ in their phonemic repertoires (“elementary particles” in Jakobson’s (?) terms), word morphologies (“atoms”) and syntactic structures (“molecules”), corresponding to the three short time scales (phonological processing level, lexical level, and sentence level) in the Friederici hierarchy (?), which are mapped to different brain regions in functional magnetic resonance imaging (fMRI). These three Friederici scales exhibit no universal linguistic patterns and bear no semantic significance. Ferdinand de Saussure’s foundational work (?) rules out semantic dependence on phonological representation (except for a limited set of onomatopoeias), while the inherent meaning of a word is affected by neither its morphological parameters (say, singular vs. plural, present vs. past) nor its syntactic rôles (say, subject vs. object, active vs. passive).
Based on the foregoing arguments, one might speculate that universal semantic content, or Pinker’s “mentalese” (?), may only exist at the discourse level (“bulk materials”, if we extrapolate Jakobson’s (?) metaphor), namely, on the longest time scale in Friederici’s neurobiological hierarchy (?). In this work, we turn such a qualitative speculation into a quantitative model (?). Concretely speaking, we observe the following statistical features of textual patterns (clusters of words that are morphologically related, see Fig. 1 and Fig. LABEL:fig:recurrenceB for examples) shared by many languages in common:
The recurrence behavior of most textual patterns is consistent with time series generated by a certain Markov process, on the longest, as opposed to the shortest (?), neuro-linguistic time scale;
Recurrence kinetics of a given concept nearly remains independent of the language in which it is expressed;
Kinetic data quantify the semantic distance between different textual patterns, thus allowing us to construct semantic fields by statistical computations.
These long-range temporal features of documents written in various languages, in our opinion, point to a universal kinetic mechanism that defines the semantic rôles of individual nodes in a web of words, mathematically and linguistically.