A Mathematical Model for Linguistic Universals

A Mathematical Model for Linguistic Universals

Weinan E, Yajun Zhou

Department of Mathematics & Program in Applied and Computational Mathematics,
Princeton University, Princeton, NJ 08544, USA
Beijing Institute of Big Data Research, Beijing 100871, P. R. China

Corresponding authors. E-mail: weinan@math.princeton.edu (W.E), yajun.zhou.1982@pku.edu.cn (Y.Z.)

Inspired by chemical kinetics and neurobiology, we propose a mathematical theory for pattern recurrence in text documents, applicable to a wide variety of languages. We present a Markov model at the discourse level for Steven Pinker’s “mentalese”, or chains of mental states that transcend the spoken/written forms. Such (potentially) universal temporal structures of textual patterns lead us to a language-independent semantic representation, or a translationally-invariant word embedding, thereby forming the common ground for both comprehensibility within a given language and translatability between different languages. Applying our model to documents of moderate lengths, without relying on external knowledge bases, we reconcile Noam Chomsky’s “poverty of stimulus” paradox with statistical learning of natural languages.

We human beings distinguish ourselves from other animals (?, ?, ?), in that our brain development (?, ?, ?) enables us to convey sophisticated ideas and to share individual experiences, via languages (?, ?, ?). Texts written in natural languages constitute a major medium that perpetuates our civilizations (?), as a cumulative body of knowledge. The quantitative mechanism underlying the mental faculties of language has long been a difficult problem for anthropologists, linguists, neurobiologists and psychologists (?, ?, ?, ?, ?), before attracting the attention of computer and data scientists (?, ?, ?, ?, ?, ?), in the recent wave of artificial intelligence. Instead of marveling at the partial success of data-hungry approaches (?, ?, ?, ?) to machine learning, we still crave for a cost-effective, interpretable and universal algorithm for understanding natural languages—one that mimics language acquisition and knowledge accumulation during early childhood, based on limited resources, as in Chomsky’s “poverty of stimulus” scenario (?, ?). Without filling the gap of data sizes, one cannot satisfactorily answer nativists’ criticism (?) against empiricists’ statistical models for natural languages.

Rising to the challenges outlined above, we perform a detailed mathematical analysis for computable “linguistic universals”—statistical patterns common to a wide range of human languages. On the theoretical side, we will present a stochastic “mentalese” model that depicts the timecourse of Markov states behind individual concepts. On the practical side, we will demonstrate (through automated word translation and question answering) that word’s meaning can be numerically characterized by moderate-sized Markov neural networks, even when there is relatively scant data input.

Our Markov model explains, up to acceptably small error margins, how our innate language faculties (nature) may help us understand the world, by connecting dots of our past experiences (nurture), irrespective of our mother tongue. Bridging nature to nurture, our stochastic algorithm for Markov neural semantics reconciles the views of nativists and empiricists.

Heuristic background

{happier, happily, happiness, happy}, {marriage, married, marry}

... LOREM IPSUM HAPPY DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING UNHAPPY ELIT, HAPPINESS SED HAPPY DO HAPPY EIUSMOD TEMPOR HAPPIER, INCIDIDUNT UT ...... LOREM IPSUM HAPPYHAPPINESST AMET, HAPPYHAPPINESSETUR ADIPISCING UNHAPPY ELIT, HAPPINESS SED HAPPY DO HAPPYHAPPINESS... LOREM IPSUM, MARRIAGE DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING MARRIED ELIT, MARRY SED HAPPILY DO HAPPILY EIUSMOD TEMPOR MARRIED INCIDIDUNT ...... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPYHAPPINESSETUR ADIPISCING UNHAPPY ELIT, UNHAPPY SED HAPPIER DO HAPPYHAPPINESS... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPY, CONSECTETUR ADIPISCING UNHAPPY ELIT, ... LOREM IPSUM HAPPINESS DOLOR SIT AMET, HAPPYHAPPINESS
Figure 1: Counting effective transitions between textual patterns. A transition from to is considered effective, if the underlined text fragment in between contains no occurrences of , and lasts longer than the longest word in . The reduced fragment length (measured in the number of letters, punctuation marks and white spaces) discounts the length of the longest word in . We count waiting times in , so as to ignore kinetic features (?) on the short time scales in the Friederici hierarchy, which may vary from language to language.

Languages differ in their phonemic repertoires (“elementary particles” in Jakobson’s (?) terms), word morphologies (“atoms”) and syntactic structures (“molecules”), corresponding to the three short time scales (phonological processing level, lexical level, and sentence level) in the Friederici hierarchy (?), which are mapped to different brain regions in functional magnetic resonance imaging (fMRI). These three Friederici scales exhibit no universal linguistic patterns and bear no semantic significance. Ferdinand de Saussure’s foundational work (?) rules out semantic dependence on phonological representation (except for a limited set of onomatopoeias), while the inherent meaning of a word is affected by neither its morphological parameters (say, singular vs. plural, present vs. past) nor its syntactic rôles (say, subject vs. object, active vs. passive).

Based on the foregoing arguments, one might speculate that universal semantic content, or Pinker’s “mentalese” (?), may only exist at the discourse level (“bulk materials”, if we extrapolate Jakobson’s (?) metaphor), namely, on the longest time scale in Friederici’s neurobiological hierarchy (?). In this work, we turn such a qualitative speculation into a quantitative model (?). Concretely speaking, we observe the following statistical features of textual patterns (clusters of words that are morphologically related, see Fig. 1 and Fig. LABEL:fig:recurrenceB for examples) shared by many languages in common:

  1. The recurrence behavior of most textual patterns is consistent with time series generated by a certain Markov process, on the longest, as opposed to the shortest (?), neuro-linguistic time scale;

  2. Recurrence kinetics of a given concept nearly remains independent of the language in which it is expressed;

  3. Kinetic data quantify the semantic distance between different textual patterns, thus allowing us to construct semantic fields by statistical computations.

These long-range temporal features of documents written in various languages, in our opinion, point to a universal kinetic mechanism that defines the semantic rôles of individual nodes in a web of words, mathematically and linguistically.

A B
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
384188
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description