Semantic Structure and Interpretability of Word Embeddings

Semantic Structure and Interpretability of
Word Embeddings

Lütfi Kerem Şenel, İhsan Utlu Veysel Yücesoy, Aykut Koç, Tolga Çukur ASELSAN Research Center, Ankara, Turkey
Electrical and Electronics Engineering Department, Bilkent University, Ankara, Turkey
UMRAM, Bilkent University, Ankara, Turkey
Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey
Email: {lksenel,iutlu,vyucesoy,aykutkoc},
* T. Çukur and A. Koç mutually supervised this work under a joint industry-university co-advising program.

Dense word embeddings, which encode semantic meanings of words to low dimensional vector spaces have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions, which makes interpretation a big challenge. In this study, we propose a statistical method to uncover the latent semantic structure in the dense word embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings; the proposed method is a practical alternative to the classical word intrusion test that requires human intervention.

I Introduction

Words are the smallest elements of a language with a practical meaning. Researchers from different areas such as linguistics [1], computer science [2] and statistics [3] have developed models that seek to capture “word meaning” so that these models can be used in a diverse set of NLP tasks such as parsing, word sense disambiguation and machine translation. Most of the effort in this field have a basis on the distributional hypothesis [4] which claims that “a word is characterized by the company it keeps”. Building from this idea, several vector space models such as well known Latent Semantic Analysis (LSA) [5] and Latent Dirichlet Allocation (LDA) [6] that make use of word distribution statistics have been proposed in distributional semantics. Although these methods have been commonly used in NLP, more recent techniques that generate dense, continuous, low dimensional vectors, called embeddings, have been receiving increasing interest in NLP research. Approaches that learn embeddings include neural network based predictive methods [2, 7] and count-based matrix-factorization methods [8]. Word embeddings brought about significant performance improvements in many intrinsic NLP tasks such as analogy task or clustering words based on semantic similarity, as well as downstream NLP tasks such as part-of-speech (POS) tagging [9], named entity recognition [10], word sense disambiguation [11], sentiment analysis [12] and cross-lingual studies [13].

Although high levels of success have been reported in many NLP tasks using word embeddings, the individual embedding dimensions are commonly considered to be uninterpretable [14]. Contrary to some earlier sparse vector space models such as Hyperspace Analogue to Language (HAL) [15], what is represented in each dimension of a word embedding is often unclear rendering them a black-box approach. In contrast, embedding models that yield dimensions that are more easily interpretable in terms of the captured information, can be better suited for NLP tasks that require semantic interpretation, including named entity recognition and retrieval of semantically related words. Model interpretability is also becoming increasingly relevant from a regulatory standpoint, as evidenced by the recent EU regulation which grants people with a right to explanation regarding automatic decision making algorithms [16].

Although word embeddings are currently dominating NLP research, research studies predominantly aim to maximize task performance, e.g., on benchmark tests such as MEN [17] or Simlex-999 [18]. While improved test performance can be beneficial, an embedding with enhanced performance does not necessarily reveal any insight about the semantic structure captured by the embedding. A systematic assessment of the semantic structure intrinsic to word embeddings would enable an improved understanding of this popular approach, allow for comparisons among different embeddings in terms of interpretability, potentially motivating new research directions.

In this study, we aim to bring light to the semantic concepts implicitly represented by various dimensions of a word embedding. To explore these hidden semantic structures, we leverage the category theory [19] that defines a category as a grouping of concepts with similar properties. We use human-designed category labels to ensure that our results and interpretations closely reflect human judgements. Human interpretation can make use of any kind of semantic relation among words to form a semantic group (category). This does not only significantly increase the number of possible categories but also make defining a category more difficult and subjective. Although several lexical databases such as [1] have a representation for relations among words, they do not provide categories as needed for this study. Since there are no gold standard for semantic word categories to the best of our knowledge, we introduce a new category dataset where more than 6500 different words are grouped in 110 semantic categories. Then, we propose a method based on distribution statistics of category words within the embedding space in order to uncover the semantic structure of the dense word vectors. We apply some quantitative and qualitative tests to substantiate our method. Finally, we claim that the semantic decomposition of the embedding space can be used to quantify the interpretability of the word embeddings without requiring any human effort unlike the word intrusion test [20].

This paper is organized as follows: Following a discussion of related work in Section II, we describe our methods in Section III. In this section we introduce our dataset, describe methods we used to investigate the semantic decomposition of the embeddings, to validate our findings and to measure the interpretability. In Section IV we present the results for our experiments and finally we conclude the paper in Section V.

Ii Related Work

In the word embedding literature, the problem of interpretability has been approached via several different routes. A group of studies aimed to obtain effective and more interpretable vector spaces by either applying sparse matrix factorization techniques such as non-negative matrix factorization (NMF) to the matrices derived from co-occurrence [22, 23] or by applying transformations onto standard word embeddings [24, 25, 26]. Some other studies attempted to interpret the non-negative sparse embeddings (NNSE) [21].

[24] and [25] proposed to use different sparse coding techniques to learn sparse, higher dimensional and interpretable vector spaces from conventional dense word embeddings. However, since the projection vectors that are used for the transformation are learned from the word embeddings in an unsupervised manner, they do not have labels describing the corresponding semantic categories. Moreover, these studies do not attempt to enlighten the dense word embedding dimensions, rather they learn new high dimensional sparse vectors which perform well on specific tests such as word similarity and polysemy detection. Both papers evaluated the interpretability of the vector spaces they learned using word intrusion test introduced in [20]. In [26], interpretability is quantified by the degree of clustering around embedding dimensions and orthogonal transformation is examined to increase interpretability while preserving the performance of the embedding. However it is also shown in [26] that total interpretability of embedding is constant under any orthogonal transformation, it can only be redistributed across the dimensions.

Instead of learning transformations for well-known embeddings, [22] and [23] suggested algorithms based on non-negative matrix factorization (NMF) for directly learning sparse, interpretable word vectors from matrices derived from co-occurrence where interpretability is evaluated using word intrusion test.

As an attempt to elucidate the hidden semantic structure within non-negative sparse embedding space, [21] used categorized words from HyperLex dataset[27] and quantified the interpretability levels of dimensions based on the average values of the vectors of the words from categories. However HyperLex is constructed based on a single type of semantic relation (hypernym) and average number of words representing a category is significantly low () preventing a comprehensive analysis.

Iii Methods

To address the limitations of the approaches discussed in Section II, in this study we introduce a new conceptual category dataset that is rich in terms of words and categories it includes and also we propose statistical approaches to capture the hidden semantic concepts in word embeddings and to measure the interpretability of th embeddings.

Iii-a Dataset

Understanding the hidden semantic structure in the dense word embeddings and providing insight about the interpretation of their dimensions are among the main objectives of this study. Since embeddings are formed via unsupervised learning on unannotated large corpora, some conceptual relationships that humans anticipate may be missed and some that humans do not anticipate may be formed in the embedding space as discussed in [28]. Thus, not all clusters obtained from a word embedding space will be interpretable. Therefore using the clusters in the dense embedding space might not be optimal to obtain interpretable embeddings. This observation also indicates the need for human judgement in evaluating the interpretability.

To provide meaningful interpretations for the dimensions, we refer to the category theory [19] where concepts with similar semantic properties are grouped under the same category. As mentioned earlier, using clusters from the embedding space as categories may not reflect human expectations thoroughly, hence having a basis based on human judgements is essential for investigating and evaluating interpretability. In that sense, semantic categories designed by humans can be considered as a gold standard for categorization tasks since they directly reflect human expectations. Therefore, using supervised categories can enable a proper investigation of the word embedding dimensions. In addition, by comparing the human-categorized semantic concepts with the unsupervised word embeddings, one can acquire an understanding of what kind of concepts can or cannot be captured by the current state-of-the-art embedding algorithms.

In the literature, the concept of category is commonly used to indicate super-subordinate (hyperonym-hyponym) relations where words in a category are types or examples of that category. For instance, “furniture” category includes words like “bed”, “table” and many other furniture names. HyperLex category dataset [27], which is used in [21] to investigate embedding dimensions, is constructed based on this type of relation that is also the most frequently encoded relation among sets of synonymous words in the WordNet database [1]. However, there are many other types of semantic relations such as meronymy (part-whole relations), antonymy (opposite meaning words), synonymy (words having the same sense) and cross-Part of Speech (POS) relations (i.e. lexical entailments). Although Wordnet provides representations for several types of these relations, how to construct categories from this information it is not clearly determined (i.e. what should be considered as a category, how many categories there should be, how narrow or broad they should be, which words they should contain). In addition to these, humans can group words by doing inference based on common properties such as color, shape, material, size or speed which increases the number of possible groups unboundedly. For instance ’sun’, ’lemon’ and ’honey’ are similar in terms of color; ’spaghetti’, ’limousine’ and ’sky-scanner’ are considered as tall; ’snail’, ’tractor’ and ’tortoise’ are slow.

In short, diverse types of semantic relationships or properties can be leveraged by humans for semantic interpretation. Therefore, to investigate the semantic structure of the word embedding space using categorized words, we need categories that represent a broad variety of distinct concepts and distinct types of relations.

max width= SEMCAT HperLex Number of Categories 110 1399 Number of Unique Words 6559 1752 Average Word Count per Category 91 2 Standard Deviation of Word Counts 56 3

TABLE I: Summary Statistics of SEMCAT and HyperLex

max width= Science Sciences Art Car Cooking Geography atom astronomy abstract auto bake africa cell botany artist car barbeque border chemical economics brush convertible boil capital data genetics composition hybrid dough cartography element linguistics draw jeep grill continent evolution neuroscience masterpiece limo juice earth laboratory psychology photograph runabout marinate east microscope taxonomy perspective rv oil gps scientist thermodynamics sketch taxi roast river theory zoology style van serve sea

TABLE II: 10 sample words from each of the 6 representative SEMCAT categories.

To the best of our knowledge, there is no word category dataset that is constructed to represents possible groups of related words humans can form. What we have found closest to the required dataset is the online categorized that are constructed for educational purposes. There are originally 168 categories in the website. By manual inspection, categories consisting of words that are not semantically related but share a common property such as their POS tagging (verbs, adverbs, adjectives etc.) or being compound words, are filtered out. Also several categories such as “Chinese New Year” and “Good Luck Symbols.”, which we consider as too specific to be included in our dataset, are also removed from the dataset. Vocabulary is limited to the most frequent 50,000 words, where frequencies are calculated from English Wikipedia, and words that are not contained in this vocabulary are also removed from the dataset. We call the resulting semantically grouped word dataset “SEMCAT” (SEMantic CATegories). Summary statistics of SEMCAT and HyperLex datasets are given in Table I. 10 sample words from each 6 representative SEMCAT categories are given in Table II.

Iii-B Semantic Decomposition

In this study we use GloVe [8] as the source algorithm for learning dense word vectors. The entire content of English Wikipedia is utilized as the corpus. In the preprocessing step, all non-alphabetic characters (punctuations, digits, etc.) are removed from the corpus and all letters are converted to lowercase. Letters coming after apostrophes are taken as separate words (she’ll becomes she ll). The resulting corpus is fed to the GloVe algorithm. Window size is set to 15, vector length is chosen to be 300 and minimum occurrence count is set to 20 for the words in the corpus. Default values are used for the remaining parameters. The word embedding matrix, , is obtained from Glove after limiting vocabulary to the most frequent 50,000 words in the corpus ( is ). The GloVe algorithm is again used for the second time on the same corpus generating a second embedding space, , to examine the effects of different initializations of the word vectors before the training.

To discover which semantic concept or concepts are captured by a particular dimension, [21] first calculate the average of the projections of the category words onto each non-negative sparse embedding dimension, and then set an empirical threshold on this average. However, using the average values of the category words without zero-centering the embedding dimensions may not be informative for the well-known dense word embeddings. For instance average value of the words from the “jobs” category in the dimension of is around 0.4, which is very close to the average value across all vocabulary for this dimension. Therefore, for words from “jobs” category having an average around 0.4 in dimension may not be really special since 0.4 is our expectation for the average of any random set of words.

To quantify the significance of word embedding dimensions for different semantic categories, one should first understand how a semantic concept can be captured by a dimension, then should find a suitable metric that accounts for it. [21] assumed that a dimension represents a semantic category if the average value of the category words for that dimension is above an empirical threshold, and took that average value as the representational power of the dimension for the category. If interpretability is considered as a requirement for a dimension to capture a semantic concept (i.e. a dimension captures a concept if and only if it is interpretable), then it is convenient to use averages after zero centering the embedding dimensions. This is because interpretability requires words with common properties (category words) to have maximum values for a dimension. However, it would be misleading to consider only the average value or separation from the general mean to understand how important a dimension is for representing a category. Let us consider a category with 50 words that are semantically related to each other. In the word embedding space they might not be tightly clustered if their semantic relations are not apparent. However, their relations might be captured in some of the word embedding dimensions. If all category words have similar values in a particular dimension, it can be deduced that the dimension captures their semantic relation.

From a statistical perspective, the question of “How strong a particular concept is encoded in an embedding dimension” can be interpreted as “How much information can be extracted from a word embedding dimension regarding a particular concept”. If the words representing a concept (i.e. words in a SEMCAT category) are sampled from the same distribution with all vocabulary words, then the answer would be zero since the category would be statistically equivalent to a random selection of words from the vocabulary. For a dimension, if denotes the distribution from which words of a particular category are sampled and denotes the distribution from which all other vocabulary words are sampled, then the distance between distributions and will be proportional to the information that can be extracted from that dimension regarding that particular category. Based on this argument, Bhattacharya distance [29] with normal distribution assumption is a suitable metric, which is given in Equation 1, to quantify the level of encoding in the word embedding dimensions.


In Equation 1 is a Bhattacharya distance matrix, which can also be considered as category weight matrix, is the vector of the category words, and is the vector of all the other vocabulary words for the word embedding dimension. and are the mean and the standard deviation operations, respectively. Values in can range from (if and have the same means and variances) to . In general a better separation of category words from remaining voabulary words in a dimension results in larger elements for the corresponding dimension.

Based on SEMCAT categories, for the learned embedding matrices and , the category weight matrices ( and ) are calculated using Bhattacharya distance metric (1).

Iii-C Validation

If the weights in truly correspond to the categorical decomposition of the semantic concepts in the dense embedding space, then can also be considered as a transformation matrix that can be used to map word embeddings to a semantic space where each dimension is a semantic category. However, it would be erroneous to directly multiply the word embeddings with category weights. The following steps should be performed in order to map word embeddings to a semantic space where dimensions are interpretable.

  1. To make word embeddings compatible with the category weights, word embedding dimensions are standardized () such that each dimension has zero mean and unit variance since category weights have been calculated based on the deviations from the general mean (second term in Equation 1) and standard deviations (first term in Equation 1).

  2. Category weights are normalized across dimensions such that each category has a total weight of 1 (). This is necessary since some columns of dominate the others in terms of representation strength (discussed in Section IV in more detail). This inequality across semantic categories can cause an undesired bias towards categories with larger total weights in the new vector space. normalization of the category weights across dimensions is performed to prevent this bias.

  3. Word embedding dimensions can encode semantic categories in both positive and negative directions which contribute equally to the Bhattacharya distance. However, since encoding directions are important for the mapping of the word embeddings, is replaced with its signed version () where negative weights correspond to encoding in the negative direction.

Then, interpretable semantic vectors () are obtained by multiplying with .

One can reasonably suggest to simply use the centers of the vectors of the category words as the weights for the corresponding category. A second interpretable embedding space, , is then obtained by simply projecting the word vectors in to the category centers.

To confirm that is a reasonable semantic decomposition of the dense word embedding dimensions, is indeed an interpretable semantic space and our proposed method produces better representations for the categories than their center vectors, and are further investigated via qualitative and quantitative approaches.

If represents the semantic distribution of the word embedding dimensions, then columns of and should correspond to semantic categories. In other words, each word vector in and should represent the semantic decomposition of the word in terms of the categories in SEMCAT. To confirm this anticipation, word vectors of four sample words (“window”, “bus”, “soldier” and “article”) from the two semantic spaces ( and ) are qualitatively investigated.

To compare weights in with category centers, we also define a quantitative test that aims to measure how well the weights represent the corresponding categories. It is natural to expect that words should have high values in dimensions that correspond to the categories they belong to since weights are calculated directly using vectors of these words. However, one can notice that using words that are in the categories to investigate the performance of the weights is similar to using training error to evaluate the performance from machine learning perspective. Using validation error (or accuracy) is more convenient than using training error to see how well the model generalizes to new, unseen data which, in our case, correspond to words that do not belong to any category. For the validation process, we randomly select 60% of the words for training and use the remaining 40% for testing for each category. To compare the two methods, we calculate category weights from the training words based on the Bhattacharya distance and also by using centers of the category words as weights. Then, by selecting the largest weights for each category () and projecting dense word vectors onto these weights we obtain interpretable semantic spaces (i.e. and for each ). Afterwards, for each category, we calculate the percentage of the unseen test words that are among the top , and words (excluding the training words) in their corresponding dimensions in the new spaces, where is the number of test words that varies across categories. We calculate the final accuracies as the weighted average of the accuracies across the dimensions in the new spaces where the weighting is proportional to the number of test words within the categories. We repeat the same procedure for 10 independent random selections of the training words.

Iii-D Measuring Interpretability

In addition to investigating semantic distribution in the embedding space, a dataset consisting of words that are grouped into categories by humans can be also used to quantify the interpretability of the word embeddings. In various studies, [22, 23, 20], interpretability is evaluated using word intrusion test. In that test, for each word embedding dimension, a word set is generated including the top 5 words in the dimension and a noisy word (intruder) from bottom half of the dimension that is also in the top ranks of a separate dimension. Then, human editors are asked to determine the intruder word within the generated set. The editors’ performances are used to quantify the interpretability of the embedding. Although evaluating interpretability based on human judgements is a logical approach, word intrusion is an expensive method since it requires human effort for each evaluation. Furthermore, the word intrusion test does not quantify the interpretability levels of the embedding dimensions, instead it yields a binary decision as to whether a dimension is interpretable or not. However, using continuous values for measuring the interpretability of individual dimensions is more convenient than making binary evaluations since interpretability levels may vary gradually across dimensions.

We propose a framework that addresses both of these issues by providing automated, continuous valued evaluations of the interpretability while keeping the basis of the evaluations as human judgements. The basic idea behind our evaluation is that, humans interpret dimensions by trying to group the most distinctive words in the dimensions (i.e. top or bottom words) which is also implied by the word intrusion test. Based on this observation, it can be noticed that if a dataset represents all the possible groups humans can form, then instead of applying to human evaluation, one can simply check whether the distinctive words are coming from any of these groups. As discussed earlier, number of possible groups humans can form is not bounded, therefore it is not possible to have a perfect dataset. However, we claim that a dataset with large enough number of categories can provide a good approximation to human judgements. Based on this argument, we propose a simple method to quantify the interpretability of the embedding dimensions.

We define two interpretability scores for a dimension category pair as:


where is the interpretability score for the positive direction and is the interpretability score for the negative direction for the dimension ( where is the dimensionality of the embedding) and category ( where is the number of categories in the dataset). is the set representing the words in the category, is the number of the words in the category and is the set of distinctive words located at the top () and bottom () of the embedding dimension. is the number of words taken from the edges where is the parameter determining how strict the interpretability definition is. Smallest value for is 1 which corresponds to the most strict interpretability definition and larger values relax the interpretability definition by increasing the range of the category words. is the intersection operator between category words and edge words and is the cardinality operator (number of elements) for the intersecting set.

We take the maximum of scores in the positive and negative directions as the overall interpretability score for a category (). The interpretability score of a dimension is then taken as the maximum of the category interpretability scores across that dimension (). Finally, we calculate the interpretability score of the embedding () as the average of the dimension interpretability scores.


We test our method on the GloVe embedding space, the semantic spaces and and a random space where word vectors are generated by sampling from a normal distribution. Interpretability scores for the random space are taken as our baseline.

Iv Results

Iv-a Semantic Decomposition

Fig. 1: Semantic category weights () for 110 categories and 300 embeddings dimensions obtained using Bhattacharya distance. Weights vary between 0 (represented by black) and 0.63 (represented by white). It can be noticed that some dimensions represent larger number of categories than others do and also some categories are represented stronger by more dimensions than others.
Fig. 2: Total representation strengths of 110 semantic categories from our dataset. Bhattacharya distance scores are summed across dimensions and then sorted. Red line represents the total score for the category constructed by randomly selected 91 words which is the average word count of SEMCAT categories. “Metals” category has the strongest total representation among SEMCAT categories due to relatively few and well clustered words it contains.

Semantic distribution calculated using the method introduced in Section III-B is displayed in Figure 1. First inference one can make upon examining this figure is that representation of semantic concepts are widely distributed across many dimensions in the GloVe embedding space which also implies uninterpretability of the space.

In addition to this observation, it can be noticed that total representation strengths of the categories across dimensions are quite different, some columns in Figure 1 are significantly brighter than others. In fact, total representation strength of a category greatly depends on its characteristics. If a particular category corresponds to a very specific semantic concept with relatively small number of category words such as “metals” or “months”, category words tend to be well clustered in the embedding space. This tight grouping of category words results in large Bhattacharya distances in most dimensions indicating stronger representation of the category. On the other hand, if words from a semantic category are weakly related, it is more difficult for the word embedding to encode their relations, which results in relatively widespread vectors for these words. Large variances of category word embeddings generally lead to smaller Bhattacharya distances indicating that semantic category does not have a strong representation in these embedding dimensions. Total representation strengths of the semantic categories are shown in Figure 2 where the baseline is given by the red horizontal line representing total weight of the category with randomly selected 91 words (average word count of categories). “Metals” category has the strongest total representation among SEMCAT categories due to relatively few and well clustered words it contains.

To have a closer look at the semantic behavior of dimensions and categories, let us investigate the decompositions of three different dimensions and three specific semantic categories (“Math”, “Animal” and “Tools”). Plots in the upper row of Figure 3 display the categorical decomposition of the , and dimensions of the word embedding. It can be seen that while dimension represent a particular category (“sciences”) significantly stronger than other categories, dimension focuses on 3 different categories and dimension encodes many different categories with comparable strengths. A similar observation can be made from category perspective as presented in the right column of Figure 3. While only a few dimensions are dominant for representing the ”Math” semantic category, semantic encoding of ”Tools” category is distributed to most of the embedding dimensions.

Fig. 3: Categorical decompositions of , and word embedding dimensions are given in the left column. A dense word embedding dimension may focus on a single category (top row), may represent a few different categories (bottom row) or may represent many different categories with low strength (middle row). Dimensional decompositions of ”Math”, ”Animal” and ”Tools” categories are shown in the right column. Semantic information about a category may be encoded in a few word embedding dimensions (top row) or it can be distributed in many of the dimensions (bottom row).

For the weights calculated for our second GloVe embeddings space , where the only difference between and is the independent random initializations of the word vectors before the training, we observe similar decompositions for the categories ignoring the order of the dimensions (similar number of peaks and similar total representation strength).

Iv-B Validation

Qualitative investigation of the semantic space is presented in Figure 5, where semantic decompositions of 4 different words, “window”, “bus”, “soldier” and “article”, are displayed using 20 dimensions of with largest values for each word. These words are expected to have high values in the dimensions that correspond to the categories to which they belong. However, we can clearly see from Figure 5 that additional categories such as “jobs”, “people”, “pirate”, “weapons” that are semantically related to “soldier” but that do not contain the word also have high values. Similar observations can be done for the other words supporting that the calculated weights generalize to non-category words.

Figure 6 presents the semantic decompositions of the same words obtained form that is calculated using the category center vectors. With this approach, categories related to the words also get higher scores than unrelated categories as shown in Figure 6. However it can also be noticed that resulting category scores for the words are closer to each other compared to implying that this approach is less discriminative for the categories than our method which is further investigated using quantitative test.

Resulting accuracies for the category word retrieval test are presented in Figure 4. As the results imply, the weights calculated using our method significantly outperform the weights from the category centers. It can be noticed that, using only 25 largest weights from our method gives higher accuracy than using all 300 weights from category centers. Another interpretation of these results is that the vectors we obtained for each category in the dense embedding space (i.e. columns of ) represent the categories better than their average vectors (category centers).

Fig. 4: Category word retrieval performances for top , and words where is the number of test words varying across categories. Category weights obtained using Bhattacharya distance represent categories better than the center of the category words. Using only 25 dimensions with largest Bhattacharya distances gives better performance than using centers of category words with all 300 dimensions.
Fig. 5: Semantic decompositions of the words ’window’, ’bus’, ’soldier’ and ’article’ for 20 highest scoring SEMCAT categories obtained from vectors in . Red bars indicate the categories that contain the word.
Fig. 6: Categorical decompositions of the words ’window’, ’bus’, ’soldier’ and ’article’ for 20 highest scoring categories obtained from vectors in . Red bars indicate the categories that contain the word.

Iv-C Measuring Interpretability

Figure 7 displays the interpretability scores of GloVe embedding, and the random embedding for varying values. As it can be seen, semantic space is significantly more interpretable than GloVe as justified in Section IV-B. We can also see that interpretability score of GloVe embedding is close to the random embedding representing the minimum interpretability level. As increases, interpretability scores also increase. can be considered as a design parameter adjusted according to the understanding of interpretability and also according to the confidence to the coverage of the dataset. For a dataset with larger number of categories (i.e. ), smaller values are more suitable. For SEMCAT dataset we find fit to get reasonable interpretability scores similar to using top 5 error for evaluating the performance of model which is commonly practiced.

As mentioned these scores are dataset dependent. They may change depending on the number of categories, what the categories are and also the coverage of the categories in the dataset. SEMCAT can be considered as an inclusive dataset, meaning that in general the categories in SEMCAT include words even if their semantic relations with that category are not direct. For instance, in addition to the words that directly relate to big, the “big” category also includes words such as “adult”, “titanic”, “mammoth”, “powerful”, “epic” and “substantial” that have relatively weak or secondary relations with the concept of “big”. This property of SEMCAT makes getting higher interpretability scores (i.e. ) more difficult. If a different dataset is constructed by only including strongly related words or words whose relations are easier to capture for an embedding algorithm then it is possible to obtain higher interpretability scores for the same .

Fig. 7: Interpretability scores for GloVe, , and random embeddings for varying values. Semantic spaces and is significantly more interpretable than GloVe as expected. outperforms in the interpretability which supports that weights calculated with our proposed method represent categories better than their centers of the categories. Interpretability scores of Glove are close to the baseline (Random) implying the uninterpretability of the dense word embedding.

In order to compare the interpretability of different embeddings, one can use category weights of the dimensions calculated using Bhattacharya distance. However, since Bhattacharya distance does not have an upper bound, it is difficult to evaluate the significance of the difference between two scores. For instance, if an embedding has the interpretability score 10 and another has 30 that are calculated using category weights, all one can deduce from these numbers is that second embedding is more interpretable than the first one since the scale of these numbers can be nonlinear.

V Conclusion

In this paper, we propose a statistical method to uncover the latent semantic structure in dense word embeddings. Using Bhattacharya distance and a new dataset (SEMCAT) we introduced that contains more than 6500 words semantically grouped under 110 categories, we provide a semantic decomposition of the word embedding dimensions and verify our findings using qualitative and quantitative tests. We also introduce another method to quantify the interpretability of the word embeddings based on our dataset that can replace the word intrusion test that requires human effort. There is a lot more work to be done to improve the word embeddings, and we hope that the framework we proposed in this study will enable researchers to have a better understanding of the embedding their model generates and to evaluate the interpretability without the need for human intervention


  • [1] G. A. Miller, “Wordnet: a lexical database for english,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
  • [2] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  • [3] A. Bordes, X. Glorot, J. Weston, and Y. Bengio, “Joint learning of words and meaning representations for open-text semantic parsing,” in Artificial Intelligence and Statistics, 2012, pp. 127–135.
  • [4] Z. S. Harris, “Distributional structure,” Word, vol. 10, no. 2-3, pp. 146–162, 1954.
  • [5] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, “Indexing by latent semantic analysis,” Journal of the American society for information science, vol. 41, no. 6, p. 391, 1990.
  • [6] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning research, vol. 3, no. Jan, pp. 993–1022, 2003.
  • [7] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” arXiv preprint arXiv:1607.04606, 2016.
  • [8] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1532–1543. [Online]. Available:
  • [9] C.-C. Lin, W. Ammar, C. Dyer, and L. Levin, “Unsupervised pos induction with word embeddings,” arXiv preprint arXiv:1503.06760, 2015.
  • [10] S. K. Sienčnik, “Adapting word2vec to named entity recognition,” in Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania, no. 109.   Linköping University Electronic Press, 2015, pp. 239–243.
  • [11] I. Iacobacci, M. T. Pilehvar, and R. Navigli, “Embeddings for word sense disambiguation: An evaluation study.” in ACL (1), 2016.
  • [12] L.-C. Yu, J. Wang, K. R. Lai, and X. Zhang, “Refining word embeddings for sentiment analysis,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 545–550.
  • [13] L. K. Şenel, V. Yücesoy, A. Koç, and T. Çukur, “Measuring cross-lingual semantic similarity across european languages,” in 40th International Conference on Telecommunications and Signal Processing (TSP), 2017.
  • [14] O. Levy and Y. Goldberg, “Dependency-based word embeddings.” in ACL (2), 2014, pp. 302–308.
  • [15] K. Lund and C. Burgess, “Producing high-dimensional semantic spaces from lexical co-occurrence,” Behavior Research Methods, Instruments, & Computers, vol. 28, no. 2, pp. 203–208, 1996.
  • [16] B. Goodman and S. Flaxman, “European union regulations on algorithmic decision-making and a” right to explanation”,” arXiv preprint arXiv:1606.08813, 2016.
  • [17] E. Bruni, N.-K. Tran, and M. Baroni, “Multimodal distributional semantics.” J. Artif. Intell. Res.(JAIR), vol. 49, no. 2014, pp. 1–47, 2014.
  • [18] F. Hill, R. Reichart, and A. Korhonen, “Simlex-999: Evaluating semantic models with (genuine) similarity estimation,” Computational Linguistics, 2016.
  • [19] G. Murphy, The big book of concepts.   MIT press, 2004.
  • [20] J. Chang, S. Gerrish, C. Wang, J. L. Boyd-Graber, and D. M. Blei, “Reading tea leaves: How humans interpret topic models,” in Advances in neural information processing systems, 2009, pp. 288–296.
  • [21] K.-R. Jang and S.-H. Myaeng, “Elucidating conceptual properties from word embeddings,” SENSE 2017, p. 91, 2017.
  • [22] B. Murphy, P. Talukdar, and T. Mitchell, “Learning effective and interpretable semantic models using non-negative sparse embedding,” Proceedings of COLING 2012, pp. 1933–1950, 2012.
  • [23] H. Luo, Z. Liu, H.-B. Luan, and M. Sun, “Online learning of interpretable word embeddings.” in EMNLP, 2015, pp. 1687–1692.
  • [24] S. Arora, Y. Li, Y. Liang, T. Ma, and A. Risteski, “Linear algebraic structure of word senses, with applications to polysemy,” arXiv preprint arXiv:1601.03764, 2016.
  • [25] M. Faruqui, Y. Tsvetkov, D. Yogatama, C. Dyer, and N. Smith, “Sparse overcomplete word vector representations,” arXiv preprint arXiv:1506.02004, 2015.
  • [26] A. Zobnin, “Rotations and interpretability of word embeddings: the case of the russian language,” arXiv preprint arXiv:1707.04662, 2017.
  • [27] I. Vulić, D. Gerz, D. Kiela, F. Hill, and A. Korhonen, “Hyperlex: A large-scale evaluation of graded lexical entailment,” arXiv preprint arXiv:1608.02117, 2016.
  • [28] A. Gladkova, A. Drozd, and C. Center, “Intrinsic evaluations of word embeddings: What can we do better?” ACL 2016, p. 36, 2016.
  • [29] A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distribution,” Bull. Calcutta Math. Soc, 1943.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description