Are Girls Neko or Shōjo? CrossLingual Alignment of NonIsomorphic Embeddings with Iterative Normalization
Abstract
Crosslingual word embeddings (clwe) underlie many multilingual natural language processing systems, often through orthogonal transformations of pretrained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For nonisomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language’s average vector is zero. Iterative Normalization consistently improves word translation accuracy of three clwe methods, with the largest improvement observed on EnglishJapanese (from 2% to 44% test accuracy).
1 Orthogonal CrossLingual Mappings
Crosslingual word embedding (clwe) models map words from multiple languages to a shared vector space, where words with similar meanings are close, regardless of language. clwe is widely used in multilingual natural language processing (Klementiev et al., 2012; Guo et al., 2015; Zhang et al., 2016). Recent clwe methods (Ruder et al., 2017; Glavas et al., 2019) independently train two monolingual embeddings on large monolingual corpora and then align them with a linear transformation. Previous work argues that these transformations should be orthogonal (Xing et al., 2015; Smith et al., 2017; Artetxe et al., 2016): for any two words, the dot product of their representations is the same as the dot product with the transformation. This preserves similarities and substructure of the original monolingual word embedding but enriches the embeddings with multilingual connections between languages.
Thus, many stateoftheart mappingbased clwe methods impose an orthogonal constraint (Artetxe et al., 2017; Conneau et al., 2018; AlvarezMelis and Jaakkola, 2018; Artetxe et al., 2018; Ruder et al., 2018; AlvarezMelis et al., 2019). The success of orthogonal methods relies on the assumption that embedding spaces are isomorphic; i.e., they have the same innerproduct structures across languages, but this does not hold for all languages (Søgaard et al., 2018; Fujinuma et al., 2019). For example, English and Japanese fastText vectors (Bojanowski et al., 2017) have different substructures around “girl” (Figure 1 left). As a result, orthogonal mapping fails on some languages—when Hoshen and Wolf (2018) align fastText embeddings with orthogonal mappings, they report 81% English–Spanish word translation accuracy but only 2% for the more distant English–Japanese.
While recent work challenges the orthogonal assumption (Doval et al., 2018; Joulin et al., 2018; Jawanpuria et al., 2019), we focus on whether simple preprocessing techniques can improve the suitability of orthogonal models. Our iterative method normalizes monolingual embeddings to make their structures more similar (Figure 1), which improves subsequent alignment.
Our method is motivated by two desired properties of monolingual embeddings that support orthogonal alignment: {enumerate*}
Every word vector has the same length.
Each language’s mean has the same length. Standard preprocessing such as dimensionwise mean centering and length normalization Artetxe et al. (2016) do not meet the two requirements at the same time. Our analysis leads to Iterative Normalization, an alternating projection algorithm that normalizes any word embedding to provably satisfy both conditions. After normalizing the monolingual embeddings, we then apply mappingbased clwe algorithms on the transformed embeddings.
We empirically validate our theory by combining Iterative Normalization with three mappingbased clwe methods. Iterative Normalization improves word translation accuracy on a dictionary induction benchmark across thirtynine language pairs.
2 Learning Orthogonal Mappings
This section reviews learning orthogonal crosslingual mapping between word embeddings and, along the way, introduces our notation.
We start with pretrained word embeddings in a source language and a target language. We assume^{1}^{1}1Word translation benchmarks use the same assumptions. all embeddings are dimensional, and the two languages have the same vocabulary size . Let be the word embedding matrix for the source language, where each column is the representation of the th word from the source language, and let be the word embedding matrix for the target language. Our goal is to learn a transformation matrix that maps the source language vectors to the target language space. While our experiments focus on the supervised case with a seed dictionary with translation pairs , the analysis also applies to unsupervised projection.
One straightforward way to learn is by minimizing Euclidean distances between translation pairs (Mikolov et al., 2013a). Formally, we solve:
(1) 
Xing et al. (2015) further restrict to orthogonal transformations; i.e., . The orthogonal constraint significantly improves word translation accuracy (Artetxe et al., 2016). However, this method still fails for some language pairs because word embeddings are not isomorphic across languages. To improve orthogonal alignment between nonisomorphic embedding spaces, we aim to transform monolingual embeddings in a way that helps orthogonal transformation.
3 When Orthogonal Mappings Work
When are two embedding spaces easily aligned? A good orthogonal mapping is more likely if word vectors have two properties: lengthinvariance and centerinvariance.
LengthInvariance.
First, all word vectors should have the same, constant length. Lengthinvariance resolves inconsistencies between monolingual word embedding and crosslingual mapping objectives (Xing et al., 2015). During training, popular word embedding algorithms Mikolov et al. (2013b); Pennington et al. (2014); Bojanowski et al. (2017) maximize dot products between similar words, but evaluate on cosine similarity. To make things worse, the transformation matrix minimizes a third metric, Euclidean distance (Equation 1). This inconsistency is naturally resolved when the lengths of word vectors are fixed. Suppose and have the same length, then
Minimizing Euclidean distance is equivalent to maximizing both dot product and cosine similarity with constant word vector lengths, thus making objectives consistent.
Lengthinvariance also satisfies a prerequisite for bilingual orthogonal alignment: the embeddings of translation pairs should have the same length. If a source word vector can be aligned to its target language translation with an orthogonal matrix , then
(2) 
where the second equality follows from the orthogonality of . Equation (2) is trivially satisfied if all vectors have the same length. In summary, lengthinvariance not only promotes consistency between monolingual word embedding and crosslingual mapping objective but also simplifies translation pair alignment.
CenterInvariance.
Our second condition is that the mean vector of different languages should have the same length, which we prove is a prerequisite for orthogonal alignment. Suppose two embedding matrices and can be aligned with an orthogonal matrix such that . Let and be the mean vectors. Then . Since is orthogonal,
In other words, orthogonal mappings can only align embedding spaces with equalmagnitude centers.
A stronger version of centerinvariance is zeromean, where the mean vector of each language is zero. \newciteartetxe16 find that centering improves dictionary induction; our analysis provides an explanation.
Method  Normalization  ja  zh  hi  tr  da  de  es 

Procrustes  None  1.7  32.5  33.3  44.9  54.0  73.5  81.4 
c+l  12.3  41.1  34.0  46.5  54.9  74.6  81.3  
in  44.3  44.2  36.7  48.7  58.4  75.5  81.5  
Procrustes + refine  None  1.7  32.5  33.6  46.3  56.8  74.3  81.9 
c+l  13.1  42.3  34.9  48.7  59.3  75.2  82.4  
in  44.3  44.2  37.7  51.7  60.9  76.0  82.5  
rcsls  None  14.6  17.1  5.0  18.3  19.2  43.6  50.5 
c+l  16.1  45.1  36.2  50.7  58.3  77.5  83.6  
in  56.3  48.6  38.0  52.4  60.5  78.1  83.9 
4 Iterative Normalization
We now develop Iterative Normalization, which transforms monolingual word embeddings to satisfy both lengthinvariance and centerinvariance. Specifically, we normalize word embeddings to simultaneously have unitlength and zeromean. Formally, we produce embedding matrix such that
(3) 
and
(4) 
Iterative Normalization transforms the embeddings to make them satisfy both constraints at the same time. Let be the initial embedding for word . We assume that all word embeddings are nonzero.^{2}^{2}2For such vectors, a small perturbation is an easy fix. For every word , we iteratively transform each word vector by first making the vectors unit length,
(5) 
and then making them mean zero,
(6) 
Equation (5) and (6) project the embedding matrix to the set of embeddings that satisfy Equation (3) and (4). Therefore, our method is a form of alternating projection (Bauschke and Borwein, 1996), an algorithm to find a point in the intersection of two closed sets by alternatively projecting onto one of the two sets. Alternating projection guarantees convergence in the intersection of two convex sets at a linear rate Gubin et al. (1967); Bauschke and Borwein (1993). Unfortunately, the unitlength constraint is nonconvex, ruling out the classic convergence proof. Nonetheless, we use recent results on alternating nonconvex projections Zhu and Li (2018) to prove Iterative Normalization’s convergence (details in Appendix A).
Theorem 1.
If the embeddings are nonzero after each iteration; i.e., for all and , then the sequence produced by Iterative Normalization is convergent.
All embeddings in our experiments satisfy the nonzero assumption; it is violated only when all words have the same embedding. In degenerate cases, the algorithm might converge to a solution that does not meet the two requirements. Empirically, our method always satisfy both constraints.
Previous approach and differences.
Artetxe et al. (2016) also study he unitlength and zeromean constraints, but our work differs in two aspects. First, they motivate the zeromean condition based on the heuristic argument that two randomly selected word types should not be semantically similar (or dissimilar) in expectation. While this statement is attractive at first blush, some word types have more synonyms than others, so we argue that word types might not be evenly distributed in the semantic space. We instead show that zeromean is helpful because it satisfies centerinvariance, a necessary condition for orthogonal mappings. Second, Artetxe et al. (2016) attempt to enforce the two constraints by a single round of dimensionwise mean centering and length normalization. Unfortunately, this often fails to meet the constraints at the same time—length normalization can change the mean, and mean centering can change vector length. In contrast, Iterative Normalization simultaneously meets both constraints and is empirically better (Table 1) on dictionary induction.
5 Dictionary Induction Experiments
On a dictionary induction benchmark, we combine Iterative Normalization with three clwe methods and show improvement in word translation accuracy across languages.
5.1 Dataset and Methods
We train and evaluate clwe on muse dictionaries (Conneau et al., 2018) with default split. We align English embeddings to thirtynine target language embeddings, pretrained on Wikipedia with fastText Bojanowski et al. (2017). The alignment matrices are trained from dictionaries of 5,000 source words. We report top1 word translation accuracy for 1,500 source words, using crossdomain similarity local scaling (Conneau et al., 2018, csls). We experiment with the following clwe methods.^{3}^{3}3We only report accuracy for one run, because these clwe methods are deterministic.
Procrustes Analysis.
Posthoc Refinement.
Orthogonal mappings can be improved with refinement steps Artetxe et al. (2017); Conneau et al. (2018). After learning an initial mapping from the seed dictionary , we build a synthetic dictionary by translating each word with . We then use the new dictionary to learn a new mapping and repeat the process.
Relaxed csls Loss (rcsls).
Joulin et al. (2018) optimize csls scores between translation pairs instead of Equation (1). rcsls has stateoftheart supervised word translation accuracies on muse (Glavas et al., 2019). For the ease of optimization, rcsls does not enforce the orthogonal constraint. Nevertheless, Iterative Normalization also improves its accuracy (Table 1), showing it can help linear nonorthogonal mappings too.
5.2 Training Details
We use the implementation from muse for Procrustes analysis and refinement (Conneau et al., 2018). We use five refinement steps. For rcsls, we use the same hyperparameter selection strategy as Joulin et al. (2018)—we choose learning rate from and number of epochs from by validation. As recommended by Joulin et al. (2018), we turn off the spectral constraint. We use ten nearest neighbors when computing csls.
5.3 Translation Accuracy
For each method, we compare three normalization strategies: (1) no normalization, (2) dimensionwise mean centering followed by length normalization (Artetxe et al., 2016), and (3) five rounds of Iterative Normalization. Table 1 shows word translation accuracies on seven selected target languages. Results on other languages are in Appendix B.
As our theory predicts, Iterative Normalization increases translation accuracy for Procrustes analysis (with and without refinement) across languages. While centering and lengthnormalization also helps, the improvement is smaller, confirming that one round of normalization is insufficient. The largest margin is on EnglishJapanese, where Iterative Normalization increases test accuracy by more than 40%. Figure 1 shows an example of how Iterative Normalization makes the substructure of an EnglishJapanese translation pair more similar.
Surprisingly, normalization is even more important for rcsls, a clwe method without orthogonal constraint. rcsls combined with Iterative Normalization has stateoftheart accuracy, but rcsls is much worse than Procrustes analysis on unnormalized embeddings, suggesting that lengthinvariance and centerinvariance are also helpful for learning linear nonorthogonal mappings.
Dataset  Before  After 

ws353  73.9  73.7 
mc  81.2  83.9 
rg  79.7  80.0 
yp130  53.3  57.6 
5.4 Monolingual Word Similarity
Many trivial solutions satisfy both lengthinvariance and centerinvariance; e.g., we can map half of words to and the rest to , where is any unitlength vector. A meaningful transformation should also preserve useful structure in the original embeddings. We confirm Iterative Normalization does not hurt scores on English word similarity benchmarks (Table 2), showing that Iterative Normalization produces meaningful representations.
6 Conclusion
We identify two conditions that make crosslingual orthogonal mapping easier: lengthinvariance and centerinvariance, and provide a simple algorithm that transforms monolingual embeddings to satisfy both conditions. Our method improves word translation accuracy of different mappingbased clwe algorithms across languages. In the future, we will investigate whether our method helps other downstream tasks.
Acknowledgments
We thank the anonymous reviewers for comments. BoydGraber and Zhang are supported by DARPA award HR001115C0113 under subcontract to Raytheon BBN Technologies. Jegelka and Xu are supported by NSF CAREER award 1553284. Xu is also supported by a ChevronMIT Energy Fellowship. Kawarabayashi is supported by JST ERATO JPMJER1201 and JSPS Kakenhi JP18H05291. Any opinions, findings, conclusions, or recommendations expressed here are those of the authors and do not necessarily reflect the view of the sponsors.
References
 AlvarezMelis and Jaakkola (2018) David AlvarezMelis and Tommi S. Jaakkola. 2018. Gromovwasserstein alignment of word embedding spaces. In Proceedings of Empirical Methods in Natural Language Processing.
 AlvarezMelis et al. (2019) David AlvarezMelis, Stefanie Jegelka, and Tommi S Jaakkola. 2019. Towards optimal transport with global invariances. In Proceedings of Artificial Intelligence and Statistics.
 Artetxe et al. (2016) Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of Empirical Methods in Natural Language Processing.
 Artetxe et al. (2017) Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the Association for Computational Linguistics.
 Artetxe et al. (2018) Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust selflearning method for fully unsupervised crosslingual mappings of word embeddings. In Proceedings of the Association for Computational Linguistics.
 Bauschke and Borwein (1993) Heinz H. Bauschke and Jonathan M. Borwein. 1993. On the convergence of von Neumann’s alternating projection algorithm for two sets. SetValued Analysis, 1(2):185–212.
 Bauschke and Borwein (1996) Heinz H. Bauschke and Jonathan M. Borwein. 1996. On projection algorithms for solving convex feasibility problems. SIAM review, 38(3):367–426.
 Bojanowski et al. (2017) Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
 Browder (1967) Felix E. Browder. 1967. Convergence of approximants to fixed points of nonexpansive nonlinear mappings in Banach spaces. Archive for Rational Mechanics and Analysis, 24(1):82–90.
 Conneau et al. (2018) Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In Proceedings of the International Conference on Learning Representations.
 Doval et al. (2018) Yerai Doval, Jose CamachoCollados, Luis EspinosaAnke, and Steven Schockaert. 2018. Improving crosslingual word embeddings by meeting in the middle. In Proceedings of Empirical Methods in Natural Language Processing.
 Finkelstein et al. (2002) Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing search in context: The concept revisited. ACM Transactions on information systems, 20(1):116–131.
 Fujinuma et al. (2019) Yosinari Fujinuma, Jordan BoydGraber, and Michael J. Paul. 2019. A resourcefree evaluation metric for crosslingual word embeddings based on graph modularity. In Proceedings of the Association for Computational Linguistics.
 Glavas et al. (2019) Goran Glavas, Robert Litschko, Sebastian Ruder, and Ivan Vulic. 2019. How to (properly) evaluate crosslingual word embeddings: On strong baselines, comparative analyses, and some misconceptions. In Proceedings of the Association for Computational Linguistics.
 Gubin et al. (1967) L.G. Gubin, B.T. Polyak, and E.V. Raik. 1967. The method of projections for finding the common point of convex sets. USSR Computational Mathematics and Mathematical Physics, 7(6):1–24.
 Guo et al. (2015) Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Crosslingual dependency parsing based on distributed representations. In Proceedings of the Association for Computational Linguistics.
 Hoshen and Wolf (2018) Yedid Hoshen and Lior Wolf. 2018. Nonadversarial unsupervised word translation. In Proceedings of Empirical Methods in Natural Language Processing.
 Jawanpuria et al. (2019) Pratik Jawanpuria, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. 2019. Learning multilingual word embeddings in latent metric space: a geometric approach. Transactions of the Association for Computational Linguistics, 7:107–120.
 Joulin et al. (2018) Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Hervé Jégou, and Edouard Grave. 2018. Loss in translation: Learning bilingual word mapping with a retrieval criterion. In Proceedings of Empirical Methods in Natural Language Processing.
 Klementiev et al. (2012) Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. Proceedings of International Conference on Computational Linguistics.
 Mikolov et al. (2013a) Tomas Mikolov, Quoc V. Le, and Ilya Sutskever. 2013a. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.
 Mikolov et al. (2013b) Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Proceedings of Advances in Neural Information Processing Systems.
 Miller and Charles (1991) George A. Miller and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1–28.
 Pennington et al. (2014) Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of Empirical Methods in Natural Language Processing.
 Rubenstein and Goodenough (1965) Herbert Rubenstein and John B Goodenough. 1965. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633.
 Ruder et al. (2018) Sebastian Ruder, Ryan Cotterell, Yova Kementchedjhieva, and Anders Søgaard. 2018. A discriminative latentvariable model for bilingual lexicon induction. In Proceedings of Empirical Methods in Natural Language Processing.
 Ruder et al. (2017) Sebastian Ruder, Ivan Vulić, and Anders Søgaard. 2017. A survey of crosslingual embedding models. arXiv preprint arXiv:1706.04902.
 Schönemann (1966) Peter H. Schönemann. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1):1–10.
 Smith et al. (2017) Samuel L. Smith, David H. P. Turban, Steven Hamblin, and Nils Y. Hammerla. 2017. Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of the International Conference on Learning Representations.
 Søgaard et al. (2018) Anders Søgaard, Sebastian Ruder, and Ivan Vulić. 2018. On the limitations of unsupervised bilingual dictionary induction. In Proceedings of the Association for Computational Linguistics.
 Xing et al. (2015) Chao Xing, Dong Wang, Chao Liu, and Yiye Lin. 2015. Normalized word embedding and orthogonal transform for bilingual word translation. In Conference of the North American Chapter of the Association for Computational Linguistics.
 Yang and Powers (2006) Dongqiang Yang and David M. Powers. 2006. Verb similarity on the taxonomy of wordnet. In International WordNet Conference.
 Zhang et al. (2016) Yuan Zhang, David Gaddy, Regina Barzilay, and Tommi Jaakkola. 2016. Ten pairs to tag – multilingual POS tagging via coarse mapping between embeddings. In Conference of the North American Chapter of the Association for Computational Linguistics.
 Zhu and Li (2018) Zhihui Zhu and Xiao Li. 2018. Convergence analysis of alternating nonconvex projections. arXiv preprint arXiv:1802.03889.
Appendix A Proof for Theorem 1
Our convergence analysis is based on a recent result on alternating nonconvex projections. Theorem 1 in the work of Zhu and Li (2018) states that the convergence of alternating projection holds even if the constraint sets are nonconvex, as long as the two constraint sets satisfy the following assumption:
Assumption 1.
Let and be any two closed semialgebraic sets, and let be the sequence of iterates generated by the alternating projection method (e.g., Iterative Normalization). Assume the sequence is bounded and the sets and obey the following properties:

threepoint property of : there exists a nonnegative function with such that for any , we have
and

local contraction property of : there exist and such that when , we have
where is the projection onto .
Zhu and Li (2018) only consider sets of vectors, but our constraint are sets of matrices. For ease of exposition, we treat every embedding matrix as a vector by concatenating the column vectors: . The norm of the concatenated vector is equivalent to the Frobenius norm of the original matrix .
The two operations in Iterative Normalization, Equation (5) and (6), are projections onto two constraint sets, unitlength set and zeromean set . To prove convergence of Iterative Normalization, we show that satisfies the threepoint property, and satisfies the local contraction property.
Threepoint property of .
For any and , let be the projection of onto the constraint set with Equation (5). The columns of and have the same length, so we have
(7) 
Since is the projection of onto the unitlength set with Equation (5); i.e., , we can rewrite Equation (7).
(8) 
All columns of and are unitlength. Therefore, we can further rewrite Equation (8).
Let be the minimum length of the columns in . We have the following inequality:
From our nonzero assumption, the minimum column length is always positive. Let be the minimum column length of the embedding matrix after the th iteration. It follows that satisfies the threepoint property with and .
Local contraction property of .
The zeromean constraint set is convex and closed: if two matrices and both have zeromean, their linear interpolation must also have zeromean for any . Projections onto convex sets in a Hilbert space are contractive Browder (1967), and therefore satisfies the local contraction property with any positive and .
Appendix B Results on All Languages
Table 3 shows word translation accuracies on all target languages. Iterative Normalization improves accuracy on all languages.
Procrustes  Procrustes + refine  rcsls  

Target  None  c+l  in  None  c+l  in  None  c+l  in 
af  26.3  28.3  29.7  27.7  28.7  30.4  9.3  28.6  29.3 
ar  36.5  37.1  37.9  36.5  37.1  37.9  18.4  40.5  41.5 
bs  22.3  23.5  24.4  23.3  23.9  26.6  5.4  25.5  26.6 
ca  65.9  67.6  68.9  66.5  67.6  68.9  43.0  68.9  69.5 
cs  54.0  54.7  55.3  54.0  54.7  55.7  29.9  57.8  58.2 
da  54.0  54.9  58.4  56.8  59.3  60.9  19.2  58.3  60.5 
de  73.5  74.6  75.5  74.3  75.2  76.0  43.6  77.5  78.1 
el  44.0  44.9  47.5  44.6  45.9  47.9  14.0  47.1  48.5 
es  81.4  81.3  81.5  81.9  82.1  82.5  50.5  83.6  83.9 
et  31.9  34.5  36.1  31.9  35.3  36.4  8.1  37.3  39.4 
fa  33.1  33.7  37.3  33.1  34.1  37.3  5.9  37.5  38.3 
fi  47.6  48.5  50.9  47.6  50.1  51.1  20.9  52.3  53.3 
fr  81.1  81.3  81.7  82.1  82.7  82.4  53.1  83.9  83.9 
he  40.2  43.1  43.7  40.2  43.1  43.7  13.1  49.7  50.1 
hi  33.3  34.0  36.7  33.6  34.9  37.7  5.0  36.2  38.0 
hr  37.0  37.8  40.2  37.6  37.8  40.2  14.5  41.1  42.6 
hu  51.8  54.1  55.5  53.3  54.1  56.1  11.7  57.3  58.2 
id  65.6  65.7  67.9  67.7  68.4  70.3  24.8  68.9  70.0 
it  76.2  76.6  76.6  77.5  78.1  78.1  48.4  78.8  79.1 
ja  1.7  13.1  44.3  1.7  13.1  44.3  14.6  16.1  56.3 
ko  31.5  32.1  33.9  31.5  32.1  33.9  6.4  37.5  37.5 
lt  22.5  22.8  23.2  22.5  22.8  23.3  7.6  23.3  23.5 
lv  23.6  24.9  26.1  23.6  24.9  26.1  10.1  28.3  28.7 
ms  44.0  45.4  48.9  46.5  48.3  51.1  19.9  49.1  50.2 
nl  72.8  73.7  74.1  73.8  75.1  75.8  46.7  75.6  75.8 
pl  58.2  60.2  60.1  58.5  60.2  60.4  39.4  62.4  62.5 
pt  79.5  79.7  79.9  79.9  81.0  81.2  63.1  81.1  81.7 
ro  58.1  60.5  61.8  59.9  60.5  62.5  27.1  61.9  63.3 
ru  51.7  52.1  52.1  51.7  52.1  52.1  26.6  57.1  57.9 
sk  38.0  39.3  40.4  38.0  39.3  41.7  13.3  41.5  42.3 
sl  32.5  34.3  36.7  32.5  34.4  36.7  12.3  36.0  37.9 
sq  23.5  25.1  27.3  23.5  25.1  27.3  4.4  26.5  27.3 
sv  58.7  59.6  60.7  60.9  61.2  62.6  35.6  63.8  63.9 
ta  15.1  15.5  16.8  15.1  15.5  17.7  6.7  16.3  17.1 
th  22.5  23.3  22.9  22.5  23.3  22.9  9.4  23.7  23.9 
tr  44.9  46.5  48.7  46.3  48.7  51.7  18.3  50.7  52.4 
uk  34.8  35.9  36.3  35.5  35.9  36.5  18.8  40.7  40.8 
vi  41.3  42.1  43.7  42.1  42.7  44.2  14.2  43.3  43.9 
zh  32.5  42.3  44.2  32.5  42.3  44.2  17.1  45.1  48.6 
Average  44.7  46.3  48.4  45.3  47.0  49.1  21.8  49.0  50.9 