MorphNet: A sequence-to-sequence model that combines morphological analysis and disambiguation
We introduce MorphNet, a single model that combines morphological analysis and disambiguation. Traditionally, analysis of morphologically complex languages has been performed in two stages: (i) A morphological analyzer based on finite-state transducers produces all possible morphological analyses of a word, (ii) A statistical disambiguation model picks the correct analysis based on the context for each word. MorphNet uses a sequence-to-sequence recurrent neural network to combine analysis and disambiguation. We show that when trained with text labeled with correct morphological analyses, MorphNet obtains state-of-the art or comparable results for nine different datasets in seven different languages.
Erenay Dayanık Ekin Akyürek Koç University Artificial Intelligence Laboratory İstanbul, Turkey edayanik16,eakyurek13,firstname.lastname@example.org Deniz Yuret
MorphNet111The code will be released upon publication. is a sequence-to-sequence recurrent neural network model that takes sentences in plain text as input and attempts to produce the correct morphological analysis of each word as output. Traditional methods, e.g. [Oflazer and Tür, 1997], first produce all possible analyses of a word using finite-state transducers and then perform disambiguation using statistical or rule-based methods. In contrast, MorphNet combines the analysis and disambiguation in a single model and it obtains state-of-the art or comparable results in producing the correct morphological analysis of each word.
|Word||Analysis & Context|
Morphological analysis identifies the structure of words and word-parts (morphemes) such as stems, prefixes and suffixes. For example, Table 1 shows the three possible analyses for the ambiguous Turkish word “masalı”: the accusative and possessive forms of the root “masal” (tale) and the +With form of the root “masa” (table) are expressed with the same surface form [Oflazer, 1994]. Producing the correct analysis is essential for downstream syntactic and semantic processing.
The importance of morphological analysis varies significantly by language family. Analytic languages like Chinese, and near-analytic languages like English show a low ratio of morphemes to words and express most grammatical relations using function words or word order. According to [Baayen et al., 1995], less than 10% of word tokens in English text carry an affix and less than 30 word forms have the type of morphological ambiguity observed in the “masalı” example (e.g. leaves=leaf-f+ves vs leave+s). Thus, for most purposes, simple table lookup tends to be sufficient for English morphological analysis.
In contrast, agglutinative languages like Turkish and Finnish tend to have a high ratio of morphemes to words and express many grammatical relations using affixes. In the Turkish training data used by [Yuret and Türe, 2006], 48% of the word tokens carry an affix and 59% of these are morphologically ambiguous. For downstream syntactic analysis, [Oflazer et al., 1999] observes that words in Turkish can have dependencies to any one of the inflectional groups of a derived word. For example, in “mavi masalı oda” (room with blue table) the adjective “mavi” (blue) modifies the noun root “masa” (table) even though the final part of speech of “masalı” is an adjective. Accurate morphological analysis and disambiguation are important prerequisites for further syntactic and semantic processing in agglutinative languages.
Previous work has separated the tasks of morphological analysis and morphological disambiguation. Morphological analysis is taken to be the task of producing all possible morphological parses of a given word. Morphological analyzers have typically been implemented as Finite State Transducers (FSTs) with language specific, manually generated rules. Morphological disambiguation is the task of selecting the correct parse for a given word in a given context among all possible parses. Various rule based, statistical and neural network models have been implemented for morphological disambiguation. These models are described in Section 2.
In this work, our motivation is to eliminate the need for separate morphological analyzer and disambiguator components and to provide a single, easy-to-use model. We present MorphNet, a sequence to sequence [Sutskever et al., 2014] model for morphological analysis and disambiguation. Once trained, the model can be used either as a stand-alone application or with an external analyzer to eliminate errors for unambiguous tokens. The model uses three Long Short Term Memory (LSTM) [Hochreiter and Schmidhuber, 1997] encoders, a character based encoder to obtain word embeddings, a word based bidirectional LSTM encoder to obtain context embeddings and a unidirectional LSTM encoder to obtain output embeddings of preceding word analyses. The decoder consists of a two layer LSTM model. The first layer’s hidden state is initialized with the context embedding, and the second layer’s hidden state is initialized with the combination of word and output embeddings. The decoder learns to predict the correct morphological analysis including root characters and morphemes. Figure 1 gives the model architecture.
When we were evaluating our model on available Turkish datasets we realized that existing datasets suffer from low accuracy and small test sets, which makes model comparison difficult due to noise and statistical significance problems. To address these issues we created a new dataset, TrMor2018222The data will be released upon publication., which contains 460K tagged tokens and has been verified to be 97%+ accurate by trained annotators. We report our results on this new dataset as well as previously available datasets.
The main contributions of this work are:
A new model that performs morphological analysis and disambiguation together.
Release of a new morphological disambiguation dataset for Turkish.
State-of-the-art or comparable results on nine different datasets in seven different languages.
2 Related Work
In this section, we summarize the previous work on Morphological Analysis, Morphological Disambiguation and Morphological Tagging.
2.1 Morphological Analysis
Morphological analysis is generally performed by finite-state transducers (FST) which produce all possible parses for a given word [Koskenniemi, 1983]. The analyzers are language dependent rule based systems that typically consist of a lexicon, two-level phonological rules, and finite state transducers that encode morphotactics [Koskenniemi, 1981, Karttunen and Wittenburg, 1983]. The first rule based analyzer for Turkish was developed in [Oflazer, 1994], we used an updated version of this analyzer when creating our new Turkish dataset. Other analyzers for Turkish include [Eryiğit and Adalı, 2004], [Çöltekin, 2010].
2.2 Morphological Disambiguation
Previous work on morphological disambiguation can be grouped into four broad categories in terms of the technique they use. These categories are rule based, statistical, hybrid and neural network based approaches.
The rule-based approaches [Karlsson et al., 1995, Oflazer and Kuruöz, 1994, Oflazer and Tür, 1996, Daybelge and Çiçekli, 2007, Daoud, 2009] exploit hand-crafted rules to select the correct parse among the candidates. The rules, typically language specific, are designed such that the model can capture the relationship between the context of the word that is subject to disambiguation and all its possible parses. Besides selecting the correct analysis, some systems [Oflazer and Tür, 1996] use these rules to eliminate the incorrect parses.
Statistical approaches try to disambiguate the target word according to the statistics calculated on the corpus they use. For example, [Hakkani-Tür et al., 2002] breaks up the morphosyntactic tags into inflectional groups to handle a large set of tags and then the model assigns a probability to each morphosyntactic tag by considering statistics over the individual inflection groups in a trigram model. [Yuret and Türe, 2006] uses decision lists to vote on each of the potential parses of a word. For each tag in the corpus, a different decision list is learnt using the Greedy Prepend Algorithm [Yuret and de la Maza, 2006].
Hybrid approaches combine hand-crafted rules with statistical methods. [Hajič et al., 2007] employs a rule-based component to reduce the number of possible parses and then runs a statistical Part of Speech (POS) tagger to disambiguate words in the Czech language.
Recently, a number of studies [Yıldız et al., 2016, Shen et al., 2016, Toleu et al., 2017] based on neural networks were also tried in morphological disambiguation. [Yıldız et al., 2016], proposed an architecture based on a convolutional neural network (CNN). The CNN is used to create a representation of the target word by using the root of the word and its morpheme features. Then by using this representation and ground truth annotations of previous words, the model predicts the correct analysis. In contrast to [Yıldız et al., 2016], to capture the context and relationship between target word and its surroundings, [Shen et al., 2016] exploits LSTM based neural network architectures. They customize the neural network architecture based on the input language. [Toleu et al., 2017] proposes a neural network model for the morphological disambiguation task on Kazakh and Turkish languages.
2.3 Morphological Tagging
In contrast to the two stage systems described above which use separate modules for morphological analysis and disambiguation, morphological tagging attempts to solve both problems at once and assign the correct tag to a given word without the aid of an analyzer that produces possible parses. This is the approach we take in MorphNet. Prior to MorphNet, [Mueller et al., 2013] attempted to solve the morphological tagging problem by employing a model based on Conditional Random Fields (CRFs)[Lafferty et al., 2001] and [Heigold et al., 2017] proposed two morphological tagging architectures based on neural networks. Their best performing architecture, similar to MorphNet, uses unidirectional and bidirectional LSTMs and takes only raw sentences as input. However, our work differs from theirs in a couple of important ways. First, given a word in a context, they predict a morphological tag as a complete unit, rather than as a combination of component features. Consequently, their model can only produce analyses observed in the training data. In contrast, MorphNet produces the morphological tag of the target word a feature at a time. Second, their system ignores the root of the target word and predicts only the morphological features whereas MorphNet generates the stem as well as the morphological tags.
MorphNet produces the morphological analysis (stem plus morphological features) for each word in a given sentence. It is based on the sequence-to-sequence encoder-decoder network approach proposed by [Sutskever et al., 2014] for machine translation. However, we use three distinct encoders to create embeddings of various input features. First, a word encoder creates an embedding for each word based on its characters. Second, a context encoder creates an embedding for the context of each word based on the word embeddings of its left and right neighbors. Third, an output encoder creates an output embedding using the analyses of previous words. These embeddings are fed to the decoder which produces the stem and the morphological features of a target word. In the following subsections, we explain each component in detail.
3.1 Input Output
The input to the model consists of an word sentence , where is the i’th word in the sentence. Each word is input as a sequence of characters where is the set of alphanumeric characters and is the number of characters in word .
The output for each word consists of a stem, a part-of-speech tag and a set of morphological features, e.g. “masal+Noun+A3sg+P3sg+Nom” for “masalı”. The stem is produced one character at a time, and the morphological information is produced one feature at a time. A sample output for a word looks like where is an alphanumeric character in the stem, is the length of the stem, is the number of features, is a morphological feature from a feature set such as .
3.2 Word Encoder
We map each character to an dimensional character embedding vector . The word encoder takes each word and processes the character embeddings from left to right producing hidden states where . The final hidden state is used as the word embedding for word .
3.3 Context Encoder
We use a bidirectional LSTM as the context encoder. The inputs are the word embeddings produced by the word encoder. The context encoder processes them in both directions and constructs a unique context embedding for each target word in the sentence. For a word we define its corresponding context embedding as the concatenation of the forward and the backward hidden states that are produced after the forward and backward LSTMs process the word embedding . In Figure 1(c), the creation of the context vector for the target word ”elini” is illustrated.
3.4 Output Encoder
The output encoder captures information about the morphological features of words processed prior to each target word. For example, in order to assign the correct possessive marker to the word “masalı” (tale) in “babamın masalı” (my father’s tale), it would be useful to know that the previous word “babamın” (my father) has a genitive marker. During training we use the gold morphological features, during testing we use the output of the model.
The output encoder only uses the morphological features, not the stem characters, of the previous words as input: . We map each morphological feature to a dimensional feature embedding vector . A unidirectional LSTM is run over the whole sentence up to the target word to produce hidden states where . The final hidden state preceding the target word is used as the output embedding for word .
The decoder is implemented as a 2-Layer LSTM network that outputs the correct tag for a single target word. By conditioning on the three input embeddings and its own hidden state, the decoder learns to generate where is the correct tag of the target word in sentence , represents both stem characters and morphological feature tokens, and is the total number of output tokens (stem + features) for word . The first layer of the decoder is initialized with the context embedding .
where, , and is element-wise summation. We initialize the second layer with the word and output embeddings after combining them by element-wise summation.
We parameterize the distribution over possible morphological features and characters at each time step as
where and where is the set of characters and morphological features in output vocabulary.
We evaluate MorphNet on several different languages and datasets. First we describe the multilingual datasets we used from the Universal Dependency Treebanks. We then describe two existing datasets for Turkish and introduce our new dataset TrMor2018.
4.1 Universal Dependency Datasets
We tested MorphNet on a diverse set of languages selected from Universal Dependency (UD) [Nivre et al., 2016] treebanks. Table-2 summarizes the corpus statistics about each language. Specifically, we use the CoNLL-U format333http://universaldependencies.org/format.html for the input files, take column 2 (FORM) as input and predict columns 4 (UPOSTAG) and 6 (FEATS).
4.2 Turkish Datasets
For Turkish we evaluate our model on two existing datasets. The first, TrMor2006, was first used in [Yuret and Türe, 2006]. The training set was disambiguated semi-automatically and has limited accuracy. The test set was hand-tagged but is very small (862 tokens) to reliably distinguish between models with similar accuracy. We randomly extracted 100 sentences from training set and used them as the development set while training our model.
The second dataset we used, TrMor2016, was prepared by [Yıldız et al., 2016]. The training set is the same with TrMor2006 but they manually retagged a subset of the training set containing roughly 20000 tokens to be used as a larger test set (T20K in the table). Unfortunately they did not exclude the sentences in T20K from the training set in their experiments. Also, they do not provide any accuracy or inter-annotator-agreement results on the new test set. Table-3 gives the statistics for these datasets.
We also evaluate MorphNet on a new dataset, TrMor2018, that we release with this paper. Our goal was to address the problems with the accuracy of the training set in TrMor2006 and to provide a larger disjoint test set to more reliably distinguish the accuracy of similar models. The new dataset consists of 34673 sentences and 460663 words in total. Similar to [Yuret and Türe, 2006], it was annotated semi-automatically in multiple passes. In order to estimate the noise level of the dataset, we randomly selected a subset from the dataset and manually disambiguated it. The subset contains 2090 sentences and 28909 words. Two annotators annotated each word independently and we assigned the final morphological tag of each word based on the adjudication by a third. Then, we compared the manually disambiguated subset with the semi-automatic results and found the noise level of the dataset as 3% approximately. In our experiments, we split TrMor2018 to train, development and test sets by randomly selecting the 80%, 10% and 10% of the sentences respectively. Table-4 shows the statistical information about TrMor2018.
|[Yuret and Türe, 2006]||89.43||95.82||-||-||-||-||-||-||-|
|[Sak et al., 2007]||90.76||96.28||-||-||-||-||-||-||-|
|[Yıldız et al., 2016]||-||-||-||84.12||92.20||-||-||-||-|
|[Shen et al., 2016]||91.03||96.41||-||-||-||-||-||-||-|
5 Experiments and Results
In this section we describe our training procedure, give experimental results, and provide an ablation analysis of our model.
All LSTM units have hidden units in our model. Size of the character embedding vectors are in the word encoder. In the decoder part, size of the output embedding vectors are . We initialized model parameters with Xavier initialization [Glorot and Bengio, 2010].
We train our networks by using back-propagation through time with stochastic gradient descent. We use learning rate and we apply learning rate decay according to development accuracy. We applied learning rate decay by a factor of 0.8 and early stopping if the development set accuracy is not improved during 6 consecutive epochs. We applied dropout with the rate of 0.3 before and after each of the LSTM units in MorphNet as well as embedding layers.
5.2 Turkish Results
Table-5 shows the results of several systems for different Turkish datasets. The (W/) column shows the performance when unambiguous tokens are tagged perfectly. The (W/O) column shows the accuracies when unambiguous tokens are also tagged by the model. The reason we made this distinction w as to get results comparable to older models: When we use MorphNet to tag unambiguous words, it does not achieve 100% accuracy. Unlike MorphNet, all of the models we compare with take the output of morphological analyzer as input and predict the correct analysis among them, thus they always get the right result for unambiguous tokens. For the TrMor2006 dataset, we report 96.86% total accuracy which is the best result to date, although the difference is not statistically significant. For ambiguous words, MorphNet performs with 92.86% accuracy showing its ability to simulate both analyzer and disambiguator internally. Maybe more importantly, the standalone MorphNet (W/O) performs better than the disambiguators even though it does not rely on an external analyzer. When we test our system on the test set with 20K tokens, we achieved a slightly better performance than [Yıldız et al., 2016] on ambiguous words. We hope that the new TrMor2018 dataset will allow for better system comparison due to its high accuracy and large test set.
5.3 Multilingual Results
|MorphNet||[Heigold et al., 2017]||MarMoT444[Mueller et al., 2013] http://cistern.cis.lmu.de/marmot|
To prove MorphNet is a language agnostic model and it can easily be used with languages other than Turkish we evaluated our model on 8 different languages and compared results with two other methods one of which is an another neural network based approach [Heigold et al., 2017] and the other is a strong non-neural baseline [Mueller et al., 2013]. Table-6 shows test set performances on these languages. For Turkish (Tr) we obtain the best score with 89.54% accuracy. For Bulgarian (Bg) we performed better than non-neural baseline by 0.66% while [Heigold et al., 2017] achieves the best score by additional 0.21% gain. We observed 1.52% improvement over the [Heigold et al., 2017], when we evaluate MorpNet in Romanian (Ro). French (Fr) is the only language where non-neural model performs the best score. This may be result of the limited morphological complexity of the language or using a model that is not specifically optimized for the French language. Among those languages, only for Hungarian (Hu), we have a significantly lower test set accuracy. We believe this is related with the amount of training size of Hungarian language which is the minimum of eight languages. We cannot compare ourself with the others on Catalan (Ca), Italian (It) and Danish (Da) since we couldn’t find any reported results.
A: Ambiguous Accuracy, U:Unambiguous accuracy T:Total accuracy
5.4 Ablation Analysis
In this section, the contributions of the individual components of the full model are analyzed. In Figure-2 the components that are identical to the finalized model are demonstrated with gray boxes. In the following two ablation studies, we disassemble individual modules to investigate the change in the performance of the model. We use the TrMor2018 dataset in two different configurations. In the first configuration we only used a small portion of the training data in order to show the difference between ablation models in a regime far from the noise level of the dataset. We randomly sampled 5 different subsets from training data and report the average performance. The size of each subset is approximately the 10% of the original training set. We train all ablation models and MorphNet on these five subsets separately and evaluated each using the original development and test sets. In the second configuration, we use all the available training data. Table-7 shows the test set accuracy of each ablated model as well as the finalized model.
We start our ablation studies by removing both the context encoder and the output encoder. The resulting model (seq2seq) is a standard encoder-decoder model which is only able to employ word embeddings (i.e. no context information). In that case, we record 91.99% total accuracy.
We then improved the model by reassembling the context encoder (seq2seq+context) to the model. We observed 0.93% and 1.03% increase in ambiguous word accuracies according to the amount of training data we use.
This is the minimal version of MorphNet which is capable of learning more than only the most frequent morphologic analysis of each wordform. For example, Table-8 shows the words with the same stem ”röportaj” (interview) and their analyses in the training set. We tested both models on the never before seen word ”röportajı”555The accompanying sentence was ”Benden bu röportajı yalanlamamı rica etti.” (I was asked to deny the interview). While seq2seq failed by selecting the most frequently occurred tag of ”röportaj”, seq2seq+context disambiguated the target word successfully666röportaj+Noun+A3sg+Pnon+Acc.
As a next step, we also include the output encoder to our model (seq2seq+context+output). This gives our full MorphNet model. It improved our disambiguation performance on ambiguous tokens by 0.21% in the first configuration and 0.61% in the second configuration.
These experiments show that each of the model components have significant individual contributions to the overall performance.
In this paper, we present MorphNet, a language independent neural sequence-to-sequence model and TrMor2018 a new Turkish dataset for morphological disambiguation. MorphNet employs two different unidirectional LSTMs to obtain word and output embeddings and a bidirectional LSTM to obtain the context embedding of target word. It outputs the stem of the word a character at a time followed by morphological features, one feature at a time. We evaluated MorphNet on eight different languages, and obtained state-of-the art or comparable results for all but one language. We also release a new morphology dataset for Turkish which is semi-automatically generated and manually confirmed to have 97%+ accuracy.
- [Baayen et al., 1995] R Harald Baayen, Richard Piepenbrock, and Leon Gulikers. 1995. The celex lexical database (release 2). Distributed by the Linguistic Data Consortium, University of Pennsylvania.
- [Çöltekin, 2010] Çağrı Çöltekin. 2010. A freely available morphological analyzer for turkish. In LREC, volume 2, pages 19–28.
- [Daoud, 2009] Daoud Daoud. 2009. Synchronized morphological and syntactic disambiguation for arabic. Advances in Computational Linguistics, 41:73–86.
- [Daybelge and Çiçekli, 2007] Turhan Daybelge and İlyas Çiçekli. 2007. A rule-based morphological disambiguator for turkish”, in. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2007), Borovets, pages 145–149.
- [Eryiğit and Adalı, 2004] Gülşen Eryiğit and Eşref Adalı. 2004. An affix stripping morphological analyzer for turkish. In Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria, pages 299–304.
- [Glorot and Bengio, 2010] Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 249–256.
- [Hajič et al., 2007] Jan Hajič, Jan Votrubec, Pavel Krbec, Pavel Květoň, et al. 2007. The best of two worlds: Cooperation of statistical and rule-based taggers for czech. In Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies, pages 67–74. Association for Computational Linguistics.
- [Hakkani-Tür et al., 2002] Dilek Z Hakkani-Tür, Kemal Oflazer, and Gökhan Tür. 2002. Statistical morphological disambiguation for agglutinative languages. Computers and the Humanities, 36(4):381–410.
- [Heigold et al., 2017] Georg Heigold, G ünter Neumann, and Josef van Genabith. 2017. An extensive empirical evaluation of character-based morphological tagging for 14 languages. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, volume 1, pages 505–513.
- [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput., 9(8):1735–1780, November.
- [Karlsson et al., 1995] Fred Karlsson, Atro Voutilainen, Juha Heikkila, and Arto Anttila, editors. 1995. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Walter de Gruyter & Co., Hawthorne, NJ, USA.
- [Karttunen and Wittenburg, 1983] Lauri Karttunen and Kent Wittenburg. 1983. A two-level morphological analysis of english. In Texas Linguistic Forum Austin, Tex., number 22, pages 217–228.
- [Koskenniemi, 1981] Kimmo Koskenniemi. 1981. An application of the two-level model to Finnish. Computational morphosyntax: Report on research, 1984:19–41.
- [Koskenniemi, 1983] Kimmo Koskenniemi. 1983. Two-level model for morphological analysis. In IJCAI, volume 83, pages 683–685.
- [Lafferty et al., 2001] John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.
- [Mueller et al., 2013] Thomas Mueller, Helmut Schmid, and Hinrich Schütze. 2013. Efficient higher-order CRFs for morphological tagging. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 322–332, Seattle, Washington, USA, October. Association for Computational Linguistics.
- [Nivre et al., 2016] Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan T. McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In LREC.
- [Oflazer and Kuruöz, 1994] Kemal Oflazer and İlker Kuruöz. 1994. Tagging and morphological disambiguation of turkish text. In Proceedings of the fourth conference on Applied natural language processing, pages 144–149. Association for Computational Linguistics.
- [Oflazer and Tür, 1996] Kemal Oflazer and Gökhan Tür. 1996. Combining hand-crafted rules and unsupervised learning in constraint-based morphological disambiguation. CoRR, cmp-lg/9604001.
- [Oflazer and Tür, 1997] Kemal Oflazer and Gökhan Tür. 1997. Morphological disambiguation by voting constraints. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL97, EACL97), Madrid, Spain.
- [Oflazer et al., 1999] Kemal Oflazer, Dilek Z. Hakkani-Tür, and Gökhan Tür. 1999. Design for a turkish treebank. In Proceedings of the Workshop on Linguistically Interpreted Corpora, EACL 99, Bergen, Norway.
- [Oflazer, 1994] Kemal Oflazer. 1994. Two-level description of turkish morphology. Literary and Linguistic Computing, 9(2):137–148.
- [Sak et al., 2007] Haşim Sak, Tunga Güngör, and Murat Saraçlar, 2007. Morphological Disambiguation of Turkish Text with Perceptron Algorithm, pages 107–118. Springer Berlin Heidelberg, Berlin, Heidelberg.
- [Shen et al., 2016] Qinlan Shen, Daniel Clothiaux, Emily Tagtow, Patrick Littell, and Chris Dyer. 2016. The role of context in neural morphological disambiguation. In COLING, pages 181–191. ACL.
- [Sutskever et al., 2014] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS’14, pages 3104–3112, Cambridge, MA, USA. MIT Press.
- [Toleu et al., 2017] Alymzhan Toleu, Gulmira Tolegen, and Aibek Makazhanov. 2017. Character-aware neural morphological disambiguation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), volume 2, pages 666–671.
- [Yıldız et al., 2016] Eray Yıldız, Çağlar Tirkaz, H. Bahadır Şahin, Mustafa Tolga Eren, and Ozan Sönmez. 2016. A morphology-aware network for morphological disambiguation. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pages 2863–2869. AAAI Press.
- [Yuret and de la Maza, 2006] Deniz Yuret and Michael de la Maza. 2006. The greedy prepend algorithm for decision list induction. In International Symposium on Computer and Information Sciences, pages 37–46. Springer.
- [Yuret and Türe, 2006] Deniz Yuret and Ferhan Türe. 2006. Learning morphological disambiguation rules for turkish. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 328–334. Association for Computational Linguistics.