An Interactive Machine Translation Framework for Modernizing Historical Documents
Due to the nature of human language, historical documents are hard to comprehend by contemporary people. This limits their accessibility to scholars specialized in the time period in which the documents were written. Modernization aims at breaking this language barrier by generating a new version of a historical document, written in the modern version of the document’s original language. However, while it is able to increase the document’s comprehension, modernization is still far from producing an error-free version. In this work, we propose a collaborative framework in which a scholar can work together with the machine to generate the new version. We tested our approach on a simulated environment, achieving significant reductions of the human effort needed to produce the modernized version of the document.
In recent years, awareness of the importance of preserving our cultural heritage has increased. Historical documents are an important part of that heritage. In order to preserve them, there is an increased need in creating digital text versions which can be search and automatically processed (Piotrowski, 2012). However, their linguistic properties create additional difficulties: due to the lack of a spelling convention, orthography changes depending on the time period and author. Furthermore, human language evolves with the passage of time, increasing the difficulty of the document’s comprehension. Thus, historical documents are mostly accessible to scholars specialized in the time period in which each document was written.
Modernization tackles the language barrier in order to increase the accessibility of historical documents. To achieve this, it generates a new version of a historical document in the modern version of the language in which the document was originally written (Fig. 1 shows an example of modernizing a document). However, while modernization has been successful in order to increase the comprehension of historical documents (Tjong Kim Sang et al., 2017; Domingo and Casacuberta, 2018a), it is still far from creating error-free modern versions. Therefore, this task still needs to be carried out by scholars.
Interactive machine translation (IMT) fosters human–computer collaborations to generate error-free translations in a productive way (Foster et al., 1997; Barrachina et al., 2009). In this work, we proposed to apply one of these protocols to historical documents modernization. We strive for creating an error-free modern version of a historical document, decreasing the human effort needed to achieve this goal.
The rest of this document is structured as follows: Section 2 introduces the related work. Then, in Section 3 we present our protocol. Section 4 describes the experiments conducted in order to assess our proposal. The results of those experiments are presented and discussed in Section 5. Finally, in Section 6, conclusions are drawn.
2 Related Work
While the lack of a spelling convention has been extensively researched for years (Baron and Rayson, 2008; Bollmann and Søgaard, 2016; Domingo and Casacuberta, 2018b), modernization of historical documents is a younger field. Tjong Kim Sang et al. (2017) organized a shared task in order to translate historical text to contemporary language. The main goal of this shared task was to tackle the spelling problem. However, they also approached document modernization using a set of rules. Domingo et al. (2017a) proposed a modernization approach based on statistical machine translation (SMT). A neural machine translation (NMT) approach was proposed by Domingo and Casacuberta (2018a). Finally, Sen et al. (2019) extracted parallel phrases from an original parallel corpus and used them as an additional training data for their NMT approach.
Despise the promising results achieved in last years, machine translation (MT) is still far from producing high-quality translations (Dale, 2016). Therefore, a human agent has to supervise these translation in a post-editing stage. IMT was introduced with the goal of combining the knowledge of a human translator and the efficiency of an MT system. Although many protocols have been proposed in recent years (Marie and Max, 2015; González-Rubio et al., 2016; Domingo et al., 2017b; Peris et al., 2017), the prefix-based remains as one of the most successful approaches (Barrachina et al., 2009; Alabau et al., 2013; Knowles and Koehn, 2016). In this approach, the user corrects the leftmost wrong word from the translation hypothesis, inherently validating a correct prefix. With each new correction, the system generates a suffix that completes the prefix to produce a new translation.
3 Interactive Machine Translation
Classical IMT approaches relay on the statistical formalization of the MT problem. Given a source sentence , SMT aims at finding its most likely translation (Brown et al., 1993):
For years, the prevailing approach to compute this expression have been phrase-based models (Koehn, 2010). These models rely on a log-linear combination of different models (Och and Ney, 2002): namely, phrase-based alignment models, reordering models and language models; among others (Zens et al., 2002; Koehn et al., 2003). However, more recently, this approach has shifted into neural models (see Section 3.2).
3.1 Prefix-based Interactive Machine Translation
Prefix-based IMT proposed a user–computer collaboration that starts with the system proposing an initial translation of length . Then, the user corrects the leftmost wrong word , inherently validating all preceding words. These words form a validated prefix , that includes the corrected word . The system reacts to this user feedback, generating a suffix that completes to obtain a new translation of . This process is repeated until the user accepts the complete system suggestion. Fig. 2 illustrates this protocol.
Barrachina et al. (2009) formalized the suffix generation as follows:
which can be straightforwardly rewritten as:
This equation is very similar to Eq. 1: at each iteration, the process consists in a regular search in the translations space but constrained by the prefix .
3.2 Neural Machine Translation
In NMT, Eq. 1 is modeled by a neural network with parameters :
This neural network usually follows an encoder-decoder architecture, featuring recurrent networks (Bahdanau et al., 2015; Sutskever et al., 2014), convolutional networks (Gehring et al., 2017) or attention mechanisms (Vaswani et al., 2017). Model parameters are jointly estimated on large parallel corpora, using stochastic gradient descent (Robbins and Monro, 1951; Rumelhart et al., 1986). At decoding time, the system obtains the most likely translation using a beam search method.
3.3 Prefix-based Interactive Neural Machine Translation
The prefix-based IMT protocol (see Section 3.1) can be naturally included into NMT systems since sentences are generated from left to right. In order to take into account the user’s feedback and generate compatible hypothesis, the search space must be constraint. Given a prefix , only a single path accounts for it. The branching of the search process starts once this path has been covered. Introducing the validated prefix , Eq. 4 becomes:
which implies a search over the space of translations, but constrained by the validated prefix (Peris et al., 2017).
In this section, we present our experimental conditions, including translation systems, corpora and evaluation metrics.
4.1 MT Systems
SMT systems were trained with Moses (Koehn et al., 2007), following the standard procedure: we estimated a 5-gram language model—smoothed with the improved KneserNey method—using SRILM (Stolcke, 2002), and optimized the weights of the log-linear model with MERT (Och, 2003).
We built our NMT systems using NMT-Keras (Peris and Casacuberta, 2018). We used long short-term memory units (Gers et al., 2000), with all model dimensions set to . We trained the system using Adam (Kingma and Ba, 2014) with a fixed learning rate of and a batch size of . We applied label smoothing of (Szegedy et al., 2015). At inference time, we used beam search with a beam size of 6. We applied joint byte pair encoding to all corpora (Sennrich et al., 2016), using merge operations.
Statistical IMT systems were implemented following the procedure of word graph exploration and generation of a best suffix for a given prefix described by Barrachina et al. (2009). Neural IMT systems were built using the interactive branch of NMT-Keras111https://github.com/lvapeab/nmt-keras/tree/interactive_NMT.
The first corpus used in our experimental session was the Dutch Bible (Tjong Kim Sang et al., 2017). This corpus consists in a collection of different versions of the Dutch Bible: a version from 1637, another from 1657, another from 1888 and another from 2010. Except for the 2010 version, which is missing the last books, all versions contain the same texts. Moreover, since the authors mentioned that the translation from this last version is not very reliable and, considering that Dutch has not evolved significantly between 1637 and 1657, we decided to only use the 1637 version—considering this as the original document—and the 1888 version—considering 19 century Dutch as modern Dutch.
We selected El Quijote (Domingo and Casacuberta, 2018a) as our second corpus. This corpus contains the famous 17 century Spanish novel by Miguel de Cervantes, and its correspondent 21 century version.
Finally, we used El Conde Lucanor (Domingo and Casacuberta, 2018a) as a third corpus. This data set contains the original 14 century Spanish novel by Don Juan Manuel, and its correspondent 21 century version. Due to the small size of the corpus, we decided to use it only as a test. Additionally, unable to find a suitable training corpus, we used the systems built for El Quijote—despite the original documents belonging to different time periods—in order to modernize El Conde Lucanor.
Table 1 presents the corpora statistics.
In order to measure the gains in human effort reduction, we made use of the following metrics:
- Word Stroke Ratio (WSR)
(Tomás and Casacuberta, 2006): measures the number of words edited by the user, normalized by the number of words in the final translation.
- Mouse Action Ratio (MAR)
(Barrachina et al., 2009): measures the number of mouse actions made by the user, normalized by the number of characters in the final translation.
Additionally, to evaluate the quality of the modernization and the difficulty of each task, we made use of the following well-known metrics:
BiLingual Evaluation Understudy (BLEU) (Papineni et al., 2002): computes the geometric average of the modified n-gram precision, multiplied by a brevity factor that penalizes short sentences.
Translation Error Rate (TER) (Snover et al., 2006): computes the number of word edit operations (insertion, substitution, deletion and swapping), normalized by the number of words in the final translation.
4.4 User Simulation
Due to the high costs of an evaluation involving human agents, we carried out an automatic evaluation with simulated users whose desired modernizations correspond to the reference sentences.
At each iteration, the user corrects the leftmost wrong word from the system’s hypothesis. With this correction, a new prefix is validated. The associated cost of this correction is of one mouse action and one word stroke. The system, then, reacts to this feedback, generating a new suffix that completes the prefix to conform a new hypothesis. This process is repeated until hypothesis and reference are the same.
Table 2 presents the quality of the modernization. Both SMT and NMT approaches were able to significantly improved the baseline. That is, the modernized documents are easier to comprehend by a contemporary reader than the original documents. An exception to this is El Conde Lucanor. The SMT approach yielded significant improvements in terms of TER, but was worse in terms of BLEU. Moreover, the NMT approach yielded worst results in terms of both BLEU and TER. Most likely, this results are due to having used the systems trained with El Quijote for modernizing El Conde Lucanor (see Section 4.2).
When comparing the SMT and NMT approaches, we observe that SMT yielded the best results in all cases. This behavior was already perceived by Domingo and Casacuberta (2018a) and is, most likely, due to the small size of the training corpora—a well-known problem in NMT. However, while the goal of modernization is making historical documents as easier to comprehend by contemporary people as possible, our goal is different. In this work, our goal is to obtain an error-free modern copy of a historical document. To achieve this, we proposed an interactive collaboration between a human expert and our modernizing system, in order to reduce the effort needed to generate such copy. Table 3 presents the experimental results.
Both SMT and NMT approaches yielded significant reductions of the human effort needed to modernize the Dutch Bible (up to points in terms of WSR and in terms of MAR) and El Quijote (up to points in terms of WSR and of MAR). For El Conde Lucanor, however, both approaches resulted in an increased of the effort need to generate an error-free modern version. This behavior was to be expected since the modernization quality for El Conde Lucanor was very low. Therefore, the system consistently generated wrong suffixes, resulting in the user having to make more corrections.
Regarding the performance of both approaches, SMT achieved the highest effort reduction. This was reasonably expected since its modernization quality was better. However, in past neural IMT works (Peris et al., 2017), the neural IMT approach was able to yield further improvements despite having a lower translation quality than its SMT counterpart. Most likely, the reason of this is that, due to the small training corpora, the neural model was not able to reach its best performance, Nonetheless, we should address this in a future work.
5.1 Qualitative Analysis
Fig. 3 shows an example of modernizing a sentence from El Quijote with the interactive SMT approach. While the system’s initial suggestion contains five errors, with the IMT protocol, the user only needs to make three corrections. With each correction, the system is able to improve its suggestions, reducing the total effort needed to achieve an error-free modernization. Note that this example has been chosen for illustrative purposes of a correct functioning of the system. The average sentences from El Quijote are longer, and there are times in which the system fails to take the human knowledge into account, resulting in an increase of the number of corrections. Nonetheless, as seen in Section 5, overall the system is able to significantly decrease the human effort.
Fig. 4 contains an example of modernizing the same sentence as in Fig. 3, using the interactive NMT approach. This is an example in which the system fails to take into account the user’s corrections, resulting in an increase of the human effort. It is specially worth noting the introduction of non-existing words such as durdos and duradas. This problem was probably caused by an incorrect segmentation of a word, via the byte pair encoding process, and should be address in a future work. Nonetheless, as seen in Section 5, overall the system is able to significantly decrease the human effort.
6 Conclusions and Future Work
In this work, we proposed a collaborative user–computer approach to create an error-free modern version of a historical document. We tested this proposal on a simulated environment, achieving significant reductions of the human effort. We built our modernization protocol based on both SMT and NMT approaches to prefix-based IMT. Although both systems yielded significant improvements for two data sets out of three, the SMT approach yielded the best results—both in terms of the human reduction and in the modernization quality of the initial system.
As a future work, we want to further research the behavior of the neural systems. For that, we would like to explore techniques for enriching the training corpus with additional data, and the incorrect generation of words due to subwords. We would also like to develop new protocols based on successful IMT approaches. Finally, we should test our proposal with real users to obtain actual measures of the effort reduction.
The research leading to these results has received funding from the European Union through Programa Operativo del Fondo Europeo de Desarrollo Regional (FEDER) from Comunitat Valencia (2014–2020) under project Sistemas de frabricación inteligentes para la indústria 4.0 (grant agreement IDIFEDER/2018/025); and from Ministerio de Economía y Competitividad (MINECO) under project MISMIS-FAKEnHATE (grant agreement PGC2018-096212-B-C31). We gratefully acknowledge the support of NVIDIA Corporation with the donation of a GPU used for part of this research.
- Alabau et al. (2013) Alabau, V., Bonk, R., Buck, C., Carl, M., Casacuberta, F., García-Martínez, M., González-Rubio, J., Koehn, P., Leiva, L. A., Mesa-Lao, B., Ortiz-Martínez, D., Saint-Amand, H., Sanchis-Trilles, G., and Tsoukala, C. (2013). CASMACAT: An open source workbench for advanced computer aided translation. The Prague Bulletin of Mathematical Linguistics, 100:101–112.
- Bahdanau et al. (2015) Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.
- Baron and Rayson (2008) Baron, A. and Rayson, P. (2008). VARD2: A tool for dealing with spelling variation in historical corpora. Postgraduate conference in corpus linguistics.
- Barrachina et al. (2009) Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A., Ney, H., Tomás, J., Vidal, E., and Vilar, J.-M. (2009). Statistical approaches to computer-assisted translation. Computational Linguistics, 35:3–28.
- Bollmann and Søgaard (2016) Bollmann, M. and Søgaard, A. (2016). Improving historical spelling normalization with bi-directional lstms and multi-task learning. In Proceedings of the International Conference on the Computational Linguistics, pages 131–139.
- Brown et al. (1993) Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311.
- Crowther (2003) Crowther, J. (2003). No Fear Shakespeare: Hamlet. SparkNotes.
- Dale (2016) Dale, R. (2016). How to make money in the translation business. Natural Language Engineering, 22(2):321–325.
- Domingo and Casacuberta (2018a) Domingo, M. and Casacuberta, F. (2018a). A machine translation approach for modernizing historical documents using back translation. In Proceedings of the International Workshop on Spoken Language Translation, pages 39–47.
- Domingo and Casacuberta (2018b) Domingo, M. and Casacuberta, F. (2018b). Spelling normalization of historical documents by using a machine translation approach. In Proceedings of the Annual Conference of the European Association for Machine Translation, pages 129–137.
- Domingo et al. (2017a) Domingo, M., Chinea-Rios, M., and Casacuberta, F. (2017a). Historical documents modernization. The Prague Bulletin of Mathematical Linguistics, 108:295–306.
- Domingo et al. (2017b) Domingo, M., Peris, Á., and Casacuberta, F. (2017b). Segment-based interactive-predictive machine translation. Machine Translation, pages 1–23.
- Foster et al. (1997) Foster, G., Isabelle, P., and Plamondon, P. (1997). Target-text mediated interactive machine translation. Machine Translation, 12:175–194.
- Gehring et al. (2017) Gehring, J., Auli, M., Grangier, D., Yarats, D., and Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv:1705.03122.
- Gers et al. (2000) Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10):2451–2471.
- González-Rubio et al. (2016) González-Rubio, J., Benedí, J.-M., Ortiz-Martínez, D., and Casacuberta, F. (2016). Beyond prefix-based interactive translation prediction. In Proceedings of the Conference on Computational Natural Language Learning, pages 198–207.
- Kingma and Ba (2014) Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Knowles and Koehn (2016) Knowles, R. and Koehn, P. (2016). Neural interactive translation prediction. In Proceedings of the Association for Machine Translation in the Americas, pages 107–120.
- Koehn (2010) Koehn, P. (2010). Statistical Machine Translation. Cambridge University Press.
- Koehn et al. (2007) Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. (2007). Moses: Open source toolkit for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 177–180.
- Koehn et al. (2003) Koehn, P., Och, F. J., and Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 48–54.
- Marie and Max (2015) Marie, B. and Max, A. (2015). Touch-based pre-post-editing of machine translation output. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1040–1045.
- Och (2003) Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 160–167.
- Och and Ney (2002) Och, F. J. and Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 295–302.
- Papineni et al. (2002) Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 311–318.
- Peris and Casacuberta (2018) Peris, A. and Casacuberta, F. (2018). NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online Learning. The Prague Bulletin of Mathematical Linguistics, 111:113–124.
- Peris et al. (2017) Peris, Á., Domingo, M., and Casacuberta, F. (2017). Interactive neural machine translation. Computer Speech & Language, 45:201–220.
- Piotrowski (2012) Piotrowski, M. (2012). Natural Language Processing for Historical Texts. Number 17 in Synthesis Lectures on Human Language Technologies. Morgan & Claypool.
- Post (2018) Post, M. (2018). A call for clarity in reporting bleu scores. In Proceedings of the Third Conference on Machine Translation, pages 186–191.
- Riezler and Maxwell (2005) Riezler, S. and Maxwell, J. T. (2005). On some pitfalls in automatic evaluation and significance testing for mt. In Proceedings of the workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 57–64.
- Robbins and Monro (1951) Robbins, H. and Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, pages 400–407.
- Rumelhart et al. (1986) Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088):533.
- Sen et al. (2019) Sen, S., Hasanuzzaman, M., Ekbal, A., Bhattacharyya, P., and Way, A. (2019). Take help from elder brother: Old to modern english nmt with phrase pair feedback. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing.
- Sennrich et al. (2016) Sennrich, R., Haddow, B., and Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1715–1725.
- Snover et al. (2006) Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas, pages 223–231.
- Stolcke (2002) Stolcke, A. (2002). SRILM - an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, pages 257–286.
- Sutskever et al. (2014) Sutskever, I., Vinyals, O., and Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, volume 27, pages 3104–3112.
- Szegedy et al. (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1–9.
- Tjong Kim Sang et al. (2017) Tjong Kim Sang, E., Bollmann, M., Boschker, R., Casacuberta, F., Dietz, F., Dipper, S., Domingo, M., van der Goot, R., van Koppen, M., Ljubešić, N., Östling, R., Petran, F., Pettersson, E., Scherrer, Y., Schraagen, M., Sevens, L., Tiedemann, J., Vanallemeersch, T., and Zervanou, K. (2017). The CLIN27 shared task: Translating historical text to contemporary language for improving automatic linguistic annotation. Computational Linguistics in the Netherlands Journal, 7:53–64.
- Tomás and Casacuberta (2006) Tomás, J. and Casacuberta, F. (2006). Statistical phrase-based models for interactive computer-assisted translation. In Proceedings of the International Conference on Computational Linguistics/Association for Computational Linguistics, pages 835–841.
- Vaswani et al. (2017) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pages 5998–6008.
- Zens et al. (2002) Zens, R., Och, F. J., and Ney, H. (2002). Phrase-based statistical machine translation. In Proceedings of the Annual German Conference on Advances in Artificial Intelligence, volume 2479, pages 18–32.