CITlab ARGUS for historical handwritten documents

CITlab ARGUS for historical handwritten documents

Gundram Leifert    Tobias Strauß    Tobias Grüning    Roger Labahn corresponding author; CITlab, Institute of Mathematics, University of Rostock, Germany
{gundram.leifert, tobias.strauss, tobias.gruening, roger.labahn}@uni-rostock.de
April 15, 2015
Abstract

We describe CITlab’s recognition system for the HTRtS competition attached to the 13. International Conference on Document Analysis and Recognition, ICDAR 2015. The task comprises the recognition of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET’s ARGUS framework for intelligent text recognition and image processing.

MDRNN, LSTM, CTC, handwriting recognition, neural network
\KOMAoption

headingssmall \KOMAoptionsparskip=half \iheadCITlab \oheadHTRtS-2015 \automark[subsection]subsection

\subtitle

Description of CITlab’s System for the HTRtS 2015 Task : Handwritten Text Recognition on the tranScriptorium Dataset

We describe CITlab’s recognition system for the HTRtS competition attached to the 13. International Conference on Document Analysis and Recognition, ICDAR 2015. The task comprises the recognition of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET’s ARGUS framework for intelligent text recognition and image processing.

1 Introduction

The International Conference on Document Analysis and Recognition, ICDAR 2015111http://2015.icdar.org, hosts a variety of competitions in that area. Among others, the Handwritten Text Recognition on the tranScriptorium Dataset (HTRtS) competition attracted our attention because we expected CITlab’s handwriting recognition software to be able to successfully deal with the respective task.

HTRtS222http://transcriptorium.eu/~htrcontest comprises a task of word recognition for segmented historical documents, see [SRTV14] for all further details. These data consist of page images taken from the Bentham collection, a well-known transScriptorium project dataset.

Our neural networks have basically been used previously in the international handwriting competition OpenHaRT 2013 attached to the ICDAR 2013 conference, see [LLS13]. Moreover, with a system very similar to the one presented here, the CITlab team also took part in ICFHR’s ANWRESH-2014 competition on historical data tables, see [LGSL14] for the according system description.

Affiliated with the Institute of Mathematics at the University of Rostock, CITlab333http://www.citlab.uni-rostock.de hosts joint projects of the Mathematical Optimization Group and PLANET intelligent systems GmbH444http://www.planet.de, a small/medium enterprise focusing on computational intelligence technology and applications. The work presented here is part of a common text recognition project 2014 – 2016 and is extensively based upon PLANET’s ARGUS software modules and the respective framework for development, testing and training.

2 Short Description

Remark 1.

This short description is intended for the HTRtS-2015 organizers’ information. Here we also explain the abbreviations used in the web form when submitting CITlab ARGUS’s recognition result files.

Please cite this now as:
private communication, extended version to be published after ICDAR 2015
.

This draft is preliminary in the sense that it will be further extended to a full paper version. It will be published after the ICDAR 2015 conference when the official final evaluation results are public.

2.1 Overview

Altogether, CITlab submits the recognition / transcription results generated by 14 moderately different systems. While they all mainly rely on our traditional, recurrent neural network based recognition engine ARGUS, the 14 variations arise from combining 2 training schemes, trn-1 / trn-2, with 7 decoding schemes, dec-BP / dec-CE / dec-DM and dec-E[2|3|4|5]. Note that these scheme orderings, suggested by the lexicographic ordering of the respective labels, also reflect both increasing complexity of the schemes, and expected improved quality for the handwritten text recognition task.

2.2 Basic Scheme

For the general approach, we may briefly refer to previous CITlab system descriptions [LLS13, LGSL14, SGLL14] because the overall scheme has essentially not been changed.

2.3 Preprocessing

We worked on line polygon images, see 2.4.1 for further explanation of the data. Firstly it were applied certain standard preprocessing routines, i.e.

  • image normalization: contrast, size;

  • writing normalization: line bends, line slope, script slant.

Then, images were further unified by CITlab’s proprietary writing normalization, thus ensuring a fixed 96px image height with the writing’s main body part appropriately placed into and stretched to cover the essential central part of the line image. These were finally the input images for the subsequent processing with Recurrent Neural Networks (RNN).

2.3.1 Recurrent Neural Network

The resulting line images were fed into the engine’s first core component which we call a Sequence Processing Recurrent Neural Network (SPRNN). Note that we processed entire line images with no further segmentation.

The SPRNN’s output then consists of a certain number of vectors. This number is related to the line length because every vector contains information about a particular image position. More precisely, the entries are understood as to estimate the probabilities of every alphabet character at the position under consideration. Hence, the vector lengths all equal the alphabet size, and putting all vectors together leads to the so-called confidence matrix. This is the intrinsic recognition result which will subsequently be used for the decoding.

Note further that, for HTRtS-2015, we worked with the alphabet containing

  • all digits, lowercase and uppercase letters of the standard latin alphabet

  • special characters /&£§+-\_.,:;!?’"=[]() and , whereby different types of quotation marks and hyphens were mapped to one of the respective symbols.

Finally, the above alphabet is augmented by an artificial, non-character symbol, which we denote by NaC. In particular, it may be used to detect character boundaries because, generally speaking, our SPRNNs emit high NaC confidences in uncertain situations.

2.4 Training Schemes

CITlab only participates in the Restricted Track of HTRtS-2015, i.e. for training and testing our systems, we exclusively used data provided within the contest:

2.4.1 Training Data

trn-1

consists of all 1stBatch line polygons, i.e. images of 10 491 line polygons from 433 pages.

trn-2

incorporates trn-1 and all 2ndBatch page images: additional 313 pages, for which the line polygons where not available. Using proprietary CITlab tools we extracted 3 968 more line polygons, s.t. altogether, trn-2 finally contained 14 479 training samples.

Note in particular, that from the data provided in HTRtS-2015, we did not use the line images itself because those covered more distortions between adjacent text lines.

2.4.2 Network Training

In both training schemes, various networks have been trained similarly: The number of training epochs slightly varied between 50 and 60, and the decrease of the learning rates was chosen correspondingly. Moreover, different tries differ in certain hyper-parameters (number of neurons, subsampling rate) and random choices of the initial values for weights that were then optimized by gradient descent procedures.

Out of a larger number of tries, finally 10 networks have been chosen by monitoring the training success on a validation data set which, due to the lack of separate data, was selected from the available training data, see 2.4.1. Note that the same approach has been used for ranking the 10 final nets in order to choose the best and certain committees, see 2.5 for details.

2.5 Decoding Schemes

2.5.1 Dec-BP: Best Path decoding

For decoding the confidence matrix, one starts with the sequence of the most confident character per matrix vector. But in order to get a proper character string over the given alphabet, then two basic transformations have to be applied:

  1. Replace repeated occurrences of the same character by just one!

  2. Delete all NaC symbols!

Note first that, due to the order of accomplishing these operations, the special NaC symbol serves for distinguishing between proper character repetition vs. just repeatedly seeing the same character while traversing the line image.

Note also that these operations are commonly applied in all decoding schemes! Thus in the following, we know how to proceed from a character sequence from (or path through) the confidence matrix to a valid string interpretation as a required recognition result.

2.5.2 Dec-CE: CITlab Expression decoding

The details of this decoding developed at CITlab will be presented in upcoming publications. Basically it tries to find the most confident string subject to additional restrictions on the internal structure of valid result strings. In HTRtS-2015, the decoded string should be build from expressions which, e.g., look like usual words, have punctuation marks attached to word expressions, have sentences beginning with capital letters … But note particularly, that this decoding scheme only considers expression syntax – it does not yet incorporate a dictionary!

2.5.3 Dec-DM: Dictionary Model decoding

At this next stage, we include a rather simple language model into the decoding scheme: We try to find the most confident string transcription which belongs to a dictionary. Moreover, besides the string confidences from the recognition result itself, also word frequencies are taken into consideration. For HTRtS-2015, the dictionary with word frequencies was extracted from the available training data.

2.5.4 Dec-E<n>: Experts Committee decoding

The above Dec-DM scheme is further extended by simultaneously processing the network output of different SPRNNs. These were choosen by descending recognition quality on the validation dataset, see 2.4.2. For coming to the committee decision, we followed the algorithm proposed in [Fis97]. In HTRtS-2015, we submitted four systems with this decoding scheme type, namely for .

Acknowledgement

First of all, the CITlab team really wishes to express its great gratitude to our long-term technology & development partner PLANET intelligent systems GmbH (Raben Steinfeld, Germany) for the extremely valuable, ongoing support in every aspect of this work. Participating in HTRtS-2015 would not have been possible without that! In particular, we continued using PLANET’s software world which was developed and essentially improved in various common CITlab–PLANET projects over previous years.

From PLANET’s side, our activities were essentially supported by Jesper Kleinjohann, whom we especially thank for ongoing very helpful discussions and his continuous development support.

Being part of our current research & development collaboration project, this work was funded by grant no. KF2622304SS3 (Kooperationsprojekt) in Zentrales Innovationsprogramm Mittelstand (ZIM) by Bundesrepublik Deutschland (BMWi).

Finally, we are indebted to the HTRtS organizers from the PRHLT group at UPV – in particular Joan Andreu Sánchez – for setting up this evaluation and the contest as well as the entire tranScriptorium project for providing all the data.

References

  • [Fis97] J.G. Fiscus. A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (rover). In IEEE Workshop on Automatic Speech Recognition and Understanding, 1997.
  • [LGSL14] Gundram Leifert, Tobias Grüning, Tobias Strauß, and Roger Labahn. CITlab ARGUS for historical data tables: Description of CITlab’s system for the ANWRESH-2014 Word Recognition task. Technical Report 2014/1, Universität Rostock, April 2014.
  • [LLS13] Gundram Leifert, Roger Labahn, and Tobias Strauß. CITlab ARGUS for arabic handwriting: Description of CITlab’s system for the OpenHaRT 2013 Document Image Recognition task. In Proceedings of the NIST 2013 OpenHaRT Workshop [Online], August 2013. Available: http://www.nist.gov/itl/iad/mig/hart2013_wrkshp.cfm.
  • [SGLL14] Tobias Strauß, Tobias Grüning, Gundram Leifert, and Roger Labahn. CITlab ARGUS for historical handwritten documents: Description of CITlab’s system for the HTRtS 2014 Handwritten Text Recognition task. Technical Report 2014/2, Universität Rostock, April 2014.
  • [SRTV14] Joan Andreu Sánchez, Verónica Romero, Alejandro H. Toselli, and Enrique Vidal. ICFHR2014 Competition on Handwritten Text Recognition on tranScriptorium Datasets (HTRtS). In Proceedings of the International Conference on Frontiers in Handwriting Recognition – ICFHR 2014, August 2014.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
200036
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description