A New Hybrid-parameter Recurrent Neural Networks for Online Handwritten Chinese Character Recognition

A New Hybrid-parameter Recurrent Neural Networks for Online Handwritten Chinese Character Recognition

Abstract

The recurrent neural network (RNN) is appropriate for dealing with temporal sequences. In this paper, we present a deep RNN with new features and apply it for online handwritten Chinese character recognition. Compared with the existing RNN models, three innovations are involved in the proposed system. First, a new hidden layer function for RNN is proposed for learning temporal information better. we call it Memory Pool Unit (MPU). The proposed MPU has a simple architecture. Second, a new RNN architecture with hybrid parameter is presented, in order to increasing the expression capacity of RNN. The proposed hybrid-parameter RNN has parameter changes when calculating the iteration at temporal dimension. Third, we make a adaptation that all the outputs of each layer are stacked as the output of network. Stacked hidden layer states combine all the hidden layer states for increasing the expression capacity. Experiments are carried out on the IAHCC-UCAS2016 dataset and the CASIA-OLHWDB1.1 dataset. The experimental results show that the hybrid-parameter RNN obtain a better recognition performance with higher efficiency (fewer parameters and faster speed). And the proposed Memory Pool Unit is proved to be a simple hidden layer function and obtains a competitive recognition results.

Introduction

Decades ago, RNN was proposed as a perception tool for sequence processing. With the widespread use of RNNs, many improvements have been obtained, particularly in avoiding vanishing gradient problem [\citeauthoryearBengio, Simard, and Frasconi1994]. LSTM [\citeauthoryearHochreiter and Schmidhuber1997] and GRU [\citeauthoryearCho et al.2014] are two famous and popular improvements of RNNs’ hidden layer function and have been applied in many tasks, such as speech recognition [\citeauthoryearHinton et al.2012], Natural Language Processing (NLP) [\citeauthoryearRuales2011], character recognition [\citeauthoryearMessina and J.Louradour2015], machine translation [\citeauthoryearCho et al.2014]. Alex Graves [\citeauthoryearGraves2013] proposed a sequence generator based on the RNNs equipped with LSTM. He and Tang  [\citeauthoryearHe et al.2016] applied LSTM-RNN to scene text recognition. Zhang et al. [\citeauthoryearXu-Yao Zhang2017] proposed a Chinese character recognizer based on RNNs equipped with LSTM and GRU. GRU was first proposed for machine translation by Bengio [\citeauthoryearCho et al.2014]. Compared with LSTM, GRU has a simpler structure and obtains competitive training results.

Despite the tremendous advances and successful applications, there still remain big challenges, particularly, the hidden layer function and network structure. The hidden layer function of RNNs is an important subject and needs to be explored in depth. Compared with GRU, LSTM has a more clear and reasonable workflow but more complex structure. Compared with LSTM, GRU has a simpler structure but slightly unclear workflow. Therefore, researchers need to explore whether there are other methods for avoiding vanishing gradient problem, rather than always apply the existing methods.


Figure 1: The Example of In-air Handwritten with the Leap Motion Sensor.

In-air hidden written is a new kind of human-computer interaction way. With some sensors (e.g., the Leap Motion sensor) and computers, people write in-air and computer recognizes what you write quickly. This kind of human-computer interaction way is shown in Fig. 1. Generally speaking, in-air handwritten Chinese characters recognition is more challenge than traditional handwritten Chinese character recognition (HCCR) [\citeauthoryearLiu, Jaeger, and Nakagawa2004, \citeauthoryearBai and Huo2005, \citeauthoryearOkamoto and Yamamoto1999]. First, each character has only one stroke without any mask of pen up-and-down. Second, in-air handwritten strokes would be more squiggly lines than handwritten strokes on touch screen. Fig. 2 gives some examples to distinguish the difference between IAHCC and HCC on touch screen.

(a) In-air handwritten Chinese characters
(b) Handwritten Chinese characters on touch screen
Figure 2: Comparison of HCC and IAHCC [\citeauthoryearRen et al.2017].

Since in-air handwritten is such an amazing way of human-computer interaction, many researchers explore in this filed. Qu et al. [\citeauthoryearQu et al.2015] presented a multi-stage classifier for IAHCCR, and their system achieved a relatively low accuracy. Qu et al. [\citeauthoryearQu et al.2016] presented a new feature representation to extend the power of 8-direction feature and applied it in IAHCC recognition. 8-direction-feature had been shown to be a discriminative feature in many works and achieved a good performance [\citeauthoryearBai and Huo2005, \citeauthoryearLiu et al.2013, \citeauthoryearJin et al.2010]. Ren et al. [\citeauthoryearRen et al.2017] proposed an end-to-end recognizer for OIAHCCR based on a new RNN. In Ren’s work, it is the first time for RNN to be used for OIAHCCR and to obtain a high recognition accuracy.

In the proposed system, three contributions are proposed to increase the recognition accuracy or the calculation speed of the in-air handwritten Chinese character RNN recognizer.

  • First, a new hidden layer function MPU is proposed. Compared with LSTM and GRU respectively, the proposed MPU has fewer parameters and a straightforward workflow. A series of experiments were carried out on IAHCC-UCAS2016 dataset. It is proved that the proposed hidden layer function (MPU) obtains a high recognition accuracy.

  • Second, a hybrid-parameter RNNs architecture is proposed. Compared with general RNNs and bidirectional RNNs correspondingly, the proposed hybrid-parameter RNNs architecture obtains competitive recognition accuracy with fewer parameters and faster calculation speed.

  • Third, when all the size of hidden layers of a RNN are same, we make a suggestion that all the outputs of each layer are stacked as the output of network. By synthesizing all the outputs of each layer, stacked method increases the recognition accuracy without increasing the parameters.


Figure 3: Hybrid Parameter RNNs.

Basic RNN System

Hybrid-Parameter RNNs

Calculation speed and parameter quantity are the two significant performance index for all the neural networks. However, designing an efficiency neural network with fewer parameters and faster calculation speed is a challenging research. Fig. 3 shows the proposed hybrid-parameter RNNs system and input data processing of input sample . The left part of Fig. 3 shows all the input character location sequence is divided into two parts. Then the new sequence of sample is generated as:

(1)

As shown in Eq. (1), the new sequence of sample is represented as a sequence with location dots. The network calculation process is shown in the right part of Fig. 3. To begin with, two sets of RNN parameters ( and , parameters in are initialized randomly and parameters in is initialized as zeros) are initialized with the same scale. Given a character sample , the new system input is a sequence of location dots, as shown in Eq. (1). When the RNN iterating on temperal dimension, the parameters are changed as follows. During the first time step’s iteration, the RNN calculates with parameters . Then the parameters change to when the RNN iterates the time steps from to . The RNN calculates the iteration of the last time step with the only parameters . For each time step of first time, the hidden layer states are computed by

(2)

where and denote the first and th hidden layer hidden layer function respectively. and denote the corresponding network parameters. denotes that there are hidden layers. For each time step of second time, the hidden layers states are computed by

(3)

where denotes the network parameters of the th layer during the second time step. For each time step of the last time, the hidden layers states are computed by

(4)

where denotes the network parameters of th layer during the last time steps. During the whole temporal iteration, both and are participated in the RNN calculation with all the location dots of the original sample . After all the time step iterations, hidden layer states are generated at the th layer, e.g., . Then the final output of the RNN is computed by

(5)
(6)
(7)

where denotes the output vector, and denote the weight matrix from and to the fully-connected layer (output layer). denotes the fully-connected layer bias vector. The sum operation (sum-pooling in Eq. (6) and Eq. (7)) proposed by Ren et al. [\citeauthoryearRen et al.2017]. Then a softmax regression is used on the fully-connected layer outputs for computing the class probability distribution. Since an in-air Chinese handwritten character generally corresponds to a long sequence of dot locations and the proposed network architecture involves five hidden layers, we add the skip connections [\citeauthoryearGraves2013] from the input layer to all hidden layers, and from all hidden layers to the fully-connected layer, to alleviate the vanishing gradient problems [\citeauthoryearBengio, Simard, and Frasconi1994].

Computing Speed and Parameters Quantity

General RNNs are computed with only one set of parameters during all the iteration at temperal dimension. The parameters are changed during iterating at temporal dimension in the proposed RNN with hybrid parameters. Experimental results show the proposed RNN structure obtains a higher recognition accuracy with a smaller hidden layer size. The smaller hidden layer size means the fewer parameters, e.g., a general RNN with five 256-size hidden layers could obtain a recognition accuracy of 92.6%. The recognition accuracy can not be increased when the hidden layer size increases. However, recognition accuracy would be decreased when the hidden layer size decreases. The proposed RNN with hybrid parameters could obtain a higher recognition accuracy of 92.9% with 128-size. The numerical relationship between the two kinds of RNN could be represented as:

(8)
(9)

where denotes the parameter quantity of the general RNN. denotes the hidden layer size of th hidden layer. denotes the parameters of all the weighted matrix of each layer inputs. denotes the parameters of all the weighted matrix from the hidden layer state to . And , , , have the similar definition for the proposed hybrid-parameter RNN. The multiplicator in Eq. (8) and Eq. (9) represents the number of state vectors in the hidden layer function. The hidden layer function used in RNN is GRU. Therefore, the number of state vectors is . The multiplicator in Eq. (9) represents there are two set of parameters with the same scale. From the Eq. (8) and Eq. (9) we can see that is as twice as the hybrid-parameter RNN and the general RNN have same hidden layer size.

Compared with the bidirectional RNN, the hybrid-parameter RNN is calculated with only time steps’ iteration, while the bidirectional RNN needs to be calculated with time steps’ iteration. Therefore, the proposed hybrid-parameter RNN has a faster computing speed than the bidirectional RNN and obtains a competitive result.

Learning Parameters

At the top of the RNN, a softmax layer is used to generate the probability distribution corresponding to 3873 character classes. To train the neural network, the following loss function is widely used and minimized, i.e.,

(10)
(11)

where is the total number of training patterns, and the function selects the true class of the example. denotes the corresponding output’s th element of the RNN given a pattern . The minimization of the loss function corresponds to maximizing the probability of correct classification of patterns in essence. Then the optimal parameters of the proposed system are obtained by constantly updating parameters based on the rmsprop [\citeauthoryearTieleman and Hinton2012] method, a form of stochastic gradient descent.

Memory Pool Unit


Figure 4: Memory Pool Unit.

Hidden layer function is significant for RNNs. LSTM and GRU are the two popular hidden layer functions used for many tasks. As a hidden layer function with a simpler architecture and fewer parameters, GRU obtains similar performance with LSTM. To make computation and implementation much simpler, memory state is wiped out in the GRU. However, hidden layer function with memory state make the whole hidden layer state calculation process more reasonable and convictive. we propose a new type of hidden layer function with memory state that has been motivated by the LSTM and GRU but is simpler than LSTM. In this section, we focus on describing the proposed hidden layer function Memory Pool Unit (MPU). MPU is based on a simple and more straightforward hidden layer state calculation process. As shown in Fig. 4, the core of the proposed hidden layer function is a memory pool. There are two gates in the architecture of MPU. One is input gate and the other is output gate. The memory pool stores a state vector, which is updated by the inputs with the limitation of input gate. With the control of output gate, the new hidden layer state is generated by using the updated memory pool state. The proposed MPU could be described as follows,

(12)
(13)
(14)
(15)

where and denote input gates and output gates. denotes the memory pool state. Eq. (13) shows that all the inputs of memory pool are restricted by the same input gate. The inputs include the last layer’s outputs, the network inputs and the hidden layer state of last time step (). The last layer’s outputs and the network inputs are represented by in equations from Eq. (12) to Eq. (14). As shown in Eq. (13), when the input gate is close to zero, the memory pool state is forced to ignore the previous inputs. Therefore, keeps its former state. For other situations, the memory pool state is updated by the inputs and its former state. The current hidden layer state is calculated using the memory pool state under the control of the output gate. The whole processing mechanism is like a pool with two valves and a self-renewal capacity. The input gate allows useful information to pour into the pool, then the pool state self-renews, and the useful information drains out through the output gate at last. The self-renewal capacity of memory pool state, as shown in Eq. (13), is the most important part of the MPU. On one side, the input gate restricts not only the last time step hidden layer state but also the current inputs. By restricting the hidden state of last time step, significant information is taken out for later computation and irrelevant information is ignored. The restriction to the current input is also important for our memory pool. Without the restriction, a great deal of useless information will be stacked in the pool when the RNNs are deep (to be exactly, without the restriction to current input, the training process of RNNs becomes difficult if there are more than two hidden layers). On the other side, the information of pool input and former memory pool state used to renew the memory pool state is complementary in a way. This kind of complementary makes better use of all the information and avoids irrespective information being stacked in the pool. Because all the inputs are restricted by the input gate, it is significant that the network inputs are used for compensating the information loss of the vertical direction. denotes the weighted matrix from input vector to input gate. , , , and have similar meanings. and are the biases of the corresponding gates.

Memory Pool Unit with Input Compensation

As network input compensation is significant for the proposed MPU, another method for compensating the information loss of the vertical direction is proposed. We change the Eq.(15) of last subsection into

(16)

where denotes the compensation weighted matrix of last layer’s outputs. With the compensation item in Eq.(16), network input is no longer necessary for each layer except the first layer.

Stacked Different Hidden Layer States

In the general RNNs, the neural network outputs are calculated by using only the last hidden layer state [\citeauthoryearXu-Yao Zhang2017] or all the hidden layers states [\citeauthoryearGraves2013]. It is better to calculate the outputs by using all the hidden layers states. Generally, the calculation of all the hidden layers states is always weighted sum, such as:

(17)

where denotes the weight matrix from th hidden layer to the full connection layer (output layer). The weight matrix brings a great amount of parameters, which have a bad effect on neural network perception. When all the hidden layers’ sizes are same, we make a adaptation that all the hidden layer states are stacked as the output of network. The calculation process could be described as Eq.(6) and Eq.(7). The hidden layer neurons are calculated by the neural network parameters, the parameters change during training process. Therefore, the attributes and function of neurons are relative rather than absolute. Due to neurons’ relativity, the sum calculation without weight matrix could achieve the similar goals with weighted sum calculation, which are proved as follows. Given the RNN output with weighted sum calculation :

(18)

and the network output could be represented as:

(19)

As shown in Eq.(19), the network output is a function expression of RNN parameter (, ) and full-layer weighted matrix parameter (, ). And the network output of RNN with stacked different hidden layer states is computed as:

(20)

and the network output could be represented as:

(21)

As shown in Eq.(21), the network output is a function expression of RNN parameter (, ) and full-layer weighted matrix parameter (, ). The full-layer weighted matrix parameter are mainly used for the integration of feature vectors learned by each layer. calculated by weighted sum means more parameters. Difficulties are brought by the parameter increase. In a sense, Eq. (19) and Eq. (21) are mathematically equivalent due to parameter variation. The sum calculation has obvious advantages during training our system. Although the sum and weighted sum calculation could ideally achieve the same training effect, fewer parameters of sum calculation bring faster training process and higher recognition accuracy. Compared with the outputs calculated by using only the last hidden layer state, outputs calculated by using sum calculation could even obtain a higher recognition accuracy without increasing the amount of parameters. The proposed sum calculation is an optimal hidden layers state process method.

Experimental Results

All the experiments are carried out on IAHCC-UCAS2016 dataset and CASIA-OLHWDB1.1 dataset. IAHCC-UCAS2016 is a dataset of in-air handwritten Chinese characters containing 3873 character classes in total. Concretely, it contains 3811 Chinese characters, 52 case-sensitive English letters, and 10 digits. Each character class has 116 samples. For each class, we choose 92 samples as training set, and the remaining 24 samples are used as testing set. 14 samples in the training set are randomly sampled to form the validation set. For further evaluation of the proposed methods, experiments are also carried out on CASIA-OLHWDB1.1 dataset. CASIA-OLHWDB1.1 dataset is a handwritten Chinese character dataset which contains 3755 characters from GB2312 written by 300 writers. The data from 240 persons are used for training, and the data from the remaining 60 persons are used for testing.

Method
General
RNNs
256
Hybrid-parameter
RNNs
128
Bidirectional
RNNs
128
General
RNNs
128
Paras Acc. Paras
speed
Sec/sample
Acc. Paras
speed
Sec/sample
Acc. Acc.
#1.
2 h_layers
1.58mil 92.2% 0.79mil 0.00127 92.7% 0.79mil 0.00143 92.7% 90.6%
3 h_layers
1.97mil 92.4% 0.98mil 0.00170 92.9% 0.98mil 0.00202 92.8% 91.4%
4 h_layers
2.37mil 92.6% 1.18mil 0.00220 92.9% 1.18mil 0.00270 92.9% 91.5%
5 h_layers
2.76mil 92.6% 1.38mil 0.00284 92.9% 1.38mil 0.00330 92.9% 91.5%
#2.
2 h_layers
1.55mil 95.6% 0.77mil 0.00113 96.3% 0.77mil 0.00136 96.1% N/A
5 h_layers
2.74mil 95.7% 1.37mil 0.00231 96.5% 1.37mil 0.00314 96.4% N/A
256,128 denotes the hidden layer size of corresponding RNNs.
h_layers denotes hidden layers.
#1. and #2. denote experiments carried out on the IAHCC-UCAS2016 dataset and CASIA-OLHWDB1.1 dataset respectively.
Table 1: Recognition accuracy of different methods

Data Preprocessing

the location sequence of a handwritten character is denoted by: , where the denotes th location dot vector, the could be represented as , where and denote th location dot’s coordinate values respectively. Before being fed into our system, the data is preprocessed by two steps. Concretely, (1) and of all the locations are scaled to a range of 0 to 64; (2) the dot locations are further normalized so that their mean equals to zero, i.e., the new coordinate is obtained by , , where and denote the means of the corresponding coordinates of all the dot locations.

Network Configuration

In our experiments, the inputs of the RNN networks are 2-dimensional or 3-dimensional temporal sequences, corresponding to coordinates of dot locations on writing trajectory of in-air handwritten character and handwritten character with pen up-and-down mask on touching scren. The number of neurons in the hidden layers are all 256 except some systems in Table 1. The dropout values are all set to 0.6 and mini-batch size is 256. All the computation is performed by GPUs on TESLA K10.

Performance Evaluation of Hybrid-parameters RNNs

The parameters of the proposed hybrid-parameter RNNs are changed during the iteration at temperal dimension. The proposed hybrid-parameter RNNs system could obtain a higher recognition accuracy with fewer parameters than the general RNN. Compared with bidirectional RNNs, the proposed hybrid-parameter RNN has a faster calculation speed. The experiments are carried out on IAHCC-UCAS2016 dataset.

As shown in Table 1, the proposed hybrid-parameter RNNs with five 128-size hidden layers obtain a higher recognition accuracy than the genral RNN with five 256-size hidden layers. However, the parameter quantity of proposed hybrid-parameter RNNs are as half as the general RNNs. Compared with bidirectional RNNs with calculation speed of 0.00127 second/sample, 0.00170 second/sample, 0.00220 second/sample, and 0.00220 second/sample, the proposed hybrid-parameter RNN obtains competitive recognition accuracy with faster calculation speed of 0.00143 second/sample, 0.00202 second/sample, 0.00270 second/sample and 0.00330 second/sample corresponding four kinds of hidden layer size. The calculation time is shortened by about 15%.

For further evaluations, experiments are also carried out on CASIA-OLHWDB1.1 dataset. In the two different architectures of RNNs containing 2 or 5 hidden layers, compared with the general RNNs, the hybrid-parameter RNNs bring parameter decrease of 50% from 1.55mil to 0.77mil , and 50% from 2.74mil to 1.37mil respectively and obtain high recognition results. Compared with the bidirectional RNNs, the the hybrid-parameter RNNs bring calculation speed increase of 18% from 0.00136 second/sample to 0.00113 second/sample, and 26% from 0.00314 second/sample to 0.00231 second/sample respectively and obtain competitive results.

Performance Evaluation of Memory Pool Unit

Methods
GRU
LSTM
MPU
MPU&C
2 h_layers
92.2%
91.6%
92.5%
92.5%
3 h_layers
92.4%
92.0%
92.7%
92.7%
4 h_layers
92.6%
92.2%
92.7%
92.7%
5 h_layers
92.6%
92.2%
92.8%
92.8%
Table 2: Recognition accuracy of different methods

The proposed MPU is applied to in-air handwritten Chinese character recognition. The main part of the proposed method is a memory pool which could learn the useful temporal information. And the core of the memory pool is the renewal function. Input compensation is important for network training and two methods are proposed for input compensation. One is that the network inputs are contained in the inputs of each layer. The other method is using the memory pool outputs and last layer outputs to compute the current hidden layer state.

To evaluate the performance of the MPU, experiments are carried out on IAHCC-UCAS2016 dataset. The RNN structures used in these experiments are the general RNNs equipped with GRU ,LSTM, MPU and MPU with input compensation respectively. As shown in Table 2, the MPU and MPU with input compensation methods obtain a competitive recognition accuracy. Compared with LSTM, the proposed method has fewer parameters and a simpler structure. Compared with GRU, the whole hidden layer state calculation process are more reasonable and convictive.

For further performance evaluation of the MPU, another experiment was carried out on CASIA-OLHWDB1.1 dataset [\citeauthoryearLiu et al.2011]. The data from 36 persons in training set are randomly sampled to form the validation set. As shown in Table 3, in the two different architectures of RNNs containing 2 or 5 hidden layers, compared with the exsiting structure LSTM, the RNNs equipped with MPU obtains a higher recognition results.

Methods
MPU&C
GRU
LSTM
2 h_layers
95.9%
95.6%
95.3%
5 h_layers
96.1%
95.7%
95.4%
Table 3: Recognition accuracy of different methods

Performance Evaluation of Stacked Different Hidden Layer States

Methods
with
stacked
without
stacked
synthesize with
weighted matrix
2 h_layers
92.6%
92.2%
90.2%
3 h_layers
92.8%
92.4%
91.2%
4 h_layers
92.8⁢%
92.6%
91.8%
5 h_layers
92.8%
92.6%
92.0%
Table 4: Recognition accuracy of different methods

For RNNs with same hidden layer size in each layer, we make a suggestion that all the outputs of each layer are stacked as the output of network. By synthesizing all the outputs of each layer, Stacked method increases the recognition accuracy without increasing the parameters.

To evaluate the performance of our proposed stack method, experiments are carried out on IAHCC-UCAS2016 dataset. The RNNs structure used in the four experiments are the general RNNs equipped with GRU. The experimental results are shown in Table 4. The experimental results show that the proposed stack method could bring a recognition accuracy increase.

Recognition Accuracy Comparison between Ours and the State-of-arts Method

Based on the constructed RNNs in Table 1 and Table 2, we construct an ensemble classifier containing all the networks. Concretely, the output of the ensemble classifier is the sum of outputs of child RNNs, i.e., , where denotes the output of a network in Table 1 and Table 2, denotes N networks in Table 1 and Table 2.

Since the accuracy in [\citeauthoryearQu et al.2016] is obtained only for the 3811 Chinese character classes, the experimental results in Table 4 are also reported on the 3811 Chinese characters for the sake of fair comparison. From Table 5, we can see the ensemble classifier can achieve the recognition accuracy of 93.7%. The proposed classifiers outperform the state-of-the-art results [\citeauthoryearQu et al.2016, \citeauthoryearRen et al.2017].

Method
Ours Ensemble
Method#1.
Method#2.
Acc.
93.7% 93.4% 91.8%
Method#1. [\citeauthoryearRen et al.2017],Method#2. [\citeauthoryearQu et al.2016]
Table 5: Comparison of recognition accuracy between ours and the state-of-the-art method [\citeauthoryearQu et al.2016]

Conclusion

This paper presents an end-to-end recognizer for online in-air handwritten Chinese characters by using recurrent neural networks (RNN) and it has obtained competitive performance compared with the state-of-the-art methods [\citeauthoryearRen et al.2017, \citeauthoryearQu et al.2016]. The merit of the proposed method is that it does not need the explicit feature representation in modeling the classifier. To make the classic RNN work better for online IAHCCR, Three mechanisms are proposed, i.e., the Menory Pool method, hybrid-parameters RNNs, stacked different hidden Layer States. The experimental results show that all the mechanisms can effectively promote the classification performance of RNN.

References

  1. Bai, Z.-L., and Huo, Q. 2005. A study on the use of 8-directional features for online handwritten chinese character recognition. 2005 International Conference on Document Analysis and Recognition, 262–266.
  2. Bengio, Y.; Simard, P.; and Frasconi, P. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166.
  3. Cho, K.; Merrienboer, B. V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. Computer Science.
  4. Graves, A. 2013. Generating sequences with recurrent neural networks. Computer Science,.
  5. He, P.; Huang, W.; Qiao, Y.; Chen, C. L.; and Tang, X. 2016. Reading scene text in deep convolutional sequences. The Association for the Advancement of Artificial Intelligence, 3501–3508.
  6. Hinton, G. E.; Deng, L.; Yu, D.; Dahl, G. E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; and Sainath, T. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6):82–97.
  7. Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Computation, 9(8):1735–1780.
  8. Jin, L.-W.; Gao, Y.; Liu, G.; Li, Y.-Y.; and Ding, K. 2010. Scut-couch2009-a comprehensive online unconstrained chinese handwriting database and benchmark evaluation. International Journal on Document Analysis and Recognition, 14(1):53–64.
  9. Liu, C.-L.; Yin, F.; Wang, D.-H.; and Wang, Q.-F. 2011. Casia online and offline chinese handwriting databases. 2011 International Conference on Document Analysis and Recognition (ICDAR), 37–41.
  10. Liu, C.-L.; Yin, F.; Wang, D.-H.; and Wang, Q.-F. 2013. Online and offline handwritten chinese character recognition: Benchmarking on new databases. Pattern Recognition, 46(1):155–162.
  11. Liu, C.-L.; Jaeger, S.; and Nakagawa, M. 2004. Online recognition of chinese characters: the state-of-the-art. Pattern Analysis and Machine Intelligence 26(2):198–213.
  12. Messina, R., and J.Louradour. 2015. Segmentation-free handwritten chinese text recognition with lstm-rnn. International Conference on Document Analysis and Recognition, 171–175.
  13. Okamoto, M., and Yamamoto, K. 1999. On-line handwriting character recognition using direction-change features that consider imaginary strokes. Pattern Recognition, 32(7):1115–1128.
  14. Qu, X.-W.; Wang, W.-Q.; Lu, K.; and Xu, N. 2015. In-air handwritten chinese character recognition using multi-stage classifier based on adaptive discriminative locality alignment. 2015 IEEE International Conference on Image Processing (ICIP), 4793 – 4797.
  15. Qu, X.-W.; Wang, W.-Q.; Lu, K.; and Ji, Z.-J. 2016. High-order directional features and sparse representation based classification for in-air handwritten chinese character recognition. International Conference on Multimedia and Expo, 1–6.
  16. Ren, H.; Wang, W.; Lu, K.; Zhouand, J.; and Yuan, Q. 2017. An end-to-end recognizer for in-air handwritten chinese characters based on a new recurrent neural networks. International Conference on Multimedia and Expo, 1–6.
  17. Ruales, J. 2011. Recurrent neural networks for sentiment analysis.
  18. Tieleman, T., and Hinton, G. E. 2012. Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning,.
  19. Xu-Yao Zhang, Fei Yin, Y.-M. Z. C.-L. L. Y. B. 2017. Drawing and recognizing chinese characters with recurrent neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99):1–14.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
54945
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description