A New Hybridparameter Recurrent Neural Networks for Online Handwritten Chinese Character Recognition
Abstract
The recurrent neural network (RNN) is appropriate for dealing with temporal sequences. In this paper, we present a deep RNN with new features and apply it for online handwritten Chinese character recognition. Compared with the existing RNN models, three innovations are involved in the proposed system. First, a new hidden layer function for RNN is proposed for learning temporal information better. we call it Memory Pool Unit (MPU). The proposed MPU has a simple architecture. Second, a new RNN architecture with hybrid parameter is presented, in order to increasing the expression capacity of RNN. The proposed hybridparameter RNN has parameter changes when calculating the iteration at temporal dimension. Third, we make a adaptation that all the outputs of each layer are stacked as the output of network. Stacked hidden layer states combine all the hidden layer states for increasing the expression capacity. Experiments are carried out on the IAHCCUCAS2016 dataset and the CASIAOLHWDB1.1 dataset. The experimental results show that the hybridparameter RNN obtain a better recognition performance with higher efficiency (fewer parameters and faster speed). And the proposed Memory Pool Unit is proved to be a simple hidden layer function and obtains a competitive recognition results.
Introduction
Decades ago, RNN was proposed as a perception tool for sequence processing. With the widespread use of RNNs, many improvements have been obtained, particularly in avoiding vanishing gradient problem [\citeauthoryearBengio, Simard, and Frasconi1994]. LSTM [\citeauthoryearHochreiter and Schmidhuber1997] and GRU [\citeauthoryearCho et al.2014] are two famous and popular improvements of RNNs’ hidden layer function and have been applied in many tasks, such as speech recognition [\citeauthoryearHinton et al.2012], Natural Language Processing (NLP) [\citeauthoryearRuales2011], character recognition [\citeauthoryearMessina and J.Louradour2015], machine translation [\citeauthoryearCho et al.2014]. Alex Graves [\citeauthoryearGraves2013] proposed a sequence generator based on the RNNs equipped with LSTM. He and Tang [\citeauthoryearHe et al.2016] applied LSTMRNN to scene text recognition. Zhang et al. [\citeauthoryearXuYao Zhang2017] proposed a Chinese character recognizer based on RNNs equipped with LSTM and GRU. GRU was first proposed for machine translation by Bengio [\citeauthoryearCho et al.2014]. Compared with LSTM, GRU has a simpler structure and obtains competitive training results.
Despite the tremendous advances and successful applications, there still remain big challenges, particularly, the hidden layer function and network structure. The hidden layer function of RNNs is an important subject and needs to be explored in depth. Compared with GRU, LSTM has a more clear and reasonable workflow but more complex structure. Compared with LSTM, GRU has a simpler structure but slightly unclear workflow. Therefore, researchers need to explore whether there are other methods for avoiding vanishing gradient problem, rather than always apply the existing methods.
Inair hidden written is a new kind of humancomputer interaction way. With some sensors (e.g., the Leap Motion sensor) and computers, people write inair and computer recognizes what you write quickly. This kind of humancomputer interaction way is shown in Fig. 1. Generally speaking, inair handwritten Chinese characters recognition is more challenge than traditional handwritten Chinese character recognition (HCCR) [\citeauthoryearLiu, Jaeger, and Nakagawa2004, \citeauthoryearBai and Huo2005, \citeauthoryearOkamoto and Yamamoto1999]. First, each character has only one stroke without any mask of pen upanddown. Second, inair handwritten strokes would be more squiggly lines than handwritten strokes on touch screen. Fig. 2 gives some examples to distinguish the difference between IAHCC and HCC on touch screen.
Since inair handwritten is such an amazing way of humancomputer interaction, many researchers explore in this filed. Qu et al. [\citeauthoryearQu et al.2015] presented a multistage classifier for IAHCCR, and their system achieved a relatively low accuracy. Qu et al. [\citeauthoryearQu et al.2016] presented a new feature representation to extend the power of 8direction feature and applied it in IAHCC recognition. 8directionfeature had been shown to be a discriminative feature in many works and achieved a good performance [\citeauthoryearBai and Huo2005, \citeauthoryearLiu et al.2013, \citeauthoryearJin et al.2010]. Ren et al. [\citeauthoryearRen et al.2017] proposed an endtoend recognizer for OIAHCCR based on a new RNN. In Ren’s work, it is the first time for RNN to be used for OIAHCCR and to obtain a high recognition accuracy.
In the proposed system, three contributions are proposed to increase the recognition accuracy or the calculation speed of the inair handwritten Chinese character RNN recognizer.

First, a new hidden layer function MPU is proposed. Compared with LSTM and GRU respectively, the proposed MPU has fewer parameters and a straightforward workflow. A series of experiments were carried out on IAHCCUCAS2016 dataset. It is proved that the proposed hidden layer function (MPU) obtains a high recognition accuracy.

Second, a hybridparameter RNNs architecture is proposed. Compared with general RNNs and bidirectional RNNs correspondingly, the proposed hybridparameter RNNs architecture obtains competitive recognition accuracy with fewer parameters and faster calculation speed.

Third, when all the size of hidden layers of a RNN are same, we make a suggestion that all the outputs of each layer are stacked as the output of network. By synthesizing all the outputs of each layer, stacked method increases the recognition accuracy without increasing the parameters.
Basic RNN System
HybridParameter RNNs
Calculation speed and parameter quantity are the two significant performance index for all the neural networks. However, designing an efficiency neural network with fewer parameters and faster calculation speed is a challenging research. Fig. 3 shows the proposed hybridparameter RNNs system and input data processing of input sample . The left part of Fig. 3 shows all the input character location sequence is divided into two parts. Then the new sequence of sample is generated as:
(1) 
As shown in Eq. (1), the new sequence of sample is represented as a sequence with location dots. The network calculation process is shown in the right part of Fig. 3. To begin with, two sets of RNN parameters ( and , parameters in are initialized randomly and parameters in is initialized as zeros) are initialized with the same scale. Given a character sample , the new system input is a sequence of location dots, as shown in Eq. (1). When the RNN iterating on temperal dimension, the parameters are changed as follows. During the first time step’s iteration, the RNN calculates with parameters . Then the parameters change to when the RNN iterates the time steps from to . The RNN calculates the iteration of the last time step with the only parameters . For each time step of first time, the hidden layer states are computed by
(2) 
where and denote the first and th hidden layer hidden layer function respectively. and denote the corresponding network parameters. denotes that there are hidden layers. For each time step of second time, the hidden layers states are computed by
(3) 
where denotes the network parameters of the th layer during the second time step. For each time step of the last time, the hidden layers states are computed by
(4) 
where denotes the network parameters of th layer during the last time steps. During the whole temporal iteration, both and are participated in the RNN calculation with all the location dots of the original sample . After all the time step iterations, hidden layer states are generated at the th layer, e.g., . Then the final output of the RNN is computed by
(5) 
(6) 
(7) 
where denotes the output vector, and denote the weight matrix from and to the fullyconnected layer (output layer). denotes the fullyconnected layer bias vector. The sum operation (sumpooling in Eq. (6) and Eq. (7)) proposed by Ren et al. [\citeauthoryearRen et al.2017]. Then a softmax regression is used on the fullyconnected layer outputs for computing the class probability distribution. Since an inair Chinese handwritten character generally corresponds to a long sequence of dot locations and the proposed network architecture involves five hidden layers, we add the skip connections [\citeauthoryearGraves2013] from the input layer to all hidden layers, and from all hidden layers to the fullyconnected layer, to alleviate the vanishing gradient problems [\citeauthoryearBengio, Simard, and Frasconi1994].
Computing Speed and Parameters Quantity
General RNNs are computed with only one set of parameters during all the iteration at temperal dimension. The parameters are changed during iterating at temporal dimension in the proposed RNN with hybrid parameters. Experimental results show the proposed RNN structure obtains a higher recognition accuracy with a smaller hidden layer size. The smaller hidden layer size means the fewer parameters, e.g., a general RNN with five 256size hidden layers could obtain a recognition accuracy of 92.6%. The recognition accuracy can not be increased when the hidden layer size increases. However, recognition accuracy would be decreased when the hidden layer size decreases. The proposed RNN with hybrid parameters could obtain a higher recognition accuracy of 92.9% with 128size. The numerical relationship between the two kinds of RNN could be represented as:
(8) 
(9) 
where denotes the parameter quantity of the general RNN. denotes the hidden layer size of th hidden layer. denotes the parameters of all the weighted matrix of each layer inputs. denotes the parameters of all the weighted matrix from the hidden layer state to . And , , , have the similar definition for the proposed hybridparameter RNN. The multiplicator in Eq. (8) and Eq. (9) represents the number of state vectors in the hidden layer function. The hidden layer function used in RNN is GRU. Therefore, the number of state vectors is . The multiplicator in Eq. (9) represents there are two set of parameters with the same scale. From the Eq. (8) and Eq. (9) we can see that is as twice as the hybridparameter RNN and the general RNN have same hidden layer size.
Compared with the bidirectional RNN, the hybridparameter RNN is calculated with only time steps’ iteration, while the bidirectional RNN needs to be calculated with time steps’ iteration. Therefore, the proposed hybridparameter RNN has a faster computing speed than the bidirectional RNN and obtains a competitive result.
Learning Parameters
At the top of the RNN, a softmax layer is used to generate the probability distribution corresponding to 3873 character classes. To train the neural network, the following loss function is widely used and minimized, i.e.,
(10) 
(11) 
where is the total number of training patterns, and the function selects the true class of the example. denotes the corresponding output’s th element of the RNN given a pattern . The minimization of the loss function corresponds to maximizing the probability of correct classification of patterns in essence. Then the optimal parameters of the proposed system are obtained by constantly updating parameters based on the rmsprop [\citeauthoryearTieleman and Hinton2012] method, a form of stochastic gradient descent.
Memory Pool Unit
Hidden layer function is significant for RNNs. LSTM and GRU are the two popular hidden layer functions used for many tasks. As a hidden layer function with a simpler architecture and fewer parameters, GRU obtains similar performance with LSTM. To make computation and implementation much simpler, memory state is wiped out in the GRU. However, hidden layer function with memory state make the whole hidden layer state calculation process more reasonable and convictive. we propose a new type of hidden layer function with memory state that has been motivated by the LSTM and GRU but is simpler than LSTM. In this section, we focus on describing the proposed hidden layer function Memory Pool Unit (MPU). MPU is based on a simple and more straightforward hidden layer state calculation process. As shown in Fig. 4, the core of the proposed hidden layer function is a memory pool. There are two gates in the architecture of MPU. One is input gate and the other is output gate. The memory pool stores a state vector, which is updated by the inputs with the limitation of input gate. With the control of output gate, the new hidden layer state is generated by using the updated memory pool state. The proposed MPU could be described as follows,
(12) 
(13) 
(14) 
(15) 
where and denote input gates and output gates. denotes the memory pool state. Eq. (13) shows that all the inputs of memory pool are restricted by the same input gate. The inputs include the last layer’s outputs, the network inputs and the hidden layer state of last time step (). The last layer’s outputs and the network inputs are represented by in equations from Eq. (12) to Eq. (14). As shown in Eq. (13), when the input gate is close to zero, the memory pool state is forced to ignore the previous inputs. Therefore, keeps its former state. For other situations, the memory pool state is updated by the inputs and its former state. The current hidden layer state is calculated using the memory pool state under the control of the output gate. The whole processing mechanism is like a pool with two valves and a selfrenewal capacity. The input gate allows useful information to pour into the pool, then the pool state selfrenews, and the useful information drains out through the output gate at last. The selfrenewal capacity of memory pool state, as shown in Eq. (13), is the most important part of the MPU. On one side, the input gate restricts not only the last time step hidden layer state but also the current inputs. By restricting the hidden state of last time step, significant information is taken out for later computation and irrelevant information is ignored. The restriction to the current input is also important for our memory pool. Without the restriction, a great deal of useless information will be stacked in the pool when the RNNs are deep (to be exactly, without the restriction to current input, the training process of RNNs becomes difficult if there are more than two hidden layers). On the other side, the information of pool input and former memory pool state used to renew the memory pool state is complementary in a way. This kind of complementary makes better use of all the information and avoids irrespective information being stacked in the pool. Because all the inputs are restricted by the input gate, it is significant that the network inputs are used for compensating the information loss of the vertical direction. denotes the weighted matrix from input vector to input gate. , , , and have similar meanings. and are the biases of the corresponding gates.
Memory Pool Unit with Input Compensation
As network input compensation is significant for the proposed MPU, another method for compensating the information loss of the vertical direction is proposed. We change the Eq.(15) of last subsection into
(16) 
where denotes the compensation weighted matrix of last layer’s outputs. With the compensation item in Eq.(16), network input is no longer necessary for each layer except the first layer.
Stacked Different Hidden Layer States
In the general RNNs, the neural network outputs are calculated by using only the last hidden layer state [\citeauthoryearXuYao Zhang2017] or all the hidden layers states [\citeauthoryearGraves2013]. It is better to calculate the outputs by using all the hidden layers states. Generally, the calculation of all the hidden layers states is always weighted sum, such as:
(17) 
where denotes the weight matrix from th hidden layer to the full connection layer (output layer). The weight matrix brings a great amount of parameters, which have a bad effect on neural network perception. When all the hidden layers’ sizes are same, we make a adaptation that all the hidden layer states are stacked as the output of network. The calculation process could be described as Eq.(6) and Eq.(7). The hidden layer neurons are calculated by the neural network parameters, the parameters change during training process. Therefore, the attributes and function of neurons are relative rather than absolute. Due to neurons’ relativity, the sum calculation without weight matrix could achieve the similar goals with weighted sum calculation, which are proved as follows. Given the RNN output with weighted sum calculation :
(18) 
and the network output could be represented as:
(19) 
As shown in Eq.(19), the network output is a function expression of RNN parameter (, ) and fulllayer weighted matrix parameter (, ). And the network output of RNN with stacked different hidden layer states is computed as:
(20) 
and the network output could be represented as:
(21) 
As shown in Eq.(21), the network output is a function expression of RNN parameter (, ) and fulllayer weighted matrix parameter (, ). The fulllayer weighted matrix parameter are mainly used for the integration of feature vectors learned by each layer. calculated by weighted sum means more parameters. Difficulties are brought by the parameter increase. In a sense, Eq. (19) and Eq. (21) are mathematically equivalent due to parameter variation. The sum calculation has obvious advantages during training our system. Although the sum and weighted sum calculation could ideally achieve the same training effect, fewer parameters of sum calculation bring faster training process and higher recognition accuracy. Compared with the outputs calculated by using only the last hidden layer state, outputs calculated by using sum calculation could even obtain a higher recognition accuracy without increasing the amount of parameters. The proposed sum calculation is an optimal hidden layers state process method.
Experimental Results
All the experiments are carried out on IAHCCUCAS2016 dataset and CASIAOLHWDB1.1 dataset. IAHCCUCAS2016 is a dataset of inair handwritten Chinese characters containing 3873 character classes in total. Concretely, it contains 3811 Chinese characters, 52 casesensitive English letters, and 10 digits. Each character class has 116 samples. For each class, we choose 92 samples as training set, and the remaining 24 samples are used as testing set. 14 samples in the training set are randomly sampled to form the validation set. For further evaluation of the proposed methods, experiments are also carried out on CASIAOLHWDB1.1 dataset. CASIAOLHWDB1.1 dataset is a handwritten Chinese character dataset which contains 3755 characters from GB2312 written by 300 writers. The data from 240 persons are used for training, and the data from the remaining 60 persons are used for testing.
Method 






Paras  Acc.  Paras 

Acc.  Paras 

Acc.  Acc.  
#1. 

1.58mil  92.2%  0.79mil  0.00127  92.7%  0.79mil  0.00143  92.7%  90.6%  

1.97mil  92.4%  0.98mil  0.00170  92.9%  0.98mil  0.00202  92.8%  91.4%  

2.37mil  92.6%  1.18mil  0.00220  92.9%  1.18mil  0.00270  92.9%  91.5%  

2.76mil  92.6%  1.38mil  0.00284  92.9%  1.38mil  0.00330  92.9%  91.5%  
#2. 

1.55mil  95.6%  0.77mil  0.00113  96.3%  0.77mil  0.00136  96.1%  N/A  

2.74mil  95.7%  1.37mil  0.00231  96.5%  1.37mil  0.00314  96.4%  N/A 
256,128 denotes the hidden layer size of corresponding RNNs. 
h_layers denotes hidden layers. 
#1. and #2. denote experiments carried out on the IAHCCUCAS2016 dataset and CASIAOLHWDB1.1 dataset respectively. 
Data Preprocessing
the location sequence of a handwritten character is denoted by: , where the denotes th location dot vector, the could be represented as , where and denote th location dot’s coordinate values respectively. Before being fed into our system, the data is preprocessed by two steps. Concretely, (1) and of all the locations are scaled to a range of 0 to 64; (2) the dot locations are further normalized so that their mean equals to zero, i.e., the new coordinate is obtained by , , where and denote the means of the corresponding coordinates of all the dot locations.
Network Configuration
In our experiments, the inputs of the RNN networks are 2dimensional or 3dimensional temporal sequences, corresponding to coordinates of dot locations on writing trajectory of inair handwritten character and handwritten character with pen upanddown mask on touching scren. The number of neurons in the hidden layers are all 256 except some systems in Table 1. The dropout values are all set to 0.6 and minibatch size is 256. All the computation is performed by GPUs on TESLA K10.
Performance Evaluation of Hybridparameters RNNs
The parameters of the proposed hybridparameter RNNs are changed during the iteration at temperal dimension. The proposed hybridparameter RNNs system could obtain a higher recognition accuracy with fewer parameters than the general RNN. Compared with bidirectional RNNs, the proposed hybridparameter RNN has a faster calculation speed. The experiments are carried out on IAHCCUCAS2016 dataset.
As shown in Table 1, the proposed hybridparameter RNNs with five 128size hidden layers obtain a higher recognition accuracy than the genral RNN with five 256size hidden layers. However, the parameter quantity of proposed hybridparameter RNNs are as half as the general RNNs. Compared with bidirectional RNNs with calculation speed of 0.00127 second/sample, 0.00170 second/sample, 0.00220 second/sample, and 0.00220 second/sample, the proposed hybridparameter RNN obtains competitive recognition accuracy with faster calculation speed of 0.00143 second/sample, 0.00202 second/sample, 0.00270 second/sample and 0.00330 second/sample corresponding four kinds of hidden layer size. The calculation time is shortened by about 15%.
For further evaluations, experiments are also carried out on CASIAOLHWDB1.1 dataset. In the two different architectures of RNNs containing 2 or 5 hidden layers, compared with the general RNNs, the hybridparameter RNNs bring parameter decrease of 50% from 1.55mil to 0.77mil , and 50% from 2.74mil to 1.37mil respectively and obtain high recognition results. Compared with the bidirectional RNNs, the the hybridparameter RNNs bring calculation speed increase of 18% from 0.00136 second/sample to 0.00113 second/sample, and 26% from 0.00314 second/sample to 0.00231 second/sample respectively and obtain competitive results.
Performance Evaluation of Memory Pool Unit
Methods 





























The proposed MPU is applied to inair handwritten Chinese character recognition. The main part of the proposed method is a memory pool which could learn the useful temporal information. And the core of the memory pool is the renewal function. Input compensation is important for network training and two methods are proposed for input compensation. One is that the network inputs are contained in the inputs of each layer. The other method is using the memory pool outputs and last layer outputs to compute the current hidden layer state.
To evaluate the performance of the MPU, experiments are carried out on IAHCCUCAS2016 dataset. The RNN structures used in these experiments are the general RNNs equipped with GRU ,LSTM, MPU and MPU with input compensation respectively. As shown in Table 2, the MPU and MPU with input compensation methods obtain a competitive recognition accuracy. Compared with LSTM, the proposed method has fewer parameters and a simpler structure. Compared with GRU, the whole hidden layer state calculation process are more reasonable and convictive.
For further performance evaluation of the MPU, another experiment was carried out on CASIAOLHWDB1.1 dataset [\citeauthoryearLiu et al.2011]. The data from 36 persons in training set are randomly sampled to form the validation set. As shown in Table 3, in the two different architectures of RNNs containing 2 or 5 hidden layers, compared with the exsiting structure LSTM, the RNNs equipped with MPU obtains a higher recognition results.
Methods 














Performance Evaluation of Stacked Different Hidden Layer States
Methods 
























For RNNs with same hidden layer size in each layer, we make a suggestion that all the outputs of each layer are stacked as the output of network. By synthesizing all the outputs of each layer, Stacked method increases the recognition accuracy without increasing the parameters.
To evaluate the performance of our proposed stack method, experiments are carried out on IAHCCUCAS2016 dataset. The RNNs structure used in the four experiments are the general RNNs equipped with GRU. The experimental results are shown in Table 4. The experimental results show that the proposed stack method could bring a recognition accuracy increase.
Recognition Accuracy Comparison between Ours and the Stateofarts Method
Based on the constructed RNNs in Table 1 and Table 2, we construct an ensemble classifier containing all the networks. Concretely, the output of the ensemble classifier is the sum of outputs of child RNNs, i.e., , where denotes the output of a network in Table 1 and Table 2, denotes N networks in Table 1 and Table 2.
Since the accuracy in [\citeauthoryearQu et al.2016] is obtained only for the 3811 Chinese character classes, the experimental results in Table 4 are also reported on the 3811 Chinese characters for the sake of fair comparison. From Table 5, we can see the ensemble classifier can achieve the recognition accuracy of 93.7%. The proposed classifiers outperform the stateoftheart results [\citeauthoryearQu et al.2016, \citeauthoryearRen et al.2017].







93.7%  93.4%  91.8% 
Method#1. [\citeauthoryearRen et al.2017],Method#2. [\citeauthoryearQu et al.2016] 
Conclusion
This paper presents an endtoend recognizer for online inair handwritten Chinese characters by using recurrent neural networks (RNN) and it has obtained competitive performance compared with the stateoftheart methods [\citeauthoryearRen et al.2017, \citeauthoryearQu et al.2016]. The merit of the proposed method is that it does not need the explicit feature representation in modeling the classifier. To make the classic RNN work better for online IAHCCR, Three mechanisms are proposed, i.e., the Menory Pool method, hybridparameters RNNs, stacked different hidden Layer States. The experimental results show that all the mechanisms can effectively promote the classification performance of RNN.
References
 Bai, Z.L., and Huo, Q. 2005. A study on the use of 8directional features for online handwritten chinese character recognition. 2005 International Conference on Document Analysis and Recognition, 262–266.
 Bengio, Y.; Simard, P.; and Frasconi, P. 1994. Learning longterm dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166.
 Cho, K.; Merrienboer, B. V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoderdecoder for statistical machine translation. Computer Science.
 Graves, A. 2013. Generating sequences with recurrent neural networks. Computer Science,.
 He, P.; Huang, W.; Qiao, Y.; Chen, C. L.; and Tang, X. 2016. Reading scene text in deep convolutional sequences. The Association for the Advancement of Artificial Intelligence, 3501–3508.
 Hinton, G. E.; Deng, L.; Yu, D.; Dahl, G. E.; Mohamed, A.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; and Sainath, T. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6):82–97.
 Hochreiter, S., and Schmidhuber, J. 1997. Long shortterm memory. Neural Computation, 9(8):1735–1780.
 Jin, L.W.; Gao, Y.; Liu, G.; Li, Y.Y.; and Ding, K. 2010. Scutcouch2009a comprehensive online unconstrained chinese handwriting database and benchmark evaluation. International Journal on Document Analysis and Recognition, 14(1):53–64.
 Liu, C.L.; Yin, F.; Wang, D.H.; and Wang, Q.F. 2011. Casia online and offline chinese handwriting databases. 2011 International Conference on Document Analysis and Recognition (ICDAR), 37–41.
 Liu, C.L.; Yin, F.; Wang, D.H.; and Wang, Q.F. 2013. Online and offline handwritten chinese character recognition: Benchmarking on new databases. Pattern Recognition, 46(1):155–162.
 Liu, C.L.; Jaeger, S.; and Nakagawa, M. 2004. Online recognition of chinese characters: the stateoftheart. Pattern Analysis and Machine Intelligence 26(2):198–213.
 Messina, R., and J.Louradour. 2015. Segmentationfree handwritten chinese text recognition with lstmrnn. International Conference on Document Analysis and Recognition, 171–175.
 Okamoto, M., and Yamamoto, K. 1999. Online handwriting character recognition using directionchange features that consider imaginary strokes. Pattern Recognition, 32(7):1115–1128.
 Qu, X.W.; Wang, W.Q.; Lu, K.; and Xu, N. 2015. Inair handwritten chinese character recognition using multistage classifier based on adaptive discriminative locality alignment. 2015 IEEE International Conference on Image Processing (ICIP), 4793 – 4797.
 Qu, X.W.; Wang, W.Q.; Lu, K.; and Ji, Z.J. 2016. Highorder directional features and sparse representation based classification for inair handwritten chinese character recognition. International Conference on Multimedia and Expo, 1–6.
 Ren, H.; Wang, W.; Lu, K.; Zhouand, J.; and Yuan, Q. 2017. An endtoend recognizer for inair handwritten chinese characters based on a new recurrent neural networks. International Conference on Multimedia and Expo, 1–6.
 Ruales, J. 2011. Recurrent neural networks for sentiment analysis.
 Tieleman, T., and Hinton, G. E. 2012. Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning,.
 XuYao Zhang, Fei Yin, Y.M. Z. C.L. L. Y. B. 2017. Drawing and recognizing chinese characters with recurrent neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99):1–14.