SigNet: Convolutional Siamese Network for Writer Independent Offline Signature Verification
Offline signature verification is one of the most challenging tasks in biometrics and document forensics. Unlike other verification problems, it needs to model minute but critical details between genuine and forged signatures, because a skilled falsification might only differ from a real signature by some specific kinds of deformation. This verification task is even harder in writer independent scenarios which is undeniably fiscal for realistic cases. In this paper, we model an offline writer independent signature verification task with a convolutional Siamese network. Siamese networks are twin networks with shared weights, which can be trained to learn a feature space where similar observations are placed in proximity. This is achieved by exposing the network to a pair of similar and dissimilar observations and minimizing the Euclidean distance between similar pairs while simultaneously maximizing it between dissimilar pairs. Experiments conducted on cross-domain datasets emphasize the capability of our network to handle forgery in different languages (scripts) and handwriting styles. Moreover, our designed Siamese network, named SigNet, provided better results than the state-of-the-art results on most of the benchmark signature datasets.
Signature is one of the most popular and commonly accepted biometric hallmarks that has been used since the ancient times for verifying different entities related to human beings, viz. documents, forms, bank checks, individuals, etc. Therefore, signature verification is a critical task and many efforts have been made to remove the uncertainty involved in the manual authentication procedure, which makes signature verification an important research line in the field of machine learning and pattern recognition Plamondon2000 (); Impedovo2008 (). Depending on the input format, signature verification can be of two types: (1) online and (2) offline. Capturing online signature needs an electronic writing pad together with a stylus, which can mainly record a sequence of coordinates of the electronic pen tip while signing. Apart from the writing coordinates of the signature, these devices are also capable of fetching the writing speed, pressure, etc., as additional information, which are used in the online verification process. On the other hand, the offline signature is usually captured by a scanner or any other type of imaging devices, which basically produces two dimensional signature images. As signature verification has been a popular research topic through decades and substantial efforts are made both on offline as well as on online signature verification purpose.
Online verification systems generally perform better than their offline counter parts Munich2003 () due to the availability of complementary information such as stroke order, writing speed, pressure,etc. However, this improvement in performances comes at the cost of requiring a special hardware for recording the pen-tip trajectory, rising its system cost and reducing the real application scenarios. There are many cases where authenticating offline signature is the only option such as check transaction and document verification. Because of its broader application area, in this paper, we focus on the more challenging task- automatic offline signature verification. Our objective is to propose a convolutional Siamese neural network model to discriminate the genuine signatures and skilled forgeries.
Offline signature verification can be addressed with (1) writer dependent and (2) writer independent approaches Bertolini2010 (). The writer independent scenario is preferable over writer dependent approaches, as for a functioning system, a writer dependent system needs to be updated (retrained) with every new writer (signer). For a consumer based system, such as bank, where every day new consumers can open their account this incurs huge cost. Whereas, in writer independent case, a generic system is built to model the discrepancy among the genuine and forged signatures. Training a signature verification system under a writer independent scenario, divides the available signers into train and test sets. For a particular signer, signatures are coupled as similar (genuine, genuine) or dissimilar (genuine, forged) pairs. From all the tuples of a single signer, equal number of tuples similar and dissimilar pairs are stochastically selected for balancing the number of instances. This procedure is applied to all the signers in the train and test sets to construct the training and test examples for the classifier.
In this regard a signature verifier can be efficiently modelled by a Siamese network which consists of twin convolutional networks accepting two distinct signature images coming from the tuples that are either similar or dissimilar. The constituting convolutional neural networks (CNN) are then joined by a cost function at the top, which computes a distance metric between the highest level feature representation on each side of the network. The parameters between this twin networks are shared, which in turns guarantees that two extremely similar images could not possibly be mapped by their respective networks to very different locations in feature space because each network computes the same function.
Different hand crafted features have been proposed for offline signature verification tasks. Many of them take into account the global signature image for feature extraction, such as, block codes, wavelet and Fourier series etc Kalera2004 (). Some other methods consider the geometrical and topological characteristics of local attributes, such as position, tangent direction, blob structure, connected component and curvature Munich2003 (). Projection and contour based methods Dimauro1997 () are also quite popular for offline signature verification. Apart from the above mentioned methods, approaches fabricated on direction profile Dimauro1997 (); Ferrer2005 (), surroundedness features Kumar2012 (), grid based methods Huang1997a (), methods based on geometrical moments Ramesh1999 (), and texture based features Pal2016 () have also become famous in signature verification task. Few structural methods that consider the relations among local features are also explored for the same task. Examples include graph matching Chen2006 () and recently proposed compact correlated features Dutta2016 (). On the other hand, Siamese like networks are very popular for different verification tasks, such as, online signature verification Bromley1994 (), face verification Chopra2005 (); Schroff2015 () etc. Furthermore, it has also been used for one-shot image recognition Koch2015 (), as well as for sketch-based image retrieval task Qi2016 (). Nevertheless, to the best of our knowledge, till date, convolutional Siamese network has never been used to model an offline signature verifier, which provides our main motivation.
The main contribution of this paper is the proposal of a convolutional Siamese network, named SigNet, for offline signature verification problem. This, in contrast to other methods based on hand crafted features, has the ability to model generic signature forgery techniques and many other related properties that envelops minute inconsistency in signatures from the training data. In contrary to other one-shot image verification tasks, the problem with signature is far more complex because of subtle variations in writing styles independent of scripts, which could also encapsulate some degrees of forgery. Here we mine this ultra fine anamorphosis and create a generic model using SigNet.
The rest of the paper is organized as follows: In Section 2 we describe the SigNet and its architechture. Section 3 presents our experimental validation and compares the proposed method with available state-of-the-art algorithms. Finally, in Section 4, we conclude the paper with a defined future direction.
2 SigNet: Siamese Network for Signature Verification
Since batch training a neural network typically needs images of same sizes but the signature images we consider have different sizes ranges from to . We resize all the images to a fixed size using bilinear interpolation. Afterwards, we invert the images so that the background pixels have values. Furthermore, we normalize each image by dividing the pixel values with the standard deviation of the pixel values of the images in a dataset.
2.2 CNN and Siamese Network
Deep Convolutuional Neural Networks (CNN) are multilayer neural networks consists of several convolutional layers with different kernel sizes interleaved by pooling layers, which summarizes and downsamples the output of its convolutions before feeding to next layers. To get nonlinearity rectified linear units are also used. In this work, we used different convolutional kernels with sizes starting with to . Generally a differentiable loss function is chosen so that Gradient descent can be applied and the network weights can be optimized. Given a differentiable loss function, the weights of different layers are updated using back propagation. As the optimization can not be applied to all training data where training size is large batch optimizations gives a fair alternative to optimize the network.
Siamese neural network is a class of network architectures that usually contains two identical subnetworks. The twin CNNs have the same configuration with the same parameters and shared weights. The parameter updating is mirrored across both the subnetworks. This framework has been successfully used for dimensionality reduction in weakly supervised metric learning Chopra2005 () and for face verification in Schroff2015 (). These subnetworks are joined by a loss function at the top, which computes a similarity metric involving the Euclidean distance between the feature representation on each side of the Siamese network. One such loss function that is mostly used in Siamese network is the contrastive loss Chopra2005 () defined as follows:
where and are two samples (here signature images), is a binary indicator function denoting whether the two samples belong to the same class or not, and are two constants and is the margin equal to in our case. is the Euclidean distance computed in the embedded feature space, is an embedding function that maps a signature image to real vector space through CNN, and , are the learned weights for a particular layer of the underlying network. Unlike conventional approaches that assign binary similarity labels to pairs, Siamese network aims to bring the output feature vectors closer for input pairs that are labelled as similar, and push the feature vectors away if the input pairs are dissimilar. Each of the branches of the Siamese network can be seen as a function that embeds the input image into a space. Due to the loss function selected (Eqn. 1), this space will have the property that images of the same class (genuine signature for a given writer) will be closer to each other than images of different classes (forgeries or signatures of different writers). Both branches are joined together by a layer that computes the Euclidean distance between the two points in the embedded space. Then, in order to decide if two images belong to the similar class (genuine, genuine) or a dissimilar class (genuine, forged) one needs to determine a threshold value on the distance.
We have used a CNN architecture that is inspired by Krizhevsky et al. Krizhevsky2012 () for an image recognition problem. For the easy reproducibility of our results, we present a full list of parameters used to design the CNN layers in Table 1. For convolution and pooling layers, we list the size of the filters as , where is the number of filters, is the height and is the width of the corresponding filter. Here, stride signifies the distance between the application of filters for the convolution and pooling operations, and pad indicates the width of added borders to the input. Here it is to be mentioned that padding is necessary in order to convolve the filter from the very first pixel in the input image. Throughout the network, we use Rectified Linear Units (ReLU) as the activation function to the output of all the convolutional and fully connected layers. For generalizing the learned features, Local Response Normalization is applied according to Krizhevsky2012 (), with the parameters shown in the corresponding row in Table 1. With the last two pooling layers and the first fully connected layer, we use a Dropout with a rate equal to and , respectively.
|Local Response Norm.||-|
|Local Response Norm.||-|
|Pooling + Dropout|
|Pooling + Dropout|
|Fully Connected + Dropout|
The first convolutional layers filter the input signature image with kernels of size with a stride of pixels. The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with kernels of size . The third and fourth convolutional layers are connected to one another without any intervention of pooling or normalization of layers. The third layer has kernels of size connected to the (normalized, pooled, and dropout) output of the second convolutional layer. The fourth convolutional layer has kernels of size . This leads to the neural network learning fewer lower level features for smaller receptive fields and more features for higher level or more abstract features. The first fully connected layer has neurons, whereas the second fully connected layer has neurons. This indicates that the highest learned feature vector from each side of SigNet has a dimension equal to .
We initialize the weights of the model according to the work of Glorot and Bengio Glorot2010 (), and the biases equal to . We trained the model using RMSprop for epochs, using momentum rate equal to , and mini batch size equal to . We started with an intial learning rate (LR) equal to with hyper parameters and . All these values are shown in Table 2. Our entire framework is implemented using Keras library with the TensorFlow as backend. The training was done using a GeForce GTX and a TITAN X Pascal GPU, and it took to hours to run approximately, depending on different databases.
|Initial Learning Rate (LR)||1e-4|
|Learning Rate Schedule|
Figure 2 shows five different filter activations in the last convolutional layer on a pair of (genuine, forged) signatures, which have received comparatively higher discrimination. The first row corresponds to the genuine signature image, whereas, the second row corresponds to the forged one and these two sigantures are correctly classified as dissimilar by SigNet. Each column starting from the second one shows the activations under the convolution of the same filter. The responses in the respective zones show the areas or signature features that are learned by the network for distinguishing these two signatures.
In order to evaluate our signature verification algorithm, we have considered four widely used benchmark databases, viz., (1) CEDAR, (2) GPDS300, (3) GPDS Synthetic Signature Database, and (4) BHSig260 signature corpus. The source code of SigNet will be available once the paper gets accepted for publication.
CEDAR signature database111Available at http://www.cedar.buffalo.edu/NIJ/data/signatures.rar contains signatures of signers belonging to various cultural and professional backgrounds. Each of these signers signed genuine signatures minutes apart. Each of the forgers tried to emulate the signatures of persons, times each, to produce forged signatures for each of the genuine signers. Hence the dataset comprise genuine signatures as well as forged signatures. The signature images in this dataset are available in gray scale mode.
GPDS300 signature corpus222Available at http://www.gpds.ulpgc.es/download comprises genuine and forged signatures for persons. This sums up to genuine signatures and forged signatures. The genuine signatures of each of the signers were collected in a single day. The genuine signatures are shown to each forger and are chosen randomly from the genuine ones to be imitated. All the signatures in this database are available in binary form.
3.1.3 GPDS Synthetic
The BHSig260 signature dataset444The dataset is available at https://goo.gl/9QfByd contains the signatures of persons, among them were signed in Bengali and are signed in Hindi Pal2016 (). The authors have followed the same protocol as in GPDS300 to generate these signatures. Here also, for each of the signers, genuine and forged signatures are available. This results in genuine and forged signatures in Bengali, and genuine and forged signatures in Hindi. Even though this dataset is available together, we experimented with our method separately on the Bengali and Hindi dataset.
3.2 Performance Evaluation
A threshold is used on the distance measure output by the SigNet to decide whether the signature pair belongs to the similar or dissimilar class. We denote the signature pairs with the same identity as , whereas all pairs of different identities as . Then, we can define the set of all true positives (TP) at as
Similarly the set of all true negatives (TN) at can be defined as
Then the true positive rate and the true negative rate for a given signature, distance are then defined as
where is the number of similar signature pairs. The final accuracy is computed as
which is the maximum accuracy obtained by varying from the minimum distance value to the maximum distance value of with step equal to .
3.3 Experimental Protocol
Since our method is designed for writer independent signature verification, we divide each of the datasets as follows. We randomly select signers from the (where ) available signers of each of the datasets. We keep all the original and forged signatures of these signers for training and the rest of the signers for testing. Since all the above mentioned datasets contain genuine signatures for each of the authors, there are only (genuine, genuine) signature pairs available for each author. Similarly, since most of the datasets contain (for CEDAR ) forged signatures for each signer, there are only (for CEDAR ) (genuine, forged) signature pairs can be obtained for each author. For balancing the similar and dissimilar classes, we randomly choose only (genuine, forged) signature pairs from each of the writers. This protocol results in (genuine, genuine) as well as (genuine, forged) signature pairs for training and for testing. Table 3 shows the values of and for different datasets, that are considered for our experiments.
Although most of the existing datasets contain forged signatures, in real life scenarios, there can be cases where getting training samples from forgers might be difficult. Thus, a system trained with genuine-forged signature pairs will be inadequate to deal with such set up. One way to deal with this type of situations is to use only the genuine signatures of other signer as forged signatures (called as unskilled forged signatures). To be applicable in such scenarios, we have performed an experiment only on the GPDS-300 dataset, where the genuine signatures of other writers are used as unskilled forged signatures. However, during testing, we have used genuine-forged pairs of the same signers, i.e., we tested our system for it’s ability to distinguish between genuine and forged signatures of the same person.
|CEDAR Signature Database||Word Shape (GSC) (Kalera et al. Kalera2004 ())|
|Zernike moments (Chen and Srihari Chen2005 ())|
|Graph matching (Chen and Srihari Chen2006 ())|
|Surroundedness features (Kumar et al. Kumar2012 ())|
|Dutta et al. Dutta2016 ()|
|GPDS 300 Signature Corpus||Ferrer et al. Ferrer2005 ()|
|Vargas et al. Vargas2007 ()|
|Solar et al. Solar2008 ()|
|Kumar et al. Kumar2012 ()|
|Dutta et al. Dutta2016 ()|
|SigNet (unskilled forged)|
|GPDS Synthetic Signature Corpus||Dutta et al. Dutta2016 ()|
|Bengali||Pal et al. Pal2016 ()|
|Dutta et al. Dutta2016 ()|
|Hindi||Pal et al. Pal2016 ()|
|Dutta et al. Dutta2016 ()|
3.4 Results and Discussions
Table 4 shows the accuracies of our proposed SigNet together with other state-of-the-art methods on different datasets discussed in Section 3.1. It is to be noted that SigNet outperformed the state-of-the-art methods on three datasets, viz. GPDS Synthetic, Bengali, and CEDAR dataset. A possible reason for the lower performance on the GPDS300 is the less number of signature samples for learning with many different signature styles. However, on GPDS Synthetic, our proposed network outperformed the same method proposed by Dutta et al. Dutta2016 () possibly because there were plenty of training samples for learning the available signature styles. Moreover, it can be observed that our system trained on genuine-unskilled forged pairs is outperformed by the system trained on genuine-forged examples. This is quite justified and very intuitive as identifying forgeries of a signature needs attention to minute details of one’s signature, which can not be captured when unskilled forged signatures (i.e.genuine signatures of other signers) are used as training examples.
To get some ideas on the generalization of the proposed network and the strength of the models learned on different datasets, we performed a second experiment with cross dataset settings. To do this, at a time, we have trained a model on one of the above mentioned datasets and tested it on all the other corpus. We have repeated this same process over all the datasets. The accuracies obtained by SigNet on the cross dataset settings are shown in Figure 3, where the datasets used for training are indicated in rows and the datasets used for testing are exhibited along columns. It is to be observed that for all the datasets, the highest accuracy is obtained with a model trained on the same dataset. This implies all the datasets have some distinctive features, despite the fact that, CEDAR, GPDS300 and GPDS Synthetic datasets contain signatures with nearly same style (some handwritten letters with initials etc.). However, this fact is justifiable in case of BHSig260 dataset, because it contains signatures in Indic script and the signatures generally look like normal text containing full names of persons. Therefore, it is probable that the network models some script based features in this case. Furthermore, it is usually noted that the system trained on a comparatively bigger and diverse dataset is more robust than the others, which is the reason why better average accuracies are obtained by the model trained on GPDS Synthetic and GPDS300. These experiments strongly show the possibility of creating signature verification system in those cases where training is not possible due to the dearth of sufficient data. In those situation, a robust pretrained model could be used with a lightweight fine tuning on the available specific data. We also thought of the situation where the forger not knowing the real identity of the person, he or she introduces his signature or scribbling which has more variations than the skilled forged ones. To evaluate this, we used the trained model on GPDS300 (trained with skilled forgery) and tested it on signatures placed against a random forgery (i.e.genuine signature of another person) giving an expected increase in performance with accuracy rate in GPDS300 dataset (keeping rest of the experimental setup same). This also proves that the model trained to find subtle differences in signature, also performs well when the variations in signatures are large.
In this paper, we have presented a framework based on Siamese network for offline signature verification, which uses writer independent feature learning. This method does not rely on hand-crafted features unlike its predecessors, instead it learns them from data in a writer independent scenario. Experiments conducted on GPDS Syntehtic dataset demonstrate that this is a step towards modelling a generic prototype for real forgeries based on synthetically generated data. Also, our experiments made on cross domain datasets emphasize how well our architecture models the fraudulence of different handwriting style of different signers and forgers with diverse background and scripts. Furthermore, the SigNet designed by us has surpassed the state-of-the-art results on most of the benchmark Signature datasets, which is encouraging for further research in this direction. Our future work in this line will focus on the development of more enriched network model. Furthermore, other different frameworks for verification task will also be explored.
This work has been partially supported by the European Union’s research and innovation program under the Marie Skłodowska-Curie grant agreement No. 665919. The TITAN X Pascal GPU used for this research was donated by the NVIDIA Corporation.
- (1) R. Plamondon, S. Srihari, Online and off-line handwriting recognition: a comprehensive survey, IEEE TPAMI 22 (1) (2000) 63–84.
- (2) D. Impedovo, G. Pirlo, Automatic signature verification: The state of the art, IEEE TSMC 38 (5) (2008) 609–635.
- (3) M. E. Munich, P. Perona, Visual identification by signature tracking, IEEE TPAMI 25 (2) (2003) 200–217.
- (4) D. Bertolini, L. Oliveira, E. Justino, R. Sabourin, Reducing forgeries in writer-independent off-line signature verification through ensemble of classifiers, PR 43 (1) (2010) 387–396.
- (5) M. K. Kalera, S. N. Srihari, A. Xu, Offline signature verification and identification using distance statistics, IJPRAI 18 (7) (2004) 1339–1360.
- (6) G. Dimauro, S. Impedovo, G. Pirlo, A. Salzo, A multi-expert signature verification system for bankcheck processing, IJPRAI 11 (05) (1997) 827–844.
- (7) M. A. Ferrer, J. B. Alonso, C. M. Travieso, Offline geometric parameters for automatic signature verification using fixed-point arithmetic, IEEE TPAMI 27 (6) (2005) 993–997.
- (8) R. Kumar, J. Sharma, B. Chanda, Writer-independent off-line signature verification using surroundedness feature, PRL 33 (3) (2012) 301–308.
- (9) K. Huang, H. Yan, Off-line signature verification based on geometric feature extraction and neural network classification, PR 30 (1) (1997) 9–17.
- (10) V. Ramesh, M. N. Murty, Off-line signature verification using genetically optimized weighted features, PR 32 (2) (1999) 217–233.
- (11) S. Pal, A. Alaei, U. Pal, M. Blumenstein, Performance of an off-line signature verification method based on texture features on a large indic-script signature dataset, in: DAS, 2016, pp. 72–77.
- (12) S. Chen, S. Srihari, A new off-line signature verification method based on graph, in: ICPR, 2006, pp. 869–872.
- (13) A. Dutta, U. Pal, J. Lladós, Compact correlated features for writer independent signature verification, in: ICPR, 2016, pp. 3411–3416.
- (14) J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, R. Shah, Signature verification using a ”siamese” time delay neural network, in: NIPS, 1994, pp. 737–744.
- (15) S. Chopra, R. Hadsell, Y. LeCun, Learning a similarity metric discriminatively, with application to face verification, in: CVPR, 2005, pp. 539–546.
- (16) F. Schroff, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in: CVPR, 2015, pp. 815–823.
- (17) G. Koch, R. Zemel, R. Salakhutdinov, Siamese neural networks for one-shot image recognition, in: ICML, 2015, pp. 1–8.
- (18) Y. Qi, Y. Z. Song, H. Zhang, J. Liu, Sketch-based image retrieval via siamese convolutional neural network, in: ICIP, 2016, pp. 2460–2464.
- (19) A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: NIPS, 2012, pp. 1097–1105.
- (20) X. Glorot, Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in: AISTATS, 2010, pp. 249–256.
- (21) M. A. Ferrer, M. Diaz-Cabrera, A. Morales, Synthetic off-line signature image generation, in: ICB, 2013, pp. 1–7.
- (22) S. Chen, S. Srihari, Use of exterior contours and shape features in off-line signature verification, in: ICDAR, 2005, pp. 1280–1284.
- (23) F. Vargas, M. Ferrer, C. Travieso, J. Alonso, Off-line handwritten signature gpds-960 corpus, in: ICDAR, Vol. 2, 2007, pp. 764–768.
- (24) J. Ruiz del Solar, C. Devia, P. Loncomilla, F. Concha, Offline signature verification using local interest points and descriptors, in: CIARP, 2008, pp. 22–29.