Fingerprint Spoof Detection: Temporal Analysis of Image Sequence
We utilize the dynamics involved in the imaging of a fingerprint on a touch-based fingerprint reader, such as perspiration, changes in skin color (blanching), and skin distortion, to differentiate real fingers from spoof (fake) fingers. Specifically, we utilize a deep learning-based architecture (CNN-LSTM) trained end-to-end using sequences of minutiae-centered local patches extracted from ten color frames captured on a COTS fingerprint reader. A time-distributed CNN (MobileNet-v1) extracts spatial features from each local patch, while a bi-directional LSTM layer learns the temporal relationship between the patches in the sequence. Experimental results on a database of live frames from subjects ( unique fingers), and spoof frames of spoof materials (with 14 variants) shows the superiority of the proposed approach in both known-material and cross-material (generalization) scenarios. For instance, the proposed approach improves the state-of-the-art cross-material performance from TDR of to @ FDR = .
Fingerprint recognition technology is now widely adopted across the globe for a plethora of applications, including international border crossing
|Parthasaradhi et al. ||Temporal analysis of perspiration pattern along friction ridges||live from subjects and spoof from materials, and cadaver from fingers||Avg. Classification Accuracy =|
|Kolberg et al. ||Blood flow detection using a sequence of Laser Speckle Contrast Images||live from subjects and spoof images of 8 spoof materials (32 variants)||TDR = @ FDR =|
|Plesh et al. ||Fusion of static (LBP and CNN) and dynamic (changes in color ratio) features using a sequence of color frames||live and spoof images of 10 materials||TDR = (known-material) @ FDR =|
|Proposed Approach||Temporal analysis of minutiae-based local patch sequences from color frames using CNN + LSTM model||live from subjects and spoof images of materials (14 variants)||TDR = 99.15% (known-material) and TDR = 86.20% (cross-material) @ FDR =|
Fingerprint spoof attacks
With the goal to detect such spoof attacks, various hardware and software-based spoof detection approaches have been proposed in literature . The hardware-based approaches typically utilize specialized sensors to detect the signs of vitality (blood flow, heartbeat, etc.) and/or sensing technologies for sub-dermal imaging [14, 29, 7, 19]. On the other hand, software-based approaches extract salient cues, related to anatomical (pores)  and texture-based features , from the captured fingerprint image(s). Chugh et al.  utilized minutiae-based local patches to train deep neural networks that achieves state-of-the-art spoof detection performance. Gonzalez-Soler et al.  proposed fusion of feature encodings of dense-SIFT features for robust spoof detection.
Software-based approaches can be further classified into static and dynamic approaches based on the input. A static approach extracts discriminative spatial features from a single fingerprint image, while a dynamic approach utilizes an image sequence to extract spatial and/or temporal features for spoof detection. For a comprehensive review on the existing static approaches, readers are referred to [17, 9].
In the case of dynamic approaches, published studies utilize temporal analysis to capture the physiological features, such as perspiration [22, 16], blood flow [31, 14], skin distortion , and color change [31, 23]. Table 1 summarizes the dynamic approaches for fingerprint spoof detection reported in the literature. Some of the limitations of these studies include long capture time (2-5 seconds), expensive hardware, and/or small number of frames in the sequence. Moreover, it is likely that some live fingers may not exhibit any of these dynamic phenomenons to separate them from spoofs. For instance, some dry fingers may not exhibit signs of perspiration during the finger presentation or a spoof may produce similar distortion characteristics as that of some live fingers.
We posit that automatic learning, as opposed to hand-engineering, of the dynamic features involved in the presentation of a finger can provide more robust and highly discriminating cues to distinguish live from spoofs. In this study, we propose to use a CNN-LSTM architecture to learn the spatio-temporal features across different frames in a sequence. We utilize a sequence of minutiae-centered local patches extracted from ten colored frames captured by a COTS fingerprint reader, SilkID SLK20R
The main contributions of this study are:
Utilized sequences of minutiae-based local patches to train a CNN-LSTM architecture with the goal of learning discriminative spatio-temporal features for fingerprint spoof detection. The local patches are extracted from a sequence of ten colored frames captured in quick succession ( fps) using a COTS fingerprint reader, SilkID SLK20R.
Experimental results on a dataset of live captures from subjects ( unique fingers) and spoof frames from 7 spoof materials (with 14 variants) shows that the proposed approach is able to improve the state-of-the-art cross-material performance from TDR of to @ FDR = .
2 Proposed Approach
The proposed approach consists of: (a) detecting minutiae from each of the frames and selecting the frame with the highest number of minutiae as the reference frame, (b) preprocessing the sequence of frames to convert them from Bayer pattern grayscale images to RGB images, (c) extracting local patches () from all ten frames based on the location of detected minutiae in the reference frame, and (c) end-to-end training of a CNN-LSTM architecture using the sequences of minutiae-centered patches extracted from the ten frames. While a time-distributed CNN network (MobileNet-v1) with shared weights extracts deep features from the local patches, a bidirectional LSTM layer is utilized to learn the temporal relationship between the features extracted from the sequence. An overview of the proposed approach is presented in Figure 4.
2.1 Minutia Detection
When a finger (or spoof) is presented to the SilkID SLK20R fingerprint reader, it captures a sequence of ten color frames, , at 8 frames per second
A digital sensor, containing a large array of photo-sensitive sites (pixels), is typically used in conjunction with a color filter array to permit only particular colors of light at each pixel. SilkID fingerprint reader employs one of the most common filter arrays, called as Bayer filter array, consisting of alternating rows of red-green (RG) and green-blue (GB) filters. Bayer demosaicing  (debayering) is the process of converting a bayer pattern image to an image with complete RGB color information at each pixel. It utilizes bilinear interpolation technique  to estimate the missing pixels in the three color planes as shown in Figure 3. The original sequence of grayscale Bayer pattern frames () is converted to RGB colorspace using an OpenCV  function, cv2.cvtColor(), with the parameter cv2.COLOR_BAYER_BG2RGB. After debayering, the frames have high pixel intensity values in the green channel (see Figure 4) as SilkID readers are calibrated with strong gains on green pixels for generating high quality FTIR images. We utilize these raw images for our experiments. For visualization purposes, we reduce the green channel intensity values by a factor of and perform histogram equalization on intensity value in the HSV colorspace
2.3 Local Patch Extraction
For each of the detected minutiae from the reference frame, , we extract a sequence of ten local patches, , of size , from the ten frames , centered at the minutiae location
|Spoof Material||Mold Type||# Presentations||# Frames|
|Ecoflex 00-35, flesh tone pigment||Dental|
|Ecoflex 00-50, flesh tone pigment||3D Printed|
|Ecoflex 00-50, tan pigment||3D Printed|
|Ballistic gelatin, flesh tone dye||3D Printed|
|Knox gelatin, clear||3D Printed|
|Third degree silicone|
|Light flesh tone pigment||Dental|
|Beige suede powder||Dental|
|Medium flesh tone pigment||Dental|
|Crayola Model Magic|
|Pigmented Dragon Skin (flesh tone)||Dental|
|Conductive Silicone||3D Printed|
|Unknown Spoof (JHU-APL)||3D Printed|
|Total Lives ( subjects)||2,665||26,650|
2.4 Network Architecture
Several deep Convolutional Neural Network (CNN) architectures, such as VGG , Inception-v3 , MobileNet-v1  etc., have been shown to achieve state-of-the-art performance for many vision-based tasks, including fingerprint spoof detection [20, 5]. Unlike traditional approaches where spatial filters are hand-engineered, CNNs can automatically learn salient features from the given image databases. However, as CNNs are feed-forward networks, they are not well-suited to capture the temporal dynamics involved in a sequence of images. On the other hand, a Recurrent Neural Network (RNN) architecture with feedback connections can process a sequence of data to learn the temporal features.
With the goal of learning highly discriminative and generalizable spatio-temporal features for fingerprint spoof detection, we utilize a joint CNN-RNN architecture that can extract deep spatial features from each frame, and learn the temporal relationship across the sequence. One of the most popular RNN architectures is Long Short-Term Memory  that can learn long range dependencies from the input sequences. The proposed network architecture utilizes a time-distributed MobileNet-v1 CNN architecture followed by a Bi-directional LSTM layer
MobileNet-v1 is a low-latency network with only M trainable parameters compared to other networks, such as Inception-v3 (M) and VGG (M), which achieve comparable performance in large-scale vision tasks . In low resource requirements such as smartphones and embedded devices, MobileNet-v1 is well-suited for real-time spoof detection. Most importantly, it has been shown to achieve state-of-the-art performance for fingerprint spoof detection  on publicly available datasets . It takes an input image of size , and outputs a 1024-dimensional feature vector (bottleneck layer). We resize the local patches from to as required by the MobileNet-v1 input. For the purposes of processing a sequence of images, we utilize a Keras’ TimeDistributed wrapper to utilize the MobileNet-v1 architecture as a feature extractor with shared parameters across different frames (time-steps) in the sequence.
|Approach||Architecture||TDR (%) ( s.d.) @ FDR = 0.2%|
|Still (Whole Image)||CNN||96.90 0.78|
|Still (Minutiae Patches) ||CNN||99.11 0.24|
|Sequence (Whole Frames)||CNN-LSTM||98.93 0.44|
|Sequence (Minutiae Patches)||CNN-LSTM||99.25 0.22|
|Unknown Material||Whole Image (Grayscale)||Fingerprint Spoof Buster ||Sequence of Whole Images||Sequence of Minutiae-based Patches|
|Mean s.d.||57.31 17.71||86.20 4.48|
2.5 Implementation Details
The network architecture is designed in the Keras framework
3 Experimental Results
In this study, we utilize a large-scale fingerprint database of live frames from subjects, and spoof frames of materials (14 variants) collected on SilkID SLK20R fingerprint reader. This database is constructed by combining fingerprint images collected from two sources. First, as part of the IARPA ODIN program , a large-scale Government Controlled Test (GCT-3) was conducted at Johns Hopkins University Applied Physics Laboratory (JHUAPL) facility in Nov. 2019, where a total of subjects with diverse demographics (in terms of age, profession, gender, and race) were recruited to present their real (live) as well as spoof biometric data (fingerprint, face, and iris). The spoof fingerprints were fabricated using 5 different spoof materials (11 variants) and a variety of fabrication techniques, including use of dental and 3D printed molds. For a balanced live and spoof data distribution, we utilize only right thumb and right index fingerprint images for the live data. Second, we collected spoof data in a lab setting
To demonstrate the robustness of our proposed approach, we evaluate it under two different settings, namely Known-Material and Cross-Material scenarios.
In this scenario, the same set of spoof materials are included in the train and test sets. To evaluate this, we utilize five-fold cross validation splitting the live and spoof datasets in 80/20 splits for training and testing with no subject overlap. In each of the five folds, there are live and spoof frames in training and rest in testing. Table 3 presents the results achieved by the proposed approach on known-materials compared to a state-of-the-art approach  that utilizes minutiae-based local patches from static grayscale images. The proposed approach improves the spoof detection performance from TDR of to @ FDR = 0.2%.
In this scenario, the spoof materials used in the test set are unknown during training. We simulate this scenario by adopting a leave-one-out protocol, where one material (including all its variants) is removed from training, and it used for evaluating the trained model. It is a more challenging and practical setting as it evaluates the generalizability of a spoof detector against spoofs that are never seen during training. For instance, in one of the cross-material experiments, we exclude all of the third degree spoofs (pigmented, tan, beige powder, and medium) from training, and use them for testing. The live data is randomly divided in a 80/20 split, with no subject overlap, for training and testing, respectively. The proposed approach improves the cross-material spoof detection performance from TDR of to @ FDR = . Table 4 presents the spoof detection performance achieved by the proposed approach, on three cross-material experiments, compared to a state-of-the-art approach.
3.3 Processing Times
The proposed network architecture takes around hours to converge when trained with sequences of whole frames, and hours with sequences of minutiae-based local patches, using a Nvidia GTX 1080Ti GPU. An average of 11 (13) sequences of minutiae-based local patches are extracted from the live (spoof) frames. The average classification time for a single presentation, including preprocessing, minutiae-detection, patch extraction, and sequence generation and inference, on a Nvidia GTX 1080 Ti GPU is ms for full frame-based sequences, and ms for minutiae-based patch sequences.
A robust and generalizable spoof detector is pivotal in the security and privacy of fingerprint recognition systems against unknown spoof attacks. In this study, we utilized a sequence of local patches centered at detected minutiae from ten color frames captured at fps as the finger is presented on the sensor. We posit that the dynamics involved in the presentation of a finger, such as skin blanching, distortion, and perspiration, provide discriminating cues to distinguish live from spoofs. We utilize a jointly learned CNN-LSTM model to learn the spatio-temporal dynamics across different frames in the sequence. The proposed approach improves the spoof detection performance from TDR of to @ FDR = in known-material scenarios, and from TDR of 81.65% to 86.20% @ FDR = 0.2% in cross-material scenarios. In future, we will explore the use of live sequences to learn a one-class classifier for generalized fingerprint spoof detection.
This research is based upon work supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA R&D Contract No. 2017 - 17020200004. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
- Fingerprint spoofs are one of the most common forms of presentation attacks (PA). The ISO standard IEC 30107-1:2016(E) defines presentation attacks as the “presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system”. Other forms of PAs include use of altered fingers and cadavers.
- It takes an average of 1.25 seconds to capture a sequence of ten frames.
- Reducing gain in green channel and histogram equalization achieved similar or lower performance compared to using raw color images. Therefore, raw images were used for all experiments.
- Minutiae coordinates extracted from the resized ppi frames are doubled to correspond to minutiae coordinates in the original ppi frames.
- Experiments with uni-directional LSTM layer achieved lower or similar performance compared to when using bi-directional layer.
- This database will be made accessible to the interested researchers after signing a license agreement.
- (2006) Fake finger detection by skin distortion analysis. IEEE Transactions on Information Forensics and Security 1 (3), pp. 360–373. Cited by: §1.
- (2008) Learning opencv: computer vision with the opencv library. O’Reilly Media, Inc.. Cited by: §2.2.
- (2016) Hacking mobile phones using 2D Printed Fingerprints. Note: MSU Tech. report, MSU-CSE-16-2 \urlhttps://www.youtube.com/watch?v=fZJI_BrMZXU Cited by: §1.
- (2019) End-to-end latent fingerprint search. IEEE Transactions on Information Forensics and Security 15, pp. 880–894. Cited by: §2.1.
- (2017) Fingerprint Spoof Detection using Minutiae-based Local Patches. In IEEE International Joint Conference on Biometrics (IJCB), Cited by: §1, §2.3, §2.4.
- (2018) Fingerprint Spoof Buster: Use of Minutiae-centered Patches. IEEE Transactions on Information Forensics and Security 13 (9), pp. 2190–2202. Cited by: §1, §2.4, Table 3, Table 4, §3.2.1.
- (2019) OCT Fingerprints: Resilience to Presentation Attacks. arXiv preprint arXiv:1908.00102. Cited by: §1.
- (2018) Universal 3D wearable Fingerprint Targets: Advancing Fingerprint Reader Evaluations. IEEE Transactions on Information Forensics and Security 13 (6), pp. 1564–1578. Cited by: §1.
- (2017) Review of the Fingerprint Liveness Detection (LivDet) competition series: 2009 to 2015. Image and Vision Computing 58, pp. 110–128. Cited by: §1, §2.4.
- (2019) Fingerprint Presentation Attack Detection Based on Local Features Encoding for Unknown Attacks. arXiv preprint arXiv:1908.10163. Cited by: §1.
- (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §2.4.
- (2017) Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint arXiv:1704.04861. Cited by: §2.4.
- (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.5.
- (2019) Multi-algorithm benchmark for fingerprint presentation attack detection with laser speckle contrast imaging. In IEEE International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–5. Cited by: Table 1, §1, §1.
- (2008) Image demosaicing: a systematic survey. In Visual Communications and Image Processing, Vol. 6822. Cited by: §2.2.
- (2012) Combining perspiration-and morphology-based static features for fingerprint liveness detection. Pattern Recognition Letters 33 (9). Cited by: §1.
- S. Marcel, M. S. Nixon, J. Fierrez and N. Evans (Eds.) (2019) ”Handbook of biometric anti-spoofing: presentation attack detection”. 2 edition, Springer. Cited by: §1, §1, §1.
- (2012) Impact of artificial gummy fingers on fingerprint systems. In Proc. SPIE, Vol. 4677, pp. 275–289. Cited by: §1.
- (2019) Optical coherence tomography for fingerprint presentation attack detection. In Handbook of Biometric Anti-Spoofing, Cited by: §1.
- (2016) Fingerprint Liveness Detection Using Convolutional Neural Networks. IEEE Transactions on Information Forensics and Security 11 (6), pp. 1206–1213. Cited by: §2.4.
- (2016) IARPA-BAA-16-04 (Thor). Note: \urlhttps://www.iarpa.gov/index.php/research-programs/odin/odin-baa Cited by: §3.1.
- (2005) Time-series Detection of Perspiration as a Liveness Test in Fingerprint Devices. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 35 (3), pp. 335–343. Cited by: Table 1, §1.
- (2019) Fingerprint Presentation Attack Detection utilizing Time-Series, Color Fingerprint Captures. In IEEE International Conference on Biometrics (ICB), Cited by: Table 1, §1.
- (2015) Imagenet large scale visual recognition challenge. Proc. International Journal of Computer Vision (IJCV) 115 (3), pp. 211–252. Cited by: §2.4.
- (2017-November 14) Fingerprint Pore Analysis for Liveness Detection. Google Patents. Note: US Patent 9,818,020 Cited by: §1.
- (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §2.4.
- (2016) Rethinking the Inception Architecture for Computer Vision. In Proc. IEEE CVPR, pp. 2818–2826. Cited by: §2.4.
- (2000) Image interpolation and resampling. Handbook of medical imaging, processing and analysis 1 (1), pp. 393–420. Cited by: §2.2.
- (2019) Biometric Presentation Attack Detection: Beyond the Visible Spectrum. IEEE Transactions on Information Forensics and Security. Cited by: §1.
- (2018) A Novel Weber Local Binary Descriptor for Fingerprint Liveness Detection. IEEE Transactions on Systems, Man, and Cybernetics: Systems. Cited by: §1.
- (2007) Fake finger detection by finger color change analysis. In International Conference on Biometrics, pp. 888–896. Cited by: §1.