Specular- and Diffuse-reflection-based Face Liveness Detection for Mobile Devices

Specular- and Diffuse-reflection-based Face Liveness Detection for Mobile Devices

Akinori F. Ebihara                   Kazuyuki Sakurai                   Hitoshi Imaoka
NEC Biometrics Research Laboratories, Japan
a-ebihara@ct.jp.nec.com
Abstract

In light of the rising demand for biometric-authentication systems, preventing face spoofing attacks is a critical issue for the safe deployment of face recognition systems. Here, we propose an efficient liveness detection algorithm that requires minimal hardware and only a small database, making it suitable for resource-constrained devices such as mobile phones. Utilizing one monocular visible light camera, the proposed algorithm takes two facial photos, one taken with a flash, the other without a flash. The proposed descriptor is constructed by leveraging two types of reflection: (i) specular reflections from the iris region that have a specific intensity distribution depending on liveness, and (ii) diffuse reflections from the entire face region that represents the 3D structure of a subject’s face. Classifiers trained with descriptor outperforms other flash-based liveness detection algorithms on both an in-house database and on publicly available NUAA and Replay-Attack databases. Moreover, the proposed algorithm achieves comparable accuracy to that of an end-to-end, deep neural network classifier, while being approximately ten-times faster execution speed.

\DeclareUnicodeCharacter

2212-

1 Introduction

Figure 1: Construction of the proposed descriptor. (a) Two photos, one taken with a flash (top), the other without a flash (bottom), are shot within milliseconds. (b) Face and eye regions are extracted from the two photos and resized, and the Speculum Descriptor ( for each eye) and the Diffusion Descriptor () are calculated from these regions. Two example results obtained with real (left) and fake (right) faces are shown. Note that the actual region used to calculate is pixels square region inside the iris centered on the pupil. (c) The two descriptors are vectorized and concatenated to build the descriptor, which is then classified by standard classifiers such as a support vector machine (SVM, [1]) or a neural network.

A biometric authentication system has an advantage over a traditional password-based authentication system: it uses intrinsic features such as a face or fingerprint, so the user does not have to remember anything to be authenticated. Among the various biometric authentication systems, face-recognition-based ones take advantage of the huge variety of facial features across individuals, and thus have the potential to offer convenience and high security. Face authentication, however, has a major drawback common to other forms of biometric authentication: a nonzero probability of false rejection and false acceptance. While false rejection is less problematic, because a genuine user can usually make a second attempt to be authorized, false acceptance entails a higher security risk. When a false acceptance occurs, the system may actually be under an attack by a malicious imposter attempting to break into it. Acquiring facial images via social networks is now easier than ever, allowing attackers to execute a variety of attacks using printed photos or recorded video. The demand for technologies for detecting face spoofing (i.e., face liveness detection) is thus rising in an effort to ensure the security of sites deploying face recognition systems. Face recognition systems are being used at, for example, airports and office entrances and as login systems of edge devices. Each site has its own hardware availability; i.e., it may have access to a server that can perform computationally expensive calculations, or it may be equipped with infrared imaging devices. On the other hand, it may only have access to a low-performance CPU. It is thus natural that the suitable face liveness detection algorithm will differ according to the hardware availability. The advent of deep-learning technologies has allowed high-precision image processing that competes with human abilities at the expense of high computational cost. On the other hand, there is still a need for an efficient liveness detection algorithm that works with minimal computational resources. In this study, we focus on this case: liveness detection on a mobile phone equipped only with a CPU, without access to external servers.
In line the goal of developing liveness detection technology independent of hardware requirements, we decided to use one visible-light camera mounted on the front of the mobile device, and have devised an efficient, novel flash reflection-based face liveness detection algorithm.

Figure 2: The proposed and , calculated and averaged across the real face class (, blue) and the fake face class (, red). The ordinates have arbitrary units (A.U.). (a) Vectorized Speculum Descriptor, , calculated from iris regions. (b) Vectorized Diffusion Descriptor, , calculated from face regions.

The algorithm leverages specular and diffuse reflections from the iris regions and the facial surface, respectively. An all-white, bright screen is used as a flash simulator, and two facial photos, taken with and without a flash, are used to calculate the Speculum Descriptor, and the Diffusion Descriptor, . Both descriptors are based on the difference between the two facial photos and are normalized by the luminance intensities such that the descriptor magnitude is bounded in the range , thereby facilitating the training of classifiers and improvement of their classification accuracy (Fig.2).
Testing on a small in-house database containing image pairs per binary class (real or fake face), the proposed descriptor classified with a support vector machine (SVM) achieved the highest classification performance among other flash-based face liveness detection algorithms. Generalizability across different domains is verified by cross-database evaluation on NUAA and Replay-Attack databases; all classifiers are trained on the in-house database and tested on the two public databases. The results confirmed that the proposed algorithm not only outperforms other flash-based algorithms on the public databases but also achieves comparable or even better classification performance than that of a computationally expensive, end-to-end deep neural network that is ten times slower.
The proposed algorithm enables efficient, user-friendly, and accurate liveness detection. Its contributions are summarized below:

  1. Minimal hardware requirements: a single visible-light camera and a flash light-emitting device.

  2. Minimal computational requirements: implementable on mobile devices.

  3. Minimal database requirements: trainable with merely image pairs for both real and fake face classes.

  4. Minimal data label requirements: no auxiliary supervision such as depth or segmentation is needed.

  5. High detection accuracy, comparable to an end-to-end, deep neural network model, but with ten-times faster execution.

2 Related Work

The current liveness detection technologies aimed against spoofing attacks are summarized below. Face spoofing attacks can be subdivided into two major categories: 2D attacks and 3D attacks. The former includes print-attacks and video-replay attacks, while the latter includes 3D spoofing mask attacks. Several publicly available face liveness databases simulate these attacks. To name a few, the NUAA [2] and Print-Attack [3] databases simulate photo attacks. The Replay-Attack [4] and CASIA Face Anti-Spoofing [5] datasets simulate replay attacks in addition to photo attacks. The 3D Mask Attack Database[6] and HKBU-Mask Attack with Real World Variations [7] simulate 3D mask attacks. Example countermeasures to each attack type are summarized below.

2.1 Countermeasures to 2D attacks

Because of the reflectance of printed media and the use of photo compression, printed photos have surface textures or patterns that differ from those of a real human face, and these textures can be used to detect print attacks. Replay attacks are conducted by playing video on displays such as monitors or screens, which also have surface properties different from those of a real face. Here, local binary pattern (LBP,[4, 8, 9]), Gaussian filtering[10, 11], and their variants can be used to detect 2D attacks.
Infrared imaging can be used to counter replay attacks, because the display emits light only at visible wavelengths (i.e., a face does not appear in an infrared picture taken of a display whereas it appears in an image of an actual person [12]). Another replay-attack-specific surface property is moiré pattern [13].
A prominent feature of these 2D attacks is the flat, 2D structure of the spoofing media. Here, stereo vision[14], depth measurement from defocusing[15], and flash-based 3D measurements [16, 17, 18, 19] are effective countermeasures that detect flatness as a surrogate of 2D spoofing attacks. In this paper, we focus on using flash-based liveness detection to counter 2D attacks.
Some algorithms, including ours, construct descriptors from pictures taken with or without a flash. The following four are mono-colored-flash-based algorithms: (i) LBP_FI (LBP on the flash image), in which the LBP of a picture taken with a flash is used as a descriptor[16]; (ii) SD_FIC (the standard deviation of face intensity change), in which the standard deviation of the difference between photos of the same subject taken with and without a flash is used as a descriptor [16]; (iii) Face flashing, in which the descriptor is made from the relative reflectance between two different pixels in one photo taken with a flash, i.e., the reflectance of each facial pixel divided by that of a reflectance pixel (hereafter abbreviated as RelativeReflectance[17]); (iv) implicit 3D features, where pixel-wise differences in pictures taken with and without a flash are calculated and divided by the pixel intensity of the picture without the flash on a pixel-by-pixel basis [18]. We compare these algorithms with ours in Result section.

2.2 Countermeasures to 3D mask attacks

The recent 3D reconstruction and printing technologies have given malicious users the ability to produce realistic spoofing masks[20]. One example countermeasure against such a 3D attack is multispectral imaging. Steiner et al.[21] have reported the effectiveness of short-wave infrared (SWIR) imaging for detecting masks. Another approach is remote photoplethysmography (rPPG), which calculates pulse rhythms from periodic changes in face color [22].
In this paper, however, we do not consider 3D attacks because they are less likely due to the high cost of producing 3D masks. Our work focuses on preventing photo attacks and replay attacks.

2.3 End-to-end deep neural networks

The advent of deep learning has allowed researchers to construct an end-to-end classifier without having to design an explicit descriptor. Research on face liveness is no exception; that is, deep neural network-based countermeasures have been found for not only photo attacks but also replay and 3D mask attacks[23, 24, 25]. The Experiment section compares our algorithm’s performance with that of a deep neural network-based, end-to-end classifier.

3 Proposed algorithm

We propose a liveness detection algorithm that uses both specular and diffuse reflection of flash light. The iris regions and the facial surface are used to compute the Speculum Descriptor, , and the Diffusion Descriptor, , respectively. The two descriptors are vectorized and concatenated to build the descriptor, which can be classified as a real or fake face by using a standard classifier such as SVM or a neural network.
The procedure is as follows. During and after the flash illumination, two RGB, three-channel photos are taken: with a flash, and under background light. Since not all front cameras of smartphones are equipped with a flash LED, all display pixels are simultaneously excited with the highest luminance intensity to simulate light from a flash. The flash duration is approximately 200 ms.
Before and are calculated, the following common preprocessing functions are applied to both and :

  • Function : detect face location () from each image.

  • Function : extract locations of facial feature points () from each face.

  • Function : crop the region of interest.

  • Function : apply Gaussian filter to increase position invariance.

  • Function : resize the region of interest.

For face detection and facial feature extraction, the LBP-AdaBoost algorithm[26] and the supervised descent method[27] are used in combination. The parameters used for cropping, Gaussian filtration, and resizing differ according to whether or is being calculated.
The positions of the faces detected in the two photos, and , may potentially be different. However, because the flash duration is short and Gaussian filtration is applied, cropping the faces with two different binding boxes does not cause a major problem with face alignment.

3.1 : specular-reflection-based descriptor

Figure 3: Specular reflections from iris regions. (a) Region of interest (ROI). A square ROI whose edge is one-third of the horizontal eye length (“len” in the panel) is used to crop the iris region. (b) Real iris pictures taken with () or without () a flash. (c) Fake iris pictures taken with () or without () a flash. Top row: a print attack with a picture taken without a flash (). Bottom row: a print attack with a picture taken with a flash ().

Unlike a printed photo or image shown on a display, the human iris shows specular reflection (due to its curved, glass beads-like structure) when light is flashed in front of it. Thus, if is from a real face, a white spot reflecting the flash appears, whereas in the case of a real , a white spot does not appear (Fig.3b). On the other hand, if and are from a fake face, a white spot appears in neither of them, but if a flashed face is used as the spoof face, it appears in both of them (Fig.3c). To utilize this difference as , the iris regions are extracted from the cropped face according to . The iris regions are defined as two square boxes centered on each eye, having an edge length that is one-third of the horizontal length of the eye (Fig.3a). A Gaussian kernel with a two-pixel standard deviation is applied to each of the regions and then they are resized to pixels. Hereafter, the extracted and resized iris regions from both eyes are denoted as and . Pixel intensities at the vertical location , horizontal location , and eye position are denoted as and . An intermediate descriptor is calculated by pixel-wise subtraction of from , which is then normalized by the sum of the luminance magnitudes, , as follows:

(1)

Because and are greater than or equal to zero, .
One potential weakness of is its sensitivity to change in the position of the reflected-light spot. Depending on the relative position of the subject’s face and direction of the flash, the position of the white-reflection spot inside the iris region changes. Although Gaussian filtering increases positional invariance, the variance of the spot position is much larger than the Gaussian-kernel width. Thus, to further increase positional invariance, the elements of the vectorized descriptor that are originated from each eye are sorted in ascending order to obtain as follows:

(2)

The steps for calculating are summarized in the algorithm 1.

0:  ,
0:  
1:  
2:  
3:  
4:  
5:  
6:  for  do
7:     if (then
8:        
9:     else
10:        
11:     end if
12:  end for
13:  
14:  
15:  return  
Algorithm 1 Calculation of .

3.2 : diffuse-reflection-based descriptor

Although it has been confirmed that by itself can detect spoofing attacks, it has several pitfalls. Firstly, if a real subject is wearing glasses, the lens surface reflects the flash. The false-negative rate is increased when glasses-originated specular light contaminates the iris-originated specular light. Secondly, if a photo printed on a glossy paper is bent and used for an attack, there is a slight chance that the flash will reflect at the iris region of the printed photo, leading to increased false-negative rate. To compensate for this risk, we propose another liveness descriptor based on facial diffuse reflection, called the Diffusion Descriptor or . represents the surface structure of a face: real faces have the 3D structures, whereas fake faces have 2D flat surfaces. An intermediate descriptor is calculated from the pixel intensities in the face region (in a similar manner to equation 1) as follows:

(3)

where and are face regions in photos and , cropped with rectangles circumscribing all , filtered with a Gaussian kernel with a five-pixel standard deviation, and resized to pixels. Here, because and are greater than or equal to zero. Unlike the case of , the intermediate descriptor is vectorized without sorting to preserve the spatial integrity of the face region:

(4)

In light of the Lambertian model, we can understand why does represent the 3D structure of a face. Moreover, the Lambertian model explains an additional advantage of : color invariance (also see [18]). This can be seen as follows: assuming that the entire face is a Lambertian surface (i.e., a uniform diffuser), the surface-luminance intensity depends on the radiant intensity per unit area in the direction of observation. Thus, the pixel intensity at the vertical position and horizontal position can be described as:

(5)

where , , and denote the light-source intensity, surface reflectance coefficient, and angle of incidence, respectively. As equation 5 indicates, the luminance intensity depends on the 3D structure of the facial surface that determines . Additionally, depends on the surface reflectance . This means that differences in color of the surface (e.g., light skin vs. dark skin) affect the observed luminance intensity even under the same light intensity, . The design of equation 3 solves color-dependency problem by canceling out the surface reflectance . Under the assumption of a Lambertian surface, the terms and are expressed as:

(6)

where and are the intensities of the flash light and background light (ambient light), respectively. Since ambient light coming from all directions is integrated, the background-light term does not depend on the incident angle of the light. Substituting and into equation 3 yields the intermediate descriptor :

(7)

Equation 7 depends on and represents the 3D structure of the facial surface. Yet equation 7 is independent of the surface reflectance , thereby avoiding the skin-color problem. Thus, although Lambertian reflections from the facial surface can be modeled as a function of the surface reflectance and surface 3D structure, equation 3 cancels in order to confer color invariance as an additional advantage to . Algorithm 2 lists the steps for calculating .

0:  ,
0:  
1:  
2:  
3:  
4:  
5:  
6:  for  do
7:     if (then
8:        
9:     else
10:        
11:     end if
12:  end for
13:  
14:  return  
Algorithm 2 Calculation of .

Figure 4: Classifiers. (a) SVM classifier applied to the calculated descriptors, , , and . (b) FC layers interleaved with nonlinear operations (ReLU), classifying the same descriptors as in (a). (c) ResNet4, an end-to-end classifier taking and as a six-channel tensor.

3.3 descriptor

The two descriptors are concatenated into the descriptor:

(8)

The descriptor attains higher classification accuracy compared to either or alone. As both and are normalized in the range , also has a bounded descriptor magnitude that helps model training and classification. The experiments described below, however, tested not only ; the ablation studies tested and by themselves.

4 Experiments

4.1 Models

The classification performances of the proposed descriptors are evaluated using mainly SVM, either with a linear or radial basis function (RBF) kernel (Fig.4a). , , and are compared with four previously reported descriptors: SD_FIC (, [16]), LBP_FI (, [16]), RelativeReflectance (, [17]), and Implicit3D (, [18]).
For , a neural network consisting of four fully-connected (FC) layers is also tested as a classifier (Fig.4b). The first three FC layers have 200, 100, and 50 hidden units, respectively, interleaved with a rectified linear unit (ReLU). The descriptor is resized to before being used as an input to the first layer.
ResNet4 based on ResNet-version 2 [28] is constructed as an end-to-end deep neural network (Fig.4c). ResNet4 takes as input a six-channel image that is constructed with two three-channel images and concatenated along the channel axis. and are facial images of pixels, with and without a flash. The initial convolution is followed by one residual connection skipping the next two convolutions. After global average pooling, the FC layers classify the data into one of the two alternative classes.
All the analyses are conducted with customized scripts running on either MATLAB R2016b with an Intel®  CoreTM i7-4790 CPU@3.60GHz, or on python-TensorFlow1.4 with an NVIDIA GeForce GTX1080 or RTX2080 graphics card. SVM training is executed using the libsvm package[29].

4.2 Databases

As of early 2019, there is no publicly available facial-image database of images taken with and without a flash. Therefore, we collected 1176 and 1660 photos of real and fake faces of 20 subjects, respectively. The photographing conditions in the real and fake categories are varied as follows: two lighting conditions (bright office area and dark corridor area), and three facial-accessory conditions (glasses, a surgical mask, or nothing). To make the fake photos, the real faces are printed on papers or projected on displays in order to simulate photo- or display- attacks ( of the entire dataset is composed of display attacks). Leave-one-ID-out cross-validation is conducted using this in-house database, and the average equal error rate (EER) is calculated as an evaluation measure. The device used for the data collection is an iPhone7 (A1779). The display-attack devices are an iPad Pro (A1584) and ASUS ZenFone Go (ZB551KL).
To test the generalizability of the proposed method, a cross-database validation is conducted using two public databases, NUAA and Replay-Attack, which consist of 15 IDs / 12614 pictures and 50 IDs / 600 videos, respectively, simulating photo and display attacks. Due to the inaccessibility of the real subjects in the databases, only the false acceptance rate (FAR) is evaluated. The models are initially trained using all of the data in the in-house database, and then tested on NUAA and Replay-Attack databases.
To test vulnerability to the two attack types, the descriptor with the RBF-kernel SVM is tested separately against photo and replay attacks by using both the in-house database and the Replay-Attack database. Leave-one-ID-out-cross-validation was performed on the in-house database. To compare accuracies by using FAR, we fix the false rejection rate (FRR) to 0.1 % and evaluated FAR. On the Replay-Attack database, FAR is calculated.
In each training epoch, 10 % of the available non-test data is randomly selected as validation data, on which hyperparameter optimization is conducted.

4.3 Speed test on actual mobile devices

To compare the execution speeds of our proposed algorithm and ResNet4, a deep-neural-network classifier, a custom-made iOS application for liveness detection is built on Xcode 10.2.1 / MacBook Pro, written in Swift, C, and C. For Gaussian filtration, an OpenCV [30] built-in function is used. The app is then installed on an iPhone7 (A1779), iPhone XR (A2106), and iPad Pro (A1876) for the speed evaluation. Execution speed is measured during the preprocessing step and the descriptor calculation/classification step.

5 Results

5.1 Classification performance

A leave-one-ID-out cross-validation is conducted on the in-house database to calculate the average EER. The results are summarized in Table1, with the best model is highlighted in bold. The average and calculated using the in-house database are shown in Fig.2a and b, respectively. The results of the cross-database test, conducted with the NUAA and the Replay-Attack databases, are summarized in Table 2. Among all the descriptors-classifiers combination, the proposed descriptor with RBF kernel-SVM achieves the highest classification accuracy on average, on both the in-house database and on the two public databases. Moreover, its accuracy is comparable or even better than ResNet4, the end-to-end deep-neural-network classifier. The results of the evaluation by attack type are summarized in Table 3. Although the display attack slightly increases FAR on the in-house database, the cross-database evaluation on the Replay-Attack database resulted in a low FAR on both photo and display attacks, confirming that the proposed approach is robust against changes in attack types.

5.2 Execution speed on mobile devices

The results of the speed evaluation of the proposed algorithm and deep neural network classifier on the iPhone7, iPhone XR and iPad Pro are summarized in Fig.5. On all devices, the proposed algorithm is approximately ten-times faster in terms of the descriptor calculation/classification time than ResNet4.

Descriptor Classifier EER (%)
SD_FIC [16] SVM linear 34.19
RBF 33.93
LBP_FI [16] SVM linear 3.07
RBF 4.46
RelativeReflectance [17] SVM linear 25.75
RBF 10.45
Implicit3D [18] SVM linear 5.15
RBF 1.93
[PROPOSED] SVM linear 2.33
RBF 2.63
[PROPOSED] SVM linear 2.59
RBF 1.48
[PROPOSED] SVM linear 1.47
RBF 0.71
3 FC layers 0.91
ResNet4 0.90
Table 1: Leave-one-ID-out cross-validation results obtained with the in-house database. The best result is highlighted in bold.
NUAA Replay Attack
Descriptor Classifier FAR (%) FAR (%)
SD_FIC [16] SVM linear 16.15 39.28
RBF 27.71 32.81
LBP_FI [16] SVM linear 5.67 0.39
RBF 1.96 0.37
RelativeReflectance [17] SVM linear 22.87 27.42
RBF 10.67 5.19
Implicit3D [18] SVM linear 0.63 6.46
RBF 0.92 1.96
[PROPOSED] SVM linear 1.07 5.19
RBF 1.47 1.57
[PROPOSED] SVM linear 0.92 9.01
RBF 1.07 3.33
[PROPOSED] SVM linear 0.07 3.07
RBF 0.00 0.20
3 FC layers 0.36 0.10
ResNet4 0.23 0.29
Table 2: Cross-database validation results. The best result is highlighted in bold.
In-house database Replay-Attack
FAR@FRR=0.1 (%) FAR(%)
photo display photo display
0.88 2.66 0.34 0. 00
Table 3: Evaluation on spoofing subcategories by using descriptor with SVM-RBF kernel classifier.

6 Discussion

Liveness detection based on the descriptor achieves the highest classification performance among the flash-based algorithms tested. Moreover, its performance is comparable to that of more complex end-to-end deep-neural-network, ResNet4. The RBF kernel-based SVM classifier has approximately 6.6 million floating operations (FLOPs), while ResNet4 has 7 giga-FLOPs (which is larger than the original ResNets due to the large input size, large channel size, and lack of an initial pooling layer). Accordingly, the proposed descriptor classified with RBF kernel-SVM is approximately ten times faster than the deep neural network. Additionally, the model requires only a single camera and a single-colored flash light without an additional imaging device (e.g., IR camera) or auxiliary supervision (e.g., depth information). Thus, the proposed model’s high computational efficiency and minimal requirements will enable it to have a wide range of application.
The proposed is simple, yet it achieves competitive classification accuracy. One drawback is, however, that its accuracy is affected by the presence of eyewear. For example, sunglasses occluding the eye region make the liveness detection impossible. Even with transparent glasses, specular reflection on the lenses can potentially interfere with the descriptor. Because of this drawback, it is recommended to use simultaneously with . The combined descriptor achieved top-tier classification performance in all the evaluation schemes, indicating its robustness.
Both and are based on the difference between two photos, one taken with, the other without a flash, normalized by the pixel intensities of the two photos. Because of the normalization, the descriptors are bounded in the range , and the classification accuracy is better than those of unnormalized descriptors.
Both our original database and the Replay-Attack database contain display attacks, by which a photo or a video is played on an electronic display such as a smartphone or a tablet. Theoretically speaking, displays violate the assumption regarding : equation 3 applies to a Lambertian surface, whereas displays not only diffuse light but also emit light by themselves, interfering with . Despite the violation of this assumption, the classification performance of the proposed model on display attacks is comparable to its performance on photo attacks. This might have been because the subtraction in the numerator of equation 3 cancels out the display-emitted light. Although the emitted light increases the denominator, it decreases the overall descriptor magnitude (i.e., it “flattens” the descriptor) leading to correct classification of a face as a spoofing class rather than a real-face class.
When implementing an algorithm for practical use, it is generally difficult to choose one best algorithm, because each algorithm has different hardware requirements. A deep-neural-network model or high-resolution infrared-camera-based liveness detection are powerful algorithms, but they require computationally expensive processing units or specialized imaging devices. On the other hand, our proposed algorithm, with the descriptor classified with RBF kernel-SVM, is efficient yet effective, having minimal hardware and database requirements suitable for mobile devices, web cameras, and edge devices.

Figure 5: Summary of execution speeds. The proposed descriptor classified with the SVM-RBF kernel is compared with ResNet4. The preprocessing step is common to both classifiers. Execution speeds on iPhone7, iPhone XR, and iPad Pro are measured.

7 Conclusion

By using specular and diffusion reflection from a subject’s face, the proposed algorithm based on the descriptor achieved the best liveness detection accuracy among other flash-based algorithms at execution speed approximately ten-times faster than that of a deep neural network. The algorithm requires only one visible-light camera and a flash light. A small database containing image pairs per class with binary labels is sufficient to train a classifier using the descriptor, enabling the easy and wide application of the liveness detection algorithm. Experiments conducted on the algorithm operating on actual devices confirms that it has a practical level of performance on mobile devices without the need for computationally expensive processing units.

Acknowledgements

We would like to thank Koichi Takahashi, Kazuo Sato, Yoshitoki Ideta, and Taiki Miyagawa for the insightful discussions and supports of the project.

References

  • [1] V. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:774–780, 1963.
  • [2] Xiaoyang Tan, Yi Li, Jun Liu, and Lin Jiang. Face liveness detection from a single image with sparse low rank bilinear discriminative model. In Kostas Daniilidis, Petros Maragos, and Nikos Paragios, editors, Computer Vision – ECCV 2010, pages 504–517, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
  • [3] A. Anjos and S. Marcel. Counter-measures to photo attacks in face recognition: A public database and a baseline. In 2011 International Joint Conference on Biometrics (IJCB), pages 1–7, Oct 2011.
  • [4] I. Chingovska, A. Anjos, and S. Marcel. On the effectiveness of local binary patterns in face anti-spoofing. In 2012 BIOSIG - Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), pages 1–7, Sep. 2012.
  • [5] Z. Zhang, J. Yan, S. Liu, Z. Lei, D. Yi, and S. Z. Li. A face antispoofing database with diverse attacks. In 2012 5th IAPR International Conference on Biometrics (ICB), pages 26–31, March 2012.
  • [6] N. Erdogmus and S. Marcel. Spoofing in 2d face recognition with 3d masks and anti-spoofing with kinect. In 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), pages 1–6, Sep. 2013.
  • [7] S. Liu, B. Yang, P. C. Yuen, and G. Zhao. A 3d mask face anti-spoofing database with real world variations. In 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1551–1557, June 2016.
  • [8] Tiago de Freitas Pereira, André Anjos, José Mario De Martino, and Sébastien Marcel. Lbp − top based countermeasure against face spoofing attacks. In Jong-Il Park and Junmo Kim, editors, Computer Vision - ACCV 2012 Workshops, pages 121–132, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
  • [9] J. Maatta, A. Hadid, and M. Pietikainen. Face spoofing detection from single images using texture and local shape analysis. IET Biometrics, 1(1):3–10, March 2012.
  • [10] K. Kollreider, H. Fronthaler, and J. Bigun. Verifying liveness by multiple experts in face biometrics. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 1–6, June 2008.
  • [11] B. Peixoto, C. Michelassi, and A. Rocha. Face liveness detection under bad illumination conditions. In 2011 18th IEEE International Conference on Image Processing, pages 3557–3560, Sep. 2011.
  • [12] Lingxue Song and Changsong Liu. Face Liveness Detection Based on Joint Analysis of RGB and Near-Infrared Image of Faces. Electronic Imaging, 2018(10):373–1–373–6, January 2018.
  • [13] D. C. Garcia and R. L. de Queiroz. Face-spoofing 2d-detection based on moiré-pattern analysis. IEEE Transactions on Information Forensics and Security, 10(4):778–786, April 2015.
  • [14] Avinash Kumar Singh, Piyush Joshi, and G.C. Nandi. Face liveness detection through face structure analysis. International Journal of Applied Pattern Recognition, 1(4):338, 2014.
  • [15] S. Kim, S. Yu, K. Kim, Y. Ban, and S. Lee. Face liveness detection using variable focusing. In 2013 International Conference on Biometrics (ICB), pages 1–6, June 2013.
  • [16] P. P. K. Chan, W. Liu, D. Chen, D. S. Yeung, F. Zhang, X. Wang, and C. Hsu. Face liveness detection using a flash against 2d spoofing attack. IEEE Transactions on Information Forensics and Security, 13(2):521–534, Feb 2018.
  • [17] Di Tang, Zhe Zhou, Yinqian Zhang, and Kehuan Zhang. Face flashing: a secure liveness detection protocol based on light reflections. arXiv, abs/1801.01949, 2018.
  • [18] J. Matias Di Martino, Qiang Qiu, Trishul Nagenalli, and Guillermo Sapiro. Liveness detection using implicit 3d features. arXiv, abs/1804.06702, 2018.
  • [19] Yao Liu, Ying Tai, Ji-Lin Li, Shouhong Ding, Chengjie Wang, Feiyue Huang, Dongyang Li, Wenshuai Qi, and Rongrong Ji. Aurora guard: Real-time face anti-spoofing via light reflection. arXiv, abs/1902.10311, 2019.
  • [20] Si-Qi Liu, Pong C. Yuen, Xiaobai Li, and Guoying Zhao. Recent Progress on Face Presentation Attack Detection of 3d Mask Attacks. In Sébastien Marcel, Mark S. Nixon, Julian Fierrez, and Nicholas Evans, editors, Handbook of Biometric Anti-Spoofing, pages 229–246. Springer International Publishing, Cham, 2019.
  • [21] H. Steiner, A. Kolb, and N. Jung. Reliable face anti-spoofing using multispectral swir imaging. In 2016 International Conference on Biometrics (ICB), pages 1–8, June 2016.
  • [22] Siqi Liu, Pong C. Yuen, Shengping Zhang, and Guoying Zhao. 3d mask face anti-spoofing with remote photoplethysmography. In Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, editors, Computer Vision – ECCV 2016, pages 85–100, Cham, 2016. Springer International Publishing.
  • [23] Jianwei Yang, Zhen Lei, and Stan Z. Li. Learn convolutional neural network for face anti-spoofing. arXiv, abs/1408.5601, 2014.
  • [24] D. Menotti, G. Chiachia, A. Pinto, W. R. Schwartz, H. Pedrini, A. X. Falcão, and A. Rocha. Deep representations for iris, face, and fingerprint spoofing detection. IEEE Transactions on Information Forensics and Security, 10(4):864–879, April 2015.
  • [25] Chaitanya Nagpal and Shiv Ram Dubey. A performance evaluation of convolutional neural networks for face anti spoofing. arXiv, abs/1805.04176, 2018.
  • [26] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I, Dec 2001.
  • [27] X. Xiong and F. De la Torre. Supervised descent method and its applications to face alignment. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 532–539, June 2013.
  • [28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part IV, pages 630–645, 2016.
  • [29] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  • [30] G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
384149
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description