Simultaneous Iris and Periocular RegionDetection Using Coarse Annotations

Simultaneous Iris and Periocular Region
Detection Using Coarse Annotations

Diego R. Lucio1, Rayson Laroca1, Luiz A. Zanlorensi1, Gladston Moreira2, David Menotti1
1Laboratory of Vision, Robotics and Imaging, Federal University of Paraná, Curitiba, PR, Brazil
2Laboratory of Intelligent Systems Computation, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
1{drlucio, rblsantos, lazjunior, menotti}@inf.ufpr.br  2gladston@iceb.ufop.br
Abstract

In this work, we propose to detect the iris and periocular regions simultaneously using coarse annotations and two well-known object detectors: YOLOv2 and Faster R-CNN. We believe coarse annotations can be used in recognition systems based on the iris and periocular regions, given the much smaller engineering effort required to manually annotate the training images. We manually made coarse annotations of the iris and periocular regions ( 122K images from the \gls*vis spectrum and  38K images from the \gls*nir spectrum). The iris annotations in the \gls*nir databases were generated semi-automatically by first applying an iris segmentation CNN and then performing a manual inspection. These annotations were made for 11 well-known public databases (3 \gls*nir and 8 \gls*vis) designed for the iris-based recognition problem, and are publicly available to the research community111All annotations made by us are publicly available at the following website: https://web.inf.ufpr.br/vri/databases/iris-periocular-coarse-annotations/.. Experimenting our proposal on these databases, we highlight two results. First, the Faster R-CNN + \gls*fpn model reported an \gls*iou higher than YOLOv2 (91.86% vs 85.30%). Second, the detection of the iris and periocular regions being performed simultaneously is as accurate as performed separately, but with a lower computational cost, i.e., two tasks were carried out at the cost of one.

\newacronymstyle

long-short-br \GlsUseAcrEntryDispStylelong-short\GlsUseAcrStyleDefslong-short \setacronymstylelong-short-br

\newacronym

asmASMActive Shape Model \newacronymasefASEFAverage of Synthetic Exact Filters \newacronymleasmLE-ASMLocal Eyebrow Active Shape Model \newacronymcnnCNNConvolutional Neural Network \newacronymencdecEDEncoder-Decoder \newacronymfcnFCNFully Convolutional Network \newacronymganGANGenerative Adversarial Network \newacronymhogHOGHistogram of Oriented Gradients \newacronymiouIoUIntersection over Union \newacronymmapmAPmean Average Precision \newacronymiupuiIUPUIIUPUI Multiwavelength \newacronymmasdv1MASD.v1Multi-Angle Sclera Dataset.v1 \newacronymnirNIRnear-infrared \newacronymprdPRDPeriocular Region Detection \newacronymreluLeaky ReLULeaky Rectified Linear Unit \newacronymvfcVFCVector Field Convolution

\newacronym

roiROIRegion of Interest \newacronymsvmSVMSupport Vector Machines \newacronymvjVJViola-Jones \newacronymvisVISvisible \newacronymrpnRPNRegion Proposal Network

\newacronym

fastFast R-CNNFast Region-Based Convolutional Neural Network \newacronymfpnFPNFeature Pyramid Network \newacronymgs4MICHE-GS4MICHE Galaxy S4 Subset \newacronymgt2MICHE-GT2MICHE Galaxy Tab 2 Subset \newacronymip5MICHE-IP5MICHE Iphone 5 Subset \newacronymvisob_iphone_dayVISOB-IPHONE-DAYVISOB Iphone day light subset \newacronymvisob_iphone_dimVISOB-IPHONE-DIMVISOB Iphone dim light subset \newacronymvisob_iphone_officeVISOB-IPHONE-OFFICEVISOB Iphone office light subset

\newacronym

visob_oppo_dayVISOB-IPHONE-DAYVISOB Iphone day light subset \newacronymvisob_oppo_dimVISOB-IPHONE-DIMVISOB Iphone dim light subset \newacronymvisob_oppo_officeVISOB-IPHONE-OFFICEVISOB Iphone office light subset

\newacronym

visob_oppo_dayVISOB-OPPO-DAYVISOB Oppo day light subset \newacronymvisob_oppo_dimVISOB-OPPO-DIMVISOB Oppo dim light subset \newacronymvisob_oppo_officeVISOB-OPPO-OFFICEVISOB Oppo office light subset

\newacronym

visob_samsung_dayVISOB-SAMSUNG-DAYVISOB Galaxy note day light subset \newacronymvisob_samsung_dimVISOB-SAMSUNG-DIMVISOB Galaxy note dim light subset \newacronymvisob_samsung_officeVISOB-SAMSUNG-OFFICEVISOB Iphone Galaxy note light subset

\newacronym

iiitdcliIIIT-D CLIIIIT-Delhi Contact Lens Iris

\newacronym

ndcld15NDCLD15Notre Dame Contact Lens Detection 2015

\newacronym

mobbioMobBIOMobBIO Subset

\newacronym

ndcclNDCCLNotre Dame Cosmetic Contact Lenses

\newacronym

frgcFRGCFace Recognition Grand Challenge \newacronymmbgcMBGCMultiple Biometrics Grand Challenge \newacronymmrfMRFMarkov Random Field

I Introduction

\glsresetall

In recent years, the interest in biometrics to automatically identify and/or verify a person’s identity has greatly increased [1, 5]. Biometrics refers to the use of physiological and behavioral characteristics of humans for personal identification [11]. Such characteristics are particularly important since they cannot be changed, forgotten, lost or stolen, providing an unquestionable connection between the individual and the application that makes use of them [31].

Several characteristics of the human body can be used as biometrics such as fingerprints, face, ocular region components and voice, each with advantages and disadvantages. Among the aforementioned modalities, ocular biometric traits have received significant attention in the recent past [12, 33, 27] due to the fact that the ocular region is an important and interrelated human trait consisting of several parts, for example, the cornea, lens, optic nerve, retina, pupil, iris, and periocular region. In this direction, many authors proposed biometric systems based on iris, periocular, retina, and sclera regions, as they are considered potential biometric modalities [19, 34].

The iris appears as one of the main biological characteristics in security systems since it remains unchanged over time and its uniqueness level is high [55]. Furthermore, the identification using the iris region is non-invasive, that is, there is no need for physical contact to obtain and analyze an iris image [20]. However, after decades of research in personal identification, it has been observed that better results can be achieved by combining different biometric modalities [33, 50, 8]. A good example of it is the combination of iris and periocular-based biometrics [54, 30].

In this work, we compare the detection of the iris and periocular regions being performed separately and simultaneously using two well-known object detection networks: YOLOv2 [41] and Faster R-CNN [42]. Such deep models were chosen due to the fact that (i) promising results were recently obtained using them in other detection tasks [17, 22, 23]; and (ii) handcrafted features are easily affected by noise and might not be robust for unconstrained scenarios.

Typically, in biometric systems that use iris and/or periocular region images as input, the first step in which efforts should be applied is the detection of the \gls*roi [13], as a poor detection would probably impair the effectiveness of the subsequent steps of the system [14, 50]. Recently, Zanlorensi et al. [56] showed that impressive iris recognition rates can be achieved when using deep representations having as system input the bounding boxes of the iris region, without the iris segmentation preprocessing. Also using deep representations and having as input a squared region (i.e., a bounding box), Luz et al. [28] achieved state-of-the-art results for periocular recognition. Such results, shorter execution times compared to single detection approaches (in which the iris and the periocular region are detected separately), and the promising results obtained in preliminary experiments support our motivation to detect both regions simultaneously.

The main contributions of this paper can be summarized as follows: (i) two new approaches for the simultaneous detection of the iris and periocular region; (ii) a comparative evaluation between detecting both regions simultaneously or separately in eleven publicly available databases; and (iii) for learning the models used in the experiments, coarse annotations (i.e., bounding boxes) were manually made for both iris and periocular regions of images from well-known \gls*vis spectral databases. As stated by Cordts et al. [10], coarse annotations are intended to support research areas that exploit large volumes of data. We also automatically generated bounding boxes using the iris segmentation approach proposed by Bezerra et al. [4] for well-known \gls*nir spectral databases. We manually checked and corrected (if necessary) all annotations.

We chose the approach proposed in [4] due to the fact that it presented an error rate lower than % in the aforementioned \gls*nir databases. However, despite the good results presented by that segmentation approach, the detection task is much less expensive in terms of both computational cost and data annotation. Regarding the databases employed in our experiments, they were chosen because they are widely used in the biometric recognition literature [56, 48, 2, 16], which we plan to investigate in future works. It should be noted that, in many works in the literature, no more than three databases were used in the experiments [43, 57, 3, 59].

In our experiments, the Faster R-CNN model yielded \gls*iou values higher than YOLOv2 (% vs %) and the detection of the iris and periocular regions being performed simultaneously is as accurate as performed separately, but with a lower computational cost, i.e., two tasks were carried out at the cost of one. Regarding the use of coarse annotations, we believe they can be used in recognition systems based on the iris and periocular regions, given the much smaller engineering effort required to manually annotate the training images.

The remainder of this paper is organized as follows. We review related works in Section II. In Section III, our methodology is described. Section IV and Section V present, respectively, the experimental setup and the results obtained. Finally, conclusions and future work are discussed in Section VI.

Ii Related Work

In this section, we discuss works related to iris and periocular region detection and conclude with final remarks.

Ii-a Iris Detection

Regarding iris detection, the works in the literature commonly show the detected \gls*roi using two different representations. Fig. (a)a shows the use of a rectangular bounding box as the iris delimitation, while Fig. (b)b shows an elliptical \gls*roi detection using the outer iris boundary.

(a) Rectangular bounding box
(b) Outer iris contour
Fig. 1: Samples of representation to iris \gls*roi extraction.

Many works in the literature show the iris delimitation by using an elliptical contour around the outer edge of it. Daugman [13] pioneered this scenario by proposing an approach that makes the use of an integro-differential operator to detect the iris identifying the borders present in the images. This operator takes into account the circular shape of the iris to find its correct position by maximizing the partial derivative concerning the radius. In the experiments, the author employed a private database composed of eye images captured in the \gls*nir wavelength from subjects.

Zhang & Ma [57] adopted a method that employs a momentum-based level set [24, 58] along with the Daugman’s operator to locate the pupil boundary. Specifically, an initial contour of the iris is obtained with a momentum-based level set using the minimum average gray level. Then, the integro-differential operator is applied to perform the final detection, reducing the execution time and improving the results obtained in [13]. An accuracy rate of was achieved on the CASIA-IrisV2 database [6]. Such improvement occurs because the initially detected contour is generally close to the actual inner boundary of the iris.

Alvarez-Betancourt & Garcia-Silvente [3], on the other hand, presented an iris location method based on the detection of circular boundaries through gradient analysis in points of interest of successive arcs, reaching an accuracy of on the CASIA-IrisV3 database [7] with improvements in processing time. The quantified majority operator QMA-OWA, proposed in [36], was used to obtain a representative value for each successive arc. Then, the iris boundary is given by the arc with the most significant representative amount.

In the method proposed by Cui et al. [60], the eyelashes are removed as a first step using the dual-threshold method, which can be an advantage over other iris location approaches. Next, the facula is removed by using mathematical morphology. Finally, the accurate iris position is obtained through Hough-Transform and least-squares. Their method achieved % accuracy in the CASIA-IrisV3-Twins database [7].

Zhou et al. [59] presented a method for iris location in which the initial position of the iris is obtained by using the \gls*vfc technique. This initial estimate makes pupil location much closer to the actual boundary instead of circle fitting, improving location accuracy and reducing computational cost. The final result is obtained using the algorithm proposed by Daugman, reducing the computational cost and improving the location accuracy since the pupil delineation is much closer to the actual boundary. An accuracy rate of % was reported on the CASIA-IrisV2 database.

Su et al. [49] proposed an iris location algorithm based on local property and iterative searching, achieving accuracy on the CASIA-IrisV1 and CASIA-IrisV3 databases (i.e., they were combined in their experiments). In order to detect the \gls*roi, the pupil area is extracted using iris regional attributes, and the inner edge of it is fitted by iterating, comparing and sorting the pupil edge points. The outer edge location is made by using an iterative searching method from the extracted pupil center and radius, with a shorter time in relation to the approaches available in the literature.

Chen & Ross [9] designed a multi-task \gls*cnn-based approach for joint iris and presentation attack detection. The experiments were performed on six publicly available databases, however, iris detection results were not reported as the main focus of their work is to identify presentation attacks.

Severo et al. [47] represented the iris as a rectangular bounding box. They fine-tuned the Fast-YOLOv2 model, which is much faster but less accurate than YOLOv2, in order to perform the \gls*roi extraction, overcoming problems such as noise, eyelids, eyelashes and reflections. Six public databases were used to evaluate their method, which attained accuracy rates above % in all of them.

Wang et[52] recently introduced IrisParseNet, a network for iris detection that reached %, % and % \gls*iou values in the CASIA-Iris-Distance, UBIRIS.v2 and MICHE-I databases, respectively. Their method simultaneously estimates the pupil center, the iris segmentation mask, and the iris inner/outer boundaries.

Ii-B Periocular Region Detection

Park et al. [35] proposed one of the first biometric approaches based on the periocular region, featuring an eye region detector that uses face images detected by the Viola-Jones detector [51] as input and outputs the periocular region.

Similarly, Juefei-Xu & Savvides [21] also proposed a periocular region detection approach that employs as input a face image detected by the Viola-Jones detector. Nevertheless, the periocular region is identified using \glspl*asm that identify facial landmarks, containing points relative to the eye region among them.

Mahalingam et al. [29] designed an eye detector that receives a face image and outputs the periocular region through \gls*asef. All experiments were carried out on a private database composed of  million faces from  subjects. Le et al. [25], on the other hand, proposed a \gls*leasm to first detect the eyebrow region directly from a given face image and then to detect the periocular region using \glspl*asm. The results obtained on this particular stage (i.e., periocular region detection) were not reported.

Proença et al. [37] proposed a \gls*mrf method to segment the periocular region components and other elements around them (i.e., the iris, sclera, eyelashes, eyebrows, hair, skin and glasses). Their approach analyzes the image pixels and outputs the segmented region taking into account appearance and geometrical constraints and assuring that the system output is biologically plausible. The periocular region can be predicted by combining the outer limits of the sclera and the lower eyelashes.

Ii-C Final Remarks

In most works, the accuracy was employed as the evaluation metric for iris and periocular region detection. However, the authors used different protocols to calculate the accuracy or do not specifically describe how the accuracy obtained by their approach was computed. Therefore, it is plausible to question how robust one method is compared to another.

While in the iris detection scenario a poor description of the evaluation metrics used has been made, in the periocular region detection scenario none of the studies found in the literature report the results achieved in this particular stage, probably due to the fact that the detection of the \gls*roi was considered only as a preprocessing step in such works [35, 21, 29, 25].

Taking this information into consideration and also the fact that \glspl*cnn are not widely explored in the iris and periocular region detection domains, we propose to evaluate two well-known \gls*cnn object detectors (i.e., YOLOv2 and Faster R-CNN) in eleven coarsely annotated databases.

More specifically, the main objective of this work is to evaluate the simultaneous detection of the iris and periocular regions. The simultaneous detection approach is proposed taking into account the assumption that \glspl*cnn are able to understand the context present in the images, thus improving the results obtained by conventional single detection approaches.

Iii Methodology

Currently, one of the most accurate ways to perform image classification, segmentation and object detection is using deep \glspl*cnn. Therefore, in this work, we propose the simultaneous detection of the iris and periocular regions using two object detection models: YOLOv2 [41] and Faster R-CNN [42]. It should be noted that (i) we trained both models from scratch; (ii) such models were chosen because promising results were obtained using them in other detection tasks [17, 22, 23].

Our hypothesis is that the proposed simultaneous detection approach is able to understand the context of the image and thereby improve detection results compared to single detection approaches in which the iris and the periocular region are detected separately. As baselines, we also adopted the YOLOv2 and Faster R-CNN models, but in two independent detection steps, i.e., one for the iris and one for the periocular region.

Iii-a YOLOv2

Table I presents the YOLOv2 model, employed for detecting the iris and the periocular region. The architecture has convolutional and max-pooling layers. The convolutional layers, except for the last one, are divided into two groups: external and internal. The layers belonging to the external group use kernels of size , whereas the layers belonging to the internal group use kernels of size . Alternating convolutional layers reduce the features space from preceding layers [40]. The convolutional blocks are composed of: convolution, batch normalization, and a \gls*relu.

As this model does not have fully connected layers, it can receive images of any size as input. We adopted an input size of pixels due to the good results achieved employing these dimensions in [41]. We also reduced the number of filters in the last convolutional layer to match our number of classes. The number of filters in that layer is given by

(1)

where is the number of anchor boxes (we use  = ) used to predict bounding boxes and is the number of classes, in our case either or to detect the iris and periocular regions separately or simultaneously, respectively. Thus, there are filters in the last convolutional layer when the regions are detected separately and when they are detected simultaneously.

The main difference between the YOLOv2 model proposed in [41] and the one used in this work is that we removed the route layers, i.e., layers that concatenate a list of previous layers together. In preliminary experiments, we observed that removing such layers did not negatively affect the results obtained in our tasks and also reduced the execution time.

# Layer Group Filters Size Input Output
conv External
max
conv External
max
conv External
conv Internal
conv External
max
conv External
conv Internal
conv External
max
conv External
conv Internal
conv External
conv Internal
conv External
max
conv External
conv Internal
conv External
conv Internal
conv External
conv
detection
TABLE I: The YOLOv2 model, modified for the detection of the iris and the periocular region. There are filters in the last convolutional layer when the regions are detected separately and when they are detected simultaneously.

Iii-B Faster R-CNN + Feature Pyramid Network

We employ the Faster R-CNN model [42] combined with a \gls*fpn [26], as shown in Figure 3. Faster R-CNN is commonly composed of (i) a feature map extraction network; (ii) a region proposal network and (iii) a detection network. We replaced the standard \gls*cnn feature extraction module by an \gls*fpn, and thus multiple feature map layers are generated with better quality information than the regular implementation of Faster R-CNN.

Iii-C Coarse Annotations

In this work, we use coarse annotations both to train and to evaluate our networks. As can be seen in Fig. 2, we define as a coarse annotation the region around the \gls*roi so that the edges of the bounding box remain outside the limits of the fine annotations proposed by Severo et al. [47]. More specifically, the delimited region is larger than the one typically used in fine annotations, and the iris is not well-centered. Also, in some cases, the eyebrows were left out the \gls*roi, as the images from some databases used in this work do not contain that region.

It is worth noting that the coarse annotations were made manually by two volunteers and that no strict rules of how annotations should be made were defined (besides simple instructions and the fact that were coarse and not fine annotations). Hence, there are random variations (in size, position, aspect ratio, etc.) among annotations of different images.

(a) Fine
(b) Coarse
Fig. 2: Examples of fine and coarse annotations of both the iris (red) and the periocular region (yellow).

We believe coarse annotations can be used in recognition systems based on the iris and/or the periocular region, given the much smaller engineering effort required to manually annotate the training images. In other words, we conjecture that deep models for person identification may achieve promising results even when these regions are not perfectly segmented.

Fig. 3: Faster R-CNN + \gls*fpn architecture overview.

Iv Experimental Setup

In this section, we present the databases and also the evaluation protocol used in our experiments. The experiments were carried out on eleven databases, which are described in Section IV-A. Note that we trained/tested the networks on each dataset separately. All experiments were performed on a computer with an Intel® Core™ i- GHz CPU, GB of RAM and two NVIDIA Titan Xp GPUs.

Iv-a Databases

We employed the following public databases: CASIA-Iris-Interval [7], CASIA-Iris-Lamp [7], CASIA-Iris-Thousand [7], Cross-Eyed-VIS [45], CSIP [44], MICHE-I [15], MobBIO [46], NICE-II [38], PolyU-VIS [32], UBIRIS.v2 [38] and VISOB [39]. An overview of the important features of all databases used in this work can be seen in Table II. These databases were chosen because they are widely used in the biometric recognition literature [56, 48, 2, 16], which we plan to investigate in future works.

Database Year Images Subjects Resolution Wavelength
CASIA-Iris-Interval [7] \gls*nir
CASIA-Iris-Lamp [7] \gls*nir
CASIA-Iris-Thousand [7] \gls*nir
Cross-Eyed-VIS [45] \gls*vis
CSIP * [44] Various \gls*vis
MICHE-I * [15] Various \gls*vis
MobBIO [46] \gls*vis
NICE-II [38] n/a \gls*vis
PolyU-VIS [32] \gls*vis
UBIRIS.v2 [38] \gls*vis
VISOB* [39] Various \gls*vis
\gls*nir
\gls*vis
Total
* Cross-sensor databases
TABLE II: Overview of the important features of the databases used in this work. All of these are a subset of the original database.

CASIA-Iris-Interval: the iris images of this database were captured with a close-up iris camera developed by the authors themselves. The database consists of images from subjects and classes, with a resolution of pixels, obtained in two sections.

CASIA-Iris-Lamp: the images were collected using a non-fixed sensor and, thus, the individuals collected the iris image with the sensor in their own hands. While capturing the images, a lamp was turned on and off in order to produce more intraclass variations due to pupil contraction and expansion, creating a nonlinear deformation. A total of images with a resolution of pixels from subjects and classes were collected in a single section.

CASIA-Iris-Thousand: this database contains iris images from subjects with a resolution of pixels, which were collected in a single section using an IKEMB- camera.

Cross-Eyed-VIS: this database subset is composed of \gls*vis images. Eight images of each eye were captured from subjects, totaling images. The images have dimensions of pixels. All images were obtained at a distance of meters, in an uncontrolled indoor environment, with a wide variation of ethnicity, eye colors, and lighting conditions.

CSIP: this database has images acquired with four different mobile devices: Sony Ericsson Xperia Arc S (rear ), iPhone  (front , rear ), THL W (front , rear ), and Huawei U (front , rear ). The database has images from subjects.

MICHE-I: this database contains images from subjects acquired by mobile devices in visible light. In order to simulate a real application, the iris images were obtained by the users themselves, indoors and outdoors, with and without glasses. Images of only one eye of each individual were captured. The mobile devices used and their respective resolutions are the following: iPhone  (), Samsung Galaxy S () and Samsung Galaxy Tablet II ().

MobBIO: this database has face, iris, and voice biometric data belonging to subjects. The data was acquired with the mobile device Asus Transformer Pad (TFT). The iris images were obtained in two different lighting conditions, with varying eye orientations and occlusion levels. For each subject, 16 images (8 of each eye) were captured.

NICE-II: this database, a subset of UBIRIS.v2, contains images with a resolution of pixels and was employed in the NICE.II contest. The number of subjects of this set was not directly specified.

PolyU-VIS: this database has iris images with a resolution of pixels, with images of each eye from subjects obtained in the visible spectrum [32].

UBIRIS.v2: this database contains RGB images captured with a Canon EOS D camera and resolution of pixels, from subjects (i.e., irises) [38].

VISOB: front cameras of three mobile devices were used to obtain the images of this database, such as the iPhone S at p resolution, Samsung Note  at p resolution and Oppo N also at p resolution. The images were captured in sessions for each of the visits, which occurred between and weeks, totaling images from subjects.

Iv-B Evaluation Protocol

The evaluation of an automatic detection approach is performed in a pixel-to-pixel comparison between the ground truth and the predicted bounding boxes. Therefore, we use the mean -score, \gls*iou and \gls*map evaluation metrics. Following Severo et al. [47], to first compute the precision and recall metrics and then the -score, we consider as correct the bounding boxes detected with an \gls*iou value above  with the ground truth. This bounding box evaluation, defined in the PASCAL VOC Challenge [18], is interesting since it penalizes both over- and under-estimated objects.

It is worth noting that we use coarse annotations as the ground truth, as the databases do not provide fine annotations of the position of the iris and periocular regions on each image. In this sense, instead of evaluating the predicted bounding boxes in relation to the exact location of the iris/periocular region, we evaluated how close to the ground truth it is.

In order to perform a fair evaluation and comparison of the proposed approaches, we divided each database into three subsets, being of the images for training, for testing and for validation. We adopt this protocol (i.e., with a larger test set) to provide more samples for analysis of statistical significance. Also, in the statistical direction, we perform the Wilcoxon signed-rank test [53] to verify if there is a statistical difference between the detection approaches.

Database F-score \gls*iou (%) \gls*map (%)
YOLOv2 Faster R-CNN YOLOv2 Faster R-CNN YOLOv2 Faster R-CNN
Multi Single Multi Single Multi Single Multi Single Multi Single Multi Single
Iris
CASIA-Iris-Interval 94.77 100.00 100.00 100.00 93.98
CASIA-Iris-Lamp 97.31 99.98 99.98 99.73 97.31
CASIA-Iris-Thousand 97.72 99.96 99.97 99.65 97.58
Cross-Eyed-VIS 90.44 100.00 100.00 100.00 90.44
CSIP 91.61 98.68 98.69 100.00 91.55
MICHE-I 86.48 97.39 94.32 100.00 92.48
MobBIO 94.14 100.00 100.00 100.00 93.79
NICE-II 88.41 98.92 99.32 99.32 78.20
PolyU-VIS 93.81 99.74 93.79 100.00 89.31
UBIRIS.v2 85.26 99.35 99.00 100.00 85.29
VISOB 93.09 99.53 99.34 99.90 92.80
Periocular Region
CASIA-Iris-Interval 97.80 98.62 100.00 100.00 97.80
CASIA-Iris-Lamp 98.08 99.95 99.95 99.97 97.70
CASIA-Iris-Thousand 98.19 99.89 99.94 99.97 98.18
Cross-Eyed-VIS 92.74 97.84 99.66 100.00 92.56
CSIP 92.96 99.83 100.00 83.61 92.96
MICHE-I 83.66 93.82 96.33 98.77 93.51
MobBIO 95.50 100.00 100.00 100.00 94.83
NICE-II 86.91 97.23 99.55 99.76 86.66
PolyU-VIS 96.74 99.48 99.56 100.00 96.41
UBIRIS.v2 85.44 83.12 98.35 99.64 85.44
VISOB 96.35 95.64 99.98 99.83 96.35
TABLE III: Detection results. The Single and Multi columns present the results obtained when detecting the iris and periocular regions separately and simultaneously, respectively. The values in bold represent the highest \gls*iou values obtained, while the highlighted results indicate the cases in which there is no statistical difference according to the Wilcoxon statistical tests.

V Results

The experiments were carried out using the protocol presented in Section IV-B. To compare the proposed approaches, we report the -score values in order to analyze the trade-off between precision and recall measures, however, we focus on the \gls*iou metric since we want to assess how close are the predicted bounding boxes compared to the ground truth.

When analyzing the results regarding iris detection (see top of Table III), in of experiments the highest mean \gls*iou value was achieved using Faster R-CNN. In general, the best results were obtained when simultaneously detecting the iris and the periocular region. The exceptions are in the CASIA-Iris-Lamp, Cross-Eyed-VIS, MICHE-I, and UBIRIS.v2 databases, where detecting both regions separately performed better, probably due to the fact that there are not many variations in iris and periocular region arrangement in the images of these databases. However, as the difference in the results obtained with both approaches is very small, we applied the Wilcoxon signed-rank test and observed that there is no statistical difference between detecting the iris and the periocular region simultaneously or separately in the CASIA-Iris-Lamp, Cross-Eyed-VIS, CSIP and MobBIO databases. In this way, in Table III, we highlighted (light gray) the results obtained in these databases.

Similar behavior occurred in the detection of the periocular region, however, in this case, all the best results were attained employing the Faster R-CNN model. In this scenario, the detection results using the single-class detection approach CSIP, UBIRIS.v2 and VISOB databases presented the best values. Similar to the results on iris detection, the difference between the \gls*iou values attained between the approaches is close and there is no statistical difference in the CASIA-Iris-Thousand, Cross-Eyed-VIS and NICE-II databases and that result was also highlighted in Table III.

We emphasize that most of the best results were obtained using the Faster R-CNN + \gls*fpn approach, which we believe to be justified by the fact that \glspl*fpn perform a better feature map extraction compared to other approaches [26].

It should be noted that the \gls*iou values obtained were higher than % for both iris and periocular region detection in the databases where the images were captured using a \gls*nir sensor. These results were achieved by using the Faster R-CNN simultaneous detection approach, and the better detected iris and periocular region can be seen in Figure 4.

(a) Periocular Region
(b) Iris Detection
Fig. 4: Best iris and periocular region detection performed by the Faster R-CNN simultaneous detection approach. The green bounding boxes represent the coarse annotations, while the red ones represent the detected regions.

Despite the good results it is necessary observe that in the databases in which the images were captured using more than one sensor, that there was no preprocessing of the image (i.e., MICHE-I and CSIP) or that composed with lower quality images (i.e., UBIRIS.v2 and NICE-II) we obtained results with \gls*iou values lower than % when detecting both the iris and periocular regions simultaneously. By analyzing these images, we can understand what made the results obtained by the approaches on these databases below than % of \gls*iou: i) the use of eyeglasses; ii) the presence of more than one eye;

Vi Conclusions

In this work, we compared the detection of the iris and the periocular region being performed separately or simultaneously using two well-known object detectors, observing a better performance of the Faster R-CNN + \gls*fpn approach.

The detection of both regions being performed simultaneously produced better results in most databases, for both the iris and the periocular region. This leads us to believe that using this approach gives the neural network a certain understanding of the context present in the image.

We also coarsely labeled images for iris and periocular region detection. These annotations are publicly available to the research community, assisting the development and evaluation of new detection approaches as well as the fair comparison among published works.

There is still room for improvements in the simultaneous detection of iris and periocular region. As future work, we intend to (i) design new and better network architectures; (ii) design a general and independent sensor approach, where the image sensor is first classified and then the iris and the periocular region are simultaneously detected with a specific approach; (iii) compare the proposed approach with methods applied in other domains; (iv) create a context-aware object-detection architecture; and (v) design a cascade detection approach for iris and periocular region detection.

Acknowledgements

This work was supported by the National Council for Scientific and Technological Development (CNPq) (grant numbers 428333/2016-8 and 313423/2017-2) and the Coordination for the Improvement of Higher Education Personnel (CAPES) (Social Demand Program). The Titan Xp GPUs used for this research were donated by the NVIDIA Corporation.

References

  • [1] A. Abdelwhab and S. Viriri (2018) A survey on soft biometrics for human identification. In Machine Learning and Biometrics, External Links: Document Cited by: §I.
  • [2] N. Aginako, J. M. Martínez-Otzeta, I. Rodriguez, E. Lazkano and B. Sierra (2016-12) Machine learning approach to dissimilarity computation: iris matching. In International Conference on Pattern Recognition (ICPR), Vol. , pp. 170–175. External Links: Document, ISSN Cited by: §I, §IV-A.
  • [3] Y. Alvarez-Betancourt and M. Garcia-Silvente (2010-07) A fast iris location based on aggregating gradient approximation using QMA-OWA operator. In International Conference on Fuzzy Systems, Vol. , pp. 1–8. External Links: Document, ISSN 1098-7584 Cited by: §I, §II-A.
  • [4] C. S. Bezerra, R. Laroca, D. R. Lucio, E. Severo, L. F. Oliveira, A. S. Britto Jr. and D. Menotti (2018-10) Robust iris segmentation based on fully convolutional networks and generative adversarial networks. In Conference on Graphics, Patterns and Images, Vol. , pp. 281–288. External Links: Document, ISSN 2377-5416 Cited by: §I, §I.
  • [5] K. W. Bowyer and M. J. Burge (2016) Handbook of iris recognition. Cited by: §I.
  • [6] CASIA (2004)(Website) CASIA - Central Asia Student International Academic. External Links: Link Cited by: §II-A.
  • [7] CASIA (2010) CASIA version 4 database. CASIA - Central Asia Student International Academic. External Links: Link Cited by: §II-A, §II-A, §IV-A, TABLE II.
  • [8] K. I. Chang, K. W. Bowyer, P. J. Flynn and X. Chen (2004) Multi-biometrics using facial appearance, shape and temperature. In International Conference on Automatic Face and Gesture Recognition, pp. 43–48. Cited by: §I.
  • [9] C. Chen and A. Ross (2018-03) A multi-task convolutional neural network for joint iris detection and presentation attack detection. In IEEE Winter Applications of Computer Vision Workshops, Vol. , pp. 44–51. External Links: Document, ISSN Cited by: §II-A.
  • [10] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele (2016-06) The cityscapes dataset for semantic urban scene understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 3213–3223. External Links: Document, ISSN 1063-6919 Cited by: §I.
  • [11] A. Das, U. Pal, M. A. F. Ballester and M. Blumenstein (2013-12) Sclera recognition using dense-SIFT. In International Conference on Intelligent Systems Design and Applications, Vol. , pp. 74–79. External Links: Document, ISSN 2164-7143 Cited by: §I.
  • [12] A. Das, U. Pal, M. A. F. Ballester and M. Blumenstein (2014-12) Multi-angle based lively sclera biometrics at a distance. In IEEE Symposium on Computational Intelligence in Biometrics and Identity Management (CIBIM), Vol. , pp. 22–29. External Links: Document, ISSN 2325-4300 Cited by: §I.
  • [13] J. G. Daugman (1993-11) High confidence visual recognition of persons by a test of statistical independence. IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (11), pp. 1148–1161. External Links: Document, ISSN 0162-8828 Cited by: §I, §II-A, §II-A.
  • [14] J. Daugman (2004) How iris recognition works. IEEE Transactions on Circuits and Systems for Video Technology 14, pp. 21–30. External Links: Document, ISSN 1051-8215 Cited by: §I.
  • [15] M. De Marsico, M. Nappi, D. Riccio and H. Wechsler (2015) Mobile Iris Challenge Evaluation (MICHE)-I, biometric iris dataset and protocols. Pattern Recognition Letters 57, pp. 17–23. External Links: Document, ISBN 0167-8655, ISSN 01678655 Cited by: §IV-A, TABLE II.
  • [16] A. Deshpande, S. Dubey, H. Shaligram, A. Potnis and S. Chavan (2014-12) Iris recognition system using block based approach with DWT and DCT. In IEEE Anual India Conference, Vol. , pp. 1–5. External Links: Document, ISSN 2325-940X Cited by: §I, §IV-A.
  • [17] Y. Ding, Q. Tao, L. Wang, D. Li and M. Zhang (2018) Image-based localisation using shared-information double stream hourglass networks. 54 (8), pp. 496–498. External Links: ISSN 0013-5194 Cited by: §I, §III.
  • [18] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman (2010-06-01) The pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88 (2), pp. 303–338. External Links: ISSN 1573-1405, Document Cited by: §IV-B.
  • [19] R.B. Hill (1978) Apparatus and method for identifying individuals through their retinal vasculature patterns. Note: US Patent 4,109,237 Cited by: §I.
  • [20] A. K. Jain, R. Bolle and S. Pankanti (1998) Biometrics, personal identification in networked society. Kluwer Academic Publishers. External Links: ISBN 0792383451 Cited by: §I.
  • [21] F. Juefei-Xu and M. Savvides (2012-01) Unconstrained periocular biometric acquisition and recognition using COTS PTZ camera for uncooperative and non-cooperative subjects. In IEEE Workshop on the Applications of Computer Vision (WACV), Vol. , pp. 201–208. External Links: Document, ISSN 1550-5790 Cited by: §II-B, §II-C.
  • [22] K. E. Ko and K. B. Sim (2017) Real-time object entity detection system for smart surveillance application. 53 (19), pp. 1304–1306. External Links: Document, ISSN 0013-5194 Cited by: §I, §III.
  • [23] R. Laroca, E. Severo, L. A. Zanlorensi, L. S. Oliveira, G. R. Gonçalves, W. R. Schwartz and D. Menotti (2018-07) A robust real-time automatic license plate recognition based on the YOLO detector. In International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 1–10. External Links: Document, ISSN 2161-4407 Cited by: §I, §III.
  • [24] G. Läthén, T. Andersson, R. Lenz and M. Borga (2009) Momentum based optimization methods for level set segmentation. In Scale Space and Variational Methods in Computer Vision, pp. 124–136. External Links: ISBN 978-3-642-02256-2 Cited by: §II-A.
  • [25] T. H. N. Le, U. Prabhu and M. Savvides (2014) A novel eyebrow segmentation and eyebrow shape-based identification. In IEEE International Joint Conference on Biometrics (IJCB), Vol. , pp. 1–8. External Links: Document, ISSN Cited by: §II-B, §II-C.
  • [26] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie (2017-07) Feature pyramid networks for object detection. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. , pp. 936–944. External Links: Document, ISSN 1063-6919 Cited by: §III-B, §V.
  • [27] D. R. Lucio, R. Laroca, E. Severo, A. S. Britto Jr. and D. Menotti (2018-10) Fully convolutional networks and generative adversarial networks applied to sclera segmentation. In IEEE International Conference on Biometrics Theory, Applications and Systems (BTAS), Vol. , pp. 1–7. External Links: Document, ISSN 2474-9699 Cited by: §I.
  • [28] E. Luz, G. Moreira, L. A. Z. Junior and D. Menotti (2018) Deep periocular representation aiming video surveillance. 114, pp. 2–12. External Links: ISSN 0167-8655 Cited by: §I.
  • [29] G. Mahalingam, K. Ricanek and A. M. Albert (2014) Investigating the periocular-based face recognition across gender transformation. IEEE Transactions on Information Forensics and Security 9 (12), pp. 2180–2192. External Links: Document, ISSN 1556-6013 Cited by: §II-B, §II-C.
  • [30] M. D. Marsico, M. Nappi and H. Proença (2017) Results from MICHE II – Mobile Iris CHallenge Evaluation II. Pattern Recognition Letters 91, pp. 3–10. External Links: ISSN 0167-8655, Document Cited by: §I.
  • [31] D. Menotti, G. Chiachia, A. Pinto, W. R. Schwartz, H. Pedrini, A. X. Falcão and A. Rocha (2015-04) Deep representations for iris, face, and fingerprint spoofing detection. IEEE Transactions on Information Forensics and Security 10 (4), pp. 864–879. External Links: Document, ISSN 1556-6013 Cited by: §I.
  • [32] P. R. Nalla and A. Kumar (2017) Toward more accurate iris recognition using cross-spectral matching. IEEE Transactions on Image Processing 26 (1), pp. 208–221. External Links: Document, ISSN 1057-7149 Cited by: §IV-A, §IV-A, TABLE II.
  • [33] I. Nigam, M. Vatsa and R. Singh (2015) Ocular biometrics: a survey of modalities and fusion approaches. Information Fusion 26, pp. 1–35. External Links: ISSN 1566-2535, Document Cited by: §I, §I.
  • [34] U. Park, A. Ross and A. K. Jain (2009-Sep.) Periocular biometrics in the visible spectrum: a feasibility study. In IEEE International Conference on Biometrics: Theory, Applications, and Systems, Vol. , pp. 1–6. External Links: Document, ISSN Cited by: §I.
  • [35] U. Park, R. R. Jillela, A. Ross and A. K. Jain (2011) Periocular biometrics in the visible spectrum. IEEE Transactions on Information Forensics and Security 6 (1), pp. 96–106. External Links: Document, ISBN 9781424450206, ISSN 15566013 Cited by: §II-B, §II-C.
  • [36] J.I. Peláez and J.M. Doña (2006) A majority model in group decision making using QMA–OWA operators. International Journal of Intelligent Systems 21 (2), pp. 193–208. External Links: Document Cited by: §II-A.
  • [37] H. Proença, J. C. Neves and G. Santos (2014) Segmenting the periocular region using a hierarchical graphical model fed by texture/shape information and geometrical constraints. In IEEE International Joint Conference on Biometrics (IJCB), Vol. , pp. 1–7. External Links: Document, ISSN Cited by: §II-B.
  • [38] H. Proenca, S. Filipe, R. Santos, J. Oliveira and L. A. Alexandre (2010) The UBIRIS.v2: a database of visible wavelength iris images captured on-the-move and at-a-distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (8), pp. 1529–1535. External Links: Document, ISSN 0162-8828 Cited by: §IV-A, §IV-A, TABLE II.
  • [39] A. Rattani, R. Derakhshani, S. K. Saripalle and V. Gottemukkula (2016) ICIP 2016 competition on mobile ocular biometric recognition. In IEEE International Conference on Image Processing, pp. 320–324. External Links: Document, ISBN 978-1-4673-9961-6, ISSN 15224880 Cited by: §IV-A, TABLE II.
  • [40] J. Redmon, S. Divvala, R. Girshick and A. Farhadi (2016) You only look once: unified, real-time object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CPVR), Vol. , pp. 779–788. External Links: Document, ISSN Cited by: §III-A.
  • [41] J. Redmon and A. Farhadi (2017) YOLO9000: better, faster, stronger. In IEEE Conference on Computer Vision and Pattern Recognition (CPVR), Vol. , pp. 6517–6525. External Links: Document, ISSN 1063-6919 Cited by: §I, §III-A, §III-A, §III.
  • [42] S. Ren, K. He, R. Girshick and J. Sun (2017-06) Faster R-CNN: towards real-time object detection with region proposal networks. 39 (6), pp. 1137–1149. External Links: Document, ISSN 0162-8828 Cited by: §I, §III-B, §III.
  • [43] J. L. G. Rodríguez and Y. D. Rubio (2005) A new method for iris pupil contour delimitation and its application in iris texture parameter estimation. In Progress in Pattern Recognition, Image Analysis and Applications, pp. 631–641. External Links: ISBN 978-3-540-32242-9 Cited by: §I.
  • [44] G. Santos, E. Grancho, M. V. Bernardo and P. T. Fiadeiro (2015) Fusing iris and periocular information for cross-sensor recognition. Pattern Recognition Letters 57, pp. 52–59. External Links: Document, ISBN 0167-8655, ISSN 01678655 Cited by: §IV-A, TABLE II.
  • [45] A. Sequeira (2016-Sep.) Cross-eyed - cross-spectral iris/periocular recognition database and competition. In International Conference of the Biometrics Special Interest Group (BIOSIG), Vol. , pp. 1–5. External Links: Document, ISSN Cited by: §IV-A, TABLE II.
  • [46] A. F. Sequeira, J. C. Monteiro, A. Rebelo and H. P. Oliveira (2014) MobBIO: A Multimodal Database Captured with a Portable Handheld Device. In International Conference on Computer Vision Theory and Applications (VISAPP), Vol. 3, pp. 133–139. Cited by: §IV-A, TABLE II.
  • [47] E. Severo, R. Laroca, C. S. Bezerra, L. A. Zanlorensi, D. Weingaertner, G. Moreira and D. Menotti (2018-07) A benchmark for iris location and a deep learning detector evaluation. In International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 1–7. External Links: Document, ISSN 2161-4407 Cited by: §II-A, §III-C, §IV-B.
  • [48] P. H. Silva, E. Luz, L. A. Zanlorensi, D. Menotti and G. Moreira (2018-07) Multimodal feature level fusion based on particle swarm optimization with deep transfer learning. In IEEE Congress on Evolutionary Computation (CEC), Vol. , pp. 1–8. External Links: Document, ISSN Cited by: §I, §IV-A.
  • [49] L. Su, J. Wu, Q. Li and Z. Liu (2017-08) Iris location based on regional property and iterative searching. In IEEE International Conference on Mechatronics and Automation (ICMA), Vol. , pp. 1064–1068. External Links: Document, ISSN 2152-744X Cited by: §II-A.
  • [50] C. Tan and A. Kumar (2012-11) Human identification from at-a-distance images by simultaneously exploiting iris and periocular features. In International Conference on Pattern Recognition, Vol. , pp. 553–556. External Links: Document, ISSN 1051-4651 Cited by: §I, §I.
  • [51] P. Viola and M. J. Jones (2004) Robust Real-Time Face Detection. International Journal of Computer Vision (2), pp. 137–154. External Links: Document, ISSN 1573-1405 Cited by: §II-B.
  • [52] C. Wang, Y. Zhu, Y. Liu, R. He and Z. Sun (2019) Joint iris segmentation and localization using deep multi-task learning framework. External Links: Link Cited by: §II-A.
  • [53] F. Wilcoxon, S. Katti and R. A. Wilcox (1970) Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics 1, pp. 171–259. Cited by: §IV-B.
  • [54] L. Xiao, Z. Sun and T. Tan (2012) Fusion of iris and periocular biometrics for cross-sensor identification. In Biometric Recognition, pp. 202–209. External Links: ISBN 978-3-642-35136-5 Cited by: §I.
  • [55] Yong Zhu, Tieniu Tan and Yunhong Wang (2000-Sep.) Biometric personal identification based on iris patterns. In International Conference on Pattern Recognition (ICPR), Vol. 2, pp. 801–804. External Links: ISSN 1051-4651 Cited by: §I.
  • [56] L. A. Zanlorensi, E. Luz, R. Laroca, A. S. Britto Jr., L. S. Oliveira and D. Menotti (2018-10) The impact of preprocessing on deep representations for iris recognition on unconstrained environments. In Conference on Graphics, Patterns and Images (SIBGRAPI), Vol. , pp. 289–296. External Links: Document, ISSN 2377-5416 Cited by: §I, §I, §IV-A.
  • [57] W. Zhang and Y. D. Ma (2014) A new approach for iris localization based on an improved level set method. In International Computer Conference on Wavelet Actiev Media Technology and Information Processing, Vol. , pp. 309–312. External Links: Document, ISSN Cited by: §I, §II-A.
  • [58] Zhejin Wang, Y. Feng and Qinqin Tao (2010-10) Momentum based level set segmentation for complex phase change thermography sequence. In International Conference on Computer Application and System Modeling, Vol. 12, pp. 257–260. External Links: Document, ISSN 2161-9069 Cited by: §II-A.
  • [59] L. Zhou, Y. Ma, J. Lian and Z. Wang (2013) A new effective algorithm for iris location. In IEEE ROBIO, Vol. , pp. 1790–1795. External Links: Document, ISSN Cited by: §I, §II-A.
  • [60] ZhuYu and Wang Cui (2012-08) A rapid iris location algorithm based on embedded. In International Conference on Computer Science and Information Processing (CSIP), Vol. , pp. 233–236. External Links: Document, ISSN Cited by: §II-A.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
384798
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description