Coupled Learning for Facial Deblur

Coupled Learning for Facial Deblur

Dayong Tian and Dacheng Tao Fellow, IEEE ©2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract

Blur in facial images significantly impedes the efficiency of recognition approaches. However, most existing blind deconvolution methods cannot generate satisfactory results, due to their dependence on strong edges which are sufficient in natural images but not in facial images. In this paper, we represent a point spread functions (PSF) by the linear combination of a set of pre-defined orthogonal PSFs and similarly, an estimated intrinsic sharp face image (EI) is represented by the linear combination of a set of pre-defined orthogonal face images. In doing so, PSF and EI estimation is simplified to discovering two sets of linear combination coefficients which are simultaneously found by our proposed coupled learning algorithm. To make our method robust to different kinds of blurry face images, we generate several candidate PSFs and EIs for a test image, and then a non-blind deconvolution method is adopted to generate more EIs by those candidate PSFs. Finally, we deploy a blind image quality assessment metric to automatically select the optimal EI. Thorough experiments on the The Facial Recognition Technology (FERET) Database111http://www.itl.nist.gov/iad/humanid/feret/feret_master.html, extended Yale Face Database B222http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html, CMU Pose, Illumination, and Expression (PIE) database333https://www.ri.cmu.edu/research_project_detail.html?project_id=418&menu_id=261 and Face Recognition Grand Challenge (FRGC) Database version 2.0444http://www.nist.gov/itl/iad/ig/frgc.cfm demonstrate that the proposed approach effectively restores intrinsic sharp face images and consequently improves the performance of face recognition.

Coupled Learning, Facial Deblur, Point Spread Function

I Introduction

Facial blur is common in recorded face images. Examples include motion blur caused by the relative movement between the target and the camera, and out-of-focus blur caused by misalignment between the target and the camera focus. It remains challenging to improve the quality of an observed blurred face image (OB) for subsequent use in various applications, including face recognition and editing. A straightforward method of overcoming OB is to use blind deconvolution methods [28][11][37][5][36][20] to obtain an estimated intrinsic face image (EI), and then to exploit the EI for subsequent recognition and analysis. The success of blind deconvolution methods designed for natural images relies on strong edges [18], which are relatively rare in most OBs. Therefore, this approach tends to perform poorly [39] and the obtained EIs do not significantly improve subsequent recognition.
Recently, machine learning has been exploited to deconvolute OBs. Liao et al. [19] decomposed an intrinsic sharp face image into the eigen-face subspace and adopted a Gaussian prior to regularizing the EI. However, this approach assumed that the point spread function (PSF) has only one varying parameter, such as a Gaussian kernel with variable variance or a horizontal linear motion function of varying length, and fails to restore images blurred by sophisticated PSFs, e.g., a linear motion function with two varying parameters (direction and length).
Nishiyama et al. [24] proposed the FAcial DEblur INference (FADEIN) scheme, which models the PSF estimation procedure as a classification-like problem. By calculating the correlation matrix that encodes the 2D Fourier transform features of all training images blurred by the -th pre-defined PSF, the subspace , corresponding to the leading eigenvectors of , is used to model the -th PSF. The PSF corresponding to the subspace closest to the feature of the OB is then exploited to deconvolute the OB. In this way, FADEIN can only model a finite (and, in reality, small) discrete set of PSFs and fails to model PSFs that are not defined in the training stage.

Fig. 1: Illustration of the drawbacks of the sparse prior. The image is blurred by a Gaussian kernel of standard deviation 0-8. The -norm of sparse coefficients monotonously decreases as the standard deviation increases. Hence, the sparse prior may lead to a blurry result.

Zhang et al. [39] proposed the Joint Restoration and Recognition (JRR) scheme, which combines restoration and recognition within the framework of sparse learning. JRR assumes that the intrinsic sharp face image of an OB can be represented as a linear combination of face images in the training set, and the coefficients of the linear combination are assumed to be sparse. Although sparse prior has been proven to be effective in a wide range of applications including face recognition[31][38][35][40] and image restoration[8][9][21][33], as shown in Figure 1, it is inappropriate for some deconvolution tasks, since the sparse prior may predispose to blurry images. In this situation, JRR may fail to deconvolute the OB. Furthermore, six parameters need to be empirically tuned, making JRR difficult to use in practice.
Recently, J. Pan et al. [25] proposed a face image deblurring method based on the contour of faces. The useless edges of a face, such as those around eyes and eyebrows, are removed firstly, because these edges have negative effects on the PSF estimation. Then, the PSF is estimated by finding a template for an OB from the training gallery. Rather than utilizing the information on edges intrinsically, like those unsupervised methods, J. Pan et al. try to utilize such information directly and smartly by filtering the edges. However, this work still depends on the edges.
To avoid the aforementioned problems and to conduct high performance face restoration, we cast the facial deblur procedure as a regression-like problem based on two mild assumptions: (1) any PSF can be represented by a linear combination of a set of orthogonal PSFs; and (2) the intrinsic sharp face image of an OB can be represented linearly by a set of orthogonalized sharp face images.

Fig. 2: The framework of our method.

Under the above two assumptions, we develop a coupled learning algorithm (shown in Figure 2) to simultaneously calculate all possible PSFs and EIs by discovering the coefficients of two associated linear combinations, in which each PSF corresponds to a particular EI. Empirically, all the EIs show far from satisfactory results, for the following two reasons. First, the dissimilarities between the training sharp face images and the intrinsic sharp face image of the OB can result in reconstruction errors. Second, the parameter space of EI is of tens of thousands of dimensions and thus to obtain a high quality EI requires a large number of training sharp images. By contrast, the parameter space of PSF is much smaller and can be estimated precisely given a small size training set. We therefore generate a sequence of PSFs and use a classical non-blind deconvolution method [4] to deblur the OB and generate candidate results. Lastly, a blind image quality assessment (BIQA) method [23] is adopted to automatically select the best EI which corresponds to a particular PSF.
In contrast to conventional face recognition scheme which consists of a face representation stage and a face matching stage [6], we propose a new recognition method based on our deblurring procedure. Intuitively, when the sharp face images have the same identity as the OB, the resulting EI is of high quality because these sharp face images are more similar to the intrinsic sharp face image of the OB. We therefore only need to deblur an OB using all the sets of sharp face images, where each set only contains sharp face images of one identity. The identity of the set that produces the best deblurring result is assigned to the OB. In this way, the proposed deblurring method simultaneously deblurs and recognizes the OB.
This paper is organized as follows. The proposed deconvolution method is described in Section II. In Section III, we show how to reduce the computational costs for symmetric PSFs. In Section IV, we demonstrate the effectiveness of the proposed deconvolution and recognition methods. We conclude the paper in Section V.

Ii On Combining Coupled Learning and BIQA for Facial Deblur

The proposed scheme for facial deblur, shown in Figure 2, is comprised of four major steps: (1) codebook construction; (2) coefficient computation; (3) candidate PSF construction and candidate results generation; and (4) the BIQA-based best candidate result selection. This section shows the motivations and details of each step.
In general, the relationship between an OB , its intrinsic sharp face image , and the corresponding PSF is modeled as

(1)

where is the additive noise and is the convolution operation. We assume can be represented by a linear combination of a set of bases , and can be represented by a linear combination of a set of functions . Hence, we have

(2)

where s and s are coefficients of the linear combinations. Let be the vectorization operation. Let the th column of matrix be and the th element of vector be . Equation (1) can be rewritten as

(3)

where . In our approach, and are predefined. Hence, the remaining problems are calculating in Equation (3) and calculating and from .

Ii-a Construct

Given a set of sharp face images , we use the first left singular vectors (i.e., those corresponding to the largest singular values) of matrix as , where is defined as:

(4)

To represent the PSFs, we use a set of orthogonal functions, called function bases

(5)

If the first function bases are used, then we let . Hence, we have

(6)

Ii-B Calculate

An intuitive way to calculate from Equation (3) is by minimizing

(7)

where is a regularization on . Here, we do not impose any regularization on , i.e., . Hence, minimizing Equation (7) is equivalent to solving

(8)

Equation (8) has the closed form solution

(9)

where is the transpose of . Alternatively, the conjugate gradient descent method [14] can be used to solve

(10)

Solving Equation (10) using the conjugate gradient descent method is much quicker than directly computing the inverse matrix of .

Ii-C Calculate and

Once is found, and can be calculated by solving the following nonlinear equation system:

(11)

Equation (11) has unknown variables and equations, and is usually an over-determined system (i.e., ). Hence, it can be solved by minimizing

(12)

Unlike linear over-determined equation systems, where the solution is unique when the rank of the coefficient matrix equals the number of columns, Equation (12) may have multiple solutions. For example, if and are solutions of Equation (12), and will also be solutions, where is a non-zero constant. Therefore, prior knowledge, regularizations, or constraints are required.

Fig. 3: The difference between projections of an image and its blurred counterpart. 90% images in FERET dataset are used to construct and the first 10 projections of a test image in the remaining 10% images and its blurred counterpart which is blurred by a Gaussian PSF with standard deviation 2 are shown here. It can be found that projections on the first left singular vector of these two images are very close to each other. The difference is approximately 0.1%.

We observe that the projections of on (i.e., s) are similar to those of (Figure 3). Hence, the following item can be added into Equation (12):

(13)

where is the inner product, i.e., the projection of on .
It is desirable that the averages of the local windows of an OB should not be altered too much by an estimated PSF. Therefore, its integration should be equal to 1, that is,

(14)

where is the domain in which the PSF is defined. The PSF should also be positive, that is,

(15)

In conclusion, the proposed minimization problem is

(16)

As SVD is adopted to generate face bases, we only use the projection on the first left singular vector which corresponds to the largest singular value as a regularization in the object function of our optimization problem. The problem (16) can be solved using the augmented Lagrange multiplier method [15].

Ii-D Generate candidate results

Since reconstruction errors exist, the s calculated by the procedure above cannot usually reconstruct satisfactory results, especially when the identity of the OB is not included in the training set. However, as stated in the introduction, a PSF is much easier to estimate than its corresponding EI. Setting as a certain value, one can get a PSF correspondingly. We therefore generate a sequence of PSFs by setting different ’s and use a classical non-blind deconvolution method [4] to deblur the OB to generate candidate results.

Fig. 4: Typical results and their corresponding scores. -2 denotes a failure, 0 is a blurry result, and 2 is a sharp image.

Ii-E Assessing candidate results

BIQA across different images is a challenging task. However, blindly assessing the qualities of images distorted by different ways from an image is much easier. Among various BIQA methods, BRISQUE-L [23] feature is used here to select the best candidate PSF, due to its robustness and computational efficiency.
BRISQUE pre-processes an image by local mean removal to generate mean subtracted contrast normalized (MSCN) coefficients which are fitted by Asymmetric Generalized Gaussian Model (AGGD) in [22]. The parameters of AGGD are estimated to form up an 18-dimensional feature. This procedure is done on two scales and results in a 36-dimensional feature. To robustify BRSIQUE, BRISQUE-L introduced L-moments which are closely related to L-estimators and extensively used in robust image filtering theory [3]. BRISQUE-L is also extracted on two different scales and totally 36-dimensional.

Ii-F Simultaneous restoration and recognition

A subset of candidate results is manually evaluated at five levels, as shown in Figure 4. Finally, support vector regression (SVR) is trained to automatically evaluate the quality of the remaining candidate results.
Since a BRISQUE-L feature contains two parts (18 elements in each part), a multiple kernel learning (MKL) method [13] needs to be adopted. Of the various MKL algorithms, the SMO-MKL approach [32] has been shown to be efficient and effective across a wide range of applications, and its source code is available online555http://research.microsoft.com/en-us/um/people/manik/code/SMO-MKL/download.html. We therefore use it here with five Gaussian kernels for each part and five Gaussian kernels for the whole feature, that is, 15 kernels in total.
Let a BRISQUE-L feature be and is a vector comprised of the th to th elements in . The standard deviation of Gaussian kernel is chosen from set . Hence, the fifteen Gaussian kernels are:

(17)

The primal problem of multiple kernel learning for SVR is

(18)

where is the support vector and is the kernel weights of the linear combination of base kernels . and are slack variables allowing for errors around the regression function. , , and are positive constants set empirically. Introducing Lagrange Multipliers on constraints corresponding to and on constraints corresponding to , the dual problem of Eq. (18) is

(19)

where is a diagonal matrix whose elements are the scores of training samples. and satisfy

(20)

Sequential Minimal Optimization (SMO) [27][10][16] is used to solve Eq. (19). The proposed recognition method is based on the SVR outputs. We construct for each identity, which is then used to deblur each OB. Lastly, the identity of that produces the best deblurring result is assigned to the OB. This simple manipulation results in the simultaneous production of deblurring and recognition results.

Iii Efficient Implementation for Symmetric PSF

In the theoretical point view, our algorithm can inherently handle any kinds of PSF. It is based on the mathematical theory - any bounded function can be represented by a linear combination of a complete set of orthogonal functions.
We mainly focus on three types of blurring: out-of-focus blur (approximated by a Gaussian kernel), linear motion blur, and a combination of the two. Here, we show how to reduce computational costs by considering the symmetry of these three types of blur. The aim is to reduce the number of function bases, i.e., .

Fig. 5: The symmetry of PSFs. The top row illustrates three PSFs: a Gaussian kernel, a linear motion kernel, and the combination of both. The bottom row illustrates five function bases, i.e.,, , , , and . The function bases of odd and are symmetric about the two axes and can be used to represent the three symmetric PSFs.

To simplify our analysis, we assume that the PSFs of the linear motion blur and the combined blur only have four directions: and . Considering the symmetry of -PSFs (Figure 5), only the s that are symmetric about and (i.e., and ), can be used to represent such PSFs. Since both the symmetric axes of -PSFs and -PSFs are and , they can be represented by a common set of function bases, . By rotating this set of function bases by clockwise or counter-clockwise, the function bases set (denoted as ) can reconstruct -PSFs and -PSFs.
Therefore, if the directions of the PSFs of OBs can be estimated, the OBs can be divided into two groups: Group I(- & -PSFs) and Group II ( & -PSFs). By constructing two different s using and , respectively, an OB can be deconvoluted using the method proposed in Section 2. The only remaining problem is how to efficiently estimate the direction of the PSFs.

Fig. 6: Illustration of proposed PSF direction estimation algorithm. A combination of linear motion and Gaussian PSF locates at the center of the graph. s with different locate around the PSF according to their direction. The numbers are the maximum values of on each direction.

Iii-a PSF direction estimation

Ignoring additive noise and taking the Fourier transform on both sides of Equation (1), we get

(21)

where denotes the Fourier transform. Given another PSF , it can be deduced that

(22)

In inequality (22), the equation holds, if and only if . Inspired by the correlation-based shape alignment method [30], we try several s that are similar to and find the whose maximizes (Fig. 6). Here, we use following function as :

(23)

where and .
The proposed direction estimation method is shown in Algorithm 1. Although we neglect the additive noises to develop our direction estimation algorithm, the algorithm turns out to be efficient when noises exist (Subsection IV-A). A reasonable explanation is that noises are random and directionless, and hence have little effect on direction estimation.

1:Input:
2:for each  do
3:     for each  do
4:         if  then
5:              
6:              
7:         end if
8:     end for
9:end for
10:Output:
Algorithm 1   Direction Estimation Method

Iv Experiments

The aim of the following experiments is three-fold: (1) to verify the accuracy of the proposed direction estimation method; (2) to validate the deconvolution method without recognition; and (3) to test the proposed recognition method.
We conducted experiments on FERET [26], CMU-PIE [29] and extended Yale B [12] datasets and compared our method with FADEIN [24], Krishnan et al. method [17], JRR [39]and Pan et al [25].
Dataset description. For FERET, we used the gallery (FA set) containing 1,196 images of 1,196 subjects. For CMU-PIE, there are 67 identities and we collected the images named as “27_**.jpg” under the “illum” folder of each identity. The file name “27_**.jpg” implies the photo was taken in frontal view. For the extended Yale B, we directly applied the images in “Cropped Yale” available at the official website. This subset contains 38 identities and 64 photos taken from the frontal view under different illumination conditions for each identity. For FRGC 2.0, we collect 2000 controlled still images and 500 uncontrolled still images of 200 subjects from both the training and validation partitions. Please note only placid faces were collected and a few uncontrolled still images contain no obvious blurs.
Evaluation method. There are three widely used indexes to evaluate image quality: Peak Signal-to-Noise Ratio (PSNR), Universal Image Quality Index (UIQI) [34] and Structural Similarity Index (SSIM) [34]. PSNR is based on the mean square error between two images, which is easy to calculate but less efficient than indexes based on human visual system [2], such as UIQI and SSIM. Hence, SSIM which is an improvement of UIQI is adopted here to evaluate our deblur method.
The SSIM has two inputs, the sharp image and its synthetically blurred counterpart. To calculate SSIM, two means, two standard deviations and one covariance are computed on each local window of two images as in Eq. (24).

(24)

where and represent local windows on sharp images and blurred images, respectively. and are two constants. is the total number of a local window. The overall SSIM index is the mean of of all local windows. In our experiments, the mean () and standard deviation () of SSIM are given on every dataset.
Global settings. Throughout our experiments, 30 dB additive white noise is synthesized into test images. We use the default setting of Chan et al.’s non-blind deconvolution method. Its code is available online666http://videoprocessing.ucsd.edu/~stanleychan/deconvtv.

Fig. 7: Results of experiments on the subset of FERET dataset. (a) Test images and their PSFs; (b) results of the proposed method; (c) results using the method proposed by Krishnan et al. [17]; (d) results of FADEIN [24] and (e) results using the method proposed by Pan et al. [25]
PSF Combine Gaussian Motion Combine Gaussian Motion
FADEIN [24] 0.8808 0.9290 0.9053 0.0100 0.0110 0.0123
Krishnan et al. [17] 0.7730 0.8866 0.7177 0.0083 0.0184 0.0324
Pan et al. [25] 0.8854 0.9211 0.9123 0.0101 0.0103 0.0056
Ours 0.8939 0.9214 0.9158 0.0100 0.0098 0.0043
TABLE I: SSIM Index on FERET

Iv-a PSF direction estimation

We randomly selected 1000 images from the three subsets of FERET, CMU-PIE and extended Yale B. The widths of the linear motion kernels were chosen as 3, 5, 7, and 9 and the standard deviations of the Gaussian kernels as 1, 3, 5, 7, and 9. Hence, images were generated for testing the proposed direction estimation method. For each combination of these two types of kernels, the rate of correctly estimating the PSF was recorded and shown to be 97%. Since the direction estimation is imperfect, some OBs will be miscategorized. Hence, OBs in which all candidate results have low scores, say less than 0, should be re-categorized into another group.

Fig. 8: Results of experiments on the subset of CMU-PIE dataset. (a) Test images and their PSFs; (b) results of the proposed method; (c) results using the method proposed by Krishnan et al. [17]; (d) results of FADEIN [24] and (e) results using the method proposed by Pan et al. [25]
Fig. 9: Results of experiments on the subset of extended Yale B dataset. (a) Test images and their PSFs; (b) results of the proposed method; (c) results using the method proposed by Krishnan et al. [17]; (d) results of FADEIN [24] and (e) results using the method proposed by Pan et al. [25]
PSF Combine Gaussian Motion Combine Gaussian Motion
FADEIN [24] 0.7599 0.7909 0.8200 0.0171 0.0174 0.0224
Krishnan et al. [17] 0.6398 0.7705 0.6579 0.0212 0.0171 0.0463
Pan et al. [25] 0.7947 0.8448 0.8033 0.0144 0.0189 0.0151
Ours 0.8068 0.8780 0.8386 0.0131 0.0160 0.0063
TABLE II: SSIM Index on CMU-PIE
PSF Combine Gaussian Motion Combine Gaussian Motion
FADEIN [24] 0.8458 0.8537 0.8537 0.0349 0.0547 0.0547
Krishnan et al. [17] 0.6906 0.8822 0.6566 0.0663 0.0297 0.0693
Pan et al. [25] 0.8521 0.8992 0.8841 0.0334 0.0399 0.0372
Ours 0.8998 0.9167 0.9020 0.0384 0.0516 0.0221
TABLE III: SSIM Index on Extended Yale B

Iv-B Facial deblur

Experiments in this subsection were conducted on FERET. We used 90% of the images to generate the matrix . The remaining 10% of images were treated as OBs and blurred by a Gaussian kernel of 2, linear motion of length 15 and direction , or the combination of the two. The first 9 odd order orthogonal functions were used to construct . We generated nine candidate results for each OB by setting nine s, where the cumulative energy content for the th eigenvector occupied 60%, 70%, 80%, 90%, 91%, 92%, 93%, 94% and 95% of the total energy, respectively. Candidate results of 10 randomly-selected OBs were used to train SVR. FADEIN [24] and the Krishnan et al. method [17] were implemented and examined under the author-suggested settings to guarantee fair comparison. The final results are shown in Figure 7. The Krishnan et al. method did not perform well on this dataset, because of its requirement of strong edges. Since FADEIN is a classification-like scheme, its deconvolution results are similar to ours if the PSFs of the OBs are pre-defined in the training set (the first row). When the PSFs are not pre-defined in the training set (the second and third rows), it tends to give the closest PSFs. Hence, the results are degraded somewhat.

Fig. 10: Results of experiments on the subset of CMU-PIE dataset. (a) Test images and their PSFs; (b) results of the proposed method; (c) results using the method proposed by Krishnan et al. [17] and (d) results using the method proposed by Pan et al. [25]
Fig. 11: Results of experiments on the subset of extended Yale B dataset. (a) Test images and their PSFs; (b) results of the proposed method; (c) results using the method proposed by Krishnan et al. [17] and (d) results using the method proposed by Pan et al. [25]

Iv-C Simultaneous facial deblur and recognition

The proposed method was compared with FADEIN [24] and JRR [39] on the subset of CMU-PIE and the subset of the extended Yale B. In contrast to FERET, the face images in these two subsets were taken under different illumination conditions.
For experiments on each subset, we adopted the first 50% of images of each identity in our method for training and the remaining 50% of images of each identity for testing. All test images were blurred by a Gaussian kernel of 3, a linear motion of length 15 and direction or the combination of both. All these blurred images were collected together as OBs. The first 9 odd order orthogonal functions were used to construct . The candidate results were generated by setting s in a similar way to those given in Subsection 4.2, but note that was occasionally zero, because only tens of sharp face images were available to construct matrix in each procedure of generating candidate results. We started the procedure with the smallest non-zero .
For fair comparison, we used exactly the same setting for our method on each dataset. According to [24], FADEIN adopted the local phase quantization (LPQ) [1] method for recognition. We carefully tuned the parameters of JRR to ensure it reached its best performance.
The deconvolution results on CMU-PIE are shown in Figure 8 and the corresponding recognition rates are listed in Table IV. The deconvolution and recognition results on extended Yale B are shown in Fig 9 and Table V, respectively. The method of Krishnan et al. [17] failed to provide satisfactory deconvolution results. Due to the complex illumination conditions, FADEIN [24] cannot usually estimate the PSFs precisely and therefore performed poorly on recognition. The deconvolution results of JRR [39] are not shown here, because the authors explained that the deconvolution procedure in JRR was not designed for human visual perception [39]. It is unfair to compare the deconvolution results of JRR with other methods, but note that JRR performed poorly in terms of visual appearance. It can be concluded that the proposed method not only gives satisfactory deconvolution results, but also significantly boosts recognition performance.

Since recognition is based on image quality, our proposed method does not require compensation for illumination. However, due to the complex illumination conditions in some images, the proposed method may fail to deconvolute in some cases. Also, since Equation (16) has multiple local minima, this can result in erroneous recognition, because the correct can be given even when is wrong. Even though the probability of this occurring is low, as shown by the high recognition rate, such mistakes are inevitable.

Gaussian Motion Both
FADEIN+LPQ [24] 85.6 84.1 82.3
JRR [39] 92.7 92.3 91.7
Ours 95.1 95.0 95.0
TABLE IV: Recognition rates on the subset of CMU-PIE dataset.
Gaussian Motion Both
FADEIN+LPQ [24] 77.3 75.8 70.1
JRR [39] 88.2 86.8 86.3
Ours 93.4 92.6 91.8
TABLE V: Recognition rates on the subset of extended Yale B dataset.
PSF TYPE I TYPE II TYPE I TYPE II
Krishnan et al. [17] 0.4973 0.5982 0.0321 0.0792
Pan et al. [25] 0.8149 0.7922 0.0930 0.0958
Ours 0.8566 0.8315 0.0749 0.1834
TABLE VI: SSIM Index on CMU-PIE dataset
PSF TYPE I TYPE II TYPE I TYPE II
Krishnan et al. [17] 0.5379 0.6428 0.0712 0.0924
Pan et al. [25] 0.8514 0.8112 0.0897 0.0926
Ours 0.8756 0.8531 0.1273 0.1857
TABLE VII: SSIM Index on Extended Yale B Dataset

Iv-D Experiments on camera-shaking blur

In this experiment, the restoration results of our method are compared with Krishnan et al. and its recognition results are compared with JRR on CMU-PIE and extended Yale B datasets. FADEIN is not available for this kind of blur, i.e., irregular and asymmetrical PSFs.
The experiment settings are same to those in Subsection IV-C, except following three points: (1) all test images were blurred by two camera-shaking PSFs; (2) the first 36 order orthogonal functions were used; and (3) the facial images reconstructed by ’s were also added into the gallery of candidate results.
BRISQUE which is widely used to evaluate the quality of natural images which are assumed to be Gaussian distributed. However, as mentioned in Section I, the face images contain less strong edges, which means they may be not Gaussian distributed. The BRISQUE-L feature was reported to be relatively unaffected by small departures from model assumptions. Hence, we use BRISQUE-L feature for our task.
The restoration results on CMU-PIE and extended Yale B datasets are shown in Fig. 10 and Fig. 11, respectively. Our method significantly outperforms that of Krishnan et al. In terms of restoration, our method has a tiny flaw on restoring shadow areas (the bottom two images in Fig. 11). This phenomenon is caused by collecting reconstructed images as candidate results. In reality, the trained SVR tends to give higher scores to images with less illumination variance. For example, the red numbers in Fig. 11 are the scores of the original sharp images, while the blue numbers are the scores of the restoration results (in this case, the reconstructed images). The recognition results on both datasets are given in Table VIII, which demonstrates the superiority of our method in the recognition of blurred facial images.

PIE Yale B
PSF TYPE I TYPE II TYPE I TYPE II
JRR [39] 90.0 85.1 82.8 77.4
Ours 94.9 91.9 90.3 83.5
TABLE VIII: Recognition rates

Iv-E Experiments on real blur

In this experiment, the restoration results of our method are compared with Krishnan et al. and Pan et al. and its recognition results are compared with JRR on FRGC 2.0 dataset. Again, FADEIN is not available for this case, because the PSF of real blur cannot be pre-defined in the training procedure.
All the sharp images of each identity and the first 36 order orthogonal functions were used for constructing . The candidate results were generated in a similar way to that in Subsection IV-D, so did the training of SVR.
The restoration results are shown in Fig. 12. In Fig. 12, our algorithm gave three linear combinations of face bases as the final restoration results which were marked by red rectangles. As there are no groudtruth sharp images for these OBs, the SSIM index is not available here. However, it is easy to visually observe the superiority of our restoration results against compared ones’. The recognition rates of JRR and our method are 93.5% and 98.7%, respectively.

Fig. 12: The restoration results on FRGC 2.0. From top to bottom, they are OBs, and results of our method, Krishnan et al. [17] and Pan et al [25], respectively. The results marked by red rectangles are generated by the linear combinations of the function bases.

Iv-F Computational efficiency

Our algorithm gives the deblurring and recognition results at the same time. There are four major parts that take relative longer time to compute: 1) SVD; 2) construct A; 3) train SVR and 4) minimize Eq. (16). For each dataset, 1), 2) and 3) are just computed once. The computation complexity of SVD of an matrix is . In this case, is the number of image pixels and where and are the number of face and function bases, respectively. To construct , 2-D convolutions are needed. By using Fast Fourier Transformation (FFT), the computation complexity of 2-D convolution is . In this case, and are the height and width of an image. 3) and 4) are two constrained optimization problem which are solved iteratively. The convergence rates are highly dependent on the datasets and parameter settings. For all our experiments which were done on MATLAB 2014a on a PC with Intel Core i5 3.2GHz and 8GB RAM, 3) takes less than 1 second and 4) takes 20 to 40 seconds for one test image.

V Conclusions

In this paper, we proposed a coupled learning method combined with blind image quality assessment (BIQA) for image deconvolution. The method is specifically designed for deblurring face images that have few strong edges, and can theoretically estimate any PSF due to the reasonable assumptions and the adopted priors. We illustrate how to reduce computational costs for three kinds of symmetric PSF that are common in real applications. To illustrate how subsequent recognition tasks can be improved, we propose a new method that simultaneously generates deconvolution and recognition results. Experimentally, our proposed deconvolution method is superior to representative methods and the recognition method produces high recognition rates for blurred face images. In future work, we will focus on extending our PSF estimation strategy to natural image deblurring and our recognition method to non-frontal face recognition problems [7].

References

  • [1] T. Ahonen, E. Rahtu, V. Ojansivu and J. Heikkila (2008) Recognition of blurred faces using local phase quantization. In Proceedings of International Conference on Pattern Recognition, pp. 1–4. Cited by: §IV-C.
  • [2] Y. A. Y. AI-Najjar and D. C. Soong (2012-08) Comparison of image quality assessment: PSNR, HVS, SSIM, UIQI. International Journal of Scientific & Engineering Research 3 (8). Cited by: §IV.
  • [3] A.C. Bovik, T.S. Huang and Jr. Munson (1983-12) A generalization of median filtering using linear combinations of order statistics. Acoustics, Speech and Signal Processing, IEEE Transactions on 31 (6), pp. 1342–1350. Cited by: §II-E.
  • [4] S.H. Chan, R. Khoshabeh, K.B. Gibson, P.E. Gill and T.Q. Nguyen (2011-11) An augmented lagrangian method for total variation video restoration. Image Processing, IEEE Transactions on 20 (11), pp. 3097–3111. Cited by: §I, §II-D.
  • [5] M. Delbracio and G. Sapiro (2015-11) Removing camera shake via weighted fourier burst accumulation. Image Processing, IEEE Transactions on 24 (11), pp. 3293–3307. Cited by: §I.
  • [6] C. Ding, J. Choi, D. Tao and L. S. Davis (2015) Multi-directional multi-level dual-cross patterns for robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell., doi: 10.1109/TPAMI.2015.2462338. Cited by: §I.
  • [7] C. Ding and D. Tao (2015) A comprehensive survey on pose-invariant face recognition. ACM Trans. Intell. Syst. Technol.. Cited by: §V.
  • [8] W. Dong, D. Zhang, G. Shi and X. Wu (2011-07) Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. Image Processing, IEEE Transactions on 20 (7), pp. 1838–1857. Cited by: §I.
  • [9] W. Dong, L. Zhang, G. Shi and X. Li (2013-04) Nonlocally centralized sparse representation for image restoration. Image Processing, IEEE Transactions on 22 (4), pp. 1620–1630. Cited by: §I.
  • [10] R. Fan, P. Chen and C. Lin (2005) Working set selection using second order information for training support vector machines. Journal of Machine Learning Research 6, pp. 1889–1918. Cited by: §II-F.
  • [11] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis and W. T. Freeman (2006) Removing camera shake from a single photograph. In SIGGRAPH, pp. 787–794. Cited by: §I.
  • [12] A.S. Georghiades, P.N. Belhumeur and D. Kriegman (2001-06) From few to many: illumination cone models for face recognition under variable lighting and pose. Pattern Analysis and Machine Intelligence, IEEE Transactions on 23 (6), pp. 643–660. Cited by: §IV.
  • [13] M. Gonen and E. Alpaydm (2011) Multiple kernel learning algorithms. Journal of Machine Learning Research 12, pp. 2211–2268. Cited by: §II-F.
  • [14] M. R. Hestenes and E. Stiefel (1952) Methods of conjugate gradients for solving linear systems. Journal of Research of National Bureau of Standards 49 (6), pp. 409–436. Cited by: §II-B.
  • [15] M. R. Hestenes (1969) Multiplier and gradient methods. Journal of Optimization Theory and Applications 4, pp. 303–320. Cited by: §II-C.
  • [16] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya and K. R. K. Murthy (2001-03) Improvements to platt’s smo algorithm for svm classifier design. Neural Computation 13 (3), pp. 637–649. External Links: ISSN 0899-7667 Cited by: §II-F.
  • [17] D. Krishnan, T. Tay and R. Fergus (2011) Blind deconvolution using a normalized sparsity measure. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 233–240. Cited by: Fig. 10, Fig. 11, Fig. 12, Fig. 7, Fig. 8, Fig. 9, §IV-B, §IV-C, TABLE I, TABLE II, TABLE III, TABLE VI, TABLE VII, §IV.
  • [18] A. Levin, Y. Weiss, F. Durand and W. T. Freeman (2011) Understanding blind deconvolution algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33 (12), pp. 2354–2357. Cited by: §I.
  • [19] Y. Liao and X. Lin (2005) Blind image restoration with eigen-face subspace. Image Processing, IEEE Transactions on 14 (11), pp. 1766–1772. Cited by: §I.
  • [20] H. Liu, X. Sun, L. Fang and F. Wu (2015-11) Deblurring saturated night image with function-form kernel. Image Processing, IEEE Transactions on 24 (11), pp. 4637–4650. Cited by: §I.
  • [21] J. Mairal, M. Elad and G. Sapiro (2008-01) Sparse representation for color image restoration. Image Processing, IEEE Transactions on 17 (1), pp. 53–69. Cited by: §I.
  • [22] A. Mittal, A.K. Moorthy and A.C. Bovik (2012-12) No-reference image quality assessment in the spatial domain. Image Processing, IEEE Transactions on 21 (12), pp. 4695–4708. Cited by: §II-E.
  • [23] A. Mittal, A. Moorthy and A. Bovik (2012) Making image quality assessment robust. In IEEE Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), pp. 1718–1722. Cited by: §I, §II-E.
  • [24] M. Nishiyama, A. Hadid, H. Takeshima, J. Shotton, T. Kozakaya and O. Yamaguchi (2011) Facial deblur inference using subspace analysis for recognition of blurred faces. Pattern Analysis and Machine Intelligence, IEEE Transactions on 33 (4), pp. 838–845. Cited by: §I, Fig. 7, Fig. 8, Fig. 9, §IV-B, §IV-C, TABLE I, TABLE II, TABLE III, TABLE IV, TABLE V, §IV.
  • [25] J. Pan, Z. Hu, Z. Su and M. Yang (2014) Deblurring face images with exemplars. In European Conference on Computer Vision, pp. 47–62. Cited by: §I, Fig. 10, Fig. 11, Fig. 12, Fig. 7, Fig. 8, Fig. 9, TABLE I, TABLE II, TABLE III, TABLE VI, TABLE VII, §IV.
  • [26] P.J. Phillips, H. Moon, S.A. Rizvi and P.J. Rauss (2000-10) The feret evaluation methodology for face-recognition algorithms. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (10), pp. 1090–1104. Cited by: §IV.
  • [27] J. C. Platt (1998-01) Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning, Cited by: §II-F.
  • [28] Q. Shan, J. Jia and A. Agarwala (2008) High-quality motion deblurring from a single image. In SIGGRAPH ASIA, pp. 73:1–73:10. Cited by: §I.
  • [29] T. Sim, S. Baker and M. Bsat (2003-12) The cmu pose, illumination, and expression database. Pattern Analysis and Machine Intelligence, IEEE Transactions on 25 (12), pp. 1615–1618. Cited by: §IV.
  • [30] Y. Tsin and T. Kannade (2004) A correlation-based approach to robust point set registration. In Proceedings of European Conference on Computer Vision, pp. 558–569. Cited by: §III-A.
  • [31] G. Tzimiropoulos, S. Zafeiriou and M. Pantic (2011) Sparse representations of image gradient orientations for visual recognition and tracking. In Proceedings of IEEE Computer Vision and Pattern Recognition, Workshop on CVPR for Human Behaviour Analysis, pp. 26–33. Cited by: §I.
  • [32] S.V.N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt and M. Varma (2010-12) Multiple kernel learning and the SMO algorithm. In Advances in Neural Informatin Processing Systems, pp. 2361–2369. Cited by: §II-F.
  • [33] R. Wang and D. Tao (2014) Recent progress in image deblurring. arXiv:1409.6838v1. Cited by: §I.
  • [34] Z. Wang and A.C. Bovik (2002-03) A universal image quality index. Signal Processing Letters, IEEE 9 (3), pp. 81–84. Cited by: §IV.
  • [35] J. Wright, AY. Yang, A. Ganesh, S.S. Sastry and Y. Ma (2009-02) Robust face recognition via sparse representation. Pattern Analysis and Machine Intelligence, IEEE Transactions on 31 (2), pp. 210–227. Cited by: §I.
  • [36] L. Xiao, J. Gregson, F. Heide and W. Heidrich (2015-10) Stochastic blind motion deblurring. Image Processing, IEEE Transactions on 24 (10), pp. 3071–3085. Cited by: §I.
  • [37] F. Xue and T. Blu (2015-02) A novel sure-based criterion for parametric psf estimation. Image Processing, IEEE Transactions on 24 (2), pp. 595–607. Cited by: §I.
  • [38] AY. Yang, Z. Zhou, AG. Balasubramanian, S.S. Sastry and Y. Ma (2013-08) Fast minimization algorithms for robust face recognition. Image Processing, IEEE Transactions on 22 (8), pp. 3234–3246. Cited by: §I.
  • [39] H. Zhang, J. Yang, Y. Zhang and N. M. N. adn Thomas S. Huang (2011) Close the loop: joint blind image restoration and recognition with sparse representation prior. In Proceedings of International Conference on Computer Vision, pp. 770–777. Cited by: §I, §I, §IV-C, TABLE IV, TABLE V, TABLE VIII, §IV.
  • [40] X. Zhao, X. Chai, Z. Niu, H. C. Keng and S. Shan (2011) Sparsely encoded local descriptor for face recognition. In International Conference on Automatic Face and Gesture Recognition, pp. 149–154. Cited by: §I.

Dayong Tian received the B.S. degree in Electronic Information Science and Technology and M.E. degree in Electronic Information Engineering from Xidian University, Xi’an, China. He is currently pursuing the Ph.D. degree with the Center for Quantum Computation and Intelligent Systems and the Faculty of Engineering and Information Technology, University of Technology at Sydney, Sydney, NSW, Australia. His research interests include computer vision and machine learning, and in particular, on image restoration, image retrieval and face recognition.

Dacheng Tao (F’ 15) is Professor of Computer Science with the Centre for Quantum Computation and Intelligent Systems, and the Faculty of Engineering and Information Technology in the University of Technology, Sydney. He mainly applies statistics and mathematics to data analytics and his research interests spread across computer vision, data science, image processing, machine learning, neural networks and video surveillance. His research results have expounded in one monograph and 100+ publications at prestigious journals and prominent conferences, such as IEEE T-PAMI, T-NNLS, T-IP, JMLR, IJCV, NIPS, ICML, CVPR, ICCV, ECCV, AISTATS, ICDM; and ACM SIGKDD, with several best paper awards, such as the best theory/algorithm paper runner up award in IEEE ICDM’07, the best student paper award in IEEE ICDM’13, and the 2014 ICDM 10 Year Highest Paper Award.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
354633
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description