ATFaceGAN: Single Face Image Restoration and Recognition from Atmospheric Turbulence
Image degradation due to atmospheric turbulence is common while capturing images at long ranges. To mitigate the degradation due to turbulence which includes deformation and blur, we propose a generative single frame restoration algorithm which disentangles the blur and deformation due to turbulence and reconstructs a restored image. The disentanglement is achieved by decomposing the distortion due to turbulence into blur and deformation components using deblur generator and deformation correction generator. Two paths of restoration are implemented to regularize the disentanglement and generate two restored images from one degraded image. A fusion function combines the features of the restored images to reconstruct a sharp image with rich details. Adversarial and perceptual losses are added to reconstruct a sharp image and suppress the artifacts respectively. Extensive experiments demonstrate the effectiveness of the proposed restoration algorithm, which achieves satisfactory performance in face restoration and face recognition.
Capturing images at long ranges is always challenging as the degradation due to atmospheric turbulence is inevitable. Under the effects of the turbulent flow of air and changes in temperature, density of air particles, humidity and carbon dioxide level, the captured image is blurry and deformed due to variations in the refractive index [hufnagel1964modulation, roggemann1996imaging]. This will significantly degrade the quality of images and performances of many computer vision tasks such as object detection [oreifej2013simultaneous], recognition and tracking [chen2014detecting]. To suppress these effects, two classical approaches have been considered, one based on adaptive optics [pearson1976atmospheric, tyson2015principles] and the other based on image processing [furhad2016restoring, hirsch2010efficient, zhu2013removing, micheli2014linear, meinhardt2014implementation, lou2013video, lau2019variational, lau2019restoration, chak2018subsampled]. However, these methods require multiple image frames captured by a static imager. Mathematically, [zhu2013removing, hirsch2010efficient, lau2019variational] the process of image degradation due to atmospheric turbulence can be represented as
where is the observed distorted images, is the latent clear image, is a space-invariant point spread function (PSF), is the deformation operator, which is assumed to deform randomly and is the sensor noise.
Recently, many learning-based face restoration algorithms such as face deblurring [Chrysos_2017_CVPR_Workshops, shen2018deep, lu2019unsupervised] and face superresolution [chen2018fsrnet, Yu_2018_CVPR, Yu_2018_ECCV] have been proposed. Moreover, the emergence of Generative Adversarial Networks (GAN) has further improved the quality of reconstructed images. However, these methods have not tackled the problem of deformation, which greatly reduces the quality of the aquired images and the performance of many computer vision tasks.
Recently, [chak2018subsampled] proposed a generative method to restore a clean image from multiple frames using a Wasserstein GAN [arjovsky2017wasserstein] and a subsampled frames algorithm proposed by [lau2019variational]. However, the method assumes a multi-frame setting with a static object. This assumption may not be practical in real life situation.
Motivated by the recent success of data-driven approach, we propose a generative single face image restoration algorithm, namely Atmospheric Turbulence Face GAN (ATFaceGan), which reconstructs a clean face image with texture details preserved by simultaneously disentangling blur and deformation. We build two generators, namely, deblur function and deformation correction function to decompose the degradations in turbulence. Also, we propose a two path training approach to further disentangle the degradation and reconstruct two images. A fusion function is used to combine the information in the two restored images and reconstruct a sharp face image. Some sample restored images are shown in Fig. 1.
Our contributions are summarized below:
The proposed method tackles the atmospheric turbulence degradation problem with a single image input.
We propose a generative face restoration algorithm trained in an end-to-end manner, which tackles degradation due to both blur and deformation by building the deblur function and deformation correction function respectively.
We propose a two path training strategy to further disentangle the blur and deformation and improve the quality of the restored image.
We propose a fusion network to combine the latent features of the intermediate results and reconstruct one clean restored image.
Experiments demonstrate that the restored face image is satisfactory in both quantitative and visual assessment. Further, the restored face images yields improved recognition performance.
Ii Related Work
Turbulence Degraded Image Restoration Classical methods of restoring images degraded by turbulence generally include two approaches. One is "lucky imaging" [aubailly2009automated, vorontsov2001anisoplanatic], which chooses a frame or a number of good frames in a turbulence degraded video and fuses the selected frames. Another one is the registration-fusion approach [hirsch2010efficient, zhu2013removing, xie2016removing, lau2019restoration], which first constructs a good reference image and aligns the distorted frames with the reference image using a non-rigid image registration algorithm. After alignment, the registered images are fused following which a restoration algorithm is applied to deblur the fused image to obtain the final restored image. Despite having satisfactory results, these methods assume multi-frame inputs with static objects. This assumption may be violated easily in reality, for example, pedestrians in long range surveillance video.
Face Restoration Due to recent successes of CNNs and GANs, several CNN-based face restoration algorithms have been proposed. [Chrysos_2017_CVPR_Workshops] proposed a CNN with Residual Blocks to deblur face images. [shen2018deep] proposed a multi-scale CNN that exploits global semantic priors and local structural constraints for face image deblurring as a generator and built a discriminator based on DCGAN [radford2015unsupervised]. [lu2019unsupervised] proposed an unsupervised method for domain-specific single image deblurring by disentangling the content information and blur information using the KL divergence constraint and improves the performance of face recognition. However, since degradation due to turbulence contains motion blur, out-of-focus blur or compression artifacts, these methods could not obtain satisfactory results obtained by the proposed method are shown in Sec(IV-D).
Iii Proposed Algorithm
The proposed face image restoration algorithm is trainable in an end-to-end manner. Our goal is to reconstruct a sharp face image from the distorted face image and enhance the performance of face recognition systems.
Iii-a Problem Setting
Following the formulation of the degradation model discussed in [lau2019restoration, zhu2013removing, lau2019variational], we assure the mathematical model in (1). This is the general setting for restoring the latent clean image from a sequence of turbulence-degraded image frames. However, we assume only one frame is available to reconstruct the latent clean image, a more challenging and practical problem than considered earlier. As a result, the subscript is removed. Also, we notice that the "mixing" of deformation and blur in realistic turbulence face images is very fast and we could not be sure whether deformation precedes blur or blur precedes deformation. Therefore, we use a general turbulence function to replace in (1). Hence, our model becomes
Let and be the space of blurry and deformed face images respectively. Our goal is to construct a restoration function to restore the distorted face images, i.e. . However, it is a highly ill-posed problem as we have very little prior information to reconstruct . Hence, a data-driven approach, in particular the Wasserstein GAN with gradient penalty, is applied to restore it. Moreover, blur and deformation are always combined in the turbulence-degraded face images. We hope to build a deblur function and a deformation correction function to remove the undesired blur and deformation, i.e. and . Therefore, we split the turbulence degradation due to blur and deformation in the training stage. In order to restore a general turbulence function which contains both the blurring operator and the deformation operator , we propose a two path training approach, which tries to obtain more information to obtain a better result. Therefore, two restored images are obtained, i.e. and . A fusion network is implemented to improve the restoration results. Denote , and be the features of image . Mathematically,
where is a image fusion function and is the feature pairs . The end-to-end architecture is illustrated in Fig. 2.
Iii-B Data Augmentation
In order to apply a data-driven method to restore a clean face image from distorted faces, sufficient amount of synthetic training data are needed. Therefore, the blur operator and the deformation operator are required to synthesize the distorted images. In this paper, we use the turbulence generation algorithm from [lau2019variational, lau2019restoration, chak2018subsampled] due to its efficiency in choosing different parameters to generate turbulence-degraded images with various severity.
We follow the procedure discussed in [lau2019variational, lau2019restoration, chak2018subsampled] to generate a random motion vector field to deform the face images. points are selected in a face image . For each point , a patch centered at is considered. A random motion vector field is obtained in . Mathematically,
where is the Gaussian kernel with standard deviation , is the strength value, and are randomly selected from a Gaussian distribution. The overall motion vector field is generated after iterations as follows:
Then this motion vector field would be our deformation operator as
where is the warping operator. The blurring operator is simply a Gaussian kernel. For more details, please see [lau2019variational, lau2019restoration, chak2018subsampled].
In order to construct the deblur function and the deformation correction function , we need to generate a blurry image , a deformed image and a distorted images from each clean face image . To generate , Gaussian blurring filter with parameter is applied on to get . To obtain , the random motion vector field with strength is applied on .
Iii-C Network Architecture
A Wasserstein GAN with gradient penalty is applied to restore the distorted face images. The generator architecture is a CNN, similar to [kupyn2018deblurgan] used for image deblurring. It contains two strided convolution blocks with stride , six residual blocks [he2016deep] (ResBlocks) and two transposed convolution blocks. There are one convolution layer, instance normalization layer [ulyanov2016instance], ReLU activation [nair2010rectified] and a Dropout layer with in each ResBlock. A global skip connection mentioned in [kupyn2018deblurgan] is also added. The deblur function and deformation correction function are with this architecture. The fusion network takes the concatenation of the features from face images and as inputs. The features are extracted after the activation function of the third ResBlock in and . The architecture of the fusion network is exactly the latter half of the structure of and , which contains three ResBlock and two transposed convolution blocks. For the number of channels in the ResBlocks of , since the input is the concatenation of two feature vectors, so the number of channels is also doubled. In order to keep the global skip connection, which has been shown to converge faster, pixel-wise average of and is added to . During training, three discriminators, namely , and , are designed. , and determine whether and and are real or fake. The discriminators are Wasserstein GAN [arjovsky2017wasserstein] with gradient penalty [gulrajani2017improved] (WGAN-GP). Their architectures are same as PatchGAN [isola2017image, li2016precomputed]. All the convolutional layers except the last are followed by InstanceNorm layer and LeakyReLU [xu2015empirical].
Iii-D Disentanglement of Blur and Deformation
In order to disentangle the turbulence distortion into blur and deformation, the deblur function and the deformation correction function are built. The content loss is defined as
which is the sum of the loss between aligned image and and the loss between deblurred image and .
Iii-E Two path training
The two path training strategy helps to disentangle the blur and deformation effects. One fixed order of restoration is needed if two path training is not implemented. For example, the distorted image is restored by and followed by according to (2). Then during the training phase, is trained with the turbulence degraded images which are both blurry and deformed. In other words, the training images for are implicitly assumed to be both blurry and deformed but not merely deformed. Therefore, if two path training is used, then could learn from turbulence degraded images and the deblurred images .
Moreover, the searching space of the optimization problem is larger because no implicit structure of degradation is assumed. As the turbulence function only consists of blur and deformation but not the order of degradation, this gives more information ( and ) to the network and improve the performance.
Iii-F Fusion Loss
After both restored images and are obtained, their features are fused together to obtain the final restored image. The fusion loss is defined as the loss of the restored image and the real clean image , i.e.
Iii-G Adversarial Loss
The Wasserstein-1 distance in WGAN has been shown to have good convergence property and is more stable in training given that the function is 1-Lipschitz. To enforce the 1-Lipschitz constraint, gradient penalty is applied. Then the discriminator and generator losses are defined as
where is the distribution obtained by randomly interpolating between real images and restored images , and . The adversarial loss is
Iii-H Perceptual Loss
Using loss or loss merely as the content loss would lead to blurry artifacts and loss in texture details as these losses favor pixelwise averaging. On the other hand, Perceptual Loss, which is an loss function between the feature maps of real image and generated image, has been demonstrated to be beneficial for image restoration tasks [shen2018deep, kupyn2018deblurgan, lu2019unsupervised]. Therefore, perceptual loss is adopted, which includes
where is the features of the layer of a pretrained CNN. In this paper, the layer of VGG-19 [simonyan2014very] network pretrained on ImageNet [deng2009imagenet] is adopted. The total perceptual loss is
The full loss function is a weighted sum of all the losses,
The weights are empirically set for each loss to balance their importance.
At test time, only the generators are used. Given a turbulence distorted image , the restored image is generated as follows:
|Method||Degraded images||One generator||Decompose into||Add two path||Add fusion|
Our algorithm is trained on [bansal2017umdfaces] and evaluated on six face recognition datasets, including LFW [huang2008labeled], CFP [sengupta2016frontal], AgeDB [moschoglou2017agedb], CALFW [zheng2017cross], CPLFW [zheng2018cross] and VGGFace2 [cao2018vggface2].
Iv-a Training details
The end-to-end design is implemented in Pytorch [paszke2017automatic]. The training was performed on two GeForce RTX 2080 Ti GPU. In training, 10000 aligned face images are randomly picked, which are with resolution from [bansal2017umdfaces] with the turbulence degradation algorithm in Sec(III-B) and a batch size of . During training, we use the Adam solver [kingma2014adam] with hyper-parameters to perform five steps of update on discriminators and then one step on generators. The learning rate is initially set at 0.0001 for the first 30 epochs, then linear decay is applied for the next 20 epochs. For hyper-parameters in deformation operator , we empirically set and . For hyper-parameters in blurring operator , the parameter is set to be . For hyper-parameters in the loss function, we empirically set , and . Note that various parameters in and are randomly picked to synthesize various strength of blur and deformation. The computation time of restoring a image is seconds per image on average.
Iv-B Testing details
In all the six testing dataset, all the pairs of the face images are degraded by the algorithm from [lau2019variational]. PSNR and SSIM are used for evaluating the quality of the restored image. We use a pretrained face recognition network [xu2017high], which is trained as reported in [guo2016ms], to test the face verification performance111Please refer to the corresponding project page for the face verification policy: https://github.com/ZhaoJ9014/face.evoLVe.PyTorch.
Iv-C Ablation study
In this section, the results of an ablation study preformed to analyze the effectiveness of each component or loss in the proposed algorithm are presented. Both quantitative and qualitative results on face dataset in [bansal2017umdfaces] are evaluated for the following four variants of our methods where each component is gradually added: 1) only one generator and one discriminator; 2) splitting the generators into two, and , and the restored image is ; 3) Applying two path training and the restored image is and 4) adding fusion network and fuse them by
We present the PSNR and SSIM for each variant in Table I and visual comparisons in Fig. 3. From Fig. 3, we observe that the resultant images with direct restoration, which only uses one generator, is not satisfactory. This is because turbulence degradation is a very ill-posed problem. There is a large gap between turbulence-degraded and clean image and one generator could not provide enough information to the network. By decomposing the network into two generators, the quantitative performance is similar to one generator but it is less noisy. This is because we have more information for the generators to learn as the intermediate results () provides additional supervision to the final restored image. When we apply the two path training step and as both and are added to supervise the training, the results are good even groundtruth is not used in the training. Adding the fusion network further improves the result as more information (features of and ) is given to the network and the information is combined by the fusion function . Table I also justifies the result.
Iv-D Qualitative and quantitative Evaluation
Since our proposed algorithm is the first single frame-based image restoration method with turbulence-degraded images, which involve blur and deformation, it is hard to compare with other methods. Therefore, We compare with some state-of-the-art image restoration methods including [kupyn2018deblurgan, lu2019unsupervised], which could train with our turbulecnce-dagraded image dataset. These two methods are the representative methods for applying GAN in deblurring in supervised and unsupervised ways respectively. For [kupyn2018deblurgan], we change the batch size from to and the number of training epoch to . For [lu2019unsupervised], we use the default setting.
The quantitative results are shown in Table II and the visual comparison are illustrated in Fig. 4. In Fig. 4, we have demonstrated three images: one from LFW, one from CFF and one from AGEDB. The top one is a frontal image with mild blur and mild deformation, the middle one is a frontal image with moderate blur and severe deformation and the bottom one is a non-frontal gray-scale face image with severe blur and mild deformation. For the top image, we can see that blur is suppressed in all three methods. [kupyn2018deblurgan] and [lu2019unsupervised] shows sharper visual result then ours. However, the result from [kupyn2018deblurgan] is noisy and that from [lu2019unsupervised] is deformed. The proposed method restores the image effectively. On the other hand, if both blur and deformation exist, [kupyn2018deblurgan] would induce more noise as shown in Fig. 4 (b) and [lu2019unsupervised] could not remove the deformation as shown in Fig. 4 (c). The proposed method suppresses both blur and deformation. Moreover, as our training set only consists of images, which include both colored and grey-scale images, the quantitative results generated by [lu2019unsupervised] are not good compared to [kupyn2018deblurgan] and the proposed method as the number of training sample is not large enough. The proposed method trained with a relatively small training set is effective in the presence of severe blur, deformation and pose. The PSNR and SSIM in Table II both demonstrate that the proposed method performs better than state-of-the-art methods.
For the face verification task, we note that [kupyn2018deblurgan] is slightly better than the proposed method in one out of seven experiments even though both the visual quality and quantitative results of the proposed method is better than [kupyn2018deblurgan]. Except LFW, the proposed method is more accurate than the other two methods. The verification accuracy of [kupyn2018deblurgan] is comparable with the proposed method. It is because [kupyn2018deblurgan] uses only perceptual loss as their content loss. As a result, the restored image from [kupyn2018deblurgan] is perceptually similar than the proposed method. Using the distance from two feature output from layer of VGG-19 [simonyan2014very] network as a perceptual metric, namely , we found that the between restored image by [kupyn2018deblurgan] and the original clean image is in LFW while the between the restored image by the proposed method and original clean image is .
Atmospheric turbulence degradation severely harms the task of face verification as the verification accuracy is reduced by more than 10 on average. There could be a significant drop (as much as for CFPFP) even though the face verification system is trained with [guo2016ms], which consists of over 5 million images. Also, as the task of restoration from turbulence is very challenging, the restoration results from other state-of-the-art method do not yield a satisfactory results even they are trained with our dataset. Moreover, the proposed method restores the turbulence degraded images effectively with a relatively small dataset.
Iv-E Performance of the disentangled representation
We try to disentangle the blur and deformation from atmospheric turbulence by training the deblur function and the deformation correction function with a commutative constraint. To see the performance of the disentanglement, and are tested. We try to use to deblur the blurry image and to correct the deformed images. Note that during the training, is only fed with the distorted image and the deformation corrected image of the distorted image.
We test and with and respectively, where are from the LFW dataset. The PSNR, SSIM and the accuracy of face verification are presented in Table III. The visual performance is shown in Fig. 5. For the first row, the image is moderately blurred (Fig. 5 (a)) and severely deformed (Fig. 5 (c)). From Figs. 5 (b) and (d), we see that and successfully remove the blur and deformation from the image and preserve the features of the subject. On the other hand, note that the image in the second row is a profile face with moderate blur (Fig. 5 (a)) and mild distortion (Fig. 5 (c)). Still, and restore the degraded images successfully. Moreover, the PSNR, SSIM and face verification results confirm that and restore the images, preserve shape and semantic information and are robust to severity of blur, deformation and pose.
In this paper, we proposed a single frame image restoration method ATFaceGAN, which is a generative algorithm to disentangle the turbulence distortion into blur and deformation and restores a sharp image. In order to disentangle the turbulence, a deblur generator and a deformation correction generator are introduced. To further separate the blur and deformation, two path training step is employed to produce two restored images. Finally, a fusion function combines the two restored images and generates one clean image. Ablation studies on each component demonstrate the effectiveness of different components. We have conducted extensive experiments on face restoration and face verification using the restored face images. Both quantitative and visual results show promising performance.