Confidence Measure Guided Single Image De-raining

Confidence Measure Guided Single Image De-raining

Rajeev Yasarla,  and Vishal M. Patel,  Rajeev Yasarla is with the Whiting School of Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218-2608, e-mail: ryasarl1@jhu.eduVishal M. Patel is with the Whiting School of Engineering, Johns Hopkins University, e-mail: vpatel36@jhu.eduManuscript received…
Abstract

Single image de-raining is an extremely challenging problem since the rainy images contain rain streaks which often vary in size, direction and density. This varying characteristic of rain streaks affect different parts of the image differently. Previous approaches have attempted to address this problem by leveraging some prior information to remove rain streaks from a single image. One of the major limitations of these approaches is that they do not consider the location information of rain drops in the image. The proposed Image Quality-based single image Deraining using Confidence measure (QuDeC), network addresses this issue by learning the quality or distortion level of each patch in the rainy image, and further processes this information to learn the rain content at different scales. In addition, we introduce a technique which guides the network to learn the network weights based on the confidence measure about the estimate of both quality at each location and residual rain streak information (residual map). Extensive experiments on synthetic and real datasets demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods.

Deraining, convolutional neural networks, image restoration.

I Introduction

Many practical computer vision-based systems such as surveillance and autonomous driving often require processing and analysis of videos and images captured under adverse weather conditions such as rain, snow, haze etc. These weather-based conditions adversely affect the visual quality of images and as a result often degrade the performance of vision systems. Hence, it is important to develop algorithms that can automatically remove these artifacts before they are fed to a vision-based system for further processing.

Fig. 1: Sample image deraining results. (a),(d),(g) are Rainy images. (b) De-rained using DID-MDN [1] where zoomed in parts show the rain streaks and blurry effects on faces and hands. (e) De-rained using Fu et al. [2] where zoomed in parts show under de-raining of the image. (h) De-rained using JORDER [3] where zoomed in part shows under de-raining near grass and blurry effects on the plant. (c),(f),(i) are de-rained images using QuDeC.

In this paper, we address the problem of removing rain streaks from a single rainy image. Rain streak removal or image de-raining is a difficult problem since a rainy image may contain rain streaks which may vary in size, direction and density. A number of different techniques have been developed in the literature to address this problem. These algorithms can be clustered into two main groups - (i) video based algorithms [4, 5, 6, 7, 8], and (ii) single image-based algorithms [1, 2, 9, 3, 10]. Algorithms corresponding to the first category assume temporal consistency among the image frames, and use this assumption for de-raining. On the other hand, single image de-raining methods attempt to use some prior information to remove rain components from a single image [9, 11, 10, 1]. Priors such as sparsity [12, 13] and low-rank representation [14] have been used in the literature. In particular, the method proposed by Fu et al. [15] uses a priori image domain knowledge by focusing on high frequency details during training to improve the de-raining performance. However, it was shown in [1], that this method tends to remove some important parts in the de-rained image (see Figure 1(e)). Similarly, a recent work by Zhang and Patel [1] uses the image-level priors to estimate the rain density information which is then used for de-raining. Although their approach provides the state-of-the-art results, they estimate image level priors which do not consider the location information of rain drops in the image. As a result, their algorithm tends to introduce some artifacts in the final de-rained images. These artifacts can be clearly seen from the de-rained results shown in Figure 1(b).

In our previous work [16] we proposed a different approach to image de-raining where we made use of the observation that rain streak density and direction does not change drastically with different scales. Rather than relying on the rain density information (i.e. heavy, medium or light) alone, we developed a method in which the rain streak location information is also taken into consideration in a multi-scale fashion to improve the de-raining performance. In this paper, we extend our work in [16] by also incorporating the distortion level information at each location in the lower scales of the image to further improve the de-raining performance. To this end, we use the Natural Image Quality Evaluator (NIQE) no-reference image quality score [17] and formulate a joint task of estimating the level of distortion at each spatial location of the image and computing rain streak information to reconstruct the clean image. This is achieved by adding a decoder network which computes the level of distortion at each location, to our UMRL approach in [16]. This additional decoder network labels the distortion at each location into three classes and also computes the confidence scores for each patch (which indicates how confident the decoder is about those obtained labels) as shown in the Figure 2. As a result, this can be interpreted as soft labelling of the distortion levels in the image. As shown in Figure 2 different patches in image are labelled with different confident scores. This makes the proposed QuDeC network more robust by blocking the errors in level of distortion while estimating the de-rained image. In addition, we propose a novel loss function to train the network. In summary, we add the following novelties to our previous UMRL approach in [16].

  • We formulate a joint task of estimating the level of distortion at each spatial location of the image and compute rain streak information by adding a new decoder to UMRL [16].

  • A new loss function is proposed for estimating the confidence scores for both residual maps and distortion quality-label maps.

Fig. 2: (a) Input rainy image, . (b) location-quality-label-map. (c) corresponding confidence scores.

Fig. 1(c), (f) and (i) present sample results from our QuDeC network, where one can clearly see that QuDeC is able to remove the noise artifacts and provides better results as compared to [2, 3] and [1]. To summarize, the following are our key contributions in this paper.

  • A novel method called QuDeC is proposed which jointly computes the level of distortion at each location and estimates rain streak information in multi-scale fashion to obtain the de-rained image.

  • Confidence scores are computed for each task and judiciously combined to obtain improved results.

  • We run extensive experiments to show the performance of QuDeC against several recent state-of the-art approaches on both synthetic and real rainy images. Furthermore, an ablation study is conducted to demonstrate the effectiveness of different parts of the proposed QuDeC network.

Rest of the paper is organized as follows. In Section II, we review a few related works. Details of the proposed method are given in Section III. Training details are provided in Section IV. Experimental results are presented in Section V, and finally, Section VI concludes the paper with a brief summary.

Ii Background and Related Work

An observed rainy image can be modeled as the superposition of a rain component (i.e. residual map) with a clean image as follows

(1)

Given the goal of image de-raining is to estimate . This can be done either by estimating the residual map and then subtracting it from the observed image or by directly estimating from . Various methods have been proposed in the literature for image de-raining [11, 18, 13, 19, 9] including dictionary learning-based [20], Gaussian mixture-model (GMM) based [21], and low-rank representation based [22] methods. In recent years, deep learning-based single image de-raining methods have also been proposed in the literature. Fu et al. [15] proposed a convolutional neural network (CNN) based approach in which they directly learn the mapping relationship between rainy and clean image detail layers from data. Zhang et al. [23] proposed a generative adversarial network (GAN) based method for image de-raining. Furthermore, to minimize the artifacts introduced by GANs and ensure better visual quality, a new refined loss function was also introduced in [23]. Fu et al. [2] presented an end-to-end deep learning framework for removing rain from individual images using a deep detail network which directly reduces the mapping range from input to output. Zhang and Patel [1] proposed a density-aware multi-stream densely connected CNN for joint rain density estimation and de-raining. Their network automatically determines the rain-density information and then efficiently removes the corresponding rain-streaks using the estimated rain-density label. Note the methods proposed in [2], and [1] showed the benefits of using multi-scale networks for image de-raining. Recently, Wang et al. [24] proposed a hierarchical approach based on estimating different frequency details of an image to get the de-rained image. The method proposed by Qian et al. [25] generates attentive maps using the recurrent neural networks, and then uses the features from different scales to compute the loss for removing the rain drops on glasses. Note that this method was specifically designed for removing rain drops from a glass rather than removing rain streaks from an image. [26, 27, 28] illustrated the importance of attention based methods in low-level vision tasks. In a recent work [28], Li et al. proposed a convolutional and recurrent neural network-based method for single image de-raining which makes use of the contextual information for rain removal. It was observed in [1], that some of the recent deep learning-based methods tend to under de-rain or over de-rain the image if the rain condition present in the rainy image is not properly considered during training.

Fig. 3: Sample location-quality-label-maps corresponding to synthetic and real-world rainy images. Here indicates high distortion, indicates medium distortion, indicates low distortion in the corresponding patch of the rainy image, .

Iii Proposed Method

Unlike many deep learning-based methods that directly estimate the de-rained image from the noisy observation, we take a different approach where we formulate the rain removal problem as the joint task of computing distortion level at each location and estimating the residual rain streak component (i.e residual map). We compute the distortion level for the patches of size , and classify them into three classes, , , and . Note that, the ground truth labels for these patches are computed using the NIQE measure [17]. In this way, the distortion level at each location in the rainy image is represented by the location-quality-label-map . Representing the distortion level at each location, and giving it as a prior information while reconstructing the image helps the network to learn the rain streak information which may vary spatially or affect the background differently. From Figure  3 we can clearly observe that different levels of rain streak affect the background differently. In some cases the same level of rain streaks have different affects on the background. For example, as can be seen from Figure 3, rain steaks produce different distortion in bright and heavily textured regions. On the other hand the distortion level is relatively high on the dark and homogeneous regions. To handle these regions differently, we propose a method that jointly estimate the level of distortion at each location and uses it as a prior information in estimating the rain streak information (i.e residual map). In contrast to other methods [1, 3] that use either rain streak binary masks or level of rain streak present in the image, we estimate the distortion caused by rain streaks at each location based on the NIQE measure which makes our method more general and applicable for synthetic and well as real-world images.

To perform the joint task of computing the location quality label maps (), and estimating the residual maps , QuDeC is constructed using one encoder and two decoders as shown in Figure 4. A rainy image is first passed through an encoder network to obtain intermediate features. These features are then further processed by the decoders. In particular, Decoder D2 estimates the location quality labels, which are then given as a prior information to Decoder D1 to estimate the residual maps ().


Fig. 4: An overview of the QuDeC network.

Iii-a QuDeC Network

Given a rainy image , we estimate the residual map and its corresponding confidence maps at three different scales, (at the original input size), (at 0.5 scale of the input size), and (at 0.25 scale of the input size) in a multi-scale fashion [16], as well as the location-quality-label-map along with the corresponding confidence scores () for label map .

The QuDeC network aims to judiciously combine the pixel-wise rain residual information from lower scales, and patch-wise distortion level in the quality label maps while estimating the final de-rained image. To achieve this we compute the confidence maps for the estimated residual maps and location-quality-label-maps. These confidence maps block the errors in the computation by giving low confidence values to erroneous regions in the computations of and . Additionally, these confidence scores enable QuDeC to perform soft labeling of the distortion levels in the location-quality-label-map , which makes QuDeC more robust to location quality label errors and effectively use this information as a prior while estimating . To compute and we start with UMRL [16] and add Decoder D2 to obtain the network architecture of QuDeC as shown in Figure 6.

Fig. 5: (a) Input rainy image, . (b) De-rained image using the base network. (c) De-rained using [1]. (d) De-rained using the proposed encoder and decoder D1 of the QuDeC network. (e) The residual map. (f) The confidence map at scale 1.0().
Fig. 6: An overview of the proposed QuDeC network. The aim of the QuDeC network is to estimate the clean image given the corresponding rainy image. QuDeC learns the residual maps, location-quality-label-maps, and computes the confidence maps to guide the network. To achieve this, we introduce ReCoN and LCN networks, and feed their outputs to the subsequent layers of Decoder D1.

Iii-A1 Encoder and Decoder D1

Rain streaks are high frequency components and existing de-raining methods either tend to remove high frequencies that are not rain streaks or do not remove the rain near high frequency components of the clean image like edges as shown in the Figure 5. To address this issue, one can use the information about the location in the image where the network might go wrong in estimating the residual value. This can be done by estimating a confidence value corresponding to the estimated residual value and guide the network to remove the artifacts, especially near the edges. For example, we can observe clearly from Figure 5 that the residual map and its corresponding confidence map are able to capture the regions where there is high probability of incorrect estimates. In the encoder and decoder D1 networks we estimate the residual values and their corresponding confidence maps at different scales (1.0(), 0.5() and 0.25()) of the input size. This information is then fed back to the subsequent layers so that the network can learn the residual value at each location, given the computed residual value and confidence value at lower scales.

The encoder and decoder D1 networks are similar to the the encoder and decoder networks of UMRL [16] where a convolutional block (ConvBlock as shown in Figure 7(a)) is used as the building block. The encoder network is described as follows,
ConvBlock(3,32)-AvgPool-ConvBlock(32,32)-AvgPool-Convblock(32,32)-AvgPool-ConvBlock(32,32)-AvgPool
where AvgPool is the average pooling layer, UpSample is the upsampling convolution layer, and ConvBlock indicates ConvBlock with input channels and output channels. The decoder D1 network is described as follows,
ConvBlock(32,32)-UpSample-ConvBlock(64,32)-UpSample-ConvBlock(67,32)-UpSample-ConvBlock(67,16)-ConvBlock(16,16)-Conv2d,
where ReCoN networks are added to Decoder D1 to estimate at different scales and their corresponding confidence maps . Given the feature maps as input to ReCoN network, RN (residual network) estimates the residual map and CN (confidence network) computes the corresponding confidence map as shown in Figure 7(d).

Fig. 7: (a) Convolutional block (ConvBlock). BN - batchnormalization, ReLU - Rectified Linear Units, Conv2d() - convolutional layer with kernel of size . (b) Residual Network (RN). (c) Confidence map Network (CN). (d) Residual-Confindence Network(ReCoN).

Feature maps at different scales such as and are given as inputs to the Residual Network (RN) to estimate the residual map at the corresponding scale as shown in Figure 7(d). RN consists of the following sequence of convolutional layers, Convblock(64,32)-Convblock(32,32)-Convblock(32,3)
as shown in Figure 7(b). We use the estimated residual map and the feature maps as inputs to the Confidence map Network (CN) to compute the confidence measure at every pixel, which indicates how sure the network is about the residual value at each pixel. CN consists of the following sequence of convolutional layers,

Convblock(67,16)-Convblock(16,16)-Convblock(16,3)

as shown in Figure 7(c). Given the estimated residual map and the corresponding feature maps as inputs to the confidence map network, it estimates and . The element wise product of and is computed, and up-sampled. This is used an input to the subsequent layers of the Decoder D1 network in QuDeC as shown in Figure 6 for . Given the output residual map and the feature maps of the final layer of the Decoder D1 network in QuDeC as input to CN, we get .

A Refinement Network (RFN) is used at the end of Decoder D1 to produce the de-rained image. It takes as the input and generates (i.e. derained image) as the output. The RFN consists of the following blocks
Conv2d-Conv2d-tanh(), where Conv2d represents 2D convolution using the kernel of size .

PSNR:26.63 SSIM: 0.91
PSNR:30.39 SSIM: 0.94
PSNR:33.84 SSIM: 0.97
PSNR:Inf SSIM: 1
PSNR:26.63 SSIM: 0.91
PSNR:30.39 SSIM: 0.94
PSNR:33.84 SSIM: 0.97
PSNR:Inf SSIM: 1
Fig. 8: (a) Input rainy image. (b) De-rained image using DIDMDN [1]. (c) De-rained using UMRL [16]. (d) De-rained using our proposed method QuDeC. (e) Location-quality-label-map computed by QuDeC. (f) Ground truth clean image.
PSNR:22.29 SSIM: 0.86

Iii-A2 Decoder D2

Rain streaks have different distortion effects on different parts of the image.

As a result while reconstructing the clean image, we also estimate the prior information such as distortion caused by rain streak at every location. Not using this prior information may lead to inferior de-raining performance. As can be seen from Figure 8 that DIDMDN [1] and UMRL [16] do not perform well as they lack prior distortion level information at each location of the image. We address this by introducing a decoder D2 in our method and formulate joint task of computing the distortion level at each location and estimating the residual rain streak information. Decoder D2 is similar to Decoder D1 with the following sequence of blocks,
ConvBlock(32,32)-UpSample-ConvBlock(64,32)-UpSample-ConvBlock(67,32)-UpSample-ConvBlock(67,16)-ConvBlock(16,16)-GlobalAveragePool-FullyConnectedLayer.
Decoder D2 takes feature maps obtained from the encoder as shown in Figure 6, to estimate the distortion levels in location-quality-label-map. Label Confidence Network (LCN) is used to compute the confidence scores corresponding to the location-quality-label-maps . Feature maps from the last layer of D2 and obtained location-quality-label-maps are fed to LCN to compute the confidence scores as shown in Figure 9. LCN is a sequence of three ConvBlocks followed by global average pool and fully connected layers as shown in Figure 9.


Fig. 9: Label Confidence Network (LCN).

Iii-B Loss for QuDeC Network

In image restoration tasks, maximum-a-posteriori method is often used to optimize the network parameters () as follows,

(2)

where is the probability function and represents the QuDeC network, . Since QuDeC performs the joint task of estimating and , the above optimization can be updated as follows,

To find the optimal network parameters , needs to be maximized. For simplicity to solve this optimization problem, let us assume and are Gaussian distributions. As our goal is to minimize the joint errors between and the ground-truth clean image (), and the ground-truth labels () of rainy image , we denote the mean of distribution as and variance as , and mean of distribution as and variance as . Thus our objective becomes,

(3)

Substituting, and into the above equation, we get

(4)

where , and . In (4) variance () and can be inferred in two ways as explained in [29, 30]. (i) Epistemic uncertainty, which is explained as the model uncertainty given enough data to train, and (ii) Aleatoric uncertainty that captures noise inherent in the observations, which is data dependent. Epistemic uncertainty can be formulated as variational inference to compute variance. Aleatoric uncertainty can be formulated as MAP (maximum-aposterior) or ML (maximum-likelihood) inference. Here, in our method we attempt to address the uncertainty caused in the outputs due to different properties of rain streaks like density, direction, and effect on background scene which are inherent in rainy images. Following the ML inference, in the above equation (4), we can view the terms and as the corresponding location-based confidence maps or scores. These confidence maps indicate the erroneous regions in the estimates of residual maps or location-quality-label-maps by giving low values to those regions or pixels. These errors occur at regions or pixels where the variance is high. Note that in our method, the confidence map has no ground-truths. We compute these confidence scores using CN (Confidence Network) and LCN (Label Confidence Network) as explained in the earlier sections. Note that the values in the confidence map at every position will be in the range of . Additionally the L2-norm in loss of in (4) for classifying the distortion levels in location-quality-map , should be replaced with the cross entropy loss. The residual maps are estimated in a multi-scale fashion. Thus loss is constructed as follows,

(5)

where , and is a set of patches in the rainy image . Inspired by the importance of the perceptual loss in many image restoration tasks [31, 12], we also use it to further improve the visual quality of the de-rained images. Features from layer of a pretrained network VGG-16 [32] are used to compute the perceptual loss [33, 34]. Let denote the features obtained using the VGG16 model [32], then the perceptual loss is defined as follows

(6)

where is the number of channels of , is the height and is the width of feature maps. The overall loss used to train the QuDeC network is,

(7)

Iv QuDeC Training

The QuDeC network is trained using the synthetic image datasets created by the authors of [1, 23, 3]. The dataset in [1] consists of 12000 images with different rain levels like low, medium and high. The dataset in [23] contains 700 training images. The dataset in [3] contains 1800 rainy images for training. We generate the ground truth location-quality-maps, which indicate the distortion levels of the background in rainy images, using the NIQE scores. Figure 10 show the histogram of NIQE scores corresponding to the patches of rainy images from the DIDMDN training dataset [1]. We divide the patches into three levels of distortions using the thresholds and . These thresholds are chosen such that the patches are divided into three equal groups. The following pseudo code summarizes the procedure used for generating the location-quality-label-maps using the NIQE scores.

  Input: Rainy image , patch in of size .
  Output: Location-quality-label-map, .
  for  do
     if NIQE( then
        
     else if  NIQE( then
        
     else if  NIQE( then
        
     end if
  end for

Note that the NIQE scores [17] are obtained using various features which may not be specifically useful for computing the distortion levels caused by the rain streaks. Furthermore, the thresholds and may vary from one dataset to another. In order to deal with these issues, we train a Generation-of-Labels Network (GLN) to automatically generate the location-quality-label-maps from the input rainy images. The GLN network is trained using pairs of from the DIDMDN training images. We provide the network architecture and training details of GLN in the appendix. The estimated from GLN as well as the clean image along with the rainy image are used to train the proposed QuDeC network.


Fig. 10: Histogram of NIQE values corresponding to the patches from the DIDMDN training images.

Iv-a Test Datasets

The QuDeC network is tested on synthetic and real-world rainy images published by the authors of [1, 23, 3]. The DIDMDN synthetic test dataset consists of two subsets Test-1 and Test-2 containing 1000 and 1200 images, respectively [1]. The Rain800 dataset shared by the authors of [23] contains 100 synthetic rainy images for testing. The Rain200H dataset contains 200 heavy rain synthetic images provided by the authors [3]. In addition to the synthetic images, we use 100 real-world rainy images provided by the authors of [1, 23, 3] to show the qualitative performance of QuDeC.

PSNR:21.52 SSIM: 0.80
PSNR: 21.80 SSIM:0.78
PSNR: 23.23 SSIM: 0.81
PSNR: 27.90 SSIM: 0.89
PSNR: InfSSIM: 1
PSNR: 15.73SSIM: 0.70
PSNR:22.01 SSIM: 0.81
PSNR: 24.14 SSIM: 0.86
PSNR: 22.71 SSIM:0.85
PSNR: 28.89 SSIM: 0.92
PSNR: InfSSIM: 1
PSNR: 14.76SSIM: 0.72
PSNR:19.10 SSIM: 0.80
PSNR: 21.37 SSIM: 0.82
PSNR: 21.95 SSIM:0.86
PSNR: 24.01 SSIM: 0.89
PSNR: InfSSIM: 1
PSNR: 14.75SSIM: 0.59
PSNR:21.22 SSIM: 0.76
PSNR: 25.78 SSIM: 0.83
PSNR: 25.54 SSIM:0.85
PSNR: 27.29 SSIM: 0.88
PSNR: InfSSIM: 1
PSNR: 21.50SSIM: 0.84
PSNR:28.77 SSIM: 0.92
PSNR: 29.83 SSIM: 0.94
PSNR: 28.63 SSIM:0.93
PSNR: 29.27 SSIM: 0.96
PSNR: InfSSIM: 1
PSNR:20.57 SSIM: 0.83
Rainy Image   
PSNR:23.48 SSIM: 0.73
DDN [2](CVPR’17)
PSNR:26.05SSIM:0.84
RESCAN [28](ECCV’18)
PSNR: 29.21 SSIM:0.88
DID-MDN [1](CVPR’18)
PSNR: 29.94SSIM:0.89
QuDeC ours   
PSNR: Inf SSIM:1
PSNR:21.52 SSIM: 0.80
PSNR: 21.80 SSIM:0.78
PSNR: 23.23 SSIM: 0.81
PSNR: 27.90 SSIM: 0.89
PSNR: InfSSIM: 1
PSNR: 15.73SSIM: 0.70
PSNR:22.01 SSIM: 0.81
PSNR: 24.14 SSIM: 0.86
PSNR: 22.71 SSIM:0.85
PSNR: 28.89 SSIM: 0.92
PSNR: InfSSIM: 1
PSNR: 14.76SSIM: 0.72
PSNR:19.10 SSIM: 0.80
PSNR: 21.37 SSIM: 0.82
PSNR: 21.95 SSIM:0.86
PSNR: 24.01 SSIM: 0.89
PSNR: InfSSIM: 1
PSNR: 14.75SSIM: 0.59
PSNR:21.22 SSIM: 0.76
PSNR: 25.78 SSIM: 0.83
PSNR: 25.54 SSIM:0.85
PSNR: 27.29 SSIM: 0.88
PSNR: InfSSIM: 1
PSNR: 21.50SSIM: 0.84
PSNR:28.77 SSIM: 0.92
PSNR: 29.83 SSIM: 0.94
PSNR: 28.63 SSIM:0.93
PSNR: 29.27 SSIM: 0.96
PSNR: InfSSIM: 1
PSNR:20.57 SSIM: 0.83
Rainy Image   
PSNR:23.48 SSIM: 0.73
DDN [2](CVPR’17)
PSNR:26.05SSIM:0.84
RESCAN [28](ECCV’18)
PSNR: 29.21 SSIM:0.88
DID-MDN [1](CVPR’18)
PSNR: 29.94SSIM:0.89
QuDeC ours   
PSNR: Inf SSIM:1
Ground Truth   
Fig. 11: Rain-streak removal results on sample images from the synthetic images consisting of different rain levels (low, medium and heavy) and directions.
PSNR: 13.49SSIM: 0.55
Dataset Testset
Fu et al.
[15](TIP’17)
DDN
[2](CVPR’17)
JORDER
[3](CVPR’17)
RESCAN
[28](CVPR’18)
DID-MDN
[1](CVPR’18)
UMRL+
cycle spinning
[16] (CVPR’19)
QuDeC
ours
Test-1 22.070.84 27.330.90 24.320.86 27.190.87 27.950.91 29.770.92 30.430.93
DIDMDN[1] Test-2 19.730.83 25.630.88 22.260.84 25.650.88 26.080.90 26.670.92 26.720.92
Rain800 [23] Rain800 18.950.78 21.330.80 21.130.81 24.370.84 23.570.87 24.520.86 24.610.86
JORDER [3] Rain200H 21.830.81 23.240.83 24.020.86 25.970.90 23.430.86 26.380.92 26.740.93

Values highlighted in - indicate the best performance, indicate the second best performance, indicate the third best performance among the de-raining methods on the test datasets.

TABLE I: Quantitative results evaluated in terms of average SSIM and PSNR (dB) (PSNRSSIM).

Iv-B Training Details

The rainy-clean-distortion label image pairs are used to train QuDeC using the loss . The Adam optimizer with the batch size of 1 is used to train the network. Learning rate is set equal to 0.0002 for the first 20 epochs and 0.0001 for the remaining epochs. During training initially , and are set equal to 0.1, 0.1 and 1.0, respectively, but when the mean of all values in the confidences maps and is greater than 0.8 then is set equal to 0.03. QuDeC is trained for 60 epochs. During inference given a rainy image , QuDeC estimates the residual map and its corresponding confidence map at three different scales, (at the original input size), , and (at 0.25 scale of the input size) along with the location-quality-label-map .

Fig. 12: De-raining results on sample real-world images.

V Experimental Results

In this section, we evaluate the performance of our method on both synthetic and real images. Peak-Signal-to-Noise Ratio (PSNR) and Structural Similarity index (SSIM) [35] measures are used to compare the performance of different methods on synthetic images. We visually inspect the performance of different methods on real images, as we don’t have the ground truth clean images. The performance of the proposed QuDeC method is compared against the following recent state-of-the-art methods:
(a) Fu et al.[15] CNN method (TIP’17),
(b) Joint Rain Detection and Removal (JORDER) [3] (CVPR’17),
(c) Deep detailed Network (DDN)[2] (CVPR’17),
(d) Density-aware Image De-raining method using a Multistream Dense Network (DID-MDN) [1] (CVPR’18),
(e) REcurrent SE Context Aggregation Net (RESCAN) [28] (ECCV’18)
(f) Uncertainty guided Multi-scale Residual Learning (UMRL) network [16] (CVPR’19).

V-a Results on Synthetic Test Images

The proposed QuDeC method based on cycle spinning [16] is also compared against the state-of-the-art algorithms qualitatively and quantitatively. Table I shows the quantitative performance of our method. As it can be seen from this table, our method clearly out-performs the present state-of-the-art image de-raining algorithms. On average, QuDeC outperformes the methods like RESCAN [28] and DIDMDN [1] by approximately dB. Furthermore, QuDeC outperformes the state-of-the-art method, UMRL+cycle-spinning [16], by 0.3dB on average. Figure 11 shows the qualitative performance of QuDeC against other methods on synthetic rainy images. The DDN method is over de-raining on some images and on others it is slightly under de-raining as shown in the second column of Figure 11. First four rows show under de-raining performance corresponding to methods RESCAN [28] and DIDMDN [1] in the third and fourth columns of Figure 11 respectively where we can observe residue rain streaks in their outputs. The last three rows show over de-raining of methods RESCAN [28] and DIDMDN [1] in the third and the fourth columns of Figure 11, respectively where we can observe the edges of the objects are blurred and objects like cables or wires and texture of birds feather have disappeared. Visually we can see in the fifth column of Figure 11 that our method produces images without any artifacts. From Figure 11 we can see that our method is able to

  • recover the texture on the wooden wall in the first row,

  • produce clear objects with sharp edges in the fourth and the fifth images of the fifth coulmn,

  • remove rain streaks by maintaining the underlying textures on trees and on the feathers in the third and the sixth images of the fifth coulmn.

Fig. 13: Results of ablation study on synthetic images.

V-B Results on Real-World Rainy Images

We conducted experiments on the real-world images provided by the authors of [23, 3, 1]. Results are shown in Figure 12. Similar to the results obtained on synthetic images, we observe the same trend of either over de-raining or under deraining by the other methods. On the other hand, our method is able to remove rain streaks while preserving details of objects in the resultant output images. For example, the background texture on tree is sharp when compared to other methods. Also, rain streaks are removed properly without losing background information of walls and flowers in the second, third and fourth images of the fifth column in Figure 12. All of these experiments clearly show that our method can handle different levels of rain (low, medium and high) with different shapes and scales.

V-C Ablation study

We study the performance of each block’s contribution to the QuDeC network by conducting extensive experiments on the test datasets. We start with Encoder-Decoder D1 which are similar to UMRL  [16] network. Encoder-Decoder D1 is trained as explained in UMRL [16] method. Now we add Decoder D2, where we call the resultant network as QuDeC w/o LCN, and output of D2 is supervised with cross-entropy loss using the ground-truth maps, . Finally, we add LCN to Decoder D2 to construct our proposed network QuDeC. Table II shows the contribution of each block on the performance of QuDeC network. Addition of Decoder D2, i.e formulating the joint task of computing distortion level at each location and estimating the residual rain streak component improves the overall performance by approximately 0.37dB. Furthermore, introducing LCN to Decoder D2 improves the performance of QuDeC by 0.2dB. Figure 13, visually shows the improvement in performance after adding each block in constructing the QuDeC network. For example, we can clearly observe from Figure 13, QuDeC is able to reconstruct clear skies and dark backgrounds in the second and third images of the fourth column. Also QuDeC is able to reconstruct sharp objects when compared to the outputs of network with only Encoder-Decoder D1 in the first and last rows of Figure 13.

Dataset Testset
UMRL [16]
(Encoder-
Decoder D1)
QuDeC
w/o LCN
QuDeC
DIDMDN [1] Test-1 29.420.91 30.170.92 30.430.93
Test-2 26.470.91 26.580.91 26.720.92
Rain800[23] Rain800 24.190.85 24.420.86 24.610.86
JORDER [3] Rain200H 26.170.91 26.560.92 26.740.93
TABLE II: PSNR and SSIM (PSNRSSIM) results corresponding to the ablation study.

Vi Conclusion

We proposed a novel QuDeC to address the single image de-raining problem. In our approach, we formulate rain removal problem as a joint task of computing distortion level at each location and estimating the residual rain streak information. We judiciously combine the residual rain streak outputs at lower scales and distortion level information at each location using the corresponding confidence maps. Extensive experiments showed that QuDeC is robust enough to handle different levels of rain content for both synthetic and real-world rainy images.

Appendix A Generation-of-Labels Network (GLN) Architecture


Fig. 14: Generation-of-Labels Network (GLN).

Generation-of-Labels Network (GLN) is used to generate the ground-truth location-quality-label-maps . We use residual blocks (ResBlock) as our building module for the GLN network. GLN network consists of a sequence of eight ResBlocks as shown in Figure 14. A ResBlock consists of a convolution layer, a convolution layer and a convolution layer with dilation factor of 2 as shown in Figure 14. Given , the GLN network process each patch of size at a time and outputs the distortion quality-label for the corresponding patch . GLN is trained on the rainy image patches and the corresponding labels, . is generated as explained in Section IV. GLN is trained for epochs using the cross entropy loss with the Adam optimizer and the learning rate is set equal to 0.0002.

Acknowledgment

This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA RD Contract No. 2019-19022600002. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government.

References

  • [1] H. Zhang and V. M. Patel, “Density-aware single image de-raining using a multi-stream dense network,” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. abs/1802.07412, 2018.
  • [2] X. Fu, J. Huang, D. Zeng, X. Ding, Y. Liao, and J. Paisley, “Removing rain from single images via a deep detail network,” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1715–1723, 2017.
  • [3] W. Yang, R. T. Tan, J. Feng, J. Liu, and S. Yan, “Deep joint rain detection and removal from a single image,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1357–1366, 2017.
  • [4] X. Zhang, H. Li, Y. Qi, W. K. Leow, and T. K. Ng, “Rain removal in video by combining temporal and chromatic properties,” In: IEEE International Conference on Multimedia and Expo, pp. 461–464, 2006.
  • [5] K. Garg and S. K. Nayar, “Vision and rain,” In: International Journal of Computer Vision, vol. 75, pp. 3–27, 2007.
  • [6] V. Santhaseelan and V. Asari, “Utilizing local phase information to remove rain from video,” In: International Journal of Computer Vision, vol. 112, 2015.
  • [7] J. Liu, W. Yang, S. Yang, and Z. Guo, “Erase or fill? deep joint recurrent rain removal and reconstruction in videos,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [8] M. Li, Q. Xie, Q. Zhao, W. Wei, S. Gu, J. Tao, and D. Meng, “Video rain streak removal by multiscale convolutional sparse coding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [9] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2736–2744, 2016.
  • [10] L. Zhu, C. W. Fu, D. Lischinski, and P. A. Heng, “Joint bi-layer optimization for single-image rain streak removal,” In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2534, 2017.
  • [11] L. W. Kang, C. W. Lin, and Y. H. Fu, “Automatic single-image-based rain streaks removal via image decomposition,” IEEE Transactions on Image Processing, vol. 21, pp. 1742–1755, 2012.
  • [12] H. Zhang and V. M. Patel, “Convolutional sparse and lowrank coding-based rain streak removal,” 7 IEEE Winter Conference In Applications of Computer Vision(WACV), pp. 1259–1267, 2017.
  • [13] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” In:IEEE International Conference on Computer Vision(ICCV), pp. 3397–3405, 2013.
  • [14] Y. Chen and C. Hsu, “A generalized low-rank appearance model for spatio-temporally correlated rain streaks,” in 2013 IEEE International Conference on Computer Vision, Dec 2013, pp. 1968–1975.
  • [15] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies a deep network architecture for single-image rain removal,” IEEE Transactions on Image Processing, vol. 26, pp. 2944–2956, 2017.
  • [16] R. Yasarla and V. M. Patel, “Uncertainty guided multi-scale residual learning-using a cycle spinning cnn for single image de-raining,” In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  • [17] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, 2012.
  • [18] D.-Y. Chen, C. C. Chen, and L. W. Kang, “Self-learning based image decomposition with applications to single image denoising,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1430 – 1455, 2014.
  • [19] D. A. Huang, L. W. Kang, Y. C. F. Wang, and C. W. Lin, “Self-learning based image decomposition with applications to single image denoising,” IEEE Transactions on multimedia, vol. 16, pp. 83–93, 2014.
  • [20] H. S. Bhadauria and M. L. Dewal, “Online dictionary learning for sparse coding,” In: International Conference on Machine Learning(ICML), pp. 689–696, 2009.
  • [21] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19–41, 2000.
  • [22] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 171–184, 2013.
  • [23] H. Zhang and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” arXiv preprint arXiv:1701.05957, 2017.
  • [24] Y. Wang, S. Liu, C. Chen, and B. Zeng, “A hierarchical approach for rain or snow removing in a single color image,” IEEE Transactions on Image Processing, vol. 26, pp. 3936–3950, 2017.
  • [25] R. Qian, R. T. Tan, W. Yang, J. Su, and J. Liu, “Attentive generative adversarial network for raindrop removal from a single image,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [26] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” CVPR, 2018.
  • [27] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “Attngan: Fine-grained text to image generation with attentional generative adversarial networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [28] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Recurrent squeeze-and-excitation context aggregation net for single image deraining,” In: European Conference on Computer Vision(ECCV), pp. 262–277, 2018.
  • [29] A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” in Advances in Neural Information Processing Systems 30 (NIPS), 2017.
  • [30] A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  • [31] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” In European Conference on Computer Vision(ECCV), pp. 694–711, 2016.
  • [32] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
  • [33] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” 2016.
  • [34] H. Zhang and K. Dana, “Multi-style generative network for real-time transfer,” arXiv preprint arXiv:1703.06953, 2017.
  • [35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
389587
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description