Confidence Measure Guided Single Image Deraining
Abstract
Single image deraining is an extremely challenging problem since the rainy images contain rain streaks which often vary in size, direction and density. This varying characteristic of rain streaks affect different parts of the image differently. Previous approaches have attempted to address this problem by leveraging some prior information to remove rain streaks from a single image. One of the major limitations of these approaches is that they do not consider the location information of rain drops in the image. The proposed Image Qualitybased single image Deraining using Confidence measure (QuDeC), network addresses this issue by learning the quality or distortion level of each patch in the rainy image, and further processes this information to learn the rain content at different scales. In addition, we introduce a technique which guides the network to learn the network weights based on the confidence measure about the estimate of both quality at each location and residual rain streak information (residual map). Extensive experiments on synthetic and real datasets demonstrate that the proposed method achieves significant improvements over the recent stateoftheart methods.
I Introduction
Many practical computer visionbased systems such as surveillance and autonomous driving often require processing and analysis of videos and images captured under adverse weather conditions such as rain, snow, haze etc. These weatherbased conditions adversely affect the visual quality of images and as a result often degrade the performance of vision systems. Hence, it is important to develop algorithms that can automatically remove these artifacts before they are fed to a visionbased system for further processing.
In this paper, we address the problem of removing rain streaks from a single rainy image. Rain streak removal or image deraining is a difficult problem since a rainy image may contain rain streaks which may vary in size, direction and density. A number of different techniques have been developed in the literature to address this problem. These algorithms can be clustered into two main groups  (i) video based algorithms [4, 5, 6, 7, 8], and (ii) single imagebased algorithms [1, 2, 9, 3, 10]. Algorithms corresponding to the first category assume temporal consistency among the image frames, and use this assumption for deraining. On the other hand, single image deraining methods attempt to use some prior information to remove rain components from a single image [9, 11, 10, 1]. Priors such as sparsity [12, 13] and lowrank representation [14] have been used in the literature. In particular, the method proposed by Fu et al. [15] uses a priori image domain knowledge by focusing on high frequency details during training to improve the deraining performance. However, it was shown in [1], that this method tends to remove some important parts in the derained image (see Figure 1(e)). Similarly, a recent work by Zhang and Patel [1] uses the imagelevel priors to estimate the rain density information which is then used for deraining. Although their approach provides the stateoftheart results, they estimate image level priors which do not consider the location information of rain drops in the image. As a result, their algorithm tends to introduce some artifacts in the final derained images. These artifacts can be clearly seen from the derained results shown in Figure 1(b).
In our previous work [16] we proposed a different approach to image deraining where we made use of the observation that rain streak density and direction does not change drastically with different scales. Rather than relying on the rain density information (i.e. heavy, medium or light) alone, we developed a method in which the rain streak location information is also taken into consideration in a multiscale fashion to improve the deraining performance. In this paper, we extend our work in [16] by also incorporating the distortion level information at each location in the lower scales of the image to further improve the deraining performance. To this end, we use the Natural Image Quality Evaluator (NIQE) noreference image quality score [17] and formulate a joint task of estimating the level of distortion at each spatial location of the image and computing rain streak information to reconstruct the clean image. This is achieved by adding a decoder network which computes the level of distortion at each location, to our UMRL approach in [16]. This additional decoder network labels the distortion at each location into three classes and also computes the confidence scores for each patch (which indicates how confident the decoder is about those obtained labels) as shown in the Figure 2. As a result, this can be interpreted as soft labelling of the distortion levels in the image. As shown in Figure 2 different patches in image are labelled with different confident scores. This makes the proposed QuDeC network more robust by blocking the errors in level of distortion while estimating the derained image. In addition, we propose a novel loss function to train the network. In summary, we add the following novelties to our previous UMRL approach in [16].

We formulate a joint task of estimating the level of distortion at each spatial location of the image and compute rain streak information by adding a new decoder to UMRL [16].

A new loss function is proposed for estimating the confidence scores for both residual maps and distortion qualitylabel maps.
Fig. 1(c), (f) and (i) present sample results from our QuDeC network, where one can clearly see that QuDeC is able to remove the noise artifacts and provides better results as compared to [2, 3] and [1]. To summarize, the following are our key contributions in this paper.

A novel method called QuDeC is proposed which jointly computes the level of distortion at each location and estimates rain streak information in multiscale fashion to obtain the derained image.

Confidence scores are computed for each task and judiciously combined to obtain improved results.

We run extensive experiments to show the performance of QuDeC against several recent stateof theart approaches on both synthetic and real rainy images. Furthermore, an ablation study is conducted to demonstrate the effectiveness of different parts of the proposed QuDeC network.
Rest of the paper is organized as follows. In Section II, we review a few related works. Details of the proposed method are given in Section III. Training details are provided in Section IV. Experimental results are presented in Section V, and finally, Section VI concludes the paper with a brief summary.
Ii Background and Related Work
An observed rainy image can be modeled as the superposition of a rain component (i.e. residual map) with a clean image as follows
(1) 
Given the goal of image deraining is to estimate . This can be done either by estimating the residual map and then subtracting it from the observed image or by directly estimating from . Various methods have been proposed in the literature for image deraining [11, 18, 13, 19, 9] including dictionary learningbased [20], Gaussian mixturemodel (GMM) based [21], and lowrank representation based [22] methods. In recent years, deep learningbased single image deraining methods have also been proposed in the literature. Fu et al. [15] proposed a convolutional neural network (CNN) based approach in which they directly learn the mapping relationship between rainy and clean image detail layers from data. Zhang et al. [23] proposed a generative adversarial network (GAN) based method for image deraining. Furthermore, to minimize the artifacts introduced by GANs and ensure better visual quality, a new refined loss function was also introduced in [23]. Fu et al. [2] presented an endtoend deep learning framework for removing rain from individual images using a deep detail network which directly reduces the mapping range from input to output. Zhang and Patel [1] proposed a densityaware multistream densely connected CNN for joint rain density estimation and deraining. Their network automatically determines the raindensity information and then efficiently removes the corresponding rainstreaks using the estimated raindensity label. Note the methods proposed in [2], and [1] showed the benefits of using multiscale networks for image deraining. Recently, Wang et al. [24] proposed a hierarchical approach based on estimating different frequency details of an image to get the derained image. The method proposed by Qian et al. [25] generates attentive maps using the recurrent neural networks, and then uses the features from different scales to compute the loss for removing the rain drops on glasses. Note that this method was specifically designed for removing rain drops from a glass rather than removing rain streaks from an image. [26, 27, 28] illustrated the importance of attention based methods in lowlevel vision tasks. In a recent work [28], Li et al. proposed a convolutional and recurrent neural networkbased method for single image deraining which makes use of the contextual information for rain removal. It was observed in [1], that some of the recent deep learningbased methods tend to under derain or over derain the image if the rain condition present in the rainy image is not properly considered during training.
Iii Proposed Method
Unlike many deep learningbased methods that directly estimate the derained image from the noisy observation, we take a different approach where we formulate the rain removal problem as the joint task of computing distortion level at each location and estimating the residual rain streak component (i.e residual map). We compute the distortion level for the patches of size , and classify them into three classes, , , and . Note that, the ground truth labels for these patches are computed using the NIQE measure [17]. In this way, the distortion level at each location in the rainy image is represented by the locationqualitylabelmap . Representing the distortion level at each location, and giving it as a prior information while reconstructing the image helps the network to learn the rain streak information which may vary spatially or affect the background differently. From Figure 3 we can clearly observe that different levels of rain streak affect the background differently. In some cases the same level of rain streaks have different affects on the background. For example, as can be seen from Figure 3, rain steaks produce different distortion in bright and heavily textured regions. On the other hand the distortion level is relatively high on the dark and homogeneous regions. To handle these regions differently, we propose a method that jointly estimate the level of distortion at each location and uses it as a prior information in estimating the rain streak information (i.e residual map). In contrast to other methods [1, 3] that use either rain streak binary masks or level of rain streak present in the image, we estimate the distortion caused by rain streaks at each location based on the NIQE measure which makes our method more general and applicable for synthetic and well as realworld images.
To perform the joint task of computing the location quality label maps (), and estimating the residual maps , QuDeC is constructed using one encoder and two decoders as shown in Figure 4. A rainy image is first passed through an encoder network to obtain intermediate features. These features are then further processed by the decoders. In particular, Decoder D2 estimates the location quality labels, which are then given as a prior information to Decoder D1 to estimate the residual maps ().
Iiia QuDeC Network
Given a rainy image , we estimate the residual map and its corresponding confidence maps at three different scales, (at the original input size), (at 0.5 scale of the input size), and (at 0.25 scale of the input size) in a multiscale fashion [16], as well as the locationqualitylabelmap along with the corresponding confidence scores () for label map .
The QuDeC network aims to judiciously combine the pixelwise rain residual information from lower scales, and patchwise distortion level in the quality label maps while estimating the final derained image. To achieve this we compute the confidence maps for the estimated residual maps and locationqualitylabelmaps. These confidence maps block the errors in the computation by giving low confidence values to erroneous regions in the computations of and . Additionally, these confidence scores enable QuDeC to perform soft labeling of the distortion levels in the locationqualitylabelmap , which makes QuDeC more robust to location quality label errors and effectively use this information as a prior while estimating . To compute and we start with UMRL [16] and add Decoder D2 to obtain the network architecture of QuDeC as shown in Figure 6.
IiiA1 Encoder and Decoder D1
Rain streaks are high frequency components and existing deraining methods either tend to remove high frequencies that are not rain streaks or do not remove the rain near high frequency components of the clean image like edges as shown in the Figure 5. To address this issue, one can use the information about the location in the image where the network might go wrong in estimating the residual value. This can be done by estimating a confidence value corresponding to the estimated residual value and guide the network to remove the artifacts, especially near the edges. For example, we can observe clearly from Figure 5 that the residual map and its corresponding confidence map are able to capture the regions where there is high probability of incorrect estimates. In the encoder and decoder D1 networks we estimate the residual values and their corresponding confidence maps at different scales (1.0(), 0.5() and 0.25()) of the input size. This information is then fed back to the subsequent layers so that the network can learn the residual value at each location, given the computed residual value and confidence value at lower scales.
The encoder and decoder D1 networks are similar to the the encoder and decoder networks of UMRL [16] where a convolutional block (ConvBlock as shown in Figure 7(a)) is used as the building block. The encoder network is described as follows,
ConvBlock(3,32)AvgPoolConvBlock(32,32)AvgPoolConvblock(32,32)AvgPoolConvBlock(32,32)AvgPool
where AvgPool is the average pooling layer, UpSample is the upsampling convolution layer, and ConvBlock indicates ConvBlock with input channels and output channels. The decoder D1 network is described as follows,
ConvBlock(32,32)UpSampleConvBlock(64,32)UpSampleConvBlock(67,32)UpSampleConvBlock(67,16)ConvBlock(16,16)Conv2d,
where ReCoN networks are added to Decoder D1 to estimate at different scales and their corresponding confidence maps . Given the feature maps as input to ReCoN network, RN (residual network) estimates the residual map and CN (confidence network) computes the corresponding confidence map as shown in Figure 7(d).
Feature maps at different scales such as and are given as inputs to the Residual Network (RN) to estimate the residual map at the corresponding scale as shown in Figure 7(d). RN consists of the following sequence of convolutional layers,
Convblock(64,32)Convblock(32,32)Convblock(32,3)
as shown in Figure 7(b). We use the estimated residual map and the feature maps as inputs to the Confidence map Network (CN) to compute the confidence measure at every pixel, which indicates how sure the network is about the residual value at each pixel. CN consists of the following sequence of convolutional layers,
Convblock(67,16)Convblock(16,16)Convblock(16,3)
as shown in Figure 7(c). Given the estimated residual map and the corresponding feature maps as inputs to the confidence map network, it estimates and . The element wise product of and is computed, and upsampled. This is used an input to the subsequent layers of the Decoder D1 network in QuDeC as shown in Figure 6 for . Given the output residual map and the feature maps of the final layer of the Decoder D1 network in QuDeC as input to CN, we get .
A Refinement Network (RFN) is used at the end of Decoder D1 to produce the derained image. It takes as the input and generates (i.e. derained image) as the output. The RFN consists of the following blocks
Conv2dConv2dtanh(),
where Conv2d represents 2D convolution using the kernel of size .
IiiA2 Decoder D2
Rain streaks have different distortion effects on different parts of the image.
As a result while reconstructing the clean image, we also estimate the prior information such as distortion caused by rain streak at every location. Not using this prior information may lead to inferior deraining performance. As can be seen from Figure 8 that DIDMDN [1] and UMRL [16] do not perform well as they lack prior distortion level information at each location of the image. We address this by introducing a decoder D2 in our method and formulate joint task of computing the distortion level at each location and estimating the residual rain streak information. Decoder D2 is similar to Decoder D1 with the following sequence of blocks,
ConvBlock(32,32)UpSampleConvBlock(64,32)UpSampleConvBlock(67,32)UpSampleConvBlock(67,16)ConvBlock(16,16)GlobalAveragePoolFullyConnectedLayer.
Decoder D2 takes feature maps obtained from the encoder as shown in Figure 6, to estimate the distortion levels in locationqualitylabelmap. Label Confidence Network (LCN) is used to compute the confidence scores corresponding to the locationqualitylabelmaps . Feature maps from the last layer of D2 and obtained locationqualitylabelmaps are fed to LCN to compute the confidence scores as shown in Figure 9. LCN is a sequence of three ConvBlocks followed by global average pool and fully connected layers as shown in Figure 9.
IiiB Loss for QuDeC Network
In image restoration tasks, maximumaposteriori method is often used to optimize the network parameters () as follows,
(2) 
where is the probability function and represents the QuDeC network, . Since QuDeC performs the joint task of estimating and , the above optimization can be updated as follows,
To find the optimal network parameters , needs to be maximized. For simplicity to solve this optimization problem, let us assume and are Gaussian distributions. As our goal is to minimize the joint errors between and the groundtruth clean image (), and the groundtruth labels () of rainy image , we denote the mean of distribution as and variance as , and mean of distribution as and variance as . Thus our objective becomes,
(3)  
Substituting, and into the above equation, we get
(4)  
where , and . In (4) variance () and can be inferred in two ways as explained in [29, 30]. (i) Epistemic uncertainty, which is explained as the model uncertainty given enough data to train, and (ii) Aleatoric uncertainty that captures noise inherent in the observations, which is data dependent. Epistemic uncertainty can be formulated as variational inference to compute variance. Aleatoric uncertainty can be formulated as MAP (maximumaposterior) or ML (maximumlikelihood) inference. Here, in our method we attempt to address the uncertainty caused in the outputs due to different properties of rain streaks like density, direction, and effect on background scene which are inherent in rainy images. Following the ML inference, in the above equation (4), we can view the terms and as the corresponding locationbased confidence maps or scores. These confidence maps indicate the erroneous regions in the estimates of residual maps or locationqualitylabelmaps by giving low values to those regions or pixels. These errors occur at regions or pixels where the variance is high. Note that in our method, the confidence map has no groundtruths. We compute these confidence scores using CN (Confidence Network) and LCN (Label Confidence Network) as explained in the earlier sections. Note that the values in the confidence map at every position will be in the range of . Additionally the L2norm in loss of in (4) for classifying the distortion levels in locationqualitymap , should be replaced with the cross entropy loss. The residual maps are estimated in a multiscale fashion. Thus loss is constructed as follows,
(5)  
where , and is a set of patches in the rainy image . Inspired by the importance of the perceptual loss in many image restoration tasks [31, 12], we also use it to further improve the visual quality of the derained images. Features from layer of a pretrained network VGG16 [32] are used to compute the perceptual loss [33, 34]. Let denote the features obtained using the VGG16 model [32], then the perceptual loss is defined as follows
(6) 
where is the number of channels of , is the height and is the width of feature maps. The overall loss used to train the QuDeC network is,
(7) 
Iv QuDeC Training
The QuDeC network is trained using the synthetic image datasets created by the authors of [1, 23, 3]. The dataset in [1] consists of 12000 images with different rain levels like low, medium and high. The dataset in [23] contains 700 training images. The dataset in [3] contains 1800 rainy images for training. We generate the ground truth locationqualitymaps, which indicate the distortion levels of the background in rainy images, using the NIQE scores. Figure 10 show the histogram of NIQE scores corresponding to the patches of rainy images from the DIDMDN training dataset [1]. We divide the patches into three levels of distortions using the thresholds and . These thresholds are chosen such that the patches are divided into three equal groups. The following pseudo code summarizes the procedure used for generating the locationqualitylabelmaps using the NIQE scores.
Note that the NIQE scores [17] are obtained using various features which may not be specifically useful for computing the distortion levels caused by the rain streaks. Furthermore, the thresholds and may vary from one dataset to another. In order to deal with these issues, we train a GenerationofLabels Network (GLN) to automatically generate the locationqualitylabelmaps from the input rainy images. The GLN network is trained using pairs of from the DIDMDN training images. We provide the network architecture and training details of GLN in the appendix. The estimated from GLN as well as the clean image along with the rainy image are used to train the proposed QuDeC network.
Iva Test Datasets
The QuDeC network is tested on synthetic and realworld rainy images published by the authors of [1, 23, 3]. The DIDMDN synthetic test dataset consists of two subsets Test1 and Test2 containing 1000 and 1200 images, respectively [1]. The Rain800 dataset shared by the authors of [23] contains 100 synthetic rainy images for testing. The Rain200H dataset contains 200 heavy rain synthetic images provided by the authors [3]. In addition to the synthetic images, we use 100 realworld rainy images provided by the authors of [1, 23, 3] to show the qualitative performance of QuDeC.
Dataset  Testset 









Test1  22.070.84  27.330.90  24.320.86  27.190.87  27.950.91  29.770.92  30.430.93  
DIDMDN[1]  Test2  19.730.83  25.630.88  22.260.84  25.650.88  26.080.90  26.670.92  26.720.92  
Rain800 [23]  Rain800  18.950.78  21.330.80  21.130.81  24.370.84  23.570.87  24.520.86  24.610.86  
JORDER [3]  Rain200H  21.830.81  23.240.83  24.020.86  25.970.90  23.430.86  26.380.92  26.740.93 
Values highlighted in  indicate the best performance, indicate the second best performance, indicate the third best performance among the deraining methods on the test datasets.
IvB Training Details
The rainycleandistortion label image pairs are used to train QuDeC using the loss . The Adam optimizer with the batch size of 1 is used to train the network. Learning rate is set equal to 0.0002 for the first 20 epochs and 0.0001 for the remaining epochs. During training initially , and are set equal to 0.1, 0.1 and 1.0, respectively, but when the mean of all values in the confidences maps and is greater than 0.8 then is set equal to 0.03. QuDeC is trained for 60 epochs. During inference given a rainy image , QuDeC estimates the residual map and its corresponding confidence map at three different scales, (at the original input size), , and (at 0.25 scale of the input size) along with the locationqualitylabelmap .
V Experimental Results
In this section, we evaluate the performance of our method on both synthetic and real images. PeakSignaltoNoise Ratio (PSNR) and Structural Similarity index (SSIM) [35] measures are used to compare the performance of different methods on synthetic images. We visually inspect the performance of different methods on real images, as we don’t have the ground truth clean images. The performance of the proposed QuDeC method is compared against the following recent stateoftheart methods:
(a) Fu et al.[15] CNN method (TIP’17),
(b) Joint Rain Detection and Removal (JORDER) [3] (CVPR’17),
(c) Deep detailed Network (DDN)[2] (CVPR’17),
(d) Densityaware Image Deraining method using a Multistream
Dense Network (DIDMDN) [1] (CVPR’18),
(e) REcurrent SE Context Aggregation Net (RESCAN) [28] (ECCV’18)
(f) Uncertainty guided Multiscale Residual Learning (UMRL) network [16] (CVPR’19).
Va Results on Synthetic Test Images
The proposed QuDeC method based on cycle spinning [16] is also compared against the stateoftheart algorithms qualitatively and quantitatively. Table I shows the quantitative performance of our method. As it can be seen from this table, our method clearly outperforms the present stateoftheart image deraining algorithms. On average, QuDeC outperformes the methods like RESCAN [28] and DIDMDN [1] by approximately dB. Furthermore, QuDeC outperformes the stateoftheart method, UMRL+cyclespinning [16], by 0.3dB on average. Figure 11 shows the qualitative performance of QuDeC against other methods on synthetic rainy images. The DDN method is over deraining on some images and on others it is slightly under deraining as shown in the second column of Figure 11. First four rows show under deraining performance corresponding to methods RESCAN [28] and DIDMDN [1] in the third and fourth columns of Figure 11 respectively where we can observe residue rain streaks in their outputs. The last three rows show over deraining of methods RESCAN [28] and DIDMDN [1] in the third and the fourth columns of Figure 11, respectively where we can observe the edges of the objects are blurred and objects like cables or wires and texture of birds feather have disappeared. Visually we can see in the fifth column of Figure 11 that our method produces images without any artifacts. From Figure 11 we can see that our method is able to

recover the texture on the wooden wall in the first row,

produce clear objects with sharp edges in the fourth and the fifth images of the fifth coulmn,

remove rain streaks by maintaining the underlying textures on trees and on the feathers in the third and the sixth images of the fifth coulmn.
VB Results on RealWorld Rainy Images
We conducted experiments on the realworld images provided by the authors of [23, 3, 1]. Results are shown in Figure 12. Similar to the results obtained on synthetic images, we observe the same trend of either over deraining or under deraining by the other methods. On the other hand, our method is able to remove rain streaks while preserving details of objects in the resultant output images. For example, the background texture on tree is sharp when compared to other methods. Also, rain streaks are removed properly without losing background information of walls and flowers in the second, third and fourth images of the fifth column in Figure 12. All of these experiments clearly show that our method can handle different levels of rain (low, medium and high) with different shapes and scales.
VC Ablation study
We study the performance of each block’s contribution to the QuDeC network by conducting extensive experiments on the test datasets. We start with EncoderDecoder D1 which are similar to UMRL [16] network. EncoderDecoder D1 is trained as explained in UMRL [16] method. Now we add Decoder D2, where we call the resultant network as QuDeC w/o LCN, and output of D2 is supervised with crossentropy loss using the groundtruth maps, . Finally, we add LCN to Decoder D2 to construct our proposed network QuDeC. Table II shows the contribution of each block on the performance of QuDeC network. Addition of Decoder D2, i.e formulating the joint task of computing distortion level at each location and estimating the residual rain streak component improves the overall performance by approximately 0.37dB. Furthermore, introducing LCN to Decoder D2 improves the performance of QuDeC by 0.2dB. Figure 13, visually shows the improvement in performance after adding each block in constructing the QuDeC network. For example, we can clearly observe from Figure 13, QuDeC is able to reconstruct clear skies and dark backgrounds in the second and third images of the fourth column. Also QuDeC is able to reconstruct sharp objects when compared to the outputs of network with only EncoderDecoder D1 in the first and last rows of Figure 13.
Dataset  Testset 


QuDeC  

DIDMDN [1]  Test1  29.420.91  30.170.92  30.430.93  
Test2  26.470.91  26.580.91  26.720.92  
Rain800[23]  Rain800  24.190.85  24.420.86  24.610.86  
JORDER [3]  Rain200H  26.170.91  26.560.92  26.740.93 
Vi Conclusion
We proposed a novel QuDeC to address the single image deraining problem. In our approach, we formulate rain removal problem as a joint task of computing distortion level at each location and estimating the residual rain streak information. We judiciously combine the residual rain streak outputs at lower scales and distortion level information at each location using the corresponding confidence maps. Extensive experiments showed that QuDeC is robust enough to handle different levels of rain content for both synthetic and realworld rainy images.
Appendix A GenerationofLabels Network (GLN) Architecture
GenerationofLabels Network (GLN) is used to generate the groundtruth locationqualitylabelmaps . We use residual blocks (ResBlock) as our building module for the GLN network. GLN network consists of a sequence of eight ResBlocks as shown in Figure 14. A ResBlock consists of a convolution layer, a convolution layer and a convolution layer with dilation factor of 2 as shown in Figure 14. Given , the GLN network process each patch of size at a time and outputs the distortion qualitylabel for the corresponding patch . GLN is trained on the rainy image patches and the corresponding labels, . is generated as explained in Section IV. GLN is trained for epochs using the cross entropy loss with the Adam optimizer and the learning rate is set equal to 0.0002.
Acknowledgment
This research is based upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA RD Contract No. 201919022600002. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government.
References
 [1] H. Zhang and V. M. Patel, “Densityaware single image deraining using a multistream dense network,” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. abs/1802.07412, 2018.
 [2] X. Fu, J. Huang, D. Zeng, X. Ding, Y. Liao, and J. Paisley, “Removing rain from single images via a deep detail network,” In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1715–1723, 2017.
 [3] W. Yang, R. T. Tan, J. Feng, J. Liu, and S. Yan, “Deep joint rain detection and removal from a single image,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1357–1366, 2017.
 [4] X. Zhang, H. Li, Y. Qi, W. K. Leow, and T. K. Ng, “Rain removal in video by combining temporal and chromatic properties,” In: IEEE International Conference on Multimedia and Expo, pp. 461–464, 2006.
 [5] K. Garg and S. K. Nayar, “Vision and rain,” In: International Journal of Computer Vision, vol. 75, pp. 3–27, 2007.
 [6] V. Santhaseelan and V. Asari, “Utilizing local phase information to remove rain from video,” In: International Journal of Computer Vision, vol. 112, 2015.
 [7] J. Liu, W. Yang, S. Yang, and Z. Guo, “Erase or fill? deep joint recurrent rain removal and reconstruction in videos,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [8] M. Li, Q. Xie, Q. Zhao, W. Wei, S. Gu, J. Tao, and D. Meng, “Video rain streak removal by multiscale convolutional sparse coding,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [9] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2736–2744, 2016.
 [10] L. Zhu, C. W. Fu, D. Lischinski, and P. A. Heng, “Joint bilayer optimization for singleimage rain streak removal,” In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2534, 2017.
 [11] L. W. Kang, C. W. Lin, and Y. H. Fu, “Automatic singleimagebased rain streaks removal via image decomposition,” IEEE Transactions on Image Processing, vol. 21, pp. 1742–1755, 2012.
 [12] H. Zhang and V. M. Patel, “Convolutional sparse and lowrank codingbased rain streak removal,” 7 IEEE Winter Conference In Applications of Computer Vision(WACV), pp. 1259–1267, 2017.
 [13] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” In:IEEE International Conference on Computer Vision(ICCV), pp. 3397–3405, 2013.
 [14] Y. Chen and C. Hsu, “A generalized lowrank appearance model for spatiotemporally correlated rain streaks,” in 2013 IEEE International Conference on Computer Vision, Dec 2013, pp. 1968–1975.
 [15] X. Fu, J. Huang, X. Ding, Y. Liao, and J. Paisley, “Clearing the skies a deep network architecture for singleimage rain removal,” IEEE Transactions on Image Processing, vol. 26, pp. 2944–2956, 2017.
 [16] R. Yasarla and V. M. Patel, “Uncertainty guided multiscale residual learningusing a cycle spinning cnn for single image deraining,” In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
 [17] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 209–212, 2012.
 [18] D.Y. Chen, C. C. Chen, and L. W. Kang, “Selflearning based image decomposition with applications to single image denoising,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1430 – 1455, 2014.
 [19] D. A. Huang, L. W. Kang, Y. C. F. Wang, and C. W. Lin, “Selflearning based image decomposition with applications to single image denoising,” IEEE Transactions on multimedia, vol. 16, pp. 83–93, 2014.
 [20] H. S. Bhadauria and M. L. Dewal, “Online dictionary learning for sparse coding,” In: International Conference on Machine Learning(ICML), pp. 689–696, 2009.
 [21] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19–41, 2000.
 [22] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by lowrank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, pp. 171–184, 2013.
 [23] H. Zhang and V. M. Patel, “Image deraining using a conditional generative adversarial network,” arXiv preprint arXiv:1701.05957, 2017.
 [24] Y. Wang, S. Liu, C. Chen, and B. Zeng, “A hierarchical approach for rain or snow removing in a single color image,” IEEE Transactions on Image Processing, vol. 26, pp. 3936–3950, 2017.
 [25] R. Qian, R. T. Tan, W. Yang, J. Su, and J. Liu, “Attentive generative adversarial network for raindrop removal from a single image,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [26] X. Wang, R. Girshick, A. Gupta, and K. He, “Nonlocal neural networks,” CVPR, 2018.
 [27] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “Attngan: Finegrained text to image generation with attentional generative adversarial networks,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [28] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Recurrent squeezeandexcitation context aggregation net for single image deraining,” In: European Conference on Computer Vision(ECCV), pp. 262–277, 2018.
 [29] A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” in Advances in Neural Information Processing Systems 30 (NIPS), 2017.
 [30] A. Kendall, Y. Gal, and R. Cipolla, “Multitask learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 [31] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” In European Conference on Computer Vision(ECCV), pp. 694–711, 2016.
 [32] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv:1409.1556, 2014.
 [33] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” 2016.
 [34] H. Zhang and K. Dana, “Multistyle generative network for realtime transfer,” arXiv preprint arXiv:1703.06953, 2017.
 [35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, April 2004.