Ug Track 2: A Collective Benchmark Effort for Evaluating and Advancing Image Understanding in Poor Visibility Environments
The UG challenge in IEEE CVPR 2019 aims to evoke a comprehensive discussion and exploration about how low-level vision techniques can benefit the high-level automatic visual recognition in various scenarios. In its second track, we focus on object or face detection in poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. While existing enhancement methods are empirically expected to help the high-level end task, that is observed to not always be the case in practice. To provide a more thorough examination and fair comparison, we introduce three benchmark sets collected in real-world hazy, rainy, and low-light conditions, respectively, with annotate objects/faces annotated. To our best knowledge, this is the first and currently largest effort of its kind. Baseline results by cascading existing enhancement and detection models are reported, indicating the highly challenging nature of our new data as well as the large room for further technical innovations. We expect a large participation from the broad research community to address these challenges together.
Background. Many emerging applications, such as unmanned aerial vehicles (UAVs), autonomous/assisted driving, search and rescue robots, environment monitoring, security surveillance, transportation and inspection, hinge on computer vision-based sensing and understanding of outdoor environments . Such systems concern a wide range of target tasks such as detection, recognition, segmentation, tracking, and parsing. However, the performances of visual sensing and understanding algorithms will be largely jeopardized by various challenging conditions in unconstrained and dynamic degraded environments, e.g., moving platforms, bad weathers, and poor illumination. They can cause severe visual input degradations such as reduced contrasts, detail occlusions, abnormal illumination, fainted surfaces and color shift.
While most current vision systems are designed to perform in âclearâ environments, i.e., where subjects are well observable without (significant) attenuation or alteration, a dependable vision system must reckon with the entire spectrum of complex unconstrained outdoor environments. Taking autonomous driving for example: the industry players have been tackling the challenges posed by inclement weathers; however, a heavy rain, haze or snow will still obscure the vision of on-board cameras and create confusing reflections and glare, leaving the state-of-the-art self-driving cars in struggle (see a Forbes article). Another illustrative example can be found in city surveillance: even the commercialized cameras adopted by governments appear fragile in challenging weather conditions (see a news article). Therefore, it is highly desirable to study to what extent, and in what sense, such challenging visual conditions can be coped with, for the goal of achieving robust visual sensing and understanding in the wild, that benefit security/safety, autonomous driving, robotics, and an even broader range of signal and image processing applications.
1.1 Challenges and Bottlenecks
Despite the blooming research on removing or alleviating the impacts of those challenges, such as dehazing [9, 90, 47, 33], deraining [15, 73, 58, 126, 11, 24, 22, 119] and illumination enhancement [54, 92, 117, 72, 97], the current solutions see significant gaps from addressing the above-mentioned pressing real-world challenges. A collective effort for identifying and resolving those bottlenecks that they commonly face has also been absent.
One primary challenge arises from the Data aspect. Those challenging visual conditions usually give rise to nonlinear and data-dependent degradations that will be much more complicated than the well-studied noise or motion blur. The state-of-the-art deep learning methods are typically hungry for training data. The usage of synthetic training data has been prevailing, but may inevitably lead to domain shifts . Fortunately, those degradations often follow some parameterized physical models a priori. That will naturally motivate a combination of model-based and data-driven approaches. In addition to training, the lack of real world test sets (and consequently, the usage of potentially oversimplified synthetic sets) have limited the practical scope of the developed algorithms.
The other main challenge is found in the Goal side. Most restoration or enhancement methods cast the handling of those challenging conditions as a post-processing step of signal restoration or enhancement after sensing, and then feed the restored data for visual understanding. The performance of high-level visual understanding tasks will thus largely depend on the quality of restoration or enhancement. Yet it remains questionable whether restoration-based approaches would actually boost the visual understanding performance, as the restoration/enhancement step is not optimized towards the target task and may bring in misleading information and artifacts too. For example, a recent line of researches [47, 124, 61, 64, 65, 16, 98, 105, 100, 94, 83] discuss on the intrinsic interplay relationship of low-level vision and high-level recognition/detection tasks, showing that their goals are not always aligned.
1.2 Overview of UG Track 2
UG Challenge Track 2 aims to evaluate and advance object detection algorithmsâ robustness in specific poor-visibility environmental situations including challenging weather and lighting conditions. We structure Challenge 2 into three sub-challenges. Each challenge features a different poor-visibility outdoor condition, and diverse training protocols (paired versus unpaired images, annotated versus unannotated, etc.). For each sub-challenge, we collect a new benchmark dataset captured in realistic poor-visibility environments with real image artifacts caused by rain, haze, insufficiency of light are observed.
Sub-Challenge 2.1: (Semi-)Supervised Object Detection in the Haze. We provide 4,322 real-world hazy images collected from traffic surveillance, all labeled with object bounding boxes and categories (car, bus, bicycle, motorcycle, pedestrian), as the main training and/or validation sets. We also release another set of 4,807 unannotated real-world hazy images collected from the same sources (and containing the same classes of traffic objects, though not annotated), which might be used at the participantsâ discretization. There will be a held-out test set of 3,000 real-world hazy images, with the same classes of objected annotated.
Sub-Challenge 2.2: (Semi-)Supervised Face Detection in the Low Light Condition. We provide 6,000 real-world low light images captured during the nighttime, at teaching buildings, streets, bridges, overpasses, parks etc., all labeled with bounding boxes for of human face, as the main training and/or validation sets. We also provide 10,400 unlabeled low-light images collected from the same setting. Additionally, we provided a unique set of 1,022 paired low-light/normal-light images captured in controllable real lighting conditions (but unnecessarily containing faces), which can be used as parts of the training data at the participantsâ discretization. There will be a held-out test set of 4,000 low-light images, with human face bounding boxes annotated.
Sub-Challenge 2.3: Zero-Shot Object Detection with Raindrop Occlusions. We provide 1,010 pairs of raindrop images and corresponding clean ground-truths (collected through physical simulations), as the training and/or validation sets. Different from Sub-Challenges 2.1 and 2.2, no semantic annotation will be available on training/validation images. A held-out test set of 2,496 real-world raindrop images are collected from high-resolution driving videos, in diverse real locations and scenes during multiple drives. We label bounding boxes for selected traffic object categories: car, person, bus, bicycle, and motorcycle.
The ranking criteria will be the Mean average precision (mAP) on each held-out test set, with default Interception-of-Union (IoU) threshold as 0.5. If the ratio of the intersection of a detected region with an annotated face region is greater than 0.5, a score of 1 is assigned to the detected region, and 0 otherwise. When mAPs with IoU as 0.5 are equal, the mAPs with higher IoUs (0.6, 0.7, 0.8) will be compared sequentially.
2 Related Work
Most datasets used for image enhancement/processing mainly targets at evaluating the quantitative (PSNR, SSIM, etc.) or qualitative (visual subjective quality) differences of enhanced images w.r.t. the ground truths. Some earlier classical datasets include Set5 , Set14 , and LIVE . The numbers of images are small with only limited. Subsequent datasets come with more diverse scene content, such as BSD500  and Urban100 . The popularity of deep learning methods has increased demand for training and testing data.Therefore, many newer and larger datasets are presented for image and video restoration, such as DIV2K  and MANGA109  for image super-resolution, PolyU  and Darmstadt  for denoising, RawInDark  and LOL dataset  for low light enhancement, HazeRD , OHAZE  and IHAZE  for dehazing, rain100L/H  and rain800  for rain streak removal, and RAINDROP  for raindrop removal. However, these datasets provide no integration with subsequent high-level tasks.
A few works [31, 95, 136] make preliminary attempts for event/action understanding, video summarization, or face recognition in unconstrained and potentially degraded environments. The following datasets are collected by aerial vehicles, including VIRAT Video Dataset  for event recognition, UAV123  for UAV tracking, and a multi-purpose dataset . In , an unconstrained Face Detection Dataset (UFDD) is proposed for face detection in adverse condition including weather-based degradations, motion blur, focus blur and several others, containing a total of 6,425 images with 10,897 face-annotations. However, few works specifically consider the impacts of image enhancement and object detection/recognition jointly. Prior to this UG effort, a number of latest works have taken the first stabs. A large scale hazy image dataset and a comprehensive study – REalistic Single Image DEhazing (RESIDE)  – including paired synthetic data and unpaired real data is proposed to thoroughly examine visual reconstruction and vision recognition in hazy images. In , an Exclusively Dark (ExDARK) dataset is proposed with a collection of 7,363 images captured from very low-light environments with 12 object classes annotated on both image class level and local object bounding boxes. In , the authors present a new large-scale benchmark called RESIDE and a comprehensive study and evaluation of existing single image deraining algorithms, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Those datasets and studies shed new light on the comparisons and limitations of state-of-the-art algorithms, and suggest promising future directions. In this work, we follow the footsteps of predecessors to advance the fields by proposing new benchmarks.
2.2 Poor Visibility Enhancement
There are numerous algorithms aiming to enhance visibility of the degraded imagery, such as image and video denoising/inpainting [109, 51, 86, 110, 66], deblurring [116, 91], super-resolution [112, 111, 62, 63] and interpolation . Here we focus on dehazing, low-light condition, and deraining, as in the UG Track 2 scope.
Dehazing. Dehazing methods proposed in an early stage rely on the exploitation of natural image priors and depth statistics, e.g. locally constant constraints and decorrelation of the transmission , dark channel prior , color attenuation prior , nonlocal prior  et al. Lately, Convolutional Neural Network (CNN)-based methods bring in the new prosperity for dehazing. Several methods [9, 90] rely on various CNNs to learn the transmission fully from data. Beyond estimating the haze related variables separately, successive works make their efforts to estimate them in a unified way. In [45, 79], the authors use a factorial Markov random field that integrates the estimation of transmission and atmosphere light. some researchers focus on the more challenging night-time dehazing problem [57, 128]. [133, 77] tries to utilize Retinex theory to approximate the spectral properties of object surfaces by the ratio of the reflected light. AOD-Net [47, 48] re-formulates the haze generation model to realize one-step estimation of the inverse recovery and consider the joint interplay effect of dehazing and object detection. The idea is further applied to video dehazing by extending the model into a light-weighed video hazing framework . In another recent work , the semantic prior is also injected to facilitate video dehazing.
Low Light Enhancement. All low-light enhancement methods can be categorized into three ways: hand-crafted methods, Retinex theory-based methods and data-driven methods. Hand-crafted methods explore and apply various image priors to single image low-light enhancement, e.g. histogram equalization [81, 1], Some methods [53, 130] regard the inverted low-light images as hazy images, and enhance the visibility by applying dehazing. Retinex theory-based method  is designed to regard the signal components, reflectance and illumination, differently to simultaneously suppress the noises and preserve high-frequency details. Different ways [39, 40] are used to decompose the signal and diverse priors [107, 25, 32, 23] are applied to realize better light adjustment and noise suppression. Li et al.  further extends the traditional Retinex model to a robust one with an explicit noise term, and made the first attempt to estimate a noise map out of that model via an alternating direction minimization algorithm. A successive work  develops a fast sequential algorithm. Learning based low-light image enhancement methods [117, 72, 97] have also been studied. In these works, low-light images used for training is synthesized by applying random gamma transformation on natural normal light images. Some recent works aim to build paired training data from real scenes. In , Chen et al. introduced a dataset See-in-the-Dark (SID) of short-exposure low-light raw images with corresponding long-exposure reference raw images. Cai et al.  built a dataset of under/over-contrast and normal-contrast encoded image pairs, in which the reference normal-contrast images are generated by Multi-Exposure image Fusion (MEF) or High Dynamic Range (HDR) algorithm.
Deraining. Single image deraining is a highly ill-posed problem. To address it, many models and priors are used to perform signal separation and texture classification. These models include sparse coding , generalized low rank model , nonlocal mean filter , discriminative sparse coding , Gaussian mixture model , rain direction prior , transformed low rank model . The presence of deep learning has promoted the development of single image deraining. In [24, 22], deep networks take the image detail layer as their input. Yang et al.  propose a deep joint rain detection and removal method to remove heavy rain streaks and accumulation. In , a novel density-aware multi-stream densely connected CNN is proposed for joint rain density estimation and removal. Video deraining can additionally make use of the temporal context and motion information. The early works formulate rain streaks with more flexible and intrinsic characteristics, including rain modeling [29, 27, 30, 28, 131, 68, 4, 93, 8, 7, 15, 38]. The presence of learning-based method [13, 104, 103, 89, 55, 114, 44], with improved modeling capacity, brings new progress. The emergence of deep learning-based methods push performance of video deraining to a new level. Chen et al.  integrate superpixel segmentation alignment, and consistency among these segments and CNN-based detail compensation network into a unified framework.  presented a recurrent network integrating rain degradation classification, deraining and background reconstruction.
2.3 Visual Recognition under Adverse Conditions
A real-world visual detection/recognition system needs to handle a complex mixture of both low-quality and high-quality images. It is commonly observed that, mild degradations, e.g. small noises, scaling with small factors, lead to almost no change of recognition performance. However, once the degradation level passes a certain threshold, there will be an unneglected or even very significant effect on system performance. In , Torralba et al. showed that, there will be a significant performance drop in object and scene recognition when the image resolution is reduced to 3232 pixels. In , the boundary where the face recognition performance is largely degraded is 1616 pixels. Karahan et al.  found the threshold of standard deviation of Gaussian noise which will cause a rapid decline range from 10 to 20. In , more impacts of contrast, brightness, sharpness, and out-of-focus on face recognition are analyzed.
In the era of deep learning, some methods [21, 118, 17] attempt to first enhance the input image and then forward the output into a classifier. However, this separate consideration of enhancement may not benefit the successive recognition task, because the first stage may incur artifacts which will damage the second stage recognition. In [137, 35] class-specific features is extracted as a prior to incorporate into the restoration model. In , Zhang et al. developed a joint image restoration and recognition method based on sparse representation prior, which constrains the identity of the test image and guide better reconstruction and recognition.  considers dehazing and object detection jointly. These two stage joint optimization methods achieve better performance than previous one-stage methods. [108, 61] examine the joint optimization pipeline for low-resolution recognition. [65, 64] discuss and the impact of denoising for semantic segmentation and advocates their mutual optimization. Lately,  thoroughly examines the algorithmic impact of enhancement algorithms for both visual quality and automatic object recognition, on a real image set with highly compound degradations. In our work, we take a further step to consider the joint enhancement and detection in bad weather environment. Three large-scale datasets are collected to inspire new ideas and develop novel methods in the related fields.
3 Introduction of UG Track 2 Datasets
3.1 (Semi-)Supervised Object Detection in the Haze
In Sub-challenge 2.1, we use the 4,322 annotated real-world hazy images of the RESIDE RTTS set  as the training and/or validation sets (the split is up to the participants). Five categories of objects (car, bus, bicycle, motorcycle, pedestrian) are labeled with tight bounding boxes. We provide another 4,807 unannotated real-world hazy images collected from the same traffic camera sources, for the possible usage of semi-supervised training too.
The participants can optionally use pre-trained models (e.g., on ImageNet or COCO), or external data. But if any pre-trained model, self-synthesized or self-collected data are used, that must be explicitly mentioned in their submissions, and the participants must ensure all their used data to be public available at the time of challenge submission, for reproduciblity purposes.
There will be a held-out test set of 2,987 real-world hazy images, collected from the same sources, with the same classes of objected annotated. Fig. 1 shows the basic statistics of the RTTS set and the hold set. The hold out test set has a similar distribution of number of bounding boxes per image, bounding box size and relative scale of bounding boxes to input images compared to the RTTS set, but has relatively large image size. Samples from RTTS set and held-out set can be found in Fig. 2 and Fig. 3.
3.2 (Semi-)Supervised Face Detection in the Low Light Condition
In Sub-challenge 2.2, we use our self-curated DARK FACE dataset. It is composed of 10,000 images (6,000 for training and validation, and 4,000 for testing) taken in under-exposure condition where human faces are annotated by human with bounding boxes; and 9,000 images taken with the same equipment in the similar environment without human annotations. Additionally, we provided a unique set of 789 paired low-light / normal-light images captured in controllable real lighting conditions (but unnecessarily containing faces), which can be optionally used as parts of the training data.
The training and evaluation set includes 43,849 annotated faces and the held-out test set includes 32,571 annotated faces. Table 3 presents a summary of the dataset and Fig. 4 presents example images.
Collection and annotation. This collection consists of images recorded from Digital Single Lens Reflexes, specifically Sony and E-mount camera with different capturing parameters on several busy streets around Beijing, where faces of various scales and poses are captured. The images in this collection are open source content tagged with a Creative Commons license. The resolution of these images is 1080 720 (down-sampled from 6K 4K). After filtering out those without sufficient information (lacking faces, too dark to see anything, etc.), we select 10,000 image for human annotation. The bounding boxes is labeled for all the recognizable faces in our collection. We make the bounding tightly around the forehead, chin, and cheek, using the LabelImg Toolbox111https://github.com/tzutalin/labelImg. If a face is occluded, we only label the exposed skin region. If most of a face is occluded, we ignore it. For this collection, we observed commonly seen degradations in addition to under exposure, such as intensive noise. Each annotated image contains 1-34 human faces. The face number and resolution range distribution are displayed in Fig 6. Each annotated image contains 1-34 human faces. The face resolutions in these images range from 12 to 335296. The resolution of most faces in our dataset is below 300 pixel and the the face number mostly falls into the range .
3.3 Zero-Shot Object Detection with Raindrop Occlusions
In Sub-challenge 2.3, we release 1,010 pairs of realistic raindrop images and corresponding clean ground-truths, collected through the physical simulation process described in , as the training and/or validation sets. Our held-out test set contains 2,495 real rainy images from high-resolution driving videos. As shown in Figure 7, all images are contaminated by raindrops on camera lens. They were captured in diverse real traffic locations and scenes during multiple drives. We labeled bounding boxes for selected traffic objects: car, person, bus, bicycle, and motorcycle, that commonly appear on the roads of all images. Most images are of 1920 990 resolution, with a few exceptions of 4023 3024 resolution.
The participants can optionally use pre-trained models (e.g., using ImageNet or COCO) or external data. But if any pre-trained model, self-synthesized or self-collected data are used, that must be explicitly mentioned in their submissions, and the participants must ensure their used data to be public available at the time of challenge submission, for reproduciblity purposes.
4 Baseline Results and Analysis
For all three sub-challenges, we report results by cascading off-the-shelf enhancement methods and popular pre-trained detectors. There has been no joint training performed, hence the baseline numbers are in no way very competitive. We expect to see much performance boosts over the baselines from the competition participants.
4.1 Sub-challenge 2.1 Baseline Results
4.1.1 Baseline Composition
We test three state-of-the-art object detectors: (1) Mask R-CNN222https://github.com/matterport/Mask_RCNN ; (2) RetinaNet333https://github.com/fizyr/keras-retinanet ; and (3) YOLO-V3444https://github.com/ayooshkathuria/pytorch-yolo-v3 ; (4) Feature Pyramid Network555https://github.com/DetectionTeamUCAS/FPN_Tensorflow (FPN) .
We also try three state-of-the-art dehazing approaches: (a) AOD-Net666https://github.com/Boyiliee/AOD-Net ; (b) Multi-Scale Convolutional Neural Network (MSCNN)777https://github.com/rwenqi/Multi-scale-CNN-Dehazing ; (c) Densely Connected Pyramid Dehazing Network (DCPDN)888https://github.com/hezhangsprinter/DCPDN . All dehazing models adopt officially released versions.
|mAP||hazy||AOD-Net ||DCPDN ||MSCNN |
|Mask R-CNN ||Person||67.52||66.71||67.18||69.23|
RetinaNet, Mask R-CNN and YOLO-V3 are pretrained on Microsoft COCO dataset.
FPN using ResNet-101 backbone is pretrained on the PASCAL Visual Object Classes (VOC) dataset.
4.1.2 Results and Analysis
Fig. 8 shows the object detection performance on the original hazy images of RESIDE RTTS set using Mask R-CNN. The detectrons is pretrained on Microsoft COCO, a large-scale object detection, segmentation, and captioning dataset. A more detailed detection performance on the five objects can be found in Table 5.
Results show that without preprocessing or dehazing, the object detectors pretrained on clean images fail to predict a large amount of objects in the hazy image. The overall detection performance has a mAP of only 41.83% using Mask R-CNN and 42.54% using YOLO-V3. Among all the five object categories, person has the highest detection AP, while bus has the lowest AP.
We also compare the validation and test set performance in Table. 5. One possible reason for the performance gap between validation and test sets is that the bounding box size of the latter is smaller compared to the former, as showed in Fig. 1 as well as visualized in Fig. 9.
Effect of Dehazing
We further evaluate the current state-of-the-art dehaze approaches on hazy dataset, with pre-trained detectors subsequently applied without tuning or adaptation. Fig. 9 shows two examples that dehazing algorithms can imporove not only the visual quality of the images but also the detection accuracies. More detection results are included in Table. 5. Detection mAPs of dehazed images using DCPDN and MSCNN approaches are 1% higher on average compared to on hazy images.
Eventually, the choice of pre-trained detectors seem to also matter here: Mask R-CNN outperforms the other two detectors on both validation and test sets, and both before and after apply dehazing.
4.2 Sub-challenge 2.2 Baseline Results
4.2.1 Baseline Composition
We test five state-of-the-art deep face detectors: (1) Dual Shot Face Detector (DSFD) 999https://github.com/TencentYoutuResearch/FaceDetection-DSFD; (2) Pyramidbox 101010https://github.com/EricZgw/PyramidBox; (3) Single Shot Scale-Invariant Face Detector (SFD) 111111https://github.com/sfzhang15/SFD; (4) Single Stage Headless Face Detector (SSH) 121212https://github.com/mahyarnajibi/SSH.git; (5) Faster RCNN 131313https://github.com/playerkk/face-py-faster-rcnn.
We also include seven state-of-the-art algorithms for light/contrast enhancement: (a) Bio-Inspired Multi-Exposure Fusion (BIMEF) 141414https://github.com/baidut/BIMEF; (b) Dong ; (c) Low-light IMage Enhancement (LIME) 151515https://sites.google.com/view/xjguo/lime; (d) MF ; (e) Multi-Scale Retinex (MSR) ; (f) Joint Enhancement and Denoising (JED) 161616https://github.com/tonghelen/JED-Method; (g) RetinexNet 171717https://github.com/weichen582/RetinexNet.
4.2.2 Results and Analysis
Fig. 12 (a) depicts the precision-recall curves of the original face detection methods, without enhancement. The baseline methods are trained on WIDER FACE, a large dataset with large scale variations in diversified factors and conditions. The results demonstrate that without proper pre-processing or adaptation, the state-of-the-art methods cannot achieve desirable detection rates on DARK FACE. Result examples are illustrated in Fig. 10. The evidences may imply that previous face datasets, though covering variations in poses, appearances, scale, et al., are still insufficient to capture the facial features in the highly under-exposure condition.
Effect of Enhancement
We next use the enhancement algorithms to pre-process the annotated dataset and then apply the above two pre-trained face detection methods to the processed data. While the visual quality of the enhanced images is better, as expected, the detectors do perform better. As shown in Fig. 12 (b) and (c), in most instances, the precision of the detectors notably increased compared to that of the data without enhancement. Except for JED, various existing enhancement methods seem to result in similar improvements here. JED leads to a performance drop. Despite being encouraging to see, the overall performance of the detectors still drops a lot compared to normal-light datasets. The simple cascade of low light enhancement and face detectors leave much improvement room open.
Effect of Face Scale and Light Condition
We analyze the performance of the face detectors on subsets of different levels of difficulty. We define difficulty of the sets based on two criteria: face scale and facial light condition. Face scale is divided into three levels based on the average size of the bounding boxes in an image: small face (100 pixel), medium face (100-300 pixel), large face (300pixel). Facial illumination is also divided into three levels based on the average pixel value of the bounding boxes: low illumination, medium illumination, high illumination. We present the results in Fig. 13 and 14. Clearly, the performance degrades for small faces and those with low illumination. DSFD achieves the best performance, with average precision rates greater than 45, while lower than 55. The results suggest that current face detectors are limited when face scale and light condition change.
4.3 Sub-challenge 2.3 Baseline Results
4.3.1 Baseline Composition
We employ five state-of-the-art deep learning-based deraining algorithms: (a) JOint Rain DEtection and Removal181818http://www.icst.pku.edu.cn/struct/Projects/joint_rain_removal.html (JORDER) ; (b) Deep Detail Network191919https://github.com/XMU-smartdsp/Removing_Rain (DDN) ; (c) Conditional Generative Adversarial Network202020https://github.com/TrinhQuocNguyen/Edited_Original_IDCGAN (CGAN) ; (d) Density-aware Image De-raining method using a Multistream Dense Network212121https://github.com/hezhangsprinter/DID-MDN (DID-MDN) ; and (e) DeRaindrop222222https://github.com/rui1996/DeRaindrop . For fair comparisons, we re-trained all deraining algorithms using the same provided training set.
Results and Analysis
Table 6 shows mAP results comparisons for different deraining algorithms using different detection models on the held-out test set. Unfortunatly, we find that almost all existing deraining algorithms deteriorate the objects detection performance compared to directly using the rainy images for YOLO-V3, SSD-512, and RetinaNet (The only exception is the detection results by FRCNN). This could be due to those deraining algorithms were not trained towards the end goal of object detection, they are unnecessary to help this goal, and the deraining process itself might have lost discriminative, semantically meaningful true information, and thus hamper the detection performance. In addition, Table 6 shows that YOLO-V3 achieves the best detection performance, independently of deraining algorithms applied. We attribute this to the small objects in relative long distance from the camera in the test set since YOLO-V3 is known to improve small object detection based on multi-scale prediction structure.
|Rainy||JORDER ||DDN ||CGAN ||DID-MDN ||DeRaindrop |
-  (2007-05) A dynamic histogram equalization for image contrast enhancement. IEEE Transactions on Consumer Electronics 53 (2), pp. 593–600. External Links: Cited by: §2.2.
-  (2018-04) I-HAZE: a dehazing benchmark with real hazy and haze-free indoor images. arXiv e-prints, pp. arXiv:1804.05091. External Links: Cited by: §2.1.
-  (2018-04) O-HAZE: a dehazing benchmark with real hazy and haze-free outdoor images. arXiv e-prints, pp. arXiv:1804.05101. External Links: Cited by: §2.1.
-  (2010) Analysis of rain and snow in frequency space. International Journal of Computer Vision 86 (2-3), pp. 256–274. Cited by: §2.2.
-  (2016-06) Non-local image dehazing. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 1674–1682. External Links: Cited by: §2.2.
-  (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proc. of the British Machine Vision Conf., pp. 135.1–135.10. External Links: Cited by: §2.1.
-  (2011) Rain or snow detection in image sequences through use of a histogram of orientation of streaks. Int’l Journal of Computer Vision 93 (3), pp. 348–367. Cited by: §2.2.
-  (2008) Using the shape characteristics of rain to identify and remove rain from video. In Joint IAPR International Workshops on SPR and SSPR, pp. 451–458. Cited by: §2.2.
-  (2016-11) DehazeNet: an end-to-end system for single image haze removal. IEEE Trans. on Image Processing 25 (11), pp. 5187–5198. External Links: Cited by: §1.1, §2.2.
-  (2018-04) Learning a deep single image contrast enhancer from multi-exposure images. IEEE Trans. on Image Processing 27 (4), pp. 2049–2062. External Links: Cited by: §2.2.
-  (2017-10) Transformed low-rank model for line pattern noise removal. In Proc. IEEE Int’l Conf. Computer Vision, Cited by: §1.1, §2.2.
-  (2018-06) Learning to see in the dark. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 3291–3300. External Links: Cited by: §2.1, §2.2.
-  (2014-03) A rain pixel recovery algorithm for videos with highly dynamic scenes. IEEE Trans. on Image Processing 23 (3), pp. 1097–1104. External Links: Cited by: §2.2.
-  (2018-06) Robust video content alignment and compensation for rain removal in a cnn framework. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Cited by: §2.2.
-  (2013) A generalized low-rank appearance model for spatio-temporally correlated rain streaks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1968–1975. Cited by: §1.1, §2.2.
-  (2017) Robust emotion recognition from low quality and low bit rate video: a deep learning approach. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 65–70. Cited by: §1.1.
-  (2015-12) Compression artifacts reduction by a deep convolutional network. In Proc. IEEE Int’l Conf. Computer Vision, Vol. , pp. 576–584. External Links: Cited by: §2.3.
-  (2011) Fast efficient algorithm for enhancement of low lighting video. In Proc. IEEE Int’l Conf. Multimedia and Expo, pp. 1–6. Cited by: §4.2.1.
-  (2012-05) The impact of image quality on the performance of face recognition. In Symposium on Information Theory in the Benelux and Joint WIC/IEEE Symposium on Information Theory and Signal Processing in the Benelux, Netherlands, pp. 141–148 (English). External Links: Cited by: §2.3.
-  (2008-08) Single image dehazing. ACM Trans. Graph. 27 (3), pp. 72:1–72:9. External Links: Cited by: §2.2.
-  (2006) Removing camera shake from a single photograph. In ACM Trans. Graphics, pp. 787–794. Cited by: §2.3.
-  (2017-06) Clearing the skies: a deep network architecture for single-image rain removal. IEEE Trans. on Image Processing 26 (6), pp. 2944–2956. External Links: Cited by: §1.1, §2.2.
-  (2016-06) A weighted variational model for simultaneous reflectance and illumination estimation. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 2782–2790. External Links: Cited by: §2.2.
-  (2017-07) Removing rain from single images via a deep detail network. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Cited by: §1.1, §2.2, §4.3.1, Table 6.
-  (2016) A fusion-based enhancing method for weakly illuminated images. Signal Processing 129, pp. 82 – 96. External Links: Cited by: §2.2, §4.2.1.
-  (2016) Manga109 dataset and creation of metadata. In Proc. of Int’l Workshop on coMics ANalysis, Processing and Understanding, pp. 1–5. Cited by: §2.1.
-  (2004) Detection and removal of rain from videos. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. 1, pp. I–528. Cited by: §2.2.
-  (2005) When does a camera see rain?. In Proc. IEEE Int’l Conf. Computer Vision, Vol. 2, pp. 1067–1074. Cited by: §2.2.
-  (2006) Photorealistic rendering of rain streaks. In ACM Trans. Graphics, Vol. 25, pp. 996–1002. Cited by: §2.2.
-  (2007) Vision and rain. Int’l Journal of Computer Vision 75 (1), pp. 3–27. Cited by: §2.2.
-  (2011-02) SCface — surveillance cameras face database. Multimedia Tools Appl. 51 (3), pp. 863–879. External Links: Cited by: §2.1.
-  (2017-02) LIME: low-light image enhancement via illumination map estimation. IEEE Trans. on Image Processing 26 (2), pp. 982–993. External Links: Cited by: §2.2, §4.2.1.
-  (2011-12) Single image haze removal using dark channel prior. IEEE Trans. on Pattern Analysis and Machine Intelligence 33 (12), pp. 2341–2353. External Links: Cited by: §1.1, §2.2.
-  (2017) Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 2961–2969. Cited by: §4.1.1, Table 5.
-  (2008-06) Simultaneous super-resolution and feature extraction for recognition of low-resolution faces. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 1–8. External Links: Cited by: §2.3.
-  (2015-06) Single image super-resolution from transformed self-exemplars. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 5197–5206. External Links: Cited by: §2.1.
-  (2017) Face detection with the faster r-cnn. IEEE Int’l Conf. on Automatic Face and Gesture Recognition, pp. 650–657. Cited by: §4.2.1.
-  (2017-07) A novel tensor-based video rain streaks removal approach via utilizing discriminatively intrinsic priors. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Cited by: §2.2.
-  (1997-03) Properties and performance of a center/surround retinex. IEEE Trans. on Image Processing 6 (3), pp. 451–462. External Links: Cited by: §2.2.
-  (1997-07) A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. on Image Processing 6 (7), pp. 965–976. External Links: Cited by: §2.2, §4.2.1.
-  (2012-04) Automatic single-image-based rain streaks removal via image decomposition. IEEE Trans. on Image Processing 21 (4), pp. 1742–1755. External Links: Cited by: §2.2.
-  (2016-Sep.) How image degradations affect deep cnn-based face recognition?. In Int’l Conf. of the Biometrics Special Interest Group, Vol. , pp. 1–5. External Links: Cited by: §2.3.
-  (2013-Sept) Single-image deraining using an adaptive nonlocal means filter. In Proc. IEEE Int’l Conf. Image Processing, pp. 914–917. External Links: Cited by: §2.2.
-  (2015-Sept) Video deraining and desnowing using temporal correlation and low-rank matrix completion. IEEE Trans. on Image Processing 24 (9), pp. 2658–2670. External Links: Cited by: §2.2.
-  (2009-Sep.) Factorizing scene albedo and depth from a single foggy image. In Proc. IEEE Int’l Conf. Computer Vision, Vol. , pp. 1701–1708. External Links: Cited by: §2.2.
-  (1977) The retinex theory of color vision. Sci. Amer, pp. 108–128. Cited by: §2.2.
-  (2017-10) AOD-net: all-in-one dehazing network. In Proc. IEEE Int’l Conf. Computer Vision, Vol. , pp. 4780–4788. External Links: Cited by: §1.1, §1.1, §2.2, §2.3, §4.1.1, Table 5.
-  (2017) An all-in-one network for dehazing and beyond. arXiv preprint arXiv:1707.06543. Cited by: §2.2.
-  (2018-Feb.) End-to-end united video dehazing and detectionc. In aaai, Cited by: §2.2.
-  (2019) Benchmarking single-image dehazing and beyond. IEEE Trans. on Image Processing 28 (1), pp. 492–505. Cited by: §2.1, Figure 2, §3.1.
-  (2013) Detection of blotch and scratch in video based on video decomposition. IEEE Transactions on Circuits and Systems for Video Technology 23 (11), pp. 1887–1900. Cited by: §2.2.
-  (2019) DSFD: dual shot face detector. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Cited by: §4.2.1.
-  (2015-Sept) A low-light image enhancement method for both denoising and contrast enlarging. In Proc. IEEE Int’l Conf. Image Processing, Vol. , pp. 3730–3734. External Links: Cited by: §2.2.
-  (2018-06) Structure-revealing low-light image enhancement via robust retinex model. IEEE Trans. on Image Processing 27 (6), pp. 2828–2841. External Links: Cited by: §1.1, §2.2.
-  (2018-06) Video rain streak removal by multiscale convolutional sparse coding. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Cited by: §2.2.
-  (2019) Single image deraining: a comprehensive benchmark analysis. arXiv preprint arXiv:1903.08558. Cited by: §2.1.
-  (2015-12) Nighttime haze removal with glow and multiple light colors. In Proc. IEEE Int’l Conf. Computer Vision, Vol. , pp. 226–234. External Links: Cited by: §2.2.
-  (2016) Rain streak removal using layer priors. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, pp. 2736–2744. Cited by: §1.1, §2.2.
-  (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125. Cited by: §4.1.1, Table 5.
-  (2018) Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence. Cited by: §4.1.1, §4.3.1, Table 5, Table 6.
-  (2017) Enhance visual recognition under adverse conditions via deep networks. arXiv preprint arXiv:1712.07732. Cited by: §1.1, §2.3.
-  (2017) Robust video super-resolution with learned temporal dynamics. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2507–2515. Cited by: §2.2.
-  (2018) Learning temporal dynamics for video super-resolution: a deep learning approach. IEEE Transactions on Image Processing 27 (7), pp. 3432–3445. Cited by: §2.2.
-  (2018) Connecting image denoising and high-level vision tasks via deep learning. arXiv preprint arXiv:1809.01826. Cited by: §1.1, §2.3.
-  (2017) When image denoising meets high-level vision tasks: a deep learning approach. arXiv preprint arXiv:1706.04284. Cited by: §1.1, §2.3.
-  (2018) Structure-guided image inpainting using homography transformation. IEEE Transactions on Multimedia 20 (12), pp. 3252–3265. Cited by: §2.2.
-  (2018-06) Erase or fill? deep joint recurrent rain removal and reconstruction in videos. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Cited by: §2.2.
-  (2009) Pixel based temporal analysis using chromatic property for removing rain from videos. In Computer and Information Science, Cited by: §2.2.
-  (2016) Ssd: single shot multibox detector. In European conference on computer vision, pp. 21–37. Cited by: §4.3.1, Table 6.
-  (2018) Improved techniques for learning to dehaze and beyond: a collective study. arXiv preprint arXiv:1807.00202. Cited by: §1.1.
-  (2019) Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding 178, pp. 30–42. External Links: Cited by: §2.1.
-  (2017) LLNet: a deep autoencoder approach to natural low-light image enhancement. Pattern Recognition 61, pp. 650 – 662. External Links: Cited by: §1.1, §2.2.
-  (2015) Removing rain from a single image via discriminative sparse coding. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3397–3405. Cited by: §1.1, §2.2.
-  (2001-07) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. IEEE Int’l Conf. Computer Vision, Vol. 2, pp. 416–423. Cited by: §2.1.
-  (2016) A benchmark and simulator for uav tracking. Springer Nature. External Links: Cited by: §2.1.
-  (2018-04) Pushing the Limits of Unconstrained Face Detection: a Challenge Dataset and Baseline Results. arXiv e-prints, pp. arXiv:1804.10275. Cited by: §2.1.
-  (2014) An effective surround filter for image dehazing. In Proc. of Int’l Conf. on Interdisciplinary Advances in Applied Computing, ICONIAAC ’14, New York, NY, USA, pp. 20:1–20:6. External Links: Cited by: §2.2.
-  (2017-10) SSH: single stage headless face detector. In Proc. IEEE Int’l Conf. Computer Vision, Vol. , pp. 4885–4894. Cited by: §4.2.1.
-  (2012-07) Bayesian defogging. Int’l Journal of Computer Vision 98 (3), pp. 263–278. External Links: Cited by: §2.2.
-  (2011-06) A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011, Vol. , pp. 3153–3160. Cited by: §2.1.
-  (1990-05) Contrast-limited adaptive histogram equalization: speed and effectiveness. In Proceedings of Conference on Visualization in Biomedical Computing, Vol. , pp. 337–345. External Links: Cited by: §2.2.
-  (2017-07) Benchmarking denoising algorithms with real photographs. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 2750–2759. External Links: Cited by: §2.1.
-  (2018) U-finger: multi-scale dilated convolutional network for fingerprint image denoising and inpainting. arXiv preprint arXiv:1807.10993. Cited by: §1.1.
-  (2018) Attentive generative adversarial network for raindrop removal from a single image. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.1, §3.3, §4.3.1, Table 6.
-  (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767. Cited by: §4.1.1, §4.3.1, Table 5, Table 6.
-  (2013) Context-aware sparse decomposition for image denoising and super-resolution. IEEE Transactions on Image Processing 22 (4), pp. 1456–1469. Cited by: §2.2.
-  (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pp. 91–99. Cited by: §4.3.1, Table 6.
-  (2019-04) Deep video dehazing with semantic segmentation. IEEE Trans. on Image Processing 28 (4), pp. 1895–1908. External Links: Cited by: §2.2.
-  (2017-07) Video desnowing and deraining based on matrix decomposition. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Cited by: §2.2.
-  (2016) Single image dehazing via multi-scale convolutional neural networks. In European Conference on Computer Vision, Cited by: §1.1, §2.2, §4.1.1, Table 5.
-  (2017) Video deblurring via semantic segmentation and pixel-wise non-linear kernel. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1077–1085. Cited by: §2.2.
-  (2018-05) Joint enhancement and denoising method via sequential decomposition. Cited by: §1.1, §2.2, §4.2.1.
-  (2015-03) Utilizing local phase information to remove rain from video. Int’l Journal of Computer Vision 112 (1), pp. 71–89. External Links: Cited by: §2.2.
-  (2005-Sep.) Recognizing facial expressions at low resolution. In Proc. of IEEE Conf. on Advanced Video and Signal Based Surveillance, Vol. , pp. 330–335. External Links: Cited by: §1.1.
-  (2014-06) Scene-independent group profiling in crowd. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 2227–2234. External Links: Cited by: §2.1.
-  (2006-11) A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. on Image Processing 15 (11), pp. 3440–3451. External Links: Cited by: §2.1.
-  (2017-11) MSR-net:Low-light Image Enhancement Using Deep Convolutional Network. ArXiv e-prints. External Links: Cited by: §1.1, §2.2.
-  (2009-10) Face tracking and recognition in low quality video sequences with the use of particle filtering. In Proc. of Annual Int’l Carnahan Conf. on Security Technology, Vol. , pp. 126–133. External Links: Cited by: §1.1.
-  (2018-09) PyramidBox: a context-assisted single shot face detector. In Proc. IEEE European Conf. Computer Vision, Cited by: §4.2.1.
-  (2004-06) Evaluation of face resolution for expression analysis. In Proc. of Int’l Conf. on Computer Vision and Pattern Recognition Workshop, Vol. , pp. 82–82. Cited by: §1.1.
-  (2017) Ntire 2017 challenge on single image super-resolution: methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 114–125. Cited by: §2.1.
-  (2008-11) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 30 (11), pp. 1958–1970. External Links: Cited by: §2.3.
-  (2012-03) Video post processing: low-latency spatiotemporal approach for detection and removal of rain. IET Image Processing 6 (2), pp. 181–196. External Links: Cited by: §2.2.
-  (2011) A probabilistic approach for detection and removal of rain from videos. IETE Journal of Research 57 (1), pp. 82–91. External Links: Cited by: §2.2.
-  (2018-09) License plate recognition and super-resolution from low-resolution videos by convolutional neural networks. In Proc. of British Machine Vision Conference, Cited by: §1.1.
-  (2019) Bridging the gap between computational photography and visual recognition. arXiv preprint arXiv:1901.09482. Cited by: §2.3.
-  (2013-Sept) Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Trans. on Image Processing 22 (9), pp. 3538–3548. External Links: Cited by: §2.2.
-  (2016) Studying very low resolution recognition using deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4792–4800. Cited by: §2.3.
-  (2013) Robust temporal-spatial decomposition and its applications in video processing. IEEE Transactions on Circuits and Systems for Video Technology 23 (3), pp. 387–400. Cited by: §2.2.
-  (2016) D3: deep dual-domain based fast restoration of jpeg-compressed images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2764–2772. Cited by: §2.2.
-  (2015) Self-tuned deep super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8. Cited by: §2.2.
-  (2015) Learning super-resolution jointly from external and internal examples. IEEE Transactions on Image Processing 24 (11), pp. 4359–4371. Cited by: §2.2.
-  (2018) Deep retinex decomposition for low-light enhancement. In British Machine Vision Conference, pp. 155. Cited by: §2.1, §4.2.1.
-  (2017-10) Should we encode rain streaks in video as deterministic or stochastic?. In Proc. IEEE Int’l Conf. Computer Vision, Cited by: §2.2.
-  (2018-04) Real-world Noisy Image Denoising: A New Benchmark. arXiv e-prints, pp. arXiv:1804.02603. Cited by: §2.1.
-  (2017) Image deblurring via extreme channels prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4003–4011. Cited by: §2.2.
-  (2016-12) Enhancement of low light level images with coupled dictionary learning. In Proc. IEEE Int’l Conf. Pattern Recognition, Vol. , pp. 751–756. External Links: Cited by: §1.1, §2.2.
-  (2010-11) Image super-resolution via sparse representation. IEEE Trans. on Image Processing 19 (11), pp. 2861–2873. External Links: Cited by: §2.3.
-  (2017) Deep joint rain detection and removal from a single image. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.1, §2.1, §2.2, §4.3.1, Table 6.
-  (2007) Introduction to a large-scale general purpose ground truth database: methodology, annotation tool and benchmarks. In EMMCVPR, Cited by: §2.1.
-  (2017-11) A Bio-Inspired Multi-Exposure Fusion Framework for Low-light Image Enhancement. ArXiv e-prints. External Links: Cited by: §4.2.1.
-  (2013) Multi-level video frame interpolation: exploiting the interaction among different levels. IEEE Transactions on Circuits and Systems for Video Technology 23 (7), pp. 1235–1248. Cited by: §2.2.
-  (2012) On single image scale-up using sparse-representations. In Proc. of the Int’l Conf. on Curves and Surfaces, Berlin, Heidelberg, pp. 711–730. External Links: Cited by: §2.1.
-  (2011) Close the loop: joint blind image restoration and recognition with sparse representation prior. In 2011 International Conference on Computer Vision, pp. 770–777. Cited by: §1.1, §2.3.
-  (2018) Densely connected pyramid dehazing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3194–3203. Cited by: §4.1.1, Table 5.
-  (2018) Density-aware single image de-raining using a multi-stream dense network. In IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §1.1, §2.2, §4.3.1, Table 6.
-  (2017) Image de-raining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957. Cited by: §2.1, §4.3.1, Table 6.
-  (2017-07) Fast haze removal for nighttime image using maximum reflectance prior. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, Vol. , pp. 7016–7024. External Links: Cited by: §2.2.
-  (2017-10) S3FD: single shot scale-invariant face detector. In Proc. IEEE Int’l Conf. Computer Vision, Vol. , pp. 192–201. Cited by: §4.2.1.
-  (2012-11) Enhancement and noise reduction of very low light level images. In Proc. IEEE Int’l Conf. Pattern Recognition, Vol. , pp. 2034–2037. External Links: Cited by: §2.2.
-  (2006) Rain removal in video by combining temporal and chromatic properties. In Proc. IEEE Int’l Conf. Multimedia and Expo, pp. 461–464. Cited by: §2.2.
-  (2017) HazeRD: an outdoor scene dataset and benchmark for single image dehazing. In Proc. IEEE Int’l Conf. Image Processing, pp. 3205–3209. Cited by: §2.1.
-  (2013-12) Single image dehazing motivated by retinex theory. In Proc. of Int’l Symposium on Instrumentation and Measurement, Sensor Network and Automation, Vol. , pp. 243–247. External Links: Cited by: §2.2.
-  (2018) VisDrone-det2018: the vision meets drone object detection in image challenge results. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0. Cited by: §1.
-  (2015-11) A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. on Image Processing 24 (11), pp. 3522–3533. External Links: Cited by: §2.2.
-  (2013-12) Video synopsis by heterogeneous multi-source correlation. In Proc. IEEE Int’l Conf. Computer Vision, Vol. , pp. 81–88. Cited by: §2.1.
-  (2012-01) Very low resolution face recognition problem. IEEE Trans. on Image Processing 21 (1), pp. 327–340. External Links: Cited by: §2.3, §2.3.