DIFAR: Deep Image Formation and Retouching

DIFAR: Deep Image Formation and Retouching

Sean Moran, Gregory Slabaugh
Huawei Noah’s Ark Lab
{sean.moran, gregory.slabaugh}@huawei.com
Abstract

We present a novel neural network architecture for the image signal processing (ISP) pipeline. In a camera system, the ISP is a critical component that forms a high quality RGB image from RAW camera sensor data. Typical ISP pipelines sequentially apply a complex set of traditional image processing modules, such as demosaicing, denoising, tone mapping, etc. We introduce a new deep network that replaces all these modules, dubbed Deep Image Formation And Retouching (DIFAR). DIFAR introduces a multi-scale context-aware pixel-level block for local denoising/demosaicing operations and a retouching block for global refinement of image colour, luminance and saturation. DIFAR can also be trained for RGB to RGB image enhancement. DIFAR is parameter-efficient and outperforms recently proposed deep learning approaches in both objective and perceptual metrics, setting new state-of-the-art performance on multiple datasets including Samsung S7 [38] and MIT-Adobe 5k [6].

(a) Underexposed image  (b) Results of DeepUPE    (c) Our model (DIFAR)     (d) Ground truth    
\captionoffigureUnderexposed image enhancement. Given (a) poorly exposed image, DIFAR (c) produces an image with pleasing contrast and colour better matching the groundtruth (d) compared to the state-of-the-art DeepUPE model [42] (b).

1 Introduction

Image quality is of fundamental importance in any imaging system, including DSLR and smartphone cameras. At the imaging sensor, RAW data is normally captured on a color filter array (such as the well-known Bayer pattern) where at each pixel, only a red, green, or blue color is available. This mosaiced RAW data suffers from noise, vignetting, lack of white balance, and many other defects and additionally has a high dynamic range.

The camera’s image signal processing (ISP) pipeline is responsible for forming a high quality RGB image with minimal noise, pleasing colors, sharp detail, and good contrast from the degraded RAW data. In most cases, the ISP is realised as a modular sequence of traditional image signal processing algorithms (Figure 1) each responsible for a single well-defined image operation (e.g. denoising, demosaicing, tone mapping). The modules are numerous and highly engineered, easily exceeding twenty in modern ISPs. As a result, it is difficult and expensive to design, develop, and tune the ISP pipeline due to the complex dependencies between the modules.

A single neural network trained end-to-end that directly transforms RAW data into an RGB image has numerous advantages including simplicity, efficiency, and most importantly, quality. Currently the state-of-the-art in many of the individual modules in an ISP (such as image denoising, automatic white balance, dynamic range compression) is set by deep neural networks. However, concatenating a large set of deep networks is impractical for efficient inference on a device like a smartphone, and end-to-end training with many modules is complicated; instead, a single network transforming from RAW to RGB is preferred. Efficient on-board neural image processing software can correct for limitations of the sensor, producing an RGB image that has rich colour, texture, contrast and detail. This is particularly important for smartphones, which must include smaller and less capable imaging sensors due to form factors.

Figure 1: A simplified traditional image signal processing (ISP) pipeline. The input is a RAW image and the output is an RGB image with correct colour and brightness.
Figure 2: Left: DIFAR image signal processor (ISP) pipeline. The network consists of two blocks trained end-to-end: a pixel-level block for local pixel processing and a global retouching block that leverages neural curve layers for artist-inspired image refinement operations. The input is a RAW image and the output is a high quality RGB image with correct colour and brightness. Right: Illustration of a piece-wise linear neural curve that predicts a scaling factor that adjusts saturation based on hue.

Given the importance of the ISP on image quality, surprisingly, the problem of RAW to RGB transformation using a deep neural network has only received limited attention in the literature, rendering it an under-researched topic. Recently the DeepISP [38] approach first demonstrated proof-of-concept for a deep network to replace the entire ISP, and work by Chen et al. [7] addressed RAW to RGB transformation in low light (less than 5 lux). However, it is an open question how far image quality can advance using deep RAW to RGB networks.

Drawing inspiration from this related work, we propose DIFAR (Deep Image Formation And Retouching), a novel neural image processing pipeline for image formation and retouching. DIFAR includes a pixel-level block for low level operations like denoising and demosaicing. Similar to [7], our pixel-level block has an encoder/decoder structure, however we replace standard skip connections with novel multi-scale contextual awareness (MSCA) connections, providing enriched features from the encoder to the decoder that boost performance. Similar to [38], DIFAR includes global image transformations to adjust colors and brightness. However, unlike [38], DIFAR introduces neural curve layers, which more expressively adjust image properties in a controlled but human-interpretable way. Unlike  [7] or  [38], DIFAR leverages multiple color spaces to boost performance. Although DIFAR was designed to handle the RAW to RGB mapping, it also excels at RGB to RGB image enhancement. Our contributions in this paper are four-fold:

  • State-of-the art end-to-end deep neural network for the entire ISP: A new neural network architecture incorporating a pixel-level block for local processing including demosaicing, denoising, brightness, and luminance correction and a retouching block that globally adjusts the image luminance, saturation and colour. DIFAR produces state-of-the-art results on several public datasets.

  • MSCA connections in the pixel level block: Our pixel-level block has an encoder/decoder structure but replaces standard skip connections with a novel multi-scale context awareness connection, enriching the information available to the decoder.

  • Retouching block: We introduce a neural network curve layer, that learns a piece-wise linear scaling curve to globally adjust image properties in a human-interpretable manner, inspired by the Photoshop curves tool111https://www.cambridgeincolour.com/tutorials/photoshop-curves.htm.

  • Multiple colour spaces: We apply sequential differentiable transformations of the image in different colour spaces (Lab, RGB, HSV) guided by a novel multi-colour space loss function.

2 Related Work

Single-stage ISP operations. Image denoising is a fundamental problem in the ISP and remains an open problem. A detailed review of denoising methods can be found in [3]. Traditional methods [15, 33, 41, 5, 13] seek noise removal while preserving edges. More recent work casts denoising as a regression problem addressed with deep learning [46, 30, 40, 27, 47, 8, 1] showing much promise to denoise using convolutional neural networks (CNNs). Image demosaicing seeks to recover a full colour RGB image from an incomplete set of colour samples arranged on a colour filter array. The task of demosaicing is essentially one of interpolation. Often coupled with denoising, the state-of-the-art in demosiacing is set by deep learning using CNNs [16, 25] and conditional GANs [14]. Other fundamental ISP operations include automatic white balance [20, 2, 35, 4] and dynamic range compression [17, 44].

Combined operations. A recent trend in the literature is to combine tasks to solve joint problems. In addition to joint denoising and demosiacing mentioned in the previous paragraph, authors have explored solutions for joint demosaicing and super-resolution [37], demosaicing, denoising, and super-resolution [34], super-resolution and tone mapping [24], demosaicing and deblurring [12].

Modeling the full ISP. Further combination of multiple ISP tasks ultimately leads to modeling the entire ISP. Currently the literature is sparse on deep learning methods that specifically aim to replace the full pipeline. The most relevant related work is the DeepISP model of [38], which incorporates a low level network that performs local adjustment of the image including joint denoising and demosaicing, and a high level network that performs global image enhancement. Other recent work includes the Self-Guided Network (SGN) [18] which relies extensively on pixel shuffling [26] operations, and CameraNet [28] that decomposes the problem into two subproblems of restoration and enhancement. Another related work is [7] which also learns a RAW to RGB mapping, however for extremely dark images using a simple UNet-style architecture [36].

Image enhancement. In an ISP, image enhancement typically relates to brightness and colour adjustment. Hu et al[19] design a photo retouching approach (White-Box) using reinforcement learning and GANs to apply a sequence of filters to an image. Deep Photo Enhancer (DPE) [11] is a GAN-based architecture with a UNet-style generator for RGB image enhancement that produces state-of-the-art results on the popular MIT-Adobe5K benchmark dataset. Arguably, the state-of-the-art in digital photography is achieved with DSLR, which due to the large aperture and lenses can produce higher quality photographs than what is achievable with a smartphone camera. Related work enhances smartphone images by learning a mapping between smartphone images and DSLR [21, 22, 49] photographs using deep learning.

Figure 3: Multi-scale contextual awareness connection that fuses multiple levels of image context (global, mid, local) to deliver more contextually relevant features for the expanding path (Section 3.1).

3 The DIFAR Neural Network

DIFAR is an end-to-end trainable neural network that consists of two blocks as shown in Figure 2. The pixel-level block is an encoder-decoder neural network (Section 3.1) for local processing. The output of the pixel level block is passed to the retouching block (Section 3.2), employing a novel neural curve layer that globally adjusts image properties.

3.1 Pixel-Level Block: Localised Pixel Processing

The DIFAR pixel-level block is based on a UNet [36] backbone. The pixel-level block performs local pixel processing including the tasks of denoising and demosiacing. We adopt an encoder-decoder architecture with symmetric skip connections. The UNet has proven to be a capable network for image enhancement tasks as demonstrated by recent related work [11, 7].

However, DIFAR proposes a new type of skip connection (Figure 3) we call multi-scale contextual awareness (MSCA) connection, inspired by [31]. The MSCA-connection fuses multiple different contextual image features, combining global and mid-level features to enable cross-talk between image content at different scales. The MSCA-connection uses convolutional layers with dilation rate 2 and 4 to gain a larger receptive field and capture mid-level image context from the input. Global image features are extracted using a series of convolutional layers with a stride 2, followed by a Leaky ReLU activation and then a max pooling operation. These layers are then followed by global average pooling and a fully connected layer, which outputs a fixed dimensional feature vector, tiled across the height and width dimensions of the input. The resulting feature maps are concatenated to the input and fused by a x convolution to produce a tensor with much fewer channels. This merges the local, mid-level and global contextual information for concatenation to the feature maps of the upsampling path at that particular level. As argued by [17, 7, 31] it is important to consider global and mid-level context when making local pixel adjustments to reduce spatial inconsistencies in the predicted image. MSCA-connections reduce colour inconsistency artifacts, see Figure 4) while also reducing the number of trainable parameters.

Figure 4: Left: U-Net. Right: U-NetMSCA connection. Adding an MSCA connection to a vanilla U-Net model can reduce colour bleeding artifacts. Please zoom in for details.

The backbone U-Net differs depending on the type of input data (RAW or RGB). For our RAW to RGB experiments we modify the standard U-Net architecture to facilitate the processing of RAW images represented by a colour filter array (CFA). The CFA tensor is shuffled into a packed form using a pixel shuffle layer [26] to give a tensor, where we set in this work. This tensor is fed into the downsampling path of the U-Net backbone network. The output from the expanding path is a set of feature maps of half the width and height of the full RGB image. These feature maps are added to the input packed image (replicated on channel dimension) through a long skip connection. A pixel shuffle upsampling operation [26] produces a result with output shape , where is the number of feature maps, and is passed over to the retouching block for further processing. Our RGB to RGB U-Net network is similar to the RAW to RGB network, with the exception that we remove the pixel shuffling operations. The long skip connection remains, but in this case the 3 channel input image is added to the output, rather than a shuffled version.

Figure 5: Global retouching block of DIFAR. The retouching block refines the image from the pixel-level network to adjust luminance, colour and saturation using our neural curve formulation. The image has luminance adjusted first, colour (RGB) second, and saturation third using three piecewise linear curves. The output of this block is the final RGB image, completing the RAW to RGB mapping. The supplementary material investigates different arrangements of this general architecture.

3.2 Retouching Block: Global Image Adjustment

The first three channels of the tensor from the pixel-level block are treated as the image to be globally adjusted, and the remaining channels serve as a feature maps that are used as input to each neural curve layer. Our proposed neural curve layer block is shown in Figure 5 and consists of, for each of the three colour spaces, a global feature extraction block followed by a fully connected layer that regresses the knot points of a piecewise-linear curve (Figure 2). The curve adjusts the predicted image () by scaling pixels with the formula presented in Equation 1.

(1)
where:

where is the number of predicted knot points, is the -th pixel value in the -th color channel of the -th image, is the value of the knot point . The neural curve outputs scale factors, so to apply the curve is a simple matter of multiplication of a pixel’s value with its scale factor indicated by the curve222We set in this paper based on a parameter sweep on a validation dataset.. Examples of neural curves learnt by one instance of our model are shown in Figure 6. The feature extraction block of the neural curve layer (Figure 5) accepts a feature map, passing it to a group of blocks each consisting of a convolutional layer with kernels of stride , and maxpooling. We place a global average pooling layer and fully connected layer at the end. The neural curves are learnt in several colour spaces (RGB, Lab, HSV).

Figure 6: Examples of learnt neural global adjustment curves for different colour spaces (CIELab and HSV).

In Figure 5, we arrange the neural curve layers in a particular sequence, adjusting firstly luminance and the a, b chrominance channels (using three curves respectively) in CIELab space. Afterwards, we adjust the red, green, blue channels (using three curves respectively) in RGB space. Lastly hue is scaled based on hue, saturation based on saturation, saturation based on hue, and value based on value (using four curves respectively) in HSV space. This ordering of the colour spaces is found to be optimal based on a sweep on a validation dataset333An ablation study on the ordering of the blocks is presented in the supplementary material.. The input to each neural curve layer consists of the concatenation of a image converted to the given colour space, and a subset of the features from the pixel-level block (). For the luminance neural curve layer, the fully connected layer regresses the parameters of the L, a and b channel scaling curves. The scaling curves scale the pixel values in the L, a, b channels using Equation 1. The adjusted CIELab image is then converted back to RGB. This RGB image is concatenated with the pixel-level block feature map and fed into the second neural curve layer that learns three more curves, one for each channel of the RGB image. These curves are applied to the R, G, B channels to adjust the colours. Lastly, the RGB image is converted to HSV space. HSV space separates the hue, saturation and value properties of an image. The HSV image is concatenated with the feature map which is used by the final curve layer to predict the four HSV space adjustment curves. These curves are applied to the HSV image and the HSV image is converted back to RGB space via a differentiable HSV to RGB conversion. A long skip connection links input and output.

All terms

Groundtruth

Figure 7: Qualitative effect of different combinations of terms in the DIFAR loss function on image quality. is the RGB loss without the cosine distance term. All terms are most effective, with obvious artefacts and colour distortions removed. See Section 3.3 for more detail.

3.3 Loss Function

The DIFAR loss function consists of three colour space-specific terms which seek to optimise different aspects of the predicted image, including the hue, saturation, luminance and chrominance. The loss is minimised over a set of image pairs , where is the reference image and is the predicted image. The DIFAR loss is presented in Equation 2

(2)

where , , , are the various color space loss terms, and is a curve regularization loss. These terms are defined in more detail below.

HSV loss

, : given the hue (angle) , saturation and value for image , we compute in the conical HSV colour space,

(3)

This loss is adapted from the colour similarity function of VisualSeek [39]. HSV is advantageous as it separates colour into useful components (hue, saturation, intensity).

CIELab loss

, : we follow  [38] and compute the distance between the Lab values of the groundtruth and predicted images (Equation 4). In particular, the multi-scale structural similarity (MS-SSIM) [43] between the luminance channels of the groundtruth and predicted images enforces a reproduction of the contrast, luminance and structure of the target image [38].

(4)

where is a function that returns the CIELab Lab channels corresponding the RGB channels of the input image and is a function that returns the L channel of the image in CIELab colour space.

RGB loss

, : this term follows [31] and consists of distance on RGB pixels between the predicted and groundtruth images and a cosine distance between RGB pixel vectors (where is three-element vector representing the RGB components of pixel in the -th image)

(5)

Curve regularization loss

, : to mitigate overfitting we regularize the neural curves with a smoothness prior by penalizing the squared difference between each of the piecewise linear segments, each denoted by .

(6)

Loss term weights:

each loss term has an associated weight hyperparameter: , , ,. In our experimental evaluation we empirically find that only the term is sensitive to the particular dataset. The remaining terms are largely dataset insensitive and we set , , , , in all of our experiments across all benchmark datasets. In our experimental evaluation we find that all terms are necessary to attain the highest quality output images.

4 Experimental Results

Datasets: We evaluate DIFAR on three publicly available benchmark datasets: (i) Samsung S7 [38] consists of 110, 12M pixel images of short-to-medium exposure RAW, RGB image pairs and medium-to-medium exposure RAW, RGB pairs. Following [38] we divide the dataset into 90 images for training, 10 for validation and 10 for testing. (ii) MIT-Adobe5k-DPE [38] contains 5,000 images taken on a variety of DLSR cameras and subsequently adjusted by an artist (Artist C). We follow the dataset pre-processing procedure of DeepPhotoEnhancer (DPE) [11]. The training dataset consists of 2,250 RGB image pairs. The RAW input images are processed into RGB by Lightroom. The groundtruth RGB images are generated by applying the adjustments of Artist C to the input. The dataset is divided into 2,250 training images and 500 test images. We randomly sample 500 validation images from the 2,250 training images. The images are re-sized to have a long-edge of 500 pixels. (iii) MIT-Adobe5k-UPE [42] MIT-Adobe5k dataset pre-processed as described in [42] (DeepUPE). The training dataset consists of 4,000 images of RGB image input, groundtruth pairs. The groundtruth are from Artist C. The dataset is divided into 4,500 training images and 500 test images. We randomly sample 500 images to serve as our validation dataset. The images are not re-sized and range from 6M pixel to 25M pixel resolution. Evaluation Metrics: Our image quality metrics are PSNR, SSIM and the perceptual quality aware metric LPIPS [48]. Hyperparameters were tuned on the held-out validation portion of each benchmark dataset.

{adjustbox}

max width=0.495 Architecture PSNR SSIM LPIPS # Parameters DIFAR (MSCA, level 1) 0.794 0.320 1.3 M DIFAR 27.09 0.312 3.3 M U-Net (large) [36] 0.340 5.1 M DeepISP (small) [38] 0.795 0.335 630 K DeepISP (large) [38] 0.326 3.9 M

Table 1: Medium-to-medium exposure RAW to RGB mapping results on the held-out test images of the Samsung S7 dataset [38].
{adjustbox}

max width=0.495 Architecture PSNR SSIM LPIPS # Parameters DIFAR (MSCA, level 1) 0.353 1.3 M DIFAR 26.73 3.3 M U-Net (large) [36] 0.395 5.1 M DeepISP (small) [38] 0.372 630 K DeepISP (large) [38] 0.792 0.360 3.9 M

Table 2: Short-to-medium exposure RAW to RGB mapping results on the held-out test images of the Samsung S7 dataset [38].

4.1 Comparison to state-of-the-art methods

We evaluate DIFAR against competitive baseline models on the RAW to RGB and RGB to RGB image enhancement tasks. Specifically our main state-of-the-art baselines are (i) U-Net: [11, 7] we use a U-Net architecture without the MSCA-connections, and broadly following the design of the generator architecture of DPE [11]. (ii) DeepISP [38]: we follow the architecture and experimental procedure of [38]. The thirty element RGB colour transformation matrix is initialised using linear regression. (iii) DPE: [11] we evaluate against the supervised (paired data) version of DPE. (iv) DeepUPE [42] (v) HDRNet [17] (vi) White-Box [19].

Quantitative Comparison:

(i) Samsung S7 dataset: We evaluate DIFAR on the RAW to RGB mapping task for both short-to-long and long-to-long exposure image pairs. DIFAR is compared to the U-Net [36] and DeepISP [38] baselines. Results are presented in Tables 1-2. DIFAR produces higher-quality images for both exposure settings compared to U-Net and outperforms DeepISP in terms of the PSNR and LPIPS metrics. This suggests DIFAR is a competitive model for replacing the traditional RAW to RGB ISP pipeline. (ii) MIT-Adobe5k (DPE): Table 3 presents the results on the MIT-Adobe5K (DPE) dataset. DIFAR uses almost 2.5 less parameters than DPE, but is able to maintain the same SSIM score while boosting the PSNR by dB. (iii) MIT-Adobe5k (UPE): Table 4 demonstrates that DIFAR is competitive with DeepUPE on this challenging high-resolution dataset. DIFAR obtains a substantial dB boost in PSNR, a change in the LPIPS metric (lower is better) while remaining competitive in terms of SSIM.

{adjustbox}

max width=0.495 Architecture PSNR SSIM LPIPS # Parameters DIFAR (MSCA, level 1) 23.97 0.900 0.583 1.3 M DIFAR 0.900 3.3 M DPED [23] 8RESBLK [50, 29] FCN [10] CRN [9] U-Net [36] DPE [11] 23.80 0.900 0.587 3.3 M

Table 3: Results for predicting the retouching of photographer C on the 500 testing images from the MIT-Adobe 5K dataset. Dataset pre-processed according to the DPE paper [11].
{adjustbox}

max width=0.495 Architecture PSNR SSIM LPIPS # Parameters DIFAR (MSCA, level 1) 24.20 0.108 1.3 M DIFAR 0.105 3.3 M HDRNet [17] DPE [11] White-Box [19] Distort-and-Recover [32] 20.97 0.841 DeepUPE [42] 23.04 0.893 0.158 1.0 M

Table 4: Average PSNR/SSIM results for predicting the retouching of photographers on the 500 testing images from the MIT-Adobe 5K dataset. Dataset pre-processed according to the DeepUPE paper [42].

Visual Comparison:

We show visual results from the output of DIFAR and compared to the output of DeepISP and DeepUPE in Figures 8-9. DIFAR produces images with more pleasing colour and luminance compared to the DeepISP and DeepUPE baselines. Additional visual examples are presented in the supplementary material.

DeepISP (28.19 dB)

DIFAR (29.37 dB)

Groundtruth

Figure 8: Examples images produced by DeepISP and DIFAR on the Samsung S7 Medium Exposure dataset.

DeepUPE (16.85 dB)

DIFAR (23.55 dB)

Groundtruth

Figure 9: Examples images produced by DeepUPE and DIFAR on the MIT-Adobe-UPE dataset.

4.2 Discussion

Ablation Studies

(i) Colour blocks: In Table 5 we evaluate the importance of the three colour blocks in the retouching layer. We compare the DIFAR model with all three colours blocks (RGB, HSV, Lab) in the retouching layer versus variants of the model with just one of the available blocks. A loss function that operates in only one colour space in e.g. RGB space is suboptimal. (ii) Loss function terms: Equation 2 presents the DIFAR loss function. We perform an ablation study (full results in the supplementary material, Table 7) on the various terms in the loss function. The highest image quality is attained with all loss function terms, demonstrating the need to constrain each of the three colour spaces appropriately with a dedicated loss term. Interestingly we find that coupling the the RGB loss with a cosine distance term gives a significant boost in image quality (25.4527.09 dB, 0.7560.793 dB) compared to the using just the RGB loss term. In addition the regularisation term () is important for highest image quality due to its role in constraining the flexibility of the neural retouching curves. (iii) Neural curve knot points: We find 16 knot points to be optimal based on a sweep of the Samsung S7 validation dataset. (iv) MSCA skip connection: We find that a single MSCA skip connection at level one of the U-Net (26.56dB, 0.781 SSIM) suffices to boost the PSNR by dB an the SSIM by compared to the U-Net baseline (25.78dB, 0.771 SSIM). For best image quality at a fixed parameter budget, it is advisable to dedicate parameters to the MSCA skip connection than it is to other blocks of the U-Net.

{adjustbox}

max width=0.495 Architecture PSNR SSIM LPIPS DIFAR (with RGB, w/o HSV, w/o CIELab) 26.74 0.790 0.340 DIFAR (with HSV, w/o RGB, w/o CIELab) 25.88 0.780 0.308 DIFAR (with CIELab, w/o RGB, w/o HSV) 26.98 0.788 0.323 DIFAR (with HSV, with RGB, with CIELab) (MSCA, level 1) 27.04 0.794 0.320 DIFAR (with HSV, with RGB, with CIELab) 27.09 0.793 0.312

Table 5: Ablation study on the retouching layer of DIFAR.

5 Conclusions

This paper introduced DIFAR (Deep Image Formation And Retouching), a novel deep architecture for the ISP or image enhancement. DIFAR introduces a multi-scale contextual awareness (MSCA) skip connection that markedly improves the image quality obtained from a U-Net backbone network, while also drastically reducing the required number of trainable parameters. DIFAR takes inspiration from artists/photographers and applies a retouching layer to refine images based on learnt global image adjustment curves. The retouching curves are learnt automatically during training to adjust image properties by exploiting image representation in three different colours spaces (CIELab, HSV, RGB). The adjustment applied by these curves is moderated by novel multi-colour space loss function. In our experimental evaluation DIFAR significantly outperformed the state-of-the-art across a suite of benchmark datasets. Future research will investigate weakly supervised methods that could ease the burden of collecting paired training data. A natural step in this direction would instead allowing training with unpaired image sets [51].

References

  • [1] S. Anwar and N. Barnes. Real image denoising with feature attention. In The IEEE International Conference on Computer Vision (ICCV), October 2019.
  • [2] J. T. Barron and Y.-T. Tsai. Fast fourier color constancy. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  • [3] M. Bertalmío. Denoising of Photographic Images and Video: Fundamentals, Open Challenges and New Trends. Springer, 2018.
  • [4] S. Bianco and C. Cusano. Quasi-unsupervised color constancy. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  • [5] A. Buades. A non-local algorithm for image denoising. In CVPR, 2005.
  • [6] V. Bychkovsky, S. Paris, E. Chan, and F. Durand. Learning photographic global tonal adjustment with a database of input / output image pairs. In The Twenty-Fourth IEEE Conference on Computer Vision and Pattern Recognition, 2011.
  • [7] C. Chen, Q. Chen, J. Xu, and V. Koltun. Learning to see in the dark. In CVPR, 2018.
  • [8] C. Chen, Z. Xiong, X. Tian, Z.-J. Zha, and F. Wu. Real-world Image Denoising with Deep Boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
  • [9] Q. Chen and V. Koltun. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 1511–1520, 2017.
  • [10] Q. Chen, J. Xu, and V. Koltun. Fast image processing with fully-convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2497–2506, 2017.
  • [11] Y.-S. Chen, Y.-C. Wang, M.-H. Kao, and Y.-Y. Chuang. Deep photo enhancer: Unpaired learning for image enhancement from photographs with gans. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  • [12] Z. Chi, X. Shu, and X. Wu. Joint demosaicking and blind deblurring using deep convolutional neural network. In International Conference on Image Processing, 2019.
  • [13] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image denoising by sparse 3d transform-domain collaborative filtering. In IEEE Transactions on Image Processing. IEEE, 2007.
  • [14] W. Dong, M. Yuan, X. Li, and G. Shi. Joint demosaicing and denoising with perceptual optimization on a generative adversarial network. arXiv preprint arXiv:1802.04723, 2018.
  • [15] D. L. Donoho. Denoising via soft thresholding. IEEE Transactions on Information Theory, 41(3), 1995.
  • [16] M. Gharbi, G. Chaurasia, S. Paris, and F. Durand. Deep Joint Demosaicking and Denoising. ACM Transactions on Graphics (TOG), 35(8), 2016.
  • [17] M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG), 36(4):118, 2017.
  • [18] S. Gu, Y. Li, L. V. Gool, and R. Timofte. Self-guided network for fast image denoising. In The IEEE International Conference on Computer Vision (ICCV), October 2019.
  • [19] Y. Hu, H. He, C. Xu, B. Wang, and S. Lin. Exposure: A white-box photo post-processing framework. arXiv preprint arXiv:1709.09602, 2018.
  • [20] Y. Hu, B. Wang, and S. Lin. Fc4: Fully convolutional color constancy with confidence-weighted pooling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  • [21] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. V. Gool. Dslr-quality photos on mobile devices with deep convolutional networks. In ICCV, 2017.
  • [22] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. V. Gool. Wespe: Weakly supervised photo enhancer for digital cameras. arXiv preprint arXiv:1709.01118, 2017.
  • [23] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. Van Gool. Dslr-quality photos on mobile devices with deep convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 3277–3285, 2017.
  • [24] S. Y. Kim, J. Oh, and M. Kim. Deep sr-itm: Joint learning of super-resolution and inverse tone-mapping for 4k uhd hdr applications. In International Conference on Computer Vision (ICCV), 2019.
  • [25] F. Kokkinos and S. Lefkimmiatis. Iterative Joint Image Demosaicking and Denoising using a Residual Denoising Network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(8), 2019.
  • [26] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. P. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi. Photo-realistic single image super-resolution using a generative adversarial network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 105–114, 2017.
  • [27] S. Lefkimmiatis. Universal denoising networks: A novel cnn architecture for image denoising. In CVPR, 2018.
  • [28] Z. Liang, J. Cai, Z. Cao, and L. Zhang. Cameranet: A two-stage framework for effective camera isp learning. In ArXiv, 2019.
  • [29] M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. In Advances in neural information processing systems, pages 700–708, 2017.
  • [30] X. Mao, C. Shen, and Y.-B. Yang. Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In Advances in neural information processing systems, pages 2802–2810, 2016.
  • [31] D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista. Expandnet: A deep convolutional neural network for high dynamic range expansion from low dynamic range content. CoRR, abs/1803.02266, 2018.
  • [32] J. Park, J.-Y. Lee, D. Yoo, and I. So Kweon. Distort-and-recover: Color enhancement using deep reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5928–5936, 2018.
  • [33] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(7), 1990.
  • [34] G. Qian, J. Gu, J. S. Ren, C. Dong, F. Zhao, and J. Lin. Trinity of pixel enhancement: a joint solution for demosaicking, denoising and super-resolution. In ArXiv, 2019.
  • [35] Y. Qian, J.-K. Kamarainen, J. Nikkanen, and J. Matas. On finding gray pixels. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  • [36] O. Ronneberger, P.Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 9351 of LNCS, pages 234–241. Springer, 2015. (available on arXiv:1505.04597 [cs.CV]).
  • [37] Ruofan, R. Achanta, and S. Susstrunk. Deep residual network for joint demosaicing and super-resolution. In Color and Imaging Conference, 26th Color and Imaging Conference Final Program and Proceedings, 2018.
  • [38] E. Schwartz, R. Giryes, and A. M. Bronstein. DeepISP: Towards Learning an End-to-End Image Processing Pipeline. IEEE Transactions on Image Processing, 28(2):912–923, 2019.
  • [39] J. R. Smith and S. fu Chang. Visualseek: a fully automated content-based image query system. pages 87–98, 1996.
  • [40] Y. Tai, J. Yang, X. Liu, and C. Xu. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE International Conference on Computer Vision, pages 4539–4547, 2017.
  • [41] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. In ICCV, 1998.
  • [42] R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng, and J. Jia. Underexposed photo enhancement using deep illumination estimation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  • [43] Z. Wang, E. Simoncelli, and A. Bovik. Multiscale Structural Similarity for Image Quality Assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems and Computers, 2003.
  • [44] Q. Yan, D. Gong, Q. Shi, A. van den Hengel, C. Shen, I. Reid, and Y. Zhang. Attention-guided network for ghost-free high dynamic range imaging. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  • [45] Y. Yu, M. Abadi, P. Barham, E. Brevdo, M. Burrows, A. Davis, J. Dean, S. Ghemawat, T. Harley, P. Hawkins, M. Isard, M. Kudlur, R. Monga, D. Murray, and X. Zheng. Dynamic control flow in large-scale machine learning. In Proceedings of the Thirteenth EuroSys Conference, EuroSys ’18, pages 18:1–18:15, New York, NY, USA, 2018. ACM.
  • [46] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
  • [47] K. Zhang, W. Zuo, and L. Zhang. FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising. IEEE Transactions on Image Processing, 27(9), 2018.
  • [48] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  • [49] X. Zhang, Q. Chen, R. Ng, and V. Koltun. Zoom to learn, learn to zoom. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  • [50] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
  • [51] J. Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.

6 Supplementary material

This document provides additional material on DIFAR. In particular, we cover:

  • DIFAR Pixel-Level Block Architecture (Section 6.1)

  • MSCA-skip connection ablation study (Section 6.2)

  • Loss function ablation study (Section 6.3)

  • Number of Knot Points for the Neural Curves (Section 6.4)

  • Variations of the retouching block architecture (Section 6.5)

  • Differentiability of the piecewise linear scaling curve formulation (Section 6.6)

  • Differentiability of the colour space conversion operators (Section 6.7)

  • Limitations of DIFAR (Section 6.10)

  • Additional qualitative visual examples (Section 6.11)

Figure 10: Effect of the number of knot points in a neural curve on the image quality (PSNR/SSIM). Maximum quality reached at 16 knot points. Results measured on the Samsung held-out validation dataset.

6.1 DIFAR Pixel-Level Block Architecture

The RAW to RGB and RGB to RGB pixel-level block of DIFAR is modelled as a U-Net, broadly following the specifications in [11]. For the RAW to RGB mapping we make modifications designed for ingesting RAW data formatted as a colour filter array (CFA). The RAW to RGB backbone for DIFAR is shown in Figure 12. Additionally, in Figure 11 we show how precisely the U-Net architecture is amended with MSCA-skip connections.

MSCA, level 1

MSCA, level 2

MSCA, level 3

MSCA, level 4

Figure 11: Variants of the U-Net backbone with MSCA skip connections at various levels. Note that the normal U-Net skip connections are removed entirely, with only one being replaced with an MSCA skip connection. This saves on parameters and leads to better performance.

6.2 MSCA-skip connection ablation study

In Table 6 we present the full ablation study on the DIFAR MSCA-skip connection. For a parameter efficient model that produces images of high quality we find it is only necessary to place an MSCA-skip connection at level 1 of the U-Net (Figure 11.

{adjustbox}

max width=0.495 Architecture PSNR SSIM # Parameters U-NetMSCA, layer 4 2.7 M U-NetMSCA, layer 3 U-NetMSCA, layer 2 U-NetMSCA, layer 1 26.56 U-NetMSCA, all layers 0.793 U-Net 1.4 M U-Net (large) 5.1 M U-Net ( Lab, MS-SSIM L) 1.3 M U-Net (large) ( Lab, MS-SSIM L) 5.1 M

Table 6: Ablation study on the MSCA-skip connection of DIFAR. A single MSCA-skip connection at the top-most level of the U-Net backbone significantly boosts the image quality, even outperforming a U-Net with twice the number of parameters. All models were trained with loss in RGB colour space, unless otherwise stated.
Figure 12: Pixel-level block of DIFAR. The block is modelled as a U-Net. The raw image data is input on the left and the RGB image is output on the right. The skip connections denote our novel multi-scale contextual fusion blocks (MCF-skip).

6.3 Loss function ablation study

In Table 7 we present the full ablation study on the DIFAR loss function. We find that all loss terms are conducive to best image quality.

{adjustbox}

max width=0.495 Architecture PSNR SSIM DIFAR () DIFAR () DIFAR () DIFAR () DIFAR () DIFAR () DIFAR () DIFAR () DIFAR () DIFAR,() 27.09 0.793

Table 7: Ablation study on the Samsung dataset for the various loss terms of DIFAR. is the loss term containing the on the Lab channels and MS-SSIM on the L channel. is the on the RGB channels and the cosine distance on the RGB vectors. is the on the RGB channels (with no cosine distance term). is the loss in HSV space. is the curve regularisation term. For best image quality it is important to have a loss to regularise the transformation in each colour space.

6.4 Number of Knot Points for the Neural Curves

In Figure 10 we present an ablation study on how the number of knot points in the neural curve formulation effects the resulting image quality. This study is conducted on the held-out validation dataset of the medium exposure Samsung S7 dataset. We find that 16 knot points is optimal in terms of PSNR and SSIM, with quality falling for more and less knot points per curve.

6.5 Variations of the Retouching Block

In this section we test minor variations of the retouching block architecture. Specifically we examine the following:

  • Multiple feature extraction blocks: In the our suggested retouching block architecture there are three feature extraction block with no parameter sharing. In Table 8 we explore a multi-task alternative with one feature extraction block (Figure 13) to determine whether having separate feature extraction blocks for each colour space is important. Our results suggest have three independent feature extraction blocks in the retouching layer is important for best image quality.

  • Ordering of colour spaces: In the original retouching block the CIELab colour space adjustments are applied first, followed by the RGB colour space adjustments and then the HSV colour space adjustments. In Table 9 we explore all possible permutations of the Lab, HSV and RGB colour spaces to determine if the ordering is important. Our results suggest that CIELab followed by RGB and then HSV leads to the best image quality.

  • Visual examples: We show in Figure 14 and Figure 16 examples of the output from DIFAR with all colour blocks (CIELab, RGB, HSV) versus variants of the model with just one of the available colour blocks.

Figure 13: Multi-task (shared parameter) version of the retouching block. One feature extractor is used by all three parameter prediction full-connected layers.
{adjustbox}

max width=0.495 Architecture PSNR (test) SSIM (test) PSNR (valid) PSNR (valid) DIFAR (multi-task) DIFAR 27.09 0.793 26.46 0.771

Table 8: Results on the medium-to-medium exposure Samsung dataset for the multi-tasking version of the retouching block (Figure 13) compared to the retouching block with three feature extraction blocks. There is a significant loss in image quality without the three feature extractions blocks.
{adjustbox}

max width=0.495 Architecture PSNR (test) SSIM (test) PSNR (valid) PSNR (valid) hsvrgblab rgbhsvlab labrgbhsv 27.09 0.793 26.56 0.771 labhsvrgb rgblabhsv rgblabhsv

Table 9: Results on the medium-to-medium exposure Samsung dataset for all permutations of the colour space blocks in the retouching layer of DIFAR. The ordering of the colour spaces is important, with the highest image quality attained with a Lab, RGB and then HSV adjustment, in that ordering.

6.6 Piecewise Linear Scaling Curve

The piecewise linear scaling curve as in Equation 7. This curve defines the mapping from a pixel property (e.g. intensity) to a scale factor that is to be applied to that pixel value.

(7)
where:

where is the number of predicted knot points, is the -th pixel value of the -th channel of the -th image, is the value of the knot point . This mapping function is differentiable with respect to and the curve knot points . The derivatives with respect to are defined almost everywhere except at the knot points, and is given in Equation 8.

(8)

The derivatives for the knot points are defined in Equation 9.

(9)

As the derivatives exist the piecewise linear scaling curve in Equation 7 is amenable for optimisation in a neural network.

HSV only (21.99 dB)

RGB only (22.93 dB)

LAB only (24.76 dB)

All colour blocks (25.86 dB)

Groundtruth

Figure 14: Qualitative effect of different formulations of the retouching block. We keep only one colour space (either CIELab, RGB, HSV) in the retouching block and compare the output versus keeping all three colour space blocks. See Section 6.5 for more detail.

6.7 Colour Space Transformations

DIFAR relies on differentiable RGBHSV, HSVRGB, RGBCIELab, CIELabRGB colour space conversions to permit end-to-end learning via stochastic gradient descent and backpropagation. The colour space conversion formulae are all listed on the OpenCV website444 https://docs.opencv.org/3.3.0/de/d25/imgproc_color_conversions.html. The colour space conversion functions consist of differentiable operations mixed with piecewise continuous functions that are also differentiable except at certain points. Non-differentiability at certain points in a domain still permits the function to be optimised by the backpropagation algorithm e.g. consider the ReLU non-linearity at .

6.8 RgbCIELab and CIELabRgb

The LabRGB and RGBLab operators can be seen as a composition of elementary differentiable operations, except with the hard conditioning (thresholding) operations which are both piecewise continuous functions that are non-differentiable at certain points. The RGBLab operator has been used in previous work to define a loss function in Lab space that computes the distance on the L,a,b channels and the MS-SSIM metric on the L channel [38].

6.9 RgbHSV and HSVRgb

RGBHSV consists of minimum and maximum operators which are both differentiable. If then the derivative is defined as in Equation 10.

(10)

If then the derivative is defined as in Equation 11.

(11)

In both cases the gradient is set to zero for every element except the maximum or minimum. The RGBHSV operator also involves conditional (if-then-else) statements, conditioned on the value of R, G and B. These are naturally handled by dynamic control flow that is possible in modern deep learning frameworks such as Tensorflow [45].

A HSVRGB conversion has been little explored in the deep learning literature. The HSVRGB conversion functions can be specified as in Equations LABEL:eq:r_curve-LABEL:eq:b_curve based on Figure 15.

where we define as in Equation 15:

(15)

with , , , the hue, saturation and value of pixel of image . Equations LABEL:eq:r_curve-LABEL:eq:b_curve are piecewise linear curves defined by linear segments and knot points and they have defined gradients between the knot points. It is also possible to formulate a smooth approximation to the HSVRGB transformation using suitable scaled and offset cosine waves (Equations 16-18).

Figure 15: Piecewise linear function used to convert HSV values to RGB colour space. This figure provides a graphical representation of RGB coordinates given values for HSV. The equation shows the origin of the marked vertical axis values. Figure source: en:user:Goffrie-en:Image:HSV_RGB_Comparison.svg,CCBY-SA3.0,https://commons.wikimedia.org/w/index.php?curid=1116423.
(16)
(17)
(18)

with , , . These equations, being smooth approximations, lead to errors (i.e. small differences to the real RGB values) in the conversion from HSV to RGB. In future work it would be interesting to explore the effect of this smooth approximation versus the full piecewise linear formulation on the convergence of DIFAR and the quality of the resulting images.

6.10 Limitations of DIFAR

The retouching block of DIFAR uses multiple conversions for the CIELab, RGB and HSV colour spaces. Each of these conversions may lead to a loss of image fidelity. We mitigate this issue by a long skip connection between the input and output of the retouching layer, which we found is important for model training and image quality.

6.11 Additional Visual Examples

We provide additional examples of output images from DIFAR and the baseline models on the Samsung and Adobe datasets in Figures 17-18. The following visual examples are provided:

  • Figure 16: Comparing a retouching network with just RGB, versus a retouching network with all three colour spaces (CIELab, RGB, HSV).

  • Figure 16: Comparing a retouching network with just RGB, versus a retouching network with all three colour spaces (CIELab, RGB, HSV).

  • Figure 17: Examples images produced by DeepISP [38] and DIFAR on the Samsung S7 Medium Exposure dataset.

  • Figure 18: Examples images produced by DeepISP [38] and DIFAR on the Samsung S7 Short Exposure dataset.

  • Figure 19: Examples images produced by Unet (large) [36] and DIFAR on the Samsung S7 Medium Exposure dataset.

  • Figure 20: Examples images produced by Unet (large) [36] and DIFAR on the Samsung S7 Short Exposure dataset.

  • Figures 21-22: Examples images produced by DeepUPE [42] and DIFAR on the Adobe-UPE dataset.

  • Figures 23-24: Examples images produced by DPE [11] and DIFAR on the Adobe-DPE dataset.

DIFAR RGB only (22.93 dB)

DIFAR All colour blocks (25.86 dB)

Groundtruth

DIFAR RGB only (27.57 dB)

DIFAR All colour blocks (29.63 dB)

Groundtruth

DIFAR RGB only (26.07 dB)

DIFAR All colour blocks (27.01 dB)

Groundtruth

DIFAR RGB only (28.26 dB)

DIFAR All colour blocks (28.13 dB)

Groundtruth

Figure 16: Comparing a retouching network with just RGB, versus a retouching network with all three colour spaces (CIELab, RGB, HSV). Examples images from the the Samsung S7 Medium Exposure dataset.

DeepISP (25.44 dB)

DIFAR (25.86 dB)

Groundtruth

DeepISP (26.36 dB)

DIFAR (27.01 dB)

Groundtruth

DeepISP (26.92 dB)

DIFAR (29.63 dB)

Groundtruth

DeepISP (29.41 dB)

DIFAR (29.81 dB)

Groundtruth

Figure 17: Examples images produced by DeepISP [38] and DIFAR on the Samsung S7 Medium Exposure dataset.

DeepISP (24.54 dB)

DIFAR (26.47 dB)

Groundtruth

DeepISP (27.59 dB)

DIFAR (28.79 dB)

Groundtruth

DeepISP (26.12 dB)

DIFAR (27.18 dB)

Groundtruth

DeepISP (27.36 dB)

DIFAR (30.26 dB)

Groundtruth

Figure 18: Examples images produced by DeepISP [38] and DIFAR on the Samsung S7 Short Exposure dataset.

Unet (25.89 dB)

DIFAR (27.01 dB)

Groundtruth

Unet (23.56 dB)

DIFAR (25.86 dB)

Groundtruth

Unet (28.26 dB)

DIFAR (29.81 dB)

Groundtruth

Unet (26.33 dB)

DIFAR (28.14 dB)

Groundtruth

Figure 19: Examples images produced by Unet (large) and DIFAR on the Samsung S7 Medium Exposure dataset.

Unet (26.42 dB)

DIFAR (30.26 dB)

Groundtruth

Unet (25.17 dB)

DIFAR (27.14 dB)

Groundtruth

Unet (27.31 dB)

DIFAR (27.96 dB)

Groundtruth

Unet (25.66 dB)

DIFAR (26.47 dB)

Groundtruth

Figure 20: Examples images produced by Unet (large) and DIFAR on the Samsung S7 Short Exposure dataset.

DeepUPE (16.90 dB)

DIFAR (33.03 dB)

Groundtruth

DeepUPE (19.01 dB)

DIFAR (32.08 dB)

Groundtruth

DeepUPE (19.41 dB)

DIFAR (29.90 dB)

Groundtruth

DeepUPE (21.24 dB)

DIFAR (34.14 dB)

Groundtruth

Figure 21: Examples images produced by DeepUPE [42] and DIFAR on the Adobe-UPE dataset.

DeepUPE (13.95 dB)

DIFAR (33.34 dB)

Groundtruth

DeepUPE (16.52 dB)

DIFAR (25.83 dB)

Groundtruth

DeepUPE (14.65dB)

DIFAR (20.95 dB)

Groundtruth

DeepUPE (15.52 dB)

DIFAR (27.54 dB)

Groundtruth

Figure 22: Examples images produced by DeepUPE [42] and DIFAR on the Adobe-UPE dataset.

Input

DPE [11]

DIFAR

Ground Truth

CLHE [3]

DPED (iphone7) [23]

NPEA [1]

FLLF [2]

Input

DPE [11]

DIFAR

Ground Truth

CLHE [3]

DPED (iphone7) [23]

NPEA [1]

FLLF [2]

Figure 23: Examples images produced by DPE [11], CLHE [3], DPED (iphone7) [23], NPEA [1], FLLF [2] and DIFAR on the Adobe-DPE dataset.

Input

DPE [11]

DIFAR

Ground Truth

CLHE [3]

DPED (iphone7) [23]

NPEA [1]

FLLF [2]

Input

DPE [11]

DIFAR

Ground Truth

CLHE [3]

DPED (iphone7) [23]

NPEA [1]

FLLF [2]

Figure 24: Examples images produced by DPE [11], CLHE [3], DPED (iphone7) [23], NPEA [1], FLLF [2] and DIFAR on the Adobe-DPE dataset.

References

  • [1] Y. Gao, H. Hu, and B. Li, and Q. Guo. Naturalness Preserved Non-Uniform Illumination Estimation for Image Enhancement Based on Retinex. IEEE Transactions on Multimedia, 2017.
  • [2] M. Aubry, S. Paris, S. Hasinoff, J. Kautz, and F. Durand. Fast Local Laplacian Filters: Theory and Applications. ACM Trans. Graph., 2014.
  • [3] S. Wang, W. Cho, J. Jang, M. Abidi, and J. Paik. Contrast-dependent saturation adjustment for outdoor image enhancement. Journal of the Optical Society of America A, 2017.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
400561
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description