FHDR: HDR Image Reconstruction from a Single LDR Image using Feedback Network
High dynamic range (HDR) image generation from a single exposure low dynamic range (LDR) image has been made possible due to the recent advances in Deep Learning. Various feed-forward Convolutional Neural Networks (CNNs) have been proposed for learning LDR to HDR representations. To better utilize the power of CNNs, we exploit the idea of feedback, where the initial low level features are guided by the high level features using a hidden state of a Recurrent Neural Network. Unlike a single forward pass in a conventional feed-forward network, the reconstruction from LDR to HDR in a feedback network is learned over multiple iterations. This enables us to create a coarse-to-fine representation, leading to an improved reconstruction at every iteration. Various advantages over standard feed-forward networks include early reconstruction ability and better reconstruction quality with fewer network parameters. We design a dense feedback block and propose an end-to-end feedback network- FHDR for HDR image generation from a single exposure LDR image. Qualitative and quantitative evaluations show the superiority of our approach over the state-of-the-art methods.
Common digital cameras can not capture the wide range of light intensity levels in a natural scene. This can lead to a loss of pixel information in under-exposed and over-exposed regions of an image, resulting in a low dynamic range (LDR) image. To recover the lost information and represent the wide range of illuminance in an image, high dynamic range (HDR) images need to be generated. There has been active research going on in the area of deep learning for HDR imaging. The advances in deep learning for image processing tasks have paved way for various approaches for HDR image reconstruction using feed-forward convolutional neural network (CNN) architectures . The above methods specifically transform a single exposure LDR image into an HDR image. HDRCNN  proposed a deep autoencoder for HDR image reconstruction which uses a weighted mask to recover only the over-exposed regions of an LDR image. The authors of DRTMO  designed a framework with two networks for generating up-exposure and down-exposure LDR images, which are merged to form an HDR image. The network is not end-to-end trainable and uses a large number of parameters. Unlike the others, the proposed FHDR model provides an end-to-end trainable solution and is able to comprehensively learn the LDR-HDR mapping while outperforming the existing methods.
Deeper networks are known to learn more complex non-linear relationships like the LDR to HDR mapping. The caveat with deeper networks is that they consume a lot of computational resources and tend to over-fit the training data. To overcome this problem, we exploit the power of feedback mechanisms, inspired by , for the task of HDR image reconstruction. A feedback block is an RNN whose output is fed back to guide its input via a hidden state. A feedback network can run for many iterations for a single training example. Considering the number of iterative operations on the shared network parameters, a feedback network is virtually deeper than the corresponding feed-forward network with the same physical depth. Here, virtual depth = physical depth number of iterations.
For improving the reconstruction at every iteration, the loss is calculated for the output of each iteration. By doing so, the network is forced to create a coarse-to-fine representation, being able to reconstruct HDR content right from the first iteration and improve with every subsequent iteration. We propose a global feedback block which consists of smaller local feedback blocks of densely connected layers, inspired by the Dense-Net architecture . Dense connections allow the network to reuse features. This helps in learning robust image representations, even with lesser network parameters.
The performance of our framework is evaluated on the standard City Scene dataset  and another dataset prepared from the list of HDR image datasets suggested in . Qualitative and quantitative assessments of the network suggest that even with fewer network parameters, the proposed FHDR model outperforms the state-of-the-art methods.
Ii HDR reconstruction framework
Ii-a Feedback system
Feedback systems are adopted to influence the input based on the generated output, unlike the conventional feed-forward networks, where information flow is unidirectional and is not directly influenced by the generated output. The authors in  exploited the feedback mechanism using an RNN, where the output of the feedback block travels back to guide its input via a hidden state. Their architecture uses ConvLSTM cells as the basic RNN units and is designed for the task of image classification. Even with lesser network parameters as compared to other feed-forward networks, such networks are able to learn better representations. Recently, authors of  designed a feedback network specifically for the task of image super-resolution which achieved state-of-the-art performance.
Inspired by the success of feedback networks, we designed a feedback network for learning the LDR to HDR mapping that has been explained in detail in the following sections.
Ii-B Model architecture
Our architecture consists of three blocks similar to , as shown in Fig. 1. The first block is the Feature Extraction block (FEB), followed by the Feedback block (FBB) and an HDR reconstruction block (HRB). Inspired by , we use a global residual skip connection for bypassing low level LDR features at every iteration to guide the HDR reconstruction block in the final layers. For every training example, the network runs for iterations. Here each iteration from to is a forward pass in time in an unfolded RNN. The FEB is responsible for extracting the low-level feature information from the input LDR image .
Here, represents the operations of the FEB. To achieve the feedback mechanism, is fed to the FBB, combined with the output of the FBB from the previous iteration, using a global hidden state as below.
Here, represents the output of the feedback block at iteration . At , when there is no feedback, the hidden state is initialised with the values of the extracted features .
At every iteration, the low level LDR features from the first convolutional layer in FEB are added to the output of the FBB using a global residual skip connection as below.
Here, represents the global residual feature map learned at iteration . stands for the low level LDR features from the first convolutional layer in FEB. is passed to the HRB to generate an HDR image at every iteration as below.
Here, represents the operations of the HRB and represents the HDR image generated at iteration. For every LDR image, a forward pass can run for iterations, therefore generating HDR images. Each generated image is coupled with a loss, hence resulting in improved reconstruction at each iteration through back-propagation in time.
Ii-C Feedback block
We have designed a novel feedback block for the task of learning LDR-to-HDR representations, as shown in Fig. 2. The basic unit of the feedback block is a Dilated Dense Block (DDB) shown in Fig. 3. It is a modification of the Dense block proposed in . Dilated convolutions help in increasing the receptive field of the network . A DDB helps in utilising all the hierarchical features from the input. Other than the two convolutional layers for feature compression, each DDB houses four dilated convolutional layers, each of which uses the information from all the previous layers using dense skip connections. This reuse of features due to the dense forward connections allows for reduced network parameters and improves the learning ability. Three of such DDBs come together to form the feedback block of the network.
We implement global and local feedback mechanisms described as follows.
The global feedback block, FBB is considered as an RNN with a global hidden state. High level features are transferred from the output of the feedback block at the iteration to its input at the iteration. The hidden state is concatenated with and a compression convolution layer is applied for high-level and low-level feature fusion as shown in Fig. 2. The fused features are passed to the dilated dense blocks, followed by a convolution layer for further processing.
We argue that a feedback connection is always beneficial as it helps to guide the low-level features which are in some way blind to the higher level features. Hence, we have implemented local feedback connections over each DDB which aim to improve the features generated locally. These connections run parallel to the global feedback connections and increase the overall effectiveness of the network. Each DDB can be considered as an RNN similar to the global feedback block, transferring features from its output to its input via a local hidden state.
Ii-D Loss function
Loss calculated directly on HDR images is misrepresented due to the dominance of high intensity values of images with a wide dynamic range. Therefore, we tonemap the generated and the ground truth HDR images to compress the wide intensity range before calculating the loss function value. We use the -law for tonemapping, as suggested by . The -law is represented as below.
Here, represents the tonemapping operation and defines the amount of compression, which is set to 5000 for the experiments. In addition to the L1 loss suggested by previous feedback networks, we use a perceptual loss  for improving the visual quality of the generated image. We calculate the L1 loss and the perceptual loss at every iteration and take an average over all the iterations. The average L1 loss is given below.
Here, represents the ground truth image. The average perceptual loss can be represented as below.
Here, represents the perceptual loss calculated between the tonemapped ground truth and generated images. The final loss function is given below.
Here, is set to 0.1 for all the experiments. We have observed that applying only the L1 distance produces dark artifacts. However, a combination of L1 loss and perceptual loss has resulted in improved visual quality of the generated image.
Ii-E Implementation details
All the convolutional layers (conv2d) in the network have kernels, followed by a ReLU activation, unless mentioned otherwise. There are 2 conv2d layers in the FEB, 20 () conv2d layers in FBB, and 2 conv2d layers in the HRB. The size of the feature maps remains same throughout the network. The depth of the feature maps also remain same i.e. 64, except for in the DDBs. A growth rate of 32 has been implemented for the DDBs, meaning that conv2d layers of each dense block output a 32 channel feature map that gets concatenated with features from all the preceding layers. This accumulated feature map depth is brought down using a conv2d layer at the end of the dense block.
We trained our network on Geforce RTX 2070 GPU with a batch-size of 16 for the City Scene dataset and 6 for the curated HDR dataset. Adam optimizer  was adopted with momentum parameters = 0.5 and = 0.999. All the variants of the proposed model were trained for 200 epochs with an initial learning rate of for first 100 epochs, decayed linearly over the next 100 epochs.
We have trained and evaluated the performance of our network on the standard City Scene dataset  and another HDR dataset curated from various sources shared in . The City Scene dataset is a low resolution HDR dataset that contains pairs of LDR and ground truth HDR images of size . We trained the network on 39460 image pairs provided in the training set and evaluated it on 1672 randomly selected images from the testing set. The curated HDR dataset consists of 1010 HDR images of high resolution. Training and testing pairs were created by producing LDR counterparts of the HDR images by exposure alteration and application of different camera curves from . Data augmentation was done using random cropping and resizing. The final training set consists of 11262 images of size 256. The testing set consists of 500 images of size 512 to evaluate the performance of the network on high resolution images.
Iii-B Evaluation metrics
For the quantitative evaluation of the proposed method, we use the HDRVDP-2 Q-score metric, specifically designed for evaluating reconstruction of HDR images based on human perception . We also use the commonly adopted PSNR and SSIM image-quality metrics. The PSNR score is calculated in dB between the -law tonemapped ground truth and generated images.
Iii-C Feedback mechanism analysis
To study the influence of feedback mechanisms, we compare the results of four variants of the network based on the number of feedback iterations performed - (i) FHDR/W (feed-forward / without feedback), (ii) FHDR-2 iterations, (iii) FHDR-3 iterations, and (iv) FHDR-4 iterations. To visualise the overall performance, we plot the PSNR score values against the number of epochs for the four variants, shown in Fig. 5. The significant impact of feedback connections can be easily observed. The PSNR score increases as the number of iterations increase. Also, early reconstruction can be seen in the network with feedback connections. Based on this, we decided to implement an iteration count of 4 for the proposed FHDR network.
Qualitative and quantitative evaluations are performed against two non-learning based inverse tonemapping methods- (AKY)  and (KOV) , two deep learning methods- HDRCNN  and DRTMO  and the feed-forward counterpart of the proposed FHDR method. We use the HDR toolbox  for evaluating the non-learning based methods. The deep learning methods were trained and evaluated on the described datasets, as mentioned in earlier sections.
DRHT  is the state-of-the-art deep neural network for the image correction task. It reconstructs HDR content as an intermediate output and transforms it back into an LDR image. Due to unavailability of DRHT network implementation, results of their network on the curated dataset are not presented. Performance metrics of their LDR-to-HDR network, trained on the City Scene dataset has been sourced from their paper because of the same experiment setup in terms of the training and testing datasets.
Iv-a Quantitative evaluation
A quantitative comparison of the above mentioned HDR reconstruction methods has been presented in Table 1. The proposed network outperforms the state-of-the-art methods in evaluations performed on both the datasets. Results of DRTMO  could not be calculated on the City Scene dataset because the network does not accept low resolution images. Even though the feed-forward counterpart of FHDR outperforms other methods, the proposed FHDR network (4-iterations) performs far better.
Iv-B Qualitative evaluation
As can be seen in Fig. 4, non-learning based methods are unable to recover highly over-exposed regions. DRTMO brightens the whole image and is able to recover only the under-exposed, regions. HDRCNN, on the other hand, is designed to recover only the saturated regions and thus under performs in recovering information in under-exposed dark regions of the image. Unlike others, our FHDR/W pays equal attention to both under-exposed and over-exposed regions of the image. The proposed FHDR network with the feedback connections enhances the overall sharpness of the image and performs an even better reconstruction of the input.
|Methods||City Scene Dataset||Curated HDR Dataset|
We propose a novel feedback network FHDR, to reconstruct an HDR image from a single exposure LDR image. The dense connections in the forward-pass enable feature-reuse, thus learning robust representations with minimum parameters. Local and global feedback connections enhance the learning ability, guiding the initial low level features from the high level features. Iterative learning forces the network to create a coarse-to-fine representation which results in early reconstructions. Extensive experiments demonstrate that the FHDR network is successfully able to recover the underexposed and overexposed regions outperforming state-of-the-art methods.
This research was supported by the SERB Core Research Grant.
- G. Eilertsen, J. Kronander, G. Denes, R. K. Mantiuk, and J. Unger, “Hdr image reconstruction from a single exposure using deep cnns,” ACM Transactions on Graphics (TOG), vol. 36, no. 6, p. 178, 2017.
- D. Marnerides, T. Bashford-Rogers, J. Hatchett, and K. Debattista, “Expandnet: A deep convolutional neural network for high dynamic range expansion from low dynamic range content,” in Computer Graphics Forum, vol. 37, pp. 37–49, Wiley Online Library, 2018.
- Y. Endo, Y. Kanamori, and J. Mitani, “Deep reverse tone mapping.,” ACM Trans. Graph., vol. 36, no. 6, pp. 177–1, 2017.
- X. Yang, K. Xu, Y. Song, Q. Zhang, X. Wei, and R. W. Lau, “Image correction via deep reciprocating hdr transformation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807, 2018.
- S. Lee, G. H. An, and S.-J. Kang, “Deep chain hdri: Reconstructing a high dynamic range image from a single low dynamic range image,” IEEE Access, vol. 6, pp. 49913–49924, 2018.
- A. R. Zamir, T.-L. Wu, L. Sun, W. B. Shen, B. E. Shi, J. Malik, and S. Savarese, “Feedback networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1308–1317, 2017.
- G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
- J. Zhang and J.-F. Lalonde, “Learning high dynamic range from outdoor panoramas,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4519–4528, 2017.
- Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu, “Feedback network for image super-resolution,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2472–2481, 2018.
- F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
- N. K. Kalantari and R. Ramamoorthi, “Deep high dynamic range imaging of dynamic scenes.,” ACM Trans. Graph., vol. 36, no. 4, pp. 144–1, 2017.
- J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision, pp. 694–711, Springer, 2016.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- A. O. Akyüz, R. Fleming, B. E. Riecke, E. Reinhard, and H. H. Bülthoff, “Do hdr displays support ldr content?: a psychophysical evaluation,” in ACM Transactions on Graphics (TOG), vol. 26, p. 38, ACM, 2007.
- R. P. Kovaleski and M. M. Oliveira, “High-quality reverse tone mapping for a wide range of exposures,” in 2014 27th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 49–56, IEEE, 2014.
- E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction for digital images,” ACM Trans. Graph., vol. 21, pp. 267–276, July 2002.
- M. D. Grossberg and S. K. Nayar, “What is the space of camera response functions?,” in 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings., vol. 2, pp. II–602, June 2003.
- M. Narwaria, R. Mantiuk, M. P. Da Silva, and P. Le Callet, “Hdr-vdp-2.2: a calibrated method for objective quality prediction of high-dynamic range and standard images,” Journal of Electronic Imaging, vol. 24, no. 1, p. 010501, 2015.
- F. Banterle, A. Artusi, K. Debattista, and A. Chalmers, Advanced high dynamic range imaging. AK Peters/CRC Press, 2017.