Efficient Image Retargeting for High Dynamic Range Scenes
Most of the real world scenes have a very high dynamic range (HDR). The mobile phone cameras and the digital cameras available in markets are limited in their capability in both the range and spatial resolution. Same argument can be posed about the limited dynamic range display devices which also differ in the spatial resolution and aspect ratios.
In this paper, we address the problem of displaying the high contrast low dynamic range (LDR) image of a HDR scene in a display device which has different spatial resolution compared to that of the capturing digital camera. The optimal solution proposed in this work can be employed with any camera which has the ability to shoot multiple differently exposed images of a scene. Further, the proposed solutions provide the flexibility in the depiction of entire contrast of the HDR scene as a LDR image with an user specified spatial resolution. This task is achieved through an optimized content aware retargeting framework which preserves salient features along with the algorithm to combine multi-exposure images. We show the proposed approach performs exceedingly well in the generation of high contrast LDR image of varying spatial resolution compared to an alternate approach.
Real world scenes have a high dynamic range (HDR). An example of such a HDR scene is one which has both brightly and poorly lit regions. This implies that the range of brightness levels are very high. Human visual system (HVS) can visualize all the brightness levels of the scene through visual adaptation. Even analog cameras can capture major percentage of the brightness levels. The digital capturing devices such as mobile phone cameras and digital cameras can not capture the entire HDR of a given scene. The digital cameras are limited in terms of their spatial resolution as evident by the spatial resolution in various digital imaging sensor architectures. In other words, digital cameras have limited range and spatial resolutions which are caused primarily due to the limitations posed by the imaging sensor design. It is highly complex to capture all the brightness levels of a HDR scene in finite duration.
Limited dynamic range is caused mainly due to the limited well capacity of the sensor elements. The dynamic range of the image can be enhanced by the HDR imaging techniques which rely on the capture of multi-exposure low dynamic range (LDR) images of the scene . These approaches recover the camera response function (CRF) of the imaging system and employ it to create the HDR image of the scene. The generated HDR images are then tone mapped into a high contrast LDR image compatible with a given digital display device. Alternately, the high contrast LDR image of the scene can be directly generated without the knowledge of CRF by-passing the HDR imaging pipeline.
The spatial resolution of the image can either be reduced or enhanced by employing super-resolution algorithms . These techniques perform resolution change through efficient interpolation without preserving the salient contents of the scene. The image retargeting approaches which have recently been developed enable one to change the spatial resolution of the image while preserving the contents of the image which are important . Retargeting has been the standard approach when one wants to modify the spatial resolution of a given image.
Consider a set of multi-exposure images of a scene captured using traditional technique such as Auto Exposure Bracketing (AEB). The problem we would like to address is whether we can achieve the flexibility in both the spatial and range resolutions given a set of multi-exposure images corresponding to a static scene. The obvious solution to this problem is to first generate a HDR image using standard approach and then perform spatial resizing either by super resolution or by image retargeting. The question to be answered while using such a solution is this: whether this approach is the optimal one, or can we derive a better optimal solution. This work is primarily focused on exploring alternate better solutions to this challenging problem.
The main objective of this work is to search for an optimal solution to achieve a flexible range (contrast) and spatial resolution, given a set of muti-exposure LDR images of a static scene. We develop an algorithm to achieve such an optimal solution to this problem. We show that the proposed approach performs far better than the obvious solution and leads to the generation of a high contrast LDR image with provision to adapt the size of the image compatible with a given display device. The key contributions of this novel approach are the algorithms which achieve the following tasks.
Flexible content aware spatial retargeting of an image corresponding to a static HDR scene,
Depiction of high contrast information within the user specified spatial resolution,
Achieving high quality desired LDR images without any visible artifacts, and
Assumption: No knowledge of exposure times, scene information, and CRF.
The paper is organized as follows. We shall review the prior relevant work in Section 2 which are key to our discussions later on. We present the primary motivation behind the present work in Section 3. We shall discuss the proposed algorithm for simultaneous contrast and content-aware spatial retargeting in detail in Section 4. Section 5 presents the results corresponding to various aspects of the proposed solution. We conclude the paper in Section 6 summarizing the key contributions and presenting some pointers on future enhancement of the proposed approach.
In recent times, creation of images which depict all the brightness levels in a natural scene has been a topic of great interest. Various research groups have been working on this topic and have proposed various solutions to this challenging problem . A bracketed exposure sequence, which spans the entire dynamic range of the real world scene, comprises of a set of LDR images that are shot with a digital camera. The CRF should be recovered in order to linearize the intensities. The HDR image can be generated by compositing these multi-exposure images in linearized intensity domain (, ). The HDR images can be displayed in specialized HDR displays . However, for visualizing the generated HDR image in common LDR displays we need to perform tone reproduction operation. Many different tone mapping operators have been proposed in recent years with various performance levels for different scenes ().
On the other hand, exposure fusion approaches relieve us the need of intermediate HDR image generation and tone mapping operation (, ). Exposure fusion involves compositing the different Laplacian pyramid levels of the multi-exposure images with appropriate weights in order to reduce saturation and enhance contrast (,). Similar approach can be further used for merging flash/no-flash images to get the best information out of both the images and create a better image (, ). Dynamic scenes captured with the help of multi-exposure images lead to artifacts which requires appropriate deghosting prior to compositing . Recently, researchers have turned their attention to reconstruct a HDR image of a non-static scene with the knowledge of CRF  and without the knowledge of CRF (, , ). The generation of a HDR image from a set of multi-exposure images when both the camera and scene change has been addressed in the recent works (, , ).
Image resizing is a different problem in which one attempts to change the spatial resolution of the given image popularly known as image super-resolution. Super-resolution can be achieved by using multiple images of the same scene with sub-pixel shifts . Content aware resizing should be done in a way that minimizes the amount of important information we lose during resizing operation. Approaches such as face detectors and visual saliency map detectors can be used to achieve this task (, ). After creating a visual saliency map, image can be cropped to capture the most salient regions in the image. These methods are based on the conventional technique of either cropping or removal of columns and rows.
These methods are often constrained by the ratio to which a given image can be resized. Resizing the image beyond a critical factor generates a high degree of artifacts. Recently, methods have been proposed by which this critical ratio can be improved. Changing the spatial resolution of the image is also important and used enormously in texture synthesis, the goal here is to generate a large textured image from a small textured image . But the solution in texture synthesis can not be extended to natural scenes directly as they follow complex statistics. A natural image may have multiple different regions of importance and sometimes a user interaction is exploited to specify the regions which are of greater importance .
Image retargeting is a much better automatic approach which has been widely used for content aware resizing . The first popular implementation of image retargeting, seam carving, involves the identification of minimum energy seams which have to be removed or added so that there is minimum loss of information. An efficient energy metric based on gradient measure serves as the energy function. Optimal seam carving can alternately use different types of energy functions such as gradient magnitude, entropy, visual saliency, eye-gaze movement, and more. The removal or insertion of seams can be done in such a way as to make it compatible with the resolution and aspect ratio of the display device. Seam carving can be extended to perform video retargeting (, ). An overview of the different types of image retargeting approaches can be found in the recent tutorial .
There is always a trade-off between the spatial resolution and the range resolution of an imaging sensor. A typical example is the assorted pixels which use multiple sensor elements with different sensitivities to create a HDR image . Here, we sacrifice some spatial resolution to gain more dynamic range. The size of the sensor element can not be made smaller than a particular size due to noise and limited well capacity (, , ). These studies on the imaging sensor emphasize the need for creating a new application with flexible spatial and range resolutions.
The recent work for spatial as well as dynamic range improvement from a set of multi-exposure images requires one to capture multi-exposure images with subpixel shifts . This approach is a combination of the traditional HDR imaging and super-resolution approaches posed in a unified optimization framework. Therefore, this method does not enable one to perform content aware resizing though it helps in improving the dynamic range and the spatial resolution. Existing methods on simultaneous improvement of spatial resolution and dynamic range do not take into considertion, the content present in the image (, , ).
The primary motivation behind this work is to generate a high contrast LDR image corresponding to a given HDR scene with flexible content aware image resizing capability. This application is quite useful in the present scenario as we have digital display devices which have different spatial resolution and aspect ratios but can only display LDR content. Examples of such display devices include Apple iPad, smartphones and tablets by Nokia and Samsung, netbooks, etc. The trivial solution to this problem as discussed eariler is to fuse the multi-exposure images and then to retarget the resultant image spatially in order to make it compatible with a given display device. This work is an attempt to probe for alternate efficient solutions for this problem and show how such solutions can indeed be better than the trivial solution in terms of image contrast and lesser artifacts incurred.
The main objective behind this work is to find an efficient way to merge multiple differently exposed images of a static scene into a high contrast LDR image with flexible spatial resolution. This task is achieved by an efficient algorithm which performs this task while reducing the loss in contrast and reducing any artifacts in the final LDR image. We shall present the basic algorithm behind the proposed approach in the next section.
In this section we propose multiple approaches for efficient retargeting of a HDR scene. Our algorithm uses a set of LDR images having different exposure times. The input images are registered LDR images of the same static scene. Let be the set of input LDR images. We use magnitude of the gradient as the energy metric. One can use other energy metrics like entropy, visual saliency, also .
Through this energy metric we generate a cumulative energy metric, which enables us to find the minimum energy seams (seams with least importance) in individual LDR images. We shall start the discussion with the trivial approach for retargeting LDR image corresponding to a HDR scene. We assume that we do not know the exposure times and the CRF in the present work.
One of the approaches for resizing image corresponding to a HDR scene is to take multiple LDR images of the scene with different exposure times and subjecting them to exposure fusion . This approach results in an image having much higher contrast than the individual input images. Further applying optimal seam carving on this high contrast image yields the resized high contrast LDR image of the scene.
This method is constrained by the ratio upto which a certain image can be resized. Increasing or decreasing the aspect ratio of an image beyond a critical factor can produce artifacts of greater magnitude (see figure ?(b) and figure ?(a)). Suppose we have multiple images of the same scene with different exposure times, we obtain multiple seams with least energy for each image in the given LDR image set. We shall show how removing or adding seams with minimum energy (before applying exposure fusion) and then using exposure fusion yields to a better quality high contrast LDR image.
In this approach, for a given energy metric we first find the cumulative energy matrix for individual LDR image. Consequently, with the help of this cumulative energy matrix, we find seams with minimum energy in each of the images. Notice that seams found by the algorithm need not be the same on each image (see the images in the last row of figure ?).
For the given set of LDR images, Let and denote the minimum energy seam and its energy value in image respectively. Now the problem reduces to a decision problem, and the decision is: which seam has to be chosen for insertion or deletion. One option is to take the seam consisting of the least energy, out of these minimum energy seams.
In this case, seam , seam with minimum will be deleted from each of the input images.
One can also choose the seam having energy value which is the median of all minimum energy seams in different images. In either case, the accuracy of results solely depends upon the natural scene statistics. In our experiments, we found that median serves better than the minimum. This is due to the fact that the median represents average exposure value from the given set of LDR images.
It may be noted that for , seam might not be the seam with minimum energy in the image . Thus, by deleting or adding the seam in image we might not add or delete the seam with minimum energy. But because we need to maintain the corresponding coordinates, the same seam needs to be added or deleted in all the LDR images.
We can further improve this strategy by making sure that each time the minimum total energy seam should be added or removed from the final image. As noticed earlier, while removing minimum energy seam (which is minimum energy seam for image ) from the image , we might delete a seam with higher energy. To overcome this we remove or add minimum total energy.
Let be the replica seam in image of the seam with minimum energy in image . If seam is deleted from each of the input images, the total energy added or removed is:
In this case the total amount of energy removed or added will be and desired seam will be . With this approach we get better results. figure ?(c) shows the results after applying this approach.
However while adopting the statistical approach discussed, the seam having the least energy need not be one among the candidate low energy seams. This does not guarantee the removal or the addition of the desired least energy seam. This is due to the fact that while calculating the total minimum energy we are only concerned about the energy of the candidate low energy seams in each image. Other possible seams which could have lead to a much better solution to the problem we address are discarded. Therefore this approach is not the optimal one.
4.3Aggregate Energy Metric Approach
Instead of finding energy matrices for each of the input LDR images separately, we can think of an aggregate energy matrix. Now we generate a aggregate cumulative energy matrix from this aggregate energy matrix. This aggregate cumulative energy matrix should be generated in such a way that any seam which is indicated as minimum energy seam by this matrix should be of least importance. This criterion is necessary because it guarantees that we will not lose important information during retargating.
For example, if we are taking magnitude of gradient as our energy metric (For individual LDR images) then our aggregate energy metric will be a function of gradients of individual images.
In this work we have defined this function as a linear combination of the gradient of each LDR image.
Parameter corresponds to weight given to image in aggregate energy metric. Now through this aggregate energy metric our algorithm generates an aggregate cumulative energy metric which defines the energy level for seams. Weight parameter should be chosen in such a way that region which are underexposed or overexposed in the LDR images will get lesser weight compared to other regions.
Average energy per pixel in each image could be used as a weighting parameter. is the average energy per pixel in the image.
This weighting parameter has an important role in making the decision regarding which seam needs to be added or deleted. We further try to calculate this weighting parameter using some other image characteristics. Laplacian of an image calculates second derivative along both the spatial directions (horizontal as well as vertical) and this Laplacian indicates sharp edges in the image. Therefore it will serve better for calculating the weight parameter.
We calculate weighted Laplacian for each image and then perform an element wise multiplication of this with the energy matrix of each image and then calculate the summation, this matrix will now work as the aggregate energy metric.
Here both multiplication and division are performed element wise.
With this approach (Aggregate energy metric with weighted Laplacian as a weighting parameter), the final image is not only losing (or adding in case of enlarging images) minimum energy but also the output resized HDR image will be of better quality than the direct approach.
In this section we presents results achieved by our algorithm using various approaches discussed above. Our main concern is not to lose (or add in case of enlarging) too much energy while resizing. In other words, we want resizing in a content aware manner.
Figure ? and figure ? show the results obtained while reducing the final high contrast LDR image horizontally through various approaches. The marked region indicates how the shape of marked object is affected differently by these methods. One can notice easily that both the improved and the aggregate energy metric approaches preserve the indicated region better than the direct approach.
Figure ?(a) shows change in average energy per pixel, with removal of minimum energy seams over all the different approaches we have discussed. Plot shows that initially all the all the approaches works similar, but as we move to the higher degree of resizing (in this case compression) the behavior of various approaches changes. Plot clearly shows that aggregate energy metric with weighted Laplacian as a weighting parameter will preserve the highest energy. Figure ?(b) show the quantitative information about how much energy is preserve through various approaches.
Figure ? shows the results while enlarging the final high contrast LDR image by inserting seams through various techniques. It can be seen that aggregate energy metric approach (figure ?(d)) yields the best result. Figure ?(a) shows how the artifacts are introduced while enlarging the input images by the direct approach beyond a certain limit. However, in the same case (see figure ?(b)) aggregate energy metric with Laplacian as weight parameter yields good results. The respective energy distributions are shown alongside.
We have proposed novel approach for the content aware resizing of multi-exposure images of a static HDR scene before fusing them into a high contrast LDR image. The proposed approach efficiently combines the content aware image retargeting and the multi-exposure images to develop a novel application suitable for any digital device. We showed that the proposed algorithm performs better when compared to the direct approach of fusing the multi-exposure images before content aware resizing. We have shown through experiments that the LDR image results generated using the proposed statistical and aggregate energy metric approaches to be far better both visually as well as energy preserving criteria. The optimal selection of seams to insert or delete leads to highly robust retargeting algorithm. The proposed approach is fully automatic with no user intervention. The proposed algorithms open up a wide possibility of retargeting and fusion techniques which can be customized for a given display device.
As the approach does not involve any iterative solution or minimization of any complex cost function, it is computationally inexpensive. The developed algorithms can either be included along with the state of the art mobile cameras/digital cameras and can be provided as applications for post capture image processing softwares. The proposed approaches assume perfectly registered images of a static scene which is a hard constraint to be placed on a real world scene. We hope that the proposed approach can be improved and extended in the case of dynamic scenes which tend to introduce ghosting artifacts. Further, we hope to extend this approach for video image retargeting applications involving HDR scenes. We believe that the novel approach discussed here would lead to more novel ideas in the flexible resolution image retargeting research.
- E. Reinhard, W. Heidrich, P. Debevec, S. Pattanaik, G. Ward, and K. Myszkowski, High dynamic range imaging: acquisition, display, and image-based lighting.1em plus 0.5em minus 0.4emMorgan Kaufmann, 2010.
- S. Park, M. Park, and M. Kang, “Super-resolution image reconstruction: a technical overview,” Signal Processing Magazine, IEEE, vol. 20, no. 3, pp. 21–36, 2003.
- S. Avidan and A. Shamir, “Seam carving for content-aware image resizing,” in ACM Transactions on graphics (TOG), vol. 26, no. 3.1em plus 0.5em minus 0.4emACM, 2007, p. 10.
- S. Mann and R. W. Picard, “On being undigital with digital cameras: Extending dynamic range by combining differently exposed pictures,” in IS & T Conference, 1995.
- =2plus 43minus 4 P. E. Debevec and J. Malik, “Recovering high dynamic range radiance maps from photographs,” in Proceedings of the 24th annual conference on Computer graphics and interactive techniques, ser. SIGGRAPH ’97.1em plus 0.5em minus 0.4emNew York, NY, USA: ACM Press/Addison-Wesley Publishing Co., 1997, pp. 369–378. [Online]. Available: http://dx.doi.org/10.1145/258734.258884 =0pt
- H. Seetzen, W. Heidrich, W. Stuerzlinger, G. Ward, L. Whitehead, M. Trentacoste, A. Ghosh, and A. Vorozcovs, “High dynamic range display systems,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 760–768, 2004.
- E. Reinhard, M. Stark, P. Shirley, and J. Ferwerda, “Photographic tone reproduction for digital images,” ACM Transactions on Graphics (TOG), vol. 21, no. 3, pp. 267–276, 2002.
- T. Mertens, J. Kautz, and F. Van Reeth, “Exposure fusion,” in Computer Graphics and Applications, 2007. PG’07. 15th Pacific Conference on. 1em plus 0.5em minus 0.4emIEEE, 2007, pp. 382–390.
- S. Raman and S. Chaudhuri, “A matte-less, variational approach to automatic scene compositing,” in Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on.1em plus 0.5em minus 0.4emIEEE, 2007, pp. 1–6.
- P. Burt and E. Adelson, “The laplacian pyramid as a compact image code,” Communications, IEEE Transactions on, vol. 31, no. 4, pp. 532–540, 1983.
- G. Petschnigg, R. Szeliski, M. Agrawala, M. Cohen, H. Hoppe, and K. Toyama, “Digital photography with flash and no-flash image pairs,” in ACM Transactions on Graphics (TOG), vol. 23, no. 3.1em plus 0.5em minus 0.4emACM, 2004, pp. 664–672.
- E. Eisemann and F. Durand, “Flash photography enhancement via intrinsic relighting,” in ACM Transactions on Graphics (TOG), vol. 23, no. 3.1em plus 0.5em minus 0.4emACM, 2004, pp. 673–678.
- E. A. Khan, A. Akyuz, and E. Reinhard, “Ghost removal in high dynamic range images,” in Image Processing, 2006 IEEE International Conference on.1em plus 0.5em minus 0.4emIEEE, 2006, pp. 2005–2008.
- O. Gallo, N. Gelfandz, W. Chen, M. Tico, and K. Pulli, “Artifact-free high dynamic range imaging,” in Computational Photography (ICCP), 2009 IEEE International Conference on.1em plus 0.5em minus 0.4emIEEE, 2009, pp. 1–7.
- F. Pece and J. Kautz, “Bitmap movement detection: Hdr for dynamic scenes,” in Visual Media Production (CVMP), 2010 Conference on.1em plus 0.5em minus 0.4emIEEE, 2010, pp. 1–8.
- S. Raman and S. Chaudhuri, “Reconstruction of high contrast images for dynamic scenes,” The Visual Computer, pp. 1–16, 2011.
- W. Zhang and W.-K. Cham, “Gradient-directed multiexposure composition,” Image Processing, IEEE Transactions on, vol. 21, no. 4, pp. 2318–2323, 2012.
- H. Zimmer, A. Bruhn, and J. Weickert, “Freehand hdr imaging of moving scenes with simultaneous resolution enhancement,” in Computer Graphics Forum, vol. 30, no. 2.1em plus 0.5em minus 0.4emWiley Online Library, 2011, pp. 405–414.
- J. Hu, O. Gallo, and K. Pulli, “Exposure stacks of live scenes with hand-held cameras,” in ECCV, 2012.
- P. Sen, N. K. Kalantari, M. Yaesoubi, S. Darabi, D. B. Goldman, and E. Shechtman, “Robust patch-based hdr reconstruction of dynamic scenes,” ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH Asia 2012), vol. 31, no. 6, 2012.
- P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1.1em plus 0.5em minus 0.4emIEEE, 2001, pp. I–511.
- L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 20, no. 11, pp. 1254–1259, 1998.
- A. Efros and W. Freeman, “Image quilting for texture synthesis and transfer,” in Proceedings of the 28th annual conference on Computer graphics and interactive techniques.1em plus 0.5em minus 0.4emACM, 2001, pp. 341–346.
- A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen, “Interactive digital photomontage,” ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 294–302, 2004.
- M. Rubinstein, A. Shamir, and S. Avidan, “Improved seam carving for video retargeting,” in ACM Transactions on Graphics (TOG), vol. 27, no. 3.1em plus 0.5em minus 0.4emACM, 2008, p. 16.
- M. Rubinstein, D. Gutierrez, O. Sorkine, and A. Shamir, “A comparative study of image retargeting,” in ACM Transactions on Graphics (TOG), vol. 29, no. 6.1em plus 0.5em minus 0.4emACM, 2010, p. 160.
- F. Banterle, A. Artusi, T. Aydin, P. Didyk, E. Eisemann, D. Gutierrez, R. Mantiuk, and K. Myszkowski, “Multidimensional image retargeting,” in SIGGRAPH Asia 2011 Courses.1em plus 0.5em minus 0.4em ACM, 2011, p. 15.
- S. G. Narasimhan and S. K. Nayar, “Enhancing resolution along multiple imaging dimensions using assorted pixels,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 4, pp. 518–530, 2005.
- A. El Gamal, “High dynamic range image sensors,” in Tutorial at International Solid-State Circuits Conference, 2002.
- M. Granados, B. Ajdin, M. Wand, C. Theobalt, H.-P. Seidel, and H. P. Lensch, “Optimal hdr reconstruction with linear digital cameras,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.1em plus 0.5em minus 0.4emIEEE, 2010, pp. 215–222.
- S. W. Hasinoff, F. Durand, and W. T. Freeman, “Noise-optimal capture for high dynamic range photography,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on.1em plus 0.5em minus 0.4em IEEE, 2010, pp. 553–560.
- B. K. Gunturk and M. Gevrekci, “High-resolution image reconstruction from multiple differently exposed images,” Signal Processing Letters, IEEE, vol. 13, no. 4, pp. 197–200, 2006.
- J. Choi, M. K. Park, and M. G. Kang, “High dynamic range image reconstruction with spatial resolution enhancement,” The Computer Journal, vol. 52, no. 1, pp. 114–125, 2009.