# Embedded polarizing filters to separate diffuse and specular reflection

###### Abstract

Polarizing filters provide a powerful way to separate diffuse and specular reflection; however, traditional methods rely on several captures and require proper alignment of the filters. Recently, camera manufacturers have proposed to embed polarizing micro-filters in front of the sensor, creating a mosaic of pixels with different polarizations. In this paper, we investigate the advantages of such camera designs. In particular, we consider different design patterns for the filter arrays and propose an algorithm to demosaic an image generated by such cameras. This essentially allows us to separate the diffuse and specular components using a single image. The performance of our algorithm is compared with a color-based method using synthetic and real data. Finally, we demonstrate how we can recover the normals of a scene using the diffuse images estimated by our method.

###### Keywords:

Polarizing micro-filter Diffuse and specular separation Photometric stereo.## 1 Introduction

The light reflected from a scene can be broadly classified into diffuse and specular terms. While the diffuse term is slowly varying for different pairs of viewing and lighting angles, specularities take the form of strong highlights that can vary quickly as either angle changes. Separating these two components is useful in many different imaging applications. For example, photometric stereo [1] estimates surface normals from images taken under different illuminations; however, like many algorithms, the process assumes that the scene is Lambertian, i.e the reflection is purely diffuse.

As a second example, consider the problem of estimating spatially-varying bidirectional reflectance distribution functions (SVBRDFs). The knowledge of the complete SVBRDF enables the reproduction of objects from any viewing direction and under variable lighting conditions. Typically, one fits a parametric model to samples obtained by capturing images with different viewing and lighting directions. Although for very simple materials the specular and diffuse components of the model can be fit from only the shape of the SVBRDF, often this process can be done much more accurately if the two components are separated prior to model fitting. Other applications that benefit from this separation include tracking, object recognition and image segmentation.

### 1.1 Related work

There are a number of approaches to perform this separation, using either a single or multiple images [2]. Most single-image approaches are color-based: this process was pioneered by Shafer [3], with the so-called dichromatic reflection model. A key observation is that, for dielectrics, the spectrum of the specularity is similar to the spectrum of the light source, whereas the diffuse spectrum also encompasses information about the color of the surface: this results in a T-shaped space [4] whose limbs represent the diffuse and specular components in the dichromatic reflection model. Approaches based on this model [5, 6] assume that the scene has been segmented into different parts with a uniform diffuse distribution, which make them unpractical for highly textured scenes. More recent work alleviates the need for segmentation by integrating spatio-temporal information [7]. Similar approaches with different color spaces such as HSI [8], rotated RGB [9], or custom spaces [10, 11, 12] have also been proposed. As well as color-based techniques, the so-called neighborhood analysis methods [13, 14, 15] leverage the information from neighboring pixels to infer the reflective component at a given point.

Multi-image approaches use strategies as varied as structured illumination [16, 17, 18], different viewing angles [19, 20], or light-field imaging [21, 22]. Another multiple image approach makes use of polarizing filters. With a single filter in front of the lens, Nayar et al. [23] recast the separation problem as a linear system using multiple captures under different polarizations. Building on their work, a number of methods propose to combine color information with polarizing filters to separate the reflective components [17, 24, 25, 26].

Ma et al. [17] and Debevec et al. [27] suggest the use of filters both in front of the camera and the light sources, which leads to a slightly different model for the recorded intensity.

Recently, cameras have been proposed with an array of polarizing micro-filters placed just in front of the image sensor [28, 29, 30, 31]. Analogously to a Bayer filter, the micro-filter of each pixel is oriented in one of a finite number of possible directions (see Fig. 1).

Potentially, this can enable the separation of the diffuse and specular components from a single image with the accuracy and robustness of multi-image polarization methods.

### 1.2 Proposed approach

In this paper, we investigate the feasibility of using these types of cameras for the separation of specularities by proposing an algorithm to demosaic images generated with polarizing micro-filter arrays. As well as regular patterns, we also study random arrangements with varying numbers of orientations and show that these arrangements offer advantages over regular patterns.

In addition to demosaicing, we demonstrate the usefulness of micro-filter arrays for photometric stereo. Using a light dome with several light sources having different polarizing orientations, we show that the separation of the diffuse and specular components using our design can significantly improve the estimation of the surface normals of objects. This approach is essentially the dual of [32], where a camera with micro-polarizers is combined with a multi-view approach to infer the shape of objects.

The code and data required to reproduce all the results presented in this paper will be made available online with the camera-ready version.

## 2 Problem statement

Polarization is a physical property of light that characterizes the orientation of electromagnetic transverse waves in space. A wave is said to be polarized when its orientation follows a regular pattern in space: for example, when the wave oscillates in a fixed direction it is called linearly polarized. In contrast, unpolarized light is composed of a mixture of electromagnetic waves with different orientations. A linear polarizing filter (or polarizer) is a physical device that only lets through light with a given polarization. Given a light wave polarized in direction with initial intensity , the polarizing filter will project it onto its orientation . This is summarized by Malus’ law, which quantifies the light intensity after it goes through the filter: . Unpolarized light can be thought of as containing all orientations equally, hence the resulting intensity is simply , which is the mean value of over all orientations.

Polarization is of interest here because the diffuse and specular reflections affect polarization differently. Specular reflections are caused by direct reflection of the incoming light at an object’s surface. Therefore, this type of reflection preserves the polarization of the incoming light. Diffusion is slightly more complex. The diffusion created by the surface reflections can be thought of as the combination of several specular reflections in all orientations due to the rough nature of most materials at the microscopic level. The polarization of the light reflected by this microfacet model is not preserved. On the other hand, some of the light is also scattered inside the medium; the polarization of such light is retained. We can leverage this observation to separate diffuse and specular reflection. In fact, we separate polarized and unpolarized light, which we can associate to the specular and diffuse components as long as the subsurface scattering is relatively weak.

In particular, suppose we have a scene illuminated by a single light source polarized with an angle ; this can be achieved by placing a linear polarizer in front of it. For now, assume that is known; later, we show that this is unnecessary as it can be easily estimated. Furthermore, suppose we have a digital camera with a filter array of linear polarizers with different orientations in front of its sensor. We describe the image formation process as follows. Let , , be a collection of images, each filtered with a distinct orientation . From Malus’ law, we can express each individual image as the sum of a constant diffuse term and a specular term that is modulated according to the polarization of the filters:

(1) |

where and are the diffuse and specular components, respectively.
Equivalently, we can rewrite (1) by flattening into a vector ^{1}^{1}1Note that we use bold uppercase letters for matrix notation and the same letter in lowercase for its flattened version. Throughout this chapter, we interchangeably use both notations to denote the same object.:

(2) |

Here, , and is the matrix that modulates the specularity with the correct attenuation and combines the diffuse and specular terms.

To model a camera with a micro-array of polarizing filters, we need to select only one polarization orientation for each pixel. To model this, let , where is a mask that zeroes the measurements that we do not have access to; it has the form of a diagonal matrix with a where the pixel is selected and a otherwise. Note that, since we have one polarizing filter per pixel, . At this stage, we make no assumption about the filter orientations: they can be regularly or irregularly structured over the array. Finally, the measured image is given by

(3) |

where

Using this formulation, our aim is to estimate and given the measurements and the downsampling matrix . To simplify notation we write the “superimage” with both diffuse and specular components as .

## 3 Algorithms

We propose a general approach to separate the diffuse and specular components. Finding both components using only a single image is an under-constrained problem, which we propose to regularize using the TV norm:

(4) |

where is the regularization term. Here , are regularization parameters and is the total variation (TV) norm^{2}^{2}2Technically, the TV norm is only a semi-norm.. Typically, the TV norm is defined as the norm of the first order differential operator, although some authors also consider the norm of the same differential operator. A compromise between the two approaches is the Huber norm or Huber loss, defined as , where

(5) |

We show below how to solve (4) with these different variations of the TV norm. The advantage of the approach is that, since it favors solutions that have a sparse first order derivative, it maintains edges in the image. In contrast, the approach can blur edges but performs well on other parts of the image. The formulation can also lead to faster algorithms.

### 3.1 Minimizing the TV norm

Although general optimization packages can be used to solve (4), the sampling matrix is very large and therefore these techniques are limited to relatively small-sized images. In order to overcome this, we observe that (4) can be expressed as a sparse linear system. To see this, note that , in the case of the TV norm, can be written as

(6) |

where is the D discrete differential operator matrix and encapsulates the effect of and . Finally, setting the derivative of (4) to zero yields the following sparse linear system:

(7) |

Although this system is very large, the matrix is sparse, symmetric and positive definite, and thus can be efficiently solved using approaches such as the conjugate gradient method. This leads to an algorithm that has linear complexity in the number of pixels.

### 3.2 Minimizing the TV norm

Unfortunately, unlike the case, we cannot find a closed form solution with the norm. Instead, we propose to use the split Bregmann method [33] to solve (4) iteratively. In particular, (4) can be written as

(8) |

so that each step of the Bregmann iteration algorithm consists of solving

(9) |

where . To solve (9), we can alternate between minimizing over and :

(10) | |||

(11) |

The main advantage of this approach is that (11) is now decoupled over space, and thus has a closed form solution. Additionally, (10) is now an minimization, which again has a closed form solution. In practice, only a few iterations of the split Bregmann algorithm are needed to converge so the method is acceptably fast and has linear complexity in the number of pixels in the image.

### 3.3 Minimizing the Huber TV norm

### 3.4 Unknown light polarization

In the above description, we assumed that the polarization angle of the incoming light was known. Although it can be directly measured in controlled experiments, this is not necessary since it can be accurately estimated using the following procedure. First, for each image , we estimate its mean from the available pixels:

(14) |

Using the linearity of the mean, we have

(15) |

where and are the mean values of and . For , this set of non-linear equations is overdetermined, and finding , and corresponds to solving the following minimization problem:

(16) |

## 4 Practical considerations

Before evaluating and comparing our proposed approach to existing methods, we discuss below a few algorithmic and design choices.

### 4.1 One-step vs. two-step algorithm

Our proposed algorithm separates the components and interpolates the images in a single step. One might wonder how much we gain by performing this estimation in a single step as opposed to a two-stage approach where we first demosaic every individual image and then fit a cosine to the interpolated images to identify the specular and diffuse components.

Figure 2 shows a comparison between our direct reconstruction algorithm and the two-stage approach. Here, the same TV norm regularization is used for both the direct approach and the interpolation step of the two-stage approach.

Even though the two approaches are close in performance when the number of orientations is , the two-stage approach performance clearly worsens when the number of orientations increases. Also, the two-stage approach peaks at dB, whereas the direct reconstruction continues to increase to dB. While the observed performance gain is not huge, it is significant for many applications. The versatility of our algorithm over the two-stage approach is also an advantage when investigating different filter design patterns. In this regard, we observe that the direct reconstruction saturates at around - orientations, suggesting that existing camera designs based on orientations are not optimal. In the rest of this paper, we exclusively use the single-step approach.

### 4.2 Comparison of the different regularizers

As stated earlier, the TV norm is typically defined in terms of the norm, which leads to sharper edges, while the norm formulation is faster and can perform better on certain regions of the image. Finally, the Huber norm provides a tradeoff between the and variants. To quantify these differences, we ran a comparison of the results obtained with the three proposed norms. The results are given for one image in Fig. 4. As we can see, there is not much difference in terms of PSNR, particularly between the and Huber norm regularization. We also observe that the regularized image is a bit sharper but also slightly noisier. In terms of performance, the and Huber norm algorithms are 5.7x and 3.3x faster than the algorithm, respectively. Based on these observations and computational speed, we choose to favor the norm for the rest of the experiments.

### 4.3 Filter design patterns and number of orientations

There are a number of ways to design the micro-filter array, including (pseudo) random patterns and regular grids. Examples of images acquired with these designs are depicted in Fig. 3c and 3d. To obtain these images, we use Blender with the raytracing engine Cycles [34] and create ground truth diffuse and specular images from different render passes. We then simulate the effect of the filters by appropriately weighting and mixing the diffuse and specular components. The corresponding estimated diffuse components are shown in Fig. 3e and 3f. Additionally, in Fig. 5, we compare the average performance of these two designs for a selection of synthetic images. Overall, we observe that the filter design does not significantly influence the peak signal-to-noise ratio (PSNR). Nevertheless, as shown in Fig. 3e and 3f, random patterns lead to a slightly better qualitative estimation and, more importantly, the reconstruction artifacts induced by random patterns are more pleasing to the eye.

### 4.4 Cost of introducing the filter array

Finally, it is also interesting to investigate how much is lost by using a polarizing micro-filter, compared to a traditional camera, when one does not wish to separate the diffuse and specular components. In Fig. 6, we see that the sum of our diffuse and specular estimations is very close to the original image, suggesting that, even if the separation fails, the sum is almost indistinguishable from the output of a traditional camera.

## 5 Experiments

We test our TV norm-based algorithm in three different scenarios: rendered images, real images with simulated polarizing filters, and real images with real polarizers. Simulated data is again generated with Blender Cycles raytracing engine [34]. The second scenario consists of real images captured by Shen and Zheng [11]: the ground truth of the diffuse components is provided along with the original images. We also provide our own captured images, taken under fixed polarized light sources, using a Nikon D810 DSLR camera with a polarizer in front of its lens. Note that even though the polarizing filter is not placed directly in front of the sensor, we neglect the effect that the lens might have on the light polarization. In practice, it may slightly change the polarization phase, but this would be accounted for in the device calibration. To obtain a diffuse ground truth, we capture one image with the polarizing filter oriented orthogonal to the light’s polarization. For all setups, we first generate all complete images for and then apply the downsampling operator to simulate the effect of the polarized filter array.

### 5.1 Single image diffuse and specular separation

Given the results from Fig. 5, we focus on a random filter array design and orientations for our algorithm. For all experiments, we set and . It is possible to obtain slightly improved results with parameter tuning, but we avoided it.

To provide context, we compare our approach with the single image color-based technique proposed by Shen and Zheng in [11]. We should emphasize that their method takes as input a standard image while ours takes an image captured with the proposed filter array. Rather than being a fair comparison between algorithms, this experiment allows us to quantify the performance improvement offered by the proposed setup.

Performing this comparison raises a number of challenges in terms of color management. In particular, our algorithm operates in a linear color space, whereas [11] uses the sRGB space. To deal with this, we run our algorithm in the linear space and convert our estimated images to sRGB for comparison. Additionally, the images provided by [11] are in sRGB format and we convert them to a linear color space to apply our algorithm.

### 5.2 Photometric stereo

Photometric stereo [1] infers surface normals from multiple images under different lighting by assuming that the surface of the objects are Lambertian materials. Unfortunately, this assumption is rarely satisfied as scenes often contain specular components and occlusions. In this section, we use our algorithm to first obtain a solid approximation of the diffuse part of the scene and then use it to recover the surface normals.

For this experiment, we use a light dome that is composed of lamps spread on a hemisphere surrounding the object of interest. A digital camera is placed at the zenith of the hemisphere. We install linear polarizing filters with unknown orientation in front of each light source as well as in front of the camera. We capture several images, one under each illumination of the fixed lights for four different polarization orientations of the filter positioned in front of the camera; these four orientations are used to create mosaiced images on which we apply our algorithm to estimate the diffuse component. These diffuse images are then fed to the photometric stereo algorithm [1]. In Fig. 9, we compare the normal map generated by our algorithm with the normal map obtained from unprocessed images (i.e. with no separation of the diffuse component) as well as from Shen and Zheng’s algorithm [11]. Note that the ground truth normal maps are computed using the non-mosaiced images. In all scenes, the gain of using our algorithm is clearly noticeable.

## 6 Conclusion

We have studied the benefits of using a camera equipped with polarizing micro-filters for the separation of diffuse and specular terms. We presented a simple algorithm to demosaic images produced by such cameras and extract the diffuse term. Regarding the diffuse extraction, we have shown that our relatively simple algorithm can significantly outperform other single-image based techniques. A more accurate knowledge of the diffuse term can then be leveraged in other imaging applications such as photometric stereo; we demonstrate that our technique improves the estimation of the normal maps on various scenes. For future work, we believe that including more elaborate priors based on, for example some of the existing color-based techniques, will improve the estimation.

## References

- [1] Woodham, R.J.: Photometric method for determining surface orientation from multiple images. Optical engineering 19 (1980) 191139
- [2] Artusi, A., Banterle, F., Chetverikov, D.: A survey of specularity removal methods. In: Computer Graphics Forum. Volume 30., Wiley Online Library (2011) 2208–2230
- [3] Shafer, S.A.: Using color to separate reflection components. Color Research & Application 10 (1985) 210–218
- [4] Klinker, G.J., Shafer, S.A., Kanade, T.: The measurement of highlights in color images. International Journal of Computer Vision 2 (1988) 7–32
- [5] Gershon, R.: The Use of Color in Computational Vision. Technical report. Department of Computer Science, University of Toronto (1987)
- [6] Klinker, G.J., Shafer, S.A., Kanade, T.: A physical approach to color image understanding. International Journal of Computer Vision 4 (1990) 7–38
- [7] Mallick, S.P., Zickler, T., Belhumeur, P.N., Kriegman, D.J.: Specularity removal in images and videos: A PDE approach. In: European Conference on Computer Vision, Springer (2006) 550–563
- [8] Yang, J., Liu, L., Li, S.Z.: Separating specular and diffuse reflection components in the HSI color space. In: IEEE International Conference on Computer Vision Workshops. (2013) 891–898
- [9] Mallick, S.P., Zickler, T.E., Kriegman, D.J., Belhumeur, P.N.: Beyond Lambert: Reconstructing specular surfaces using color. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Volume 2., Ieee (2005) 619–626
- [10] Bajcsy, R., Lee, S.W., Leonardis, A.: Detection of diffuse and specular interface reflections and inter-reflections by color image segmentation. International Journal of Computer Vision 17 (1996) 241–272
- [11] Shen, H.L., Zheng, Z.H.: Real-time highlight removal using intensity ratio. Applied optics 52 (2013) 4483–4493
- [12] Tan, R.T., Nishino, K., Ikeuchi, K.: Separating reflection components based on chromaticity and noise analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2004) 1373–1379
- [13] Mallick, S.P., Zickler, T., Belhumeur, P., Kriegman, D.: Dichromatic separation: specularity removal and editing. In: ACM SIGGRAPH 2006 Sketches, ACM (2006) 166
- [14] Tan, R.T., Ikeuchi, K.: Separating reflection components of textured surfaces using a single image. IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005) 178–193
- [15] Yoon, K.J., Choi, Y., Kweon, I.S.: Fast separation of reflection components using a specularity-invariant image representation. In: IEEE International Conference on Image Processing, IEEE (2006) 973–976
- [16] Lamond, B., Peers, P., Ghosh, A., Debevec, P.: Image-based separation of diffuse and specular reflections using environmental structured illumination. In: IEEE Computational Photography. (2009)
- [17] Ma, W.C., Hawkins, T., Peers, P., Chabert, C.F., Weiss, M., Debevec, P.: Rapid acquisition of specular and diffuse normal maps from polarized spherical gradient illumination. In: Eurographics Symposium on Rendering, Eurographics Association (2007) 183–194
- [18] Nayar, S.K., Krishnan, G., Grossberg, M.D., Raskar, R.: Fast separation of direct and global components of a scene using high frequency illumination. In: ACM Transactions on Graphics. Volume 25., ACM (2006) 935–944
- [19] Jaklič, A., Solina, F.: Separating diffuse and specular component of image irradiance by translating a camera. In: International Conference on Computer Analysis of Images and Patterns, Springer (1993) 428–435
- [20] Lin, S., Li, Y., Kang, S.B., Tong, X., Shum, H.Y.: Diffuse-specular separation and depth recovery from image sequences. In: European Conference on Computer Vision, Springer (2002) 210–224
- [21] Meng, L., Lu, L., Bedard, N., Berkner, K.: Single-shot specular surface reconstruction with gonio-plenoptic imaging. In: Proceedings of the IEEE International Conference on Computer Vision. (2015) 3433–3441
- [22] Wang, H., Xu, C., Wang, X., Zhang, Y., Peng, B.: Light field imaging based accurate image specular highlight removal. PLOS ONE 11 (2016) e0156173
- [23] Nayar, S.K., Fang, X.S., Boult, T.: Removal of specularities using color and polarization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (1993) 583–590
- [24] Kim, D.W., Lin, S., Hong, K.S., Shum, H.Y.: Variational specular separation using color and polarization. In: MVA. (2002) 176–179
- [25] Nayar, S.K., Fang, X.S., Boult, T.: Separation of reflection components using color and polarization. International Journal of Computer Vision 21 (1997) 163–186
- [26] Umeyama, S., Godin, G.: Separation of diffuse and specular components of surface reflection by use of polarization and statistical analysis of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2004) 639–647
- [27] Debevec, P., Hawkins, T., Tchou, C., Duiker, H.P., Sarokin, W., Sagar, M.: Acquiring the reflectance field of a human face. In: SIGGRAPH. (2000) 145–156
- [28] 4D Technologies: Polarization camera for image enhancement. (https://www.4dtechnology.com/products/polarimeters/polarcam)
- [29] FluxData: Polarization imaging camera FD-1665P. (http://www.fluxdata.com/products/fd-1665p)
- [30] Photonic Lattice: Polarization imaging camera PI-110. (https://www.photonic-lattice.com/en/products/polarization_camera/pi-110/)
- [31] Ricoh Imaging Company Ltd: Polarization camera. (https://www.ricoh.com/technology/tech/051_polarization.html)
- [32] Cui, Z., Gu, J., Shi, B., Tan, P., Kautz, J.: Polarimetric multi-view stereo. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017) 1558–1567
- [33] Goldstein, T., Osher, S.: The split bregman method for l1-regularized problems. SIAM J. Img. Sci. 2 (2009) 323–343
- [34] Blender Online Community: Blender - A 3D modelling and rendering package. Blender Foundation, Blender Institute, Amsterdam. (2017)
- [35] Stanford Graphics Lab: The Stanford 3D Scanning Repository. (http://graphics.stanford.edu/data/3Dscanrep/)