Spectral DiffuserCam: lensless snapshot hyperspectral imaging with a spectral filter array

Spectral DiffuserCam: lensless snapshot hyperspectral imaging with a spectral filter array


Hyperspectral imaging is useful for applications ranging from medical diagnostics to agricultural crop monitoring; however, traditional scanning hyperspectral imagers are prohibitively slow and expensive for widespread adoption. Snapshot techniques exist but are often confined to bulky benchtop setups or have low spatio-spectral resolution. In this paper, we propose a novel, compact, and inexpensive computational camera for snapshot hyperspectral imaging. Our system consists of a tiled spectral filter array placed directly on the image sensor and a diffuser placed close to the sensor. Each point in the world maps to a unique pseudorandom pattern on the spectral filter array, which encodes multiplexed spatio-spectral information. By solving a sparsity-constrained inverse problem, we recover the hyperspectral volume with sub-superpixel resolution. Our hyperspectral imaging framework is flexible and can be designed with contiguous or non-contiguous spectral filters that can be chosen for a given application. We provide theory for system design, demonstrate a prototype device, and present experimental results with high spatio-spectral resolution.

1 Introduction

Hyperspectral imaging systems aim to capture a 3D spatio-spectral cube containing spectral information for each spatial location. This enables the detection and classification of different material properties through spectral fingerprints, which cannot be seen with an RGB camera alone. Hyperspectral imaging has been shown to be useful for a variety of applications, from agricultural crop monitoring to medical diagnostics, microscopy, and food quality analysis [13, 27, 33, 44, 20, 1, 34, 37, 24, 5]. Despite the potential utility, commercial hyperspectral cameras range from $25,000 - $100,000 (at the time of publication of this paper). This high price point and the large size have limited the widespread use of hyperspectral imagers.

Traditional hyperspectral imagers rely on scanning either the spectral or spatial dimension of the hyperspectral cube with spectral filters or line-scanning [21, 17, 50]. These methods can be slow and generally require precise moving parts, increasing the camera complexity. More recently, snapshot techniques have emerged, enabling capture of the full hyperspectral data cube in a single shot. Some snapshot methods trade-off spatial resolution for spectral resolution by using a color filter array or splitting up the camera’s field-of-view (FOV). Computational imaging approaches can circumvent this trade-off by spatio-spectrally encoding the incoming light, then solving a compressive sensing inverse problem to recover the spectral cube [47], assuming some structure in the scene. These systems are typically table-top instruments with bulky relay lenses, prisms, or diffractive elements, suitable for laboratory experiments, but not the real world. Recently, several compact snapshot hyperspectral imagers have been demonstrated that encode spatio-spectral information with a single optic, enabling a practical form factor [40, 16, 25]. Using a single optic to control both the spectral and spatial resolution, they are generally constrained to measuring contiguous spectral bins within a given spectral band.

Figure 1: Overview of the Spectral DiffuserCam imaging pipeline, which reconstructs a hyperspectral datacube from a single-shot 2D measurement. The system consists of a diffuser and spectral filter array bonded to an image sensor. A one-time calibration procedure measures the point spread function (PSF) and filter function. Images are reconstructed using a non-linear inverse problem solver with a sparsity prior. The result is a 3D hyperspectral cube with 64 channels of spectral information for each of 448320 spatial points, generated from a 2D sensor measurement that is 448320 pixels.

Here, we propose a new encoding scheme that takes advantage of recent advances in patterned thin film spectral filters [42], and lensless imaging, to achieve high-resolution snapshot hyperspectral imaging in a small form factor. Our system consists of a tiled spectral filter array placed directly onto the sensor and a randomizing phase mask (i.e. diffuser) placed a small distance away from the sensor, as in the DiffuserCam architecture [2]. The diffuser spatially multiplexes the incoming light, such that each spatial point in the world maps to many pixels on the camera. The spectral filter array then spectrally encodes the incoming light via a structured erasure function. The multiplexing effect of the diffuser allows recovery of scene information from a subset of sensor pixels, so we are able to recover the full spatio-spectral cube without the loss in resolution that would result from using a non-multiplexing optic, such as a lens.

Our encoding scheme enables hyperspectral recovery in a compact and inexpensive form factor. The spectral filter array can be manufactured directly on the sensor, costing under $5 for both the diffuser and the filter array at scale. A key advantage of our system over previous compact snapshot hyperspectral imagers is that it decouples the spectral and spatial responses, enabling a flexible design in which either contiguous or non-contiguous spectral filters with user-selected bandwidths can be chosen. Given some conditions on scene sparsity and the diffuser randomness, the spectral sampling is determined by the spectral filters and the spatial resolution is determined by the autocorrelation of the diffuser response. This should find use in task-specific/classification applications [41, 12, 30, 23], where one may wish to tailor the spectral sampling to the application by measuring multiple non-contiguous spectral bands, or have higher-resolution spectral sampling for certain bands.

We present theory for our system, simulations to motivate the need for a diffuser, and experimental results from a prototype system. The main contributions of our paper are:

  1. A novel framework for snapshot hyperspectral imaging that combines compressive sensing with spectral filter arrays, enabling compact and inexpensive hyperspectral imaging.

  2. Theory and simulations analyzing the system’s spatio-spectral resolution for objects with varying complexity.

  3. A prototype device demonstrating snapshot hyperspectral recovery on real data from natural scenes.

2 Related Work

2.1 Snapshot Hyperspectral Imaging

There have been a variety of snapshot hyperspectral imaging techniques proposed and evaluated over the past decades. Most approaches can be categorized into the following groups: spectral filter array methods, coded aperture methods, speckle-based methods, and dispersion-based methods.

Spectral filter array methods use tiled spectral filter arrays on the sensor to recover the spectral channels of interest [29]. These methods can be viewed as an extension of Bayer filters for RGB imaging, since each ‘super-pixel’ in the tiled array has a grid of spectral filters. As the number of filters increases, the spectral resolution increases and the spatial resolution decreases. For instance, with an 88 filter array (64 spectral channels), the spatial resolution is 8 worse in each direction than that of the camera sensor. Demosaicing methods have been proposed to improve upon this in post-processing; however, they rely on intelligently guessing information that is not recorded by the sensor [35]. Recently, photonic crystal slabs have been demonstrated for compact spectroscopy based on random spectral responses (as opposed to traditional passband responses) and extended to hyperspectral imaging through the tiling of the photonic crystal slab pixels [49, 48]. While these methods have high spectral accuracy, they have only been demonstrated in a 1010 spatial pixel configuration. Our system uses a spectral filter array, but combines it with a randomizing diffuser in a lensless imaging architecture, allowing us to recover close to the full spatial resolution of the sensor, which is not possible with traditional lens-based methods. Our method uses traditional pass-band spectral filters, but could be extended to photonic crystal slabs and other spectral filter designs.

Coded aperture methods use a coded aperture, in combination with a dispersive optical element (e.g. a prism or diffractive grating), in order to modulate the light and encode spatial-spectral information [18, 31, 47, 10]. These systems are able to capture hyperspectral images and videos but tend to be large table-top systems consisting of multiple lenses and optical components. In contrast, our system has a much smaller form factor, requiring only a camera sensor with an attached spectral filter array and a thin diffuser placed close to the sensor.

Speckle-based methods use the wavelength dependence of speckle from a random media to achieve hyperspectral imaging. This has been demonstrated for compact spectrometers [39, 11] and extended to hyperspectral imaging [40, 16]. These systems can be compact, since they require only a sensor and scattering media as their optic; however their spectral resolution is limited by the speckle correlation through wavelengths. This is challenging to design for a given application, since the spatial and spectral resolutions are highly coupled. In contrast, our system uses spectral filters that can easily be adjusted for a given application and can be selected to have variable bandwidth or non-uniform spectral sampling.

Dispersive methods utilize the dispersion from a prism or diffractive optic to encode spectral information on the sensor. This can be accomplished opportunistically by a prism added to a standard DSLR camera [6]. The resulting system has high spatial resolution, equal to that of the camera sensor, but spectral information is encoded only at the edges of objects in the scene, resulting in a highly ill-conditioned problem and lower spectral accuracy. Other methods use a diffuser (as opposed to a prism) as the dispersive element [19]. This can be more compact than prism-based systems and can have improved spatial resolution when combined with an additional RGB camera [22]. To further improve compactness, [25] uses a single diffractive optic as both the lens and the dispersive element, uniquely encoding spectral information in a spectrally-rotating point spread function (PSF).

Our system uses a lensless architecture and a spectral filter array, together with sparsity assumptions, to reconstruct 3D hyperspectral information across 64 wavelengths. The design is most similar to [25] and achieves a similar compact size; however, our system achieves better spectral accuracy, and the use of the color filter array and diffuser results in more design flexibility, as our spectral and spatial resolutions are decoupled, enabling custom sensors tailored to specific spectral filter bands that do not need to be contiguous.

2.2 Lensless Imaging

Lensless, mask-based imaging systems do not have a main lens, but instead use an amplitude or phase mask in place of imaging optics. These systems have been demonstrated for very compact, small form factor 2D imaging [4, 28, 45, 46]. They are generally amenable to compressive imaging, due to the multiplexing nature of lensless architectures; each point in the scene maps to many pixels on the sensor, allowing a sparse scene to be completely recovered from a subset of sensor pixels [15]. Or, one can reconstruct higher-dimensional functions like 3D [2] or video [3] from a single 2D measurement. In this work, we use diffuser-based lensless imaging to spatially-multiplex light onto a repeated spectral filter array, then reconstruct 3D hyperspectral information. Because of the compressed sensing framework, our spatial resolution is better than the array super-pixel size, despite the missing information due to the array.

Figure 2: Motivation for multiplexing: A high-NA lens captures high-resolution spatial information, but misses the yellow point source, since it comes into focus on a spectral filter pixel designed for blue light. A low-NA lens blurs the image of each point source to be the size of the spectral filter’s super-pixel, capturing accurate spectra at the cost of poor spatial resolution. Our DiffuserCam approach multiplexes the light from each point source across many super-pixels, enabling the computational recovery of both point sources and their spectra without sacrificing spatial resolution. Note that a simplified 33 filter array is shown here for clarity.
Figure 3: Image formation model for a scene with two point sources of different colors, each with narrow-band irradiance centered at (yellow) and (red). The final measurement is the sum of the contributions from each individual spectral filter band in the array. Due to the spatial multiplexing of the lensless architecture, all scene points project information to multiple spectral filters, which is why we can recover a high-resolution hyperspectral cube from a single image, after solving an inverse problem.

3 System Design Overview

Our system leverages recent advances in both spectral filter array technology and compressive lensless imaging to decouple the spectral and spatial design. Furthermore, the spectral filter arrays can be deposited directly on the camera sensor. With a diffuser as our multiplexing optic, the system is compact and inexpensive at scale.

To motivate our need for a multiplexing optic instead of an imaging lens, let us consider three candidate architectures: one with a high numerical aperture (NA) lens whose diffraction-limited spot size is matched to the filter pixel size, one with a low-NA lens whose diffraction-limited spot size is matched to the super-pixel size, and finally our design with a diffuser as a multiplexing optic. Figure 2 illustrates these three scenarios with a simplified example of a spectral filter array consisting of spectral filters (9 total) repeated horizontally and vertically. Assume that the monochrome camera sensor has square pixels of lateral size , the spectral filter array has square filters of size , and each block of spectral filters creates a super-pixel of size , where .

In the high-NA lens case, a point source in the scene will be imaged onto a single filter pixel of the sensor, and thus will only be measured if it is within the passband of that filter; otherwise it will not be recorded, Fig. 2 (left). In the low-NA lens case, each point source will be imaged to an area the size of the filter array super-pixel, and thus recorded by the sensor correctly, but at the price of low spatial-resolution (matched to the the super-pixel size), Fig. 2 (middle). In contrast, a multiplexing optic can avoid the gaps in the measurement of the high-NA lens and achieve better resolution than the low-NA case.

A diffuser multiplexes the light from each point source such that it hits many filter pixels, covering all of the spectral bands. And the spatial resolution of the final image can be on the order of the camera pixel size, provided that conditions for compressed sensing are met, Fig. 2 (right). In practice, the spatial resolution of our system will be bounded by the autocorrelation of the point spread function (PSF), as detailed in Sec. 7, and the diffuser PSF must span multiple super-pixels to ensure that each point in the world is captured. Since compressive recovery is used to recover a 3D hyperspectral cube from a 2D measurement, the resolution is a function of the scene complexity, as described in Sec. 7.

4 Imaging Forward Model

Given our design with a diffuser placed in front of a sensor that has a spectral filter array on top of it, in this section we outline a forward model for the optical system, illustrated in Fig. 3. This model is a critical piece of our iterative inverse algorithm for hyperspectral reconstruction and will also be used to analyze spatial and spectral resolution.

4.1 Spectral filter model

The spectral filter array is placed on top of an imaging sensor, such that the exposure on each pixel is the sum of point-wise multiplications with the discrete filter function,


where denotes point-wise multiplication, is the spectral irradiance incident on the filter array and is a 3D function describing the transmittance of light through the spectral filter for wavelength bands, which we call the filter function. In this model, we absorb the sensor’s spectral response into the definition of . Our device’s filter function is determined experimentally (see Sec 6.C) and shown in Fig. 4(b). This can be generalized to any arbitrary spectral filter design and does not assume alignment between the filter pixels and the sensor pixels. Here, we focus on the case of a repeating grid of spectral filters, where each ’super-pixel’ consists of a set of narrow-band filters. Our device has a 88 grid of filters in each super-pixel; Fig. 3 illustrates a simplified 33 grid, for clarity.

4.2 Diffuser model

The diffuser (a smooth pseudorandom phase optic) in our system achieves spatial multiplexing; this results in a compact form factor and enables reconstruction with spatial resolution better than the super-pixel size via compressed sensing. The diffuser is placed a small distance away from the sensor and an aperture is placed on the diffuser to limit higher angles. The sensor plane intensity resulting from the diffuser can be modeled as a convolution of the scene, with the on-axis PSF,  [28]:


where represents a discrete 2D linear convolution over spatial dimensions. The crop function accounts for the finite sensor size. We assume that the PSF does not vary with wavelength and validate this experimentally in Sec. 6.B. However, this model can be easily extended to include a spectrally-varying PSF, if there is more dispersion across wavelengths.

We assume that objects are placed beyond the hyperfocal distance of the imager, therefore the PSF has negligible depth-variance and a 2D convolutional model is valid [28]. If objects are placed within the hyperfocal distance, a 3D model will be needed to account for the depth-variance of the PSF.

4.3 Combined model

Combining the spectral filter model with the diffuser model, we have the following discrete forward model:


The linear forward model is represented by the combined operations in matrix . Figure 3 illustrates the forward model for several point sources, showing the intermediate variable , which is the scene convolved with the PSF, before point-wise multiplication by the filter function. The final image is the sum over all wavelengths.

5 Hyperspectral Reconstruction

To recover the hyperspectral datacube from the 2D measurement, we must solve an underdetermined inverse problem. Since our system falls within the framework of compressive sensing due to our incoherent, multiplexed measurement, we use minimization. We use a weighted 3D total variation (3DTV) prior on the scene, as well as a non-negativity constraint, and a low-rank prior on the spectrum. This can be written as:


where is the matrix of forward finite differences in the , , and directions, represents the nuclear norm, which is the sum of singular values. and are the tuning parameters for the 3DTV prior and low-rank priors, respectively. We use the fast iterative shrinkage-thresholding algorithm (FISTA) [7] with weighted anisotropic 3DTV to solve this problem according to [26].

Figure 4: Experimental calibration of Spectral DiffuserCam. (a) The caustic PSF (contrast-stretched and cropped), before passing through the spectral filter array, is similar at all wavelengths. (b) The spectral response with the filter array only (no diffuser). (Top left) Full measurement with illumination by a 458nm plane wave. The filter array consists of 88 grids of spectral filters repeating in 2820 super-pixels. (Top right) Spectral responses of each of the 64 color channels. (Bottom) Spectral response of a single super-pixel as illumination wavelength is varied with a monochromater.

6 Implementation Details

We built a prototype system using a CMOS sensor, a hyperspectral filter array provided by Viavi Solutions (Santa Rosa, CA)[42], and an off-the-shelf diffuser (Luminit 0.5°) placed 1cm away from the sensor. The sensor has 659494 pixels (with a pixel pitch of 9.9), which we crop down to 448320 to match the spectral filter array size. The spectral filter array consists of a grid of 2820 super-pixels, each with an 88 grid of filter pixels (64 total, spanning the range 386-898nm). Each filter pixel is 20 in size, covering slightly more than 4 sensor pixels. The alignment between the sensor pixels and the filter pixels is unknown, requiring a calibration procedure (detailed in Sec. 66.1). The exposure time is adjusted for each image, ranging from 1ms-13ms, which is short enough for video-rate acquisition. The computational reconstruction typically takes 12-24 minutes (for 500-1000 iterations) on an RTX 2080-Ti GPU using MATLAB.

6.1 Filter Function Calibration

To calibrate the filter function ( in Eqn. 3), including the spectral sensitivity of both the sensor and the spectral filter array, we use a Cornerstone 130 1/3m motorized monochromator (Model 74004). The monochromater creates a narrow-band source of 5nm full-width at half-maximum (FWHM) and we measure the filter response (without the diffuser) while sweeping the source by 8nm increments from 386nm to 898nm. The result is shown in Fig. 4(b).

6.2 PSF Calibration

We also need to calibrate the diffuser response by measuring the diffuser PSF pattern without the spectral filter array. Because the diffuser is relatively smooth with large features (relative to the wavelength of light), the PSF remains relatively constant as a function of wavelength, as shown in Fig. 4(a). Hence, we only need to calibrate for a single wavelength by capturing a single point source calibration image [2]. However, this is not trivial because the spectral filter array is bonded to the sensor and cannot be removed easily. In our setup, we instead take advantage of the fact that our filter array is smaller than our sensor, so we can measure the PSF using the edges of the raw sensor, by shifting the point source to scan the different parts of the PSF over the raw sensor area and stitching the sub-images together. In a system where the filter size is matched to the sensor, this trick will not be possible, but an optimization-based approach could be developed to recover the PSF from measurements.

6.3 System Non-idealities

Our reconstruction quality and spectral resolution are limited by two non-idealities in our system. First, our camera development board performs an undefined and uncontrollable non-linear contrast-stretching to all images. This makes the measurement non-linear and impedes our imaging of dim objects (since the camera performs a larger contrast stretching for dimmer images). Further, our spectral calibration may have errors, since each calibration image cannot be normalized by the intensity of light hitting the sensor. This may cause certain wavelength bands to appear brighter or dimmer than they should in our spectral reconstructions. A better camera board without automatic contrast stretching should fix this problem and provide more quantitative spectral profile reconstructions in the future.

Second, we used a simplified spectral calibration in which we measured the response with uniform spectral sampling, instead of at the true wavelengths of the filters. Due to the mismatch between our calibration scheme (measured every 8nm with constant bandwidth) and the actual spectral filters (center wavelengths spaced 5-12nm apart with bandwidths between 6-23nm), sometimes our calibration wavelengths fall between two filters, resulting in an ambiguity. Given this non-ideal calibration, our effective spectral bands are limited to 49 bands, instead of 64. In our results, we show all 64 bands, but note that some will have overlapping spectral responses. In the future, we will calibrate at the design wavelengths of the filter to fix this issue. Further, the deposition of the spectral filters directly on-top of the camera pixels (requiring precise placement during the manufacturing stage) would alleviate the need for this calibration entirely.

7 Resolution Analysis

Here, we derive our theoretical resolution and experimentally validate it with our prototype system. First, we discuss spectral resolution, which is set by the filter bandwidths, and then we compute the expected two-point spatial resolution, based on the PSF autocorrelation. Since our resolution is scene-dependent, we expect the resolution to degrade with scene complexity. To characterize this, we present theory for multi-point resolution based on the condition number analysis introduced in [2]. We compare our system against those with a high-NA and low-NA lens instead of a diffuser. Our results demonstrate two-point spatial resolution of 0.19 super-pixels and multi-point spatial resolution of 0.3 super-pixels for 64 spectral channels ranging from 386-898nm.

Figure 5: Spatial Resolution analysis. (a) The theoretical resolution of our system, defined as the half-width of the autocorrelation peak at 70 its maximum value, is 0.19 super-pixels. (b) Experimental two-point reconstructions demonstrate 0.19 super-pixel resolution across all wavelengths (slices of the reconstruction shown here), matching the theoretical resolution.

7.1 Spectral Resolution

Spectral resolution is determined by the spectral channels of the filter array. As such, we expect to be able to resolve the 64 spectral channels present in our spectral filter array. The filters have an average spacing of 8nm across a 386-898nm range with bandwidths between 6-23nm. To validate our spectral resolution, we scan a point source across those wavelengths using a monochrometer. Figure 6 shows a sampling of spectral reconstructions overlaid on top of each other, with the shaded blocks indicating the ground-truth monochrometer spectra. Our reconstructions all match the ground-truth peaks within 5nm of the true wavelength. The small red peaks around 400nm are artifacts from the monochrometer, which emitted a 2nd peak around 400nm for the longer wavelengths.

Figure 6: Spectral resolution analysis. Sample spectra from hyperspectral reconstructions of narrow-band point sources, overlaid on top of each other, with shaded lines indicating the ground-truth. For each case, the recovered spectral peak matches the true wavelength within 5nm.

7.2 Two-point Spatial Resolution

Spatial resolution of our system, in terms of the two-point resolution, will be bounded by that of a lensless imager with the diffuser only (without the spectral filter array). The expected resolution can be defined as the autocorrelation peak half-width at 70% the maximum value [28], Fig. 5(a). For our system, this is 3 sensor pixels, or 0.19 super-pixels. To experimentally measure the spatial resolution of our system, we image two point sources at three different wavelengths ( nm, nm, nm). The reconstructions in Fig. 5 show that we can resolve two point sources that are 0.19 super-pixels apart for each wavelength and orientation, as determined by applying the Rayleigh criterion. This demonstrates that our system achieves sub-super-pixel spatial resolution, consistent with the expected resolution that would be achieved without the spectral filter array.

Figure 7: Condition number analysis for Spectral DiffuserCam, as compared to a low-NA or high-NA lens. (a) Condition numbers for the 2D spatial case (single spectral channel) are calculated by generating different numbers of points on a 2D grid, each with separation distance . (b) Condition numbers for the full spatio-spectral case are calculated on a 3D grid. A condition number below 40 is considered to be good (shown in green). The diffuser has a consistently better performance for small separation distances than either the low-NA or the high-NA lens. The diffuser can resolve objects as low as 0.3 super-pixels apart for more complex scenes, whereas the low-NA lens requires larger separation distances and the high-NA lens suffers errors due to gaps in the measurement.

7.3 Multi-point resolution

Because our image reconstruction algorithm contains nonlinear regularization terms, our reconstruction resolution will be object dependent. Hence, two-point resolution measurements are not sufficient for fully characterizing the system resolution, and should be considered a best case scenario. To better predict real-world performance, we perform a local condition number analysis, as introduced in [2], that estimates resolution as a function of object complexity. The local condition number is a proxy for how well the forward model can be inverted, given known support, and is useful for systems such as ours in which the full matrix is never explicitly calculated [9].

The local condition number theory states that given knowledge of the a priori support of the scene, , we can form a sub-matrix consisting only of columns of corresponding to the non-zero voxels. The reconstruction problem will be ill-posed if any of the sub-matrices of are ill-conditioned, which can be quantified by the condition number of the sub-matrices. The worst-case condition number will be when sources are near each other, therefore we compute the condition number for a group of point sources with a separation varying by an integer number of voxels and repeat this for increasing numbers of point sources.

Figure 8: Simulated hyperspectral reconstructions comparing our Spectral DiffuserCam result with alternative design options. (a) Resolution target with different sections illuminated by narrow-band 634nm (red), 570nm (green), 474nm (blue), and broadband (white) sources. (b) Reconstruction of the target by Spectral DiffuserCam, (c) a low-NA lens design, and (d) a high-NA lens design, each showing the raw data, false-colored reconstruction and sum projection. The diffuser achieves higher spatial resolution and better accuracy than the low-NA and the high-NA lens.

In Fig. 7, we calculate the local condition number for two cases: the 2D spatial reconstruction case, considering only a single spectral channel, and the 3D case, considering points with varying spatial and spectral positions. For comparison, we also simulate the condition number for a low-NA and high-NA lens, as introduced in Sec. 3. The results show that our diffuser design has a consistently lower condition number than either the low- or high-NA lens, having a condition number below 40 for separation distances of greater than 0.3 super-pixels. The low-NA lens needs a separation distance closer to 1 super-pixel, as expected, and the high-NA lens has an erratic condition number due to the missing information in the measurement.

From this analysis, we can see that, beyond 0.3 super-pixels separation, the condition number for the diffuser does not get arbitrarily worse for increasing scene complexity. Thus, our expected spatial resolution is approximately 0.3 super-pixels.

7.4 Simulated Resolution Target Reconstruction

Next, we validate the results of our condition number analysis through simulated reconstructions of a resolution target with different spatial locations illuminated by different sources (red, green, blue and white light), as shown in Fig. 8. For each simulation, we add Gaussian noise with a variance of and run the reconstruction for 2,000 iterations of FISTA with 3DTV. Our system resolves features that are 0.3 super-pixels apart, whereas the low-NA lens can only resolve features that are roughly 1 super-pixel apart and the high-NA lens results in gaps, validating our predicted performance.

Figure 9: (a) Experimental reconstruction of a broadband resolution target, showing the sum projection (top) and sum projection (bottom), demonstrating spatial resolution of 0.3 super-pixels. (b) Experimental reconstruction of 10 multi-colored LEDs in a grid with 0.4 super-pixels spacing (four red LEDs on left, four green in middle, two blue at right). We show the sum projection (top) and sum projection (bottom). The LEDs are clearly resolved spatially and spectrally, and spectral line profiles for each color LED closely match the ground truth spectra from a spectrometer.

8 Experimental Results

We start with experimental reconstructions of simple objects with known properties - a broadband USAF resolution target displayed on a computer monitor, and a grid of RGB LEDs (Fig. 9). We resolve points that are .3 super-pixels apart, which matches our expected multi-point resolution based on the condition number analysis above. For the RGB LED scene, the ground truth spectral profiles of the LEDs are measured using a spectrometer, and our recovered spectral profile closely matches the ground truth, as shown in Fig. 9(b).

Next, we show reconstructions of more complex objects, either displayed on a computer monitor or illuminated with two halogen lamps (Figure 10). We plot the ground truth spectral line profiles, as measured by a Thorlabs CCS200 spectrometer, from four points in the scene, showing that we can accurately recover the spectra. A reference RGB scene is shown for each image, demonstrating that the reconstructions spatially match the expected scene.

Figure 10: Experimental hyperspectral reconstructions. (a-c) Reconstructions of color images displayed on a computer monitor and (d) Thorlabs plush toy placed in front of the imager and illuminated by two Halogen lamps. The raw measurement, false color images, sum projections and spectral line profiles for four spatial points are shown for each scene. The ground truth spectral line profiles, measured using a spectrometer, are plotted in black for reference. Spectral line profiles in (a,b) show the average and standard deviation spectral profiles across the area of the box or letter in the object, whereas (c-d) show a line profile from a single spatial point in the scene.

9 Discussion

A key advantage of our design over previous work is its flexibility to choose the spectral filters in order to tailor the system to a specific application. For example, one can non-linearly sample a wide range of wavelengths (which is difficult with many previous snapshot hyperspectral imagers). In future, we plan to design implementations specific to various task-based applications, which could make hyperspectral imaging more easily adopted, especially since the price is several orders-of-magnitude lower than currently available hyperspectral cameras.

Currently, we experimentally achieve a spatial resolution of 0.3 super-pixels, or 5 sensor pixels. In future designs, we should be able to achieve the full sensor resolution (along with better quality reconstructions) by optimizing the randomizing optic, instead of using an off-the shelf diffuser. This could be achieved by end-to-end optical design [43, 38].

Our system has two main limitations: light-throughput and scene-dependence. Due to the use of narrow-band spectral filters, much of the light is filtered out by the filters. This provides good spectral accuracy and discrimination, but at the cost of low light throughput. In addition, since the light is spread by the diffuser over many pixels, the signal-to-noise ratio (SNR) is further decreased. Hence, our imager is not currently suitable for low-light conditions. This light-throughput limitation can be mitigated in the future by the use of photonic crystal slabs instead of narrowband filters, in order to increase light-throughput while maintaining spatio-spectral resolution and accuracy [48]. In addition, end-to-end design of both the spectral filters and the phase mask should improve efficiency, since application-specific designs can use only the set of wavelengths necessary for a particular task, without sampling the in-between wavelengths. Reducing the number of spectral bands improves both light throughput (because more sensor area will be dedicated to each spectral band) and spatial resolution (because the super-pixels will be smaller).

Our second limitation is scene-dependence, as our reconstruction algorithm relies on object sparsity (e.g. sparse gradients). Because of the non-linear regularization term, it is difficult to predict performance, and one might suffer artifacts if the scene is not sufficiently sparse. Recent advances in machine learning and inverse problems seek to provide better signal representations, enabling the reconstruction of more complicated, denser scenes [32, 8]. In addition, machine learning could be useful in speeding up the reconstruction algorithm [36] as well as potentially utilizing the imager more directly for a higher-level task, such as classification [14].

10 Conclusion

Our work presents a new hyperspectral imaging modality that combines a color filter array and lensless imaging techniques for an ultra-compact and inexpensive hyperspectral camera. The spectral filter array encodes spectral information onto the sensor and the diffuser multiplexes the incoming light such that each point in the world maps to many spectral filters. The multiplexed nature of the measurement allows us to use compressive sensing to reconstruct high spatio-spectral resolution from a single 2D measurement. We provided an analysis for the expected resolution of our imager and experimentally characterized the two-point and multi-point resolution of the system. Finally, we built a prototype and demonstrated reconstructions of complex spatio-spectral scenes, achieving up to 0.19 super-pixel spatial resolution across 64 spectral bands.

Funding Information

This work was supported by the Gordon and Betty Moore Foundation Data-Driven Discovery Initiative Grant GBMF4562, and STROBE: A National Science Foundation Science & Technology Center under Grant No. DMR 1548924. Kristina Monakhova and Kyrollos Yanny acknowledge funding from the National Science Foundation Graduate Research Fellowship Program (NSF GRFP) (DGE 1752814). The camera and spectral filter array were provided by Viavi Solutions (Santa Rosa, CA).


The authors would like to thank Viavi Solutions (Santa Rosa, CA), and particularly Bill Houck, for their technical help and support, as well as Nick Antipa and Grace Kuo for helpful discussions.


The authors declare no conflicts of interest.




  1. H. Akbari, L. Halig, D. M. Schuster, B. Fei, A. Osunkoya, V. Master, P. Nieh and G. Chen (2012) Hyperspectral imaging and quantitative analysis for prostate cancer detection. Journal of Biomedical Optics 17 (7), pp. 076005. Cited by: §1.
  2. N. Antipa, G. Kuo, R. Heckel, B. Mildenhall, E. Bostan, R. Ng and L. Waller (2018) DiffuserCam: lensless single-exposure 3D imaging. Optica 5 (1), pp. 1–9. Cited by: §1, §2.2, §6.2, §7.3, §7.
  3. N. Antipa, P. Oare, E. Bostan, R. Ng and L. Waller (2019) Video from stills: lensless imaging with rolling shutter. In 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–8. Cited by: §2.2.
  4. M. S. Asif, A. Ayremlou, A. Sankaranarayanan, A. Veeraraghavan and R. G. Baraniuk (2016) Flatcam: thin, lensless cameras using coded aperture and computation. IEEE Transactions on Computational Imaging 3 (3), pp. 384–397. Cited by: §2.2.
  5. C. P. Bacon, Y. Mattley and R. DeFrece (2004) Miniature spectroscopic instrumentation: applications to biology and chemistry. Review of Scientific instruments 75 (1), pp. 1–16. Cited by: §1.
  6. S. Baek, I. Kim, D. Gutierrez and M. H. Kim (2017) Compact single-shot hyperspectral imaging using a prism. ACM Transactions on Graphics (TOG) 36 (6), pp. 1–12. Cited by: §2.1.
  7. A. Beck and M. Teboulle (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences 2 (1), pp. 183–202. Cited by: §5.
  8. A. Bora, A. Jalal, E. Price and A. G. Dimakis (2017) Compressed sensing using generative models. arXiv preprint arXiv:1703.03208. Cited by: §9.
  9. E. J. Candès and C. Fernandez-Granda (2014) Towards a mathematical theory of super-resolution. Communications on Pure and Applied Mathematics 67 (6), pp. 906–956. Cited by: §7.3.
  10. X. Cao, T. Yue, X. Lin, S. Lin, X. Yuan, Q. Dai, L. Carin and D. J. Brady (2016) Computational snapshot multispectral cameras: toward dynamic capture of the spectral world. IEEE Signal Processing Magazine 33 (5), pp. 95–108. Cited by: §2.1.
  11. M. Chakrabarti, M. L. Jakobsen and S. G. Hanson (2015) Speckle-based spectrometer. Optics Letters 40 (14), pp. 3264–3267. Cited by: §2.1.
  12. K. Chao, C. Yang, Y. Chen, M. Kim and D. Chan (2007) Hyperspectral-multispectral line-scan imaging system for automated poultry carcass inspection applications for food safety. Poultry Science 86 (11), pp. 2450–2460. Cited by: §1.
  13. S. Delalieux, A. Auwerkerken, W. W. Verstraeten, B. Somers, R. Valcke, S. Lhermitte, J. Keulemans and P. Coppin (2009) Hyperspectral reflectance and fluorescence imaging to detect scab induced stress in apple leaves. Remote Sensing 1 (4), pp. 858–874. Cited by: §1.
  14. S. Diamond, V. Sitzmann, S. Boyd, G. Wetzstein and F. Heide (2017) Dirty pixels: optimizing image classification architectures for raw sensor data. arXiv preprint arXiv:1701.06487. Cited by: §9.
  15. R. Fergus, A. Torralba and W. T. Freeman (2006) Random lens imaging. Cited by: §2.2.
  16. R. French, S. Gigan and O. L. Muskens (2017) Speckle-based hyperspectral imaging combining multiple scattering and compressive sensing in nanowire mats. Optics Letters 42 (9), pp. 1820–1823. Cited by: §1, §2.1.
  17. N. Gat (2000) Imaging spectroscopy using tunable filters: a review. In Wavelet Applications VII, Vol. 4056, pp. 50–64. Cited by: §1.
  18. M. E. Gehm, R. John, D. J. Brady, R. M. Willett and T. J. Schulz (2007) Single-shot compressive spectral imaging with a dual-disperser architecture. Optics Express 15 (21), pp. 14013–14027. Cited by: §2.1.
  19. M. A. Golub, A. Averbuch, M. Nathan, V. A. Zheludev, J. Hauser, S. Gurevitch, R. Malinsky and A. Kagan (2016) Compressed sensing snapshot spectral imaging by a regular digital camera with an added optical diffuser. Applied Optics 55 (3), pp. 432–443. Cited by: §2.1.
  20. A. Gowen, C. O’Donnell, P. Cullen, G. Downey and J. Frias (2007) Hyperspectral imaging–an emerging process analytical tool for food quality and safety control. Trends in Food Science & Technology 18 (12), pp. 590–598. Cited by: §1.
  21. R. O. Green, M. L. Eastwood, C. M. Sarture, T. G. Chrien, M. Aronsson, B. J. Chippendale, J. A. Faust, B. E. Pavri, C. J. Chovit and M. Solis (1998) Imaging spectroscopy and the airborne visible/infrared imaging spectrometer (AVIRIS). Remote Sensing of Environment 65 (3), pp. 227–248. Cited by: §1.
  22. J. Hauser, M. A. Golub, A. Averbuch, M. Nathan, V. A. Zheludev and M. Kagan (2020) Dual-camera snapshot spectral imaging with a pupil-domain optical diffuser and compressed sensing algorithms. Applied Optics 59 (4), pp. 1058–1070. Cited by: §2.1.
  23. A. Hennessy, K. Clarke and M. Lewis (2020) Hyperspectral classification of plants: a review of waveband selection generalisability. Remote Sensing 12 (1), pp. 113. Cited by: §1.
  24. W. Huang, J. Li, Q. Wang and L. Chen (2015) Development of a multispectral imaging system for online detection of bruises on apples. Journal of Food Engineering 146, pp. 62–71. Cited by: §1.
  25. D. S. Jeon, S. Baek, S. Yi, Q. Fu, X. Dun, W. Heidrich and M. H. Kim (2019-07) Compact snapshot hyperspectral imaging with diffracted rotation. ACM Trans. Graph. 38 (4). External Links: ISSN 0730-0301, Link, Document Cited by: §1, §2.1, §2.1.
  26. U. S. Kamilov (2016) A parallel proximal algorithm for anisotropic total variation minimization. IEEE Transactions on Image Processing 26 (2), pp. 539–548. Cited by: §5.
  27. R. T. Kester, N. Bedard, L. S. Gao and T. S. Tkaczyk (2011) Real-time snapshot hyperspectral imaging endoscope. Journal of Biomedical Optics 16 (5), pp. 056005. Cited by: §1.
  28. G. Kuo, N. Antipa, R. Ng and L. Waller (2017) DiffuserCam: diffuser-based lensless cameras. In Computational Optical Sensing and Imaging, pp. CTu3B–2. Cited by: §2.2, §4.2, §4.2, §7.2.
  29. P. Lapray, X. Wang, J. Thomas and P. Gouton (2014) Multispectral filter arrays: recent advances and practical implementation. Sensors 14 (11), pp. 21626–21659. Cited by: §2.1.
  30. R. M. Levenson, D. T. Lynch, H. Kobayashi, J. M. Backer and M. V. Backer (2008) Multiplexing with multispectral imaging: from mice to microscopy. ILAR Journal 49 (1), pp. 78–88. Cited by: §1.
  31. X. Lin, Y. Liu, J. Wu and Q. Dai (2014) Spatial-spectral encoded compressive hyperspectral imaging. ACM Transactions on Graphics (TOG) 33 (6), pp. 1–11. Cited by: §2.1.
  32. Z. Liu and J. Scarlett (2020) Information-theoretic lower bounds for compressive sensing with generative models. IEEE Journal on Selected Areas in Information Theory. Cited by: §9.
  33. G. Lu and B. Fei (2014) Medical hyperspectral imaging: a review. Journal of Biomedical Optics 19 (1), pp. 010901. Cited by: §1.
  34. G. Lu, L. V. Halig, D. Wang, X. Qin, Z. G. Chen and B. Fei (2014) Spectral-spatial classification for noninvasive cancer detection using hyperspectral imaging. Journal of Biomedical Optics 19 (10), pp. 106004. Cited by: §1.
  35. S. Mihoubi, O. Losson, B. Mathon and L. Macaire (2017) Multispectral demosaicing using pseudo-panchromatic image. IEEE Transactions on Computational Imaging 3 (4), pp. 982–995. Cited by: §2.1.
  36. K. Monakhova, J. Yurtsever, G. Kuo, N. Antipa, K. Yanny and L. Waller (2019) Learned reconstructions for practical mask-based lensless imaging. Optics Express 27 (20), pp. 28075–28090. Cited by: §9.
  37. A. Orth, M. J. Tomaszewski, R. N. Ghosh and E. Schonbrun (2015) Gigapixel multispectral microscopy. Optica 2 (7), pp. 654–662. Cited by: §1.
  38. Y. Peng, Q. Sun, X. Dun, G. Wetzstein, W. Heidrich and F. Heide (2019) Learned large field-of-view imaging with thin-plate optics. ACM Transactions on Graphics (TOG) 38 (6), pp. 219. Cited by: §9.
  39. B. Redding, S. F. Liew, R. Sarma and H. Cao (2013) Compact spectrometer based on a disordered photonic chip. Nature Photonics 7 (9), pp. 746. Cited by: §2.1.
  40. S. K. Sahoo, D. Tang and C. Dang (2017) Single-shot multispectral imaging with a monochromatic camera. Optica 4 (10), pp. 1209–1213. Cited by: §1, §2.1.
  41. V. Saragadam and A. C. Sankaranarayanan (2020) Programmable spectrometry: per-pixel material classification using learned spectral filters. In 2020 IEEE International Conference on Computational Photography (ICCP), pp. 1–10. Cited by: §1.
  42. S. Saxe, L. Sun, V. Smith, D. Meysing, C. Hsiung, A. Houck, M. Von Gunten, C. Hruska, D. Martin and R. Bradley (2018) Advances in miniaturized spectral sensors. In Next-Generation Spectroscopic Technologies XI, Vol. 10657, pp. 106570B. Cited by: §1, §6.
  43. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide and G. Wetzstein (2018) End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging. ACM Transactions on Graphics (TOG) 37 (4), pp. 1–13. Cited by: §9.
  44. D. Sun (2010) Hyperspectral imaging for food quality analysis and control. Elsevier. Cited by: §1.
  45. J. Tanida, T. Kumagai, K. Yamada, S. Miyatake, K. Ishida, T. Morimoto, N. Kondou, D. Miyazaki and Y. Ichioka (2001) Thin observation module by bound optics (TOMBO): concept and experimental verification. Applied Optics 40 (11), pp. 1806–1813. Cited by: §2.2.
  46. J. Tanida, R. Shogenji, Y. Kitamura, K. Yamada, M. Miyamoto and S. Miyatake (2003) Color imaging with an integrated compound imaging system. Optics Express 11 (18), pp. 2109–2117. Cited by: §2.2.
  47. A. Wagadarikar, R. John, R. Willett and D. Brady (2008) Single disperser design for coded aperture snapshot spectral imaging. Applied Optics 47 (10), pp. B44–B51. Cited by: §1, §2.1.
  48. Z. Wang, S. Yi, A. Chen, M. Zhou, T. S. Luk, A. James, J. Nogan, W. Ross, G. Joe and A. Shahsafi (2019) Single-shot on-chip spectral sensors based on photonic crystal slabs. Nature communications 10 (1), pp. 1–6. Cited by: §2.1, §9.
  49. Z. Wang and Z. Yu (2014) Spectral analysis based on compressive sensing in nanophotonic structures. Optics Express 22 (21), pp. 25608–25614. Cited by: §2.1.
  50. C. Zhang, M. Rosenberger, A. Breitbarth and G. Notni (2016) A novel 3d multispectral vision system based on filter wheel cameras. In 2016 IEEE International Conference on Imaging Systems and Techniques (IST), pp. 267–272. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description