Imaging with SPADs and DMDs: Seeing through DiffractionPhotons
Abstract
This paper addresses the problem of imaging in the presence of diffractionphotons. Diffractionphotons arise from the low contrast ratio of DMDs ( ), and very much degrade the quality of images captured by SPADbased systems.
Herein, a joint illuminationdeconvolution scheme is designed to overcome diffractionphotons, enabling the acquisition of intensity and depth images. Additionally, a proofofconcept experiment is conducted to demonstrate the viability of the designed scheme. It is shown that by codesigning the illumination and deconvolution phases of imaging, one can substantially overcome diffractionphotons.
I Introduction
Today, imaging with singlephoton avalanche diodes (SPADs) is rapidly gaining attention. The high sensitivity and temporal resolution of SPADs is finding favor in a variety of applications, ranging from lightinflight recording [Gariepy] to imaging around corners [NLOS].
Traditionally, photon detection has relied on photomultiplier tubes (PMTs). PMTs have a fast response time and provide a high amplification gain, yet they are fairly large vacuum tubes and thus not wellsuited for cameras.
Nowadays, SPADs are increasingly becoming the preferred solidstate device for photon detection. Low dark counts ( Hz) and high quantum efficiencies (up to ) [MPD] are among the attractive features of SPADs.
SPAD cameras are, however, still in their infancy. Currently, commercially available SPAD cameras have low pixel resolutions ( pixels). To increase the spatial resolution of singlepixel SPAD cameras, rasterscanning, whereby pixel values of a scene are sequentially acquired, is typically used.
Rasterscanning is commonly achieved by means of galvanometric mirrors. In this approach, light is steered to positions of a scene mechanically via a pair of mirrors. Such mechanical systems, however, are cumbersome.
In contrast to singlepoint illumination in galvanometric systems, digital micromirror devices (DMDs) can project optical patterns. A DMD is a spatial light modulator (SLM), comprising a matrix of micromirrors integrated with a semiconductor chip. When light shines on a DMD, highresolution ( pixels) [TI] optical patterns are projected; in our experiment, we use these optical patterns to illuminate a scene.
A challenging drawback of DMDbased projection systems, compared with their galvanometric counterparts, is the presence of scattered light. Light diffracted by edges of DMD structures, such as mirrors, gives rise to scattered light; this light can undergo multiple internal reflections within the DMD and exit as a wide cone of light, Fig. 1. Due to scattered light, scene locations that would ideally be masked by DMD mirrors in the offstate are illuminated; this imperfection hinders the collection of quality images.
The degree to which a DMD system rejects light is quantified by the contrast ratio^{1}^{1}1The fullon/fulloff contrast ratio is defined as , where is the measured luminance when all DMD mirrors are in the onstate, resulting in a fully illuminated screen, and is the measured luminance when all mirrors are in the offstate.. The higher the contrast ratio, the lower the impact of scattered light on image quality. We shall refer to photons originating from a DMD’s scattered light as diffractionphotons, Fig. 1.
There are further challenges. For example, increasing the illumination power may not always be a feasible option for enhancing image quality, such as when imaging sensitive biological samples [Handbook]. Moreover, it is desirable to operate an imaging system in a gateless manner in order to reduce the number of measurements.
This study aims to overcome diffractionphotons and provide an alternative to rasterscanning. Using a SPAD, a DMD, and a codesigned illuminationdeconvolution scheme, we experimentally demonstrate a system capable of acquiring intensity and depth images in an environment tainted by diffractionphotons.
The contributions of this study are as follows:
The central idea of this paper can be summarized as follows:
Project overlapping illumination blocks to overcome diffractionphotons, and let deconvolution algorithms untangle pixel values.
The remainder of this paper is organized into seven sections: In Section II, we introduce some preliminaries. In Section III, we review prior work. In Section IV, we describe a method for scene illumination, and we then present our imageformation models in Section V. In Section VI, we describe details of our experimental setup. In Section VII, we present and discuss our experimental results. In Section VIII, we draw conclusions from our findings and outline future research directions.
Ii Preliminaries
In this section, we introduce some principles that will serve as starting points for the subsequent sections. This section is based in part on [System].
Each pixel, , of the scene has a depth and reflectivity , and is illuminated times by a laser pulse with a waveform . The rate of photons impinging on a SPAD is given by
(1) 
where is the speed of light and is the average photon rate due to ambient light.
The quantum efficiency of the SPAD lowers the photon rate by a factor of , and dark counts, , of the SPAD are added:
(2) 
A timecorrelated singlephoton counting (TCSPC) system quantizes the observation interval, , into timebins of duration . Accordingly, the average number of photon counts in a timebin (per illumination) is
(3) 
where is the index of a timebin, .
After illuminations, the number of photons, , detected in a timebin draws values from a Poisson distribution:
(4) 
The signaltobackground ratio (SBR), a measure of signal fidelity, is defined as follows: SBR, where and are the mean number of signal and noise photons, respectively. Moreover, deadtime is defined as the time a SPAD needs to reset to its initial state after an avalanche event occurs. During this period of time, arriving photons will go undetected.
Iii Prior Work
Based on the method of acquisition employed, the literature can be categorized into the following classes:
RasterScanning: In rasterscanning pixel values are sequentially acquired. Typically, a pair of galvanometric mirrors or a motorized translation stage is used to obtain a scene’s spatial content [First_Photon, Stanford, Photonics, Shin].
Our imaging system and systems in this category require the same number of measurements to reconstruct an intensity and depth image. The main problems, however, associated with mechanical scanning systems such as galvanometric mirrors or motorized translation stages is that they are substantially large, require high power levels, and in the case of galvanometric mirrors, suffer from geometric distortion [ThorlabsD]. A solution to these problems is to employ a DMD, which we use in our experiment.
TimeGated Compressive Sensing: In [Sweep2, Sweep3, Sweep4], 3D imaging systems based on compressive sensing were demonstrated. Using range gating, timeslices of a scene are obtained, from which depth images are reconstructed.
While such systems can be used to reduce the impact of diffractionphotons, a full time sweep is required to construct a 3D image, as gating is needed to distinguish objects that fall within the same range interval.
What sets our imaging system apart from others in this category is that it is gateless, hence requiring fewer measurements to reconstruct a 3D image. The number of measurements in our imaging system is equal to the number of pixels, , as opposed to for gated compressed sensing, where is a selected constant for a given image sparsity/compressibility (typically in the range between 2 and 4) and is the number of timebins ().
Gateless Compressive Sensing: Methods of circumventing the need for time gating have been proposed. In [Andrea], an imaging system able to construct a depth image in a gateless manner was demonstrated. Prior to recovering a scene’s spatial information, parametric deconvolution is first used to estimate the set of depths present in scene. Once the set of depths is determined, a compressive sensing approach is used to recover a scene’s spatial content.
A major obstacle to adopting the aforementioned system is the presence of deadtimeinterference. To give an example, consider the following: an object in the background of a scene may not be visible as photons reflected from it may go undetected due to the deadtime initiated by preceding photons reflected from foreground objects. A way to resolve this obstacle is to use a photo diode operating in the linearmode (analogue mode) [Sun], as opposed to Geigermode (photoncounting mode); this, however, will entail a loss in photon detection sensitivity.
In our experiment, we employ a SPAD (Geigermode), enabling singlephoton detection sensitivity. We also use block illumination (Section IV), which ameliorates deadtimeinterference, allowing our imaging system to operate in gateless manner.
Epipolar Imaging: A relatively new method of image acquisition is epipolar imaging [Epi1, Epi2, Epi3]. In epipolar imaging, an illumination sheet sweeps through a scene, and an array of pixels on an epipolar plane of illumination detects incoming light. This method of imaging can potentially overcome diffractionphotons as it limits indirect light. To realize such system, however, some hardware, such as controllable mirrors, is needed to select an epipolar plane. Our imaging system is free of such hardware, making it simpler to operate.
Iv Illumination
Tradingoff spatial resolution in return for a higher signal power can be made. At low illumination levels, rasterscanning (Fig. 1(a)) can yield inaccurate estimations of intensity and depth images due to the presence of diffractionphotons and low SBR. To improve signal power, one can scan large patches of the scene; this, however, comes at the expense of a loss in spatial resolution (Fig. 1(b)).
A simple yet effective approach to boosting signal power while maintaining a given spatial resolution is to use an illumination window (for example, of size ) that scans the scene in steps of one pixel (Fig. 1(c) provides an illustration). This approach improves signal power as more photons are collected and simultaneously retains the native resolution. Additionally, deadtimeinterference is alleviated as the optical power is concentrated within an illumination block, where depth values of a scene are notably correlated.
A byproduct, however, of using overlapping scanning blocks (Fig. 1(c)) is a blur artifact as photons from adjacent pixels are collected. In Section V, we overcome this challenge via a deconvolution technique designed for the problem at hand.
A key advantage of the proposed scheme is that it provides (via optimization) a means to relax the tradeoff between spatial resolution and signal power.
Let us now discuss the two prime factors that need to be taken into consideration when selecting an illumination window size, :

[label=()]

Deadtime and SPAD saturation: Increasing the window size results in more signal photons arriving at the SPAD. This, however, can improve performance only to a limited extent, because the deadtime causes the SPAD to saturate at high photon rates.

Contrast ratio and diffractionphotons: A DMD with a high contrast ratio requires a relatively small window size, because fewer diffractionphotons are emitted from it; the converse is equally true.
V Image Formation
Using the illumination scheme (Fig. 1(c)) presented in Section IV, we now describe approaches to recovering intensity and depth images from photonarrival events. We first formulate the image reconstruction process as a discrete inverse problem and then solve it by a convex optimization algorithm.
Va Intensity Image
The goal here is to reconstruct an intensity image, , with the aid of the proposed illumination scheme (see Fig. 1(c)). The input–output relationship, in the absence of noise, can be described as follows:
(5) 
where denotes spatial convolution, is the pointspread function that describes the proposed illumination scheme (Fig. 1(c)), is the latent intensity image, and is the observed number of photon counts at pixel .
Let denote the vectorized representations of image —columnwise stacked. Likewise, let denote the observation vector, where is the total number of photon counts at the pixel.
The twodimensional convolution in Eq. 5 can be expressed as matrix–vector multiplication:
(6) 
(7) 
where is the DMD reflection coefficient. When a pixel of the scene is within an illumination block (see Fig. 1(c)), takes on a value of one—full reflection. Other pixels of the scene are, however, moderately illuminated due to the low contrast ratio of the DMD: when a DMD mirror is in the offstate, a fraction of the optical power illuminates the pixels of the scene; we denote this fraction by . More formally:

where denotes the number of rows of the image, and is the length of an illumination window of size (Fig. 1(c) provides an illustration). Additionally, let “” denote the smallest positive integer “” such that .
The following four steps are performed to recover (Eq. 6):

Variancestabilizing transformation:
In Eq. 6, is a Poisson distributed random vector; to use an norm apt for Gaussian noise, we apply an Anscombe transform to convert the signaldependent Poisson noise of to (approximately) additive Gaussian noise with a constant variance [Anscombe]:
(8) 
Denoising:
The aim here is to denoise a blurred image prior to deconvolving it:
(9) where and is a denoised yet blurred image. A constraint is added because the minimum value of Eq. 8 is , which occurs when .

Inverse transformation:
Algebraic inversion of Eq. 8 produces a bias; we therefore use a maximum likelihood (ML) transformation [Foi]:
(10) where can be regarded as the denoised version of (Eq. 6). In this step, a lookup table is used to map the values of to .

Deconvolution:
We now wish to solve the following system of equations . Matrix is illconditioned. Accordingly, we consider a regularized leastsquares approach to recover our latent image:
(11)
We solve 9 and 11 using the alternating direction method of multipliers (ADMM) algorithm. Details of the ADMM algorithm are provided in the Appendix.
Limitation: A limitation of the procedure described (in steps 1–4) is that a low or high value of in Eq. 9 may diminish the quality of the image recovered in step 4 (Eq. 11), and thus one must carefully select . In our experimental results (Section VII), the value of is chosen by inspection: the value of that produces the best result is selected.
VB Depth Image
The aim here is to obtain a depth image, , from photonarrival events. In the absence of noise, the input–output relationship can be expressed as follows:
(12) 
Here, two convolution processes take place:

[label=()]

is a 2D spatial convolution over , describing the convolution of the scene with the proposed overlapping illumination blocks, (Fig. 1(c) provides an example).

is a 1D temporal convolution over , describing the convolution of the scene with the waveform of the illumination pulse, .
Each pixel, , of the scene has a depth, , value and a corresponding timeofflight (ToF), , which can be represented by the following impulse response:
(13) 
where is a Dirac function (Fig. 2(a)) and is the speed of light. Likewise, in the continuous spatial and temporal domain, function in Eq. 12 is zero except at the ToF. Additionally, let denote the number of photons at a given spacetime point .
Discretization: The continuous volume containing the scene is discretized into a voxel grid: the DMD discretizes the scene spatially into pixels, and the TCSPC module discretizes the scene temporally into timebins. In this voxel grid, a discrete depth image is represented by a vector .
To keep the notation light, the discrete version of in this voxel domain is denoted by —akin to Eq. 13, but now for a complete image—and is given by
(14) 
where is the timebin, is the speed of light, is the depth at pixel , is the number of timebins, is the total number of pixels, and is a Kronecker delta function defined as
Each column of matrix in Eq. 14 represents a timeslice of the voxel grid at a particular timebin, , and each row is allotted to a single pixel of the scene. Additionally, matrix is a binary matrix, and the summation of entries along each row of has a value of one—as each pixel can only have a single depth.
The convolution operators in Eq. 12 can be represented as a matrix multiplication:
(15) 
where
(16) 
Here, each row of matrix is a timehistogram of the illumination waveform and is a circular shift of its preceding row. When a row of is multiplied by matrix , the illumination waveform is shifted to the ToF (Fig. 2(a) provides a detailed example).
In Eq. 15, matrix is the observed timehistogram at the detector (Fig. 4 provides an illustration). Each row of is a timehistogram, resulting from convolving the scene with both an illumination window and pulse waveform:
(17) 
Optimization: The inverse problem of recovering (Eq. 15) can be formulated using the following optimization problem:
(18) 
where is a regularizer weighted by . Eq. 18 is a combinatorial optimization problem, and it is challenging to solve within a reasonable period of time (the search space is ).
A tractable method for inferring depth from Eq. 18 can be attained using the following three steps:

Spatial deconvolution:
In Eq. 18, let , and the column of and be denoted as and , respectively (Figs. 3 and 4 provide illustrations). Both and , are dimensional vectors: and , where .
The optimization problem for each timeslice (Fig. 2(b)) of the voxel grid is
(19) Solving this optimization problem for each yields matrix . We solve this convex optimization problem using an ADMM algorithm. Details of the ADMM algorithm are provided in the Appendix.

Intermediate filtering:
Prior to temporally deconvolving our image, we apply an order median filter. Experimentation has shown that this filter significantly improves our results.
Let denote the row of matrix . The order median filter of is
(20) from which we obtain .

Temporal deconvolution:
Using from the previous step, we can now determine directly in a perpixel manner. Vector contains the received signal for pixel , which can be regarded as a delayed and scaled version of the transmitted pulse. The ToF (or signal delay) for each pixel is independently obtained by crosscorrelating with the waveform of the transmitted pulse, , and finding the maximum, as follows:
(21) For simplicity of notation, denotes the element of vector . Using Eq. 21, the depth at pixel, , is , from which we obtain .
Vi Hardware and Experimental Setup
Fig. 5 shows a schematic of the experimental setup. The illumination source is a laser diode (LDHPC650, PicoQuant) with a central wavelength of nm. The laser is controlled by a laser driver (PDL 828 Sepia II, PicoQuant) and emits pulses with a duration of ps (FWHM) and a repetition rate of MHz.
An optical beam expander enlarges the laser beam by a factor of . The DMD (in a DLP 4500 projector, Texas Instruments) spatially modulates the incoming light and projects it towards the scene. The contrast ratio of the DMD is .
Photons reflected by surfaces of the scene are detected by a SPAD (PDM m Series, Micro Photon Devices) with a ps (FWHM) temporal resolution. The SPAD has a quantum efficiency, ns deadtime, active area diameter of m, and dark counts rate of Hz. Photons detected by the SPAD are timestamped, relative to a sent laser pulse, with a resolution of ps by a TCSPC module (PicoHarp 300, PicoQuant).
The scene is spatially sampled by the DMD into pixels, forming an image of size rows by columns. The observation interval is divided by the TCSPC module into timebins, each with a duration of ps. A total of laser pulses are transmitted for each measurement. The integration time for each measurement is ms.
The scene consists of a ball ( cm radius) placed in front of a screen. The ball is cm away from the SPAD, and the horizontal distance between the SPAD and the background screen is cm. Table I provides a summary of the experimental parameters.
Parameter  Value 
Average darkcount rate of SPAD  Hz 
Center wavelength of laser  nm 
Contrast ratio of DMD  
Deadtimeof SPAD  ns 
Diameter of active area of SPAD  m 
Integration time per measurement  ms 
Laser pulse duration  ps 
Number of transmitted laser pulses per measurement  
Number of timebins  
Quantum efficiency of SPAD  
Repetition rate of laser pulses  MHz 
Timing resolution of SPAD  ps 
Timing resolution of TCSPC module  ps 
Nonextensible deadtime [DeadTime]. Full width at half maximum. 
Vii Results and Discussion
In this section, we present and discuss the results of a proofofconcept experiment. The goal is to demonstrate the ability of the designed illuminationdeconvolution scheme (Sections IV and V) to overcome diffractionphotons.
Fig. 6 displays histograms of our experimental data, and the statistics of our measurements are as follows: the average number of noise (ambient + dark) photons was measured and found to be photons/pixel. The average number diffractionphotons with noise was measured and found to be photons/pixel. For consistency, we have taken samples for all measurements presented in this section—this value is equal to the number of pixels of the sought image.
The average number of photons (signal + noise + diffraction) per pixel for rasterscanning, and illumination blocks of size , , and , was measured and found to be , , , and photons/pixel, respectively.
Fig. 6(b) and Figs. 7(a)–7(e), display data gathered from the experiment for the observation vector (Eq. 6) and matrix (Eq. 17), respectively. Fig. 6(d) and Figs. 7(h)–7(j) present the performance of the proposed scheme.
A rich body of research exists on 3D structure and reflectivity imaging with photoncounting detectors. However, this is the first paper to address the issue of diffractionphotons in such systems. We therefore believe that, in addition to presenting the performance of our scheme, it would be beneficial to compare it with commonly used benchmarks ([System] and [First_Photon]).
Fig. 7 and 8 show that the proposed scheme reveals a latent image amid a bath of diffractionphotons ( photons/pixel, Fig. 6). Methods based on traditional rasterscanning [First_Photon, System], however, are adversely affected by diffractionphotons. The reason for this is that the average number of signal photons per pixel for rasterscanning is greatly below the number of diffractionphotons; this results in diminished image quality.
The proposed scheme has an improved performance as it takes codesigned illumination and deconvolution approach to solve the image capturing problem, which boosts the average number of signal photons collected per pixel (Fig. 6) and maintains the native image resolution as the relatively large illumination blocks overlap (Fig. 2). For our experiment, an illumination window size of produced the best results: this size is a workable balance between signal photons collected and blur introduced.
It is important to point out, however, that the performance improvement of the proposed scheme over [System] and [First_Photon] should be viewed with some caution, because [System] and [First_Photon] are designed for imaging environments without diffractionphotons. As such, they produce a lower image quality than our scheme.
Viii Conclusion
This study set out to overcome diffractionphotons. We described a method for acquiring intensity and depth images in an environment tainted by diffractionphotons ( photons/pixel).
A proofofconcept experiment demonstrates the viability of the designed scheme. It is shown that at a low DMD contrast ratio ( ), diffractionphotons can be overcome via a joint illuminationdeconvolution scheme, enabling the acquisition of intensity and depth images.
The scheme works as the number of signal photons collected is boosted by overlapping sizeable illumination blocks. The overlapping blocks mix pixel values, which are subsequently untangled by deconvolution algorithms.
The central conclusion that can be drawn from this study is that the designed scheme offers a means to relax the tradeoff between spatial resolution and signal power—this is achieved mainly through convex optimization techniques.
A promising avenue for future research is to determine the optimal illumination window size mathematically, for a given contrast ratio, deadtime and noise level. Thus far, the optimal window size has been determined by experimental testing. Additionally, a potential research direction is to analyze the sensitivity of the problems in Eqs. 11 and 19 to noise in . Such an analysis would provide insight into how to best deconvolve images captured by SPAD/DMDbased imaging systems.
To conclude, we believe that the findings in this paper will be of special interest to researchers in the field of ToF imaging as it addresses a new practical challenge.
Appendix A alternating direction method of multipliers
In this section we shall denote the shrinkage operator (elementwise soft thresholding) by
Additionally, we denote the projection of onto set and by and , respectively.
Appendix B Derivative Matrices
In this section we describe the derivative matrices used throughout the paper. Let be defined as follows:
where and are the first and second order derivative matrices, respectively; and constant controls the strength of the second derivative. Here, the first and second derivative matrices are defined as
and
where are convolution matrices for the first and second derivatives, respectively; and their subscripts specify the direction in which a derivative operation is performed.
The rationale of including a second derivative in our regularizer is that it encourages the recovery of image curvatures, rendering deblurred images more naturallylooking.
Acknowledgment
The authors wish to express their gratitude to Congli Wang, Muxingzi Li and Qilin Sun for their technical assistance and fruitful discussions. This research was made possible by KAUST Baseline Funding.