Abstract
Singlephoton avalanche diode (SPAD) based transient imaging suffers from an aberration called pileup. When multiple photons arrive within a single repetition period of the illuminating laser, the SPAD records only the arrival of the first photon; this leads to a bias in the recorded light transient wherein the transient response at later timeinstants are underestimated. An unfortunate consequence of this is the need to operate the illumination at lowpower levels to reduce the probability of multiple photons returning in a single period. Operating the laser at low power results in either low signaltonoise ratio (SNR) in the measured transients or reduced frame rate due to longer exposure durations to achieve a high SNR. In this paper, we propose a signal processingbased approach to compensate pileup in postprocessing, thereby enabling high power operation of the illuminating laser. While increasing illumination does cause a fundamental information loss in the data captured by SPAD, we quantify this information loss using CramerRao bound and show that the errors in our framework are only limited to this information loss. We experimentally validate our hypotheses using real data from a lab prototype.
Signal Processing Based Pileup Compensation for Gated SinglePhoton Avalanche Diodes
Adithya Kumar Pediredla,^{1,*} Aswin C. Sankaranarayanan,^{2} Mauro Buttafava,^{3} Alberto Tosi,^{3} and Ashok Veeraraghavan^{1}
^{1}Electrical and Computer Engineering, Rice University, 6100 Main street, Houston, Texas 77005, USA
^{2} Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh 15213, USA
^{3}Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
^{*}akp4@rice.edu
OCIS codes: (000.5490) Probability theory, stochastic processes, and statistics; (010.3640) Lidar; (040.1345) Avalanche photodiodes (APDs); (320.0320) Ultrafast optics;
References and links
 [1] A. Jarabo, B. Masia, J. Marco, and D. Gutierrez, “Recent advances in transient imaging: A computer graphics and vision perspective,” Visual Informatics 1, 65–79 (2017).
 [2] A. Velten, D. Wu, A. Jarabo, B. Masia, C. Barsi, C. Joshi, E. Lawson, M. Bawendi, D. Gutierrez, and R. Raskar, “Femtophotography: capturing and visualizing the propagation of light,” ACM Transactions on Graphics (ToG) 32, 44 (2013).
 [3] “Ultra high speed ICCD camera,” http://stanfordcomputeroptics.com/download/Brochure4Picos.pdf. Accessed: 20180219.
 [4] G. Gariepy, N. Krstajić, R. Henderson, C. Li, R. R. Thomson, G. S. Buller, B. Heshmat, R. Raskar, J. Leach, and D. Faccio, “Singlephoton sensitive lightinfight imaging,” Nature Communications 6 (2015).
 [5] M. O’Toole, F. Heide, D. B. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from singlephoton sensors,” in “Proc. IEEE CVPR,” (2017), pp. 2289–2297.
 [6] N. Franch, O. Alonso, J. Canals, A. Vilà, and A. Dieguez, “A low cost fluorescence lifetime measurement system based on spad detectors and fpga processing,” Journal of Instrumentation 12, C02070 (2017).
 [7] D. Shin, A. Kirmani, V. K. Goyal, and J. H. Shapiro, “Photonefficient computational 3d and reflectivity imaging with singlephoton detectors,” IEEE Transactions on Computational Imaging 1, 112–125 (2015).
 [8] G. Satat, M. Tancik, O. Gupta, B. Heshmat, and R. Raskar, “Object classification through scattering media with deep learning on time resolved measurement,” Optics express 25, 17466–17479 (2017).
 [9] A. K. Pediredla, N. Matsuda, O. Cossairt, and A. Veeraraghavan, “Linear systems approach to identifying performance bounds in indirect imaging,” IEEE International Conference on Acoustics, Speech and Signal Processing (2017).
 [10] A. Pifferi, A. Torricelli, L. Spinelli, D. Contini, R. Cubeddu, F. Martelli, G. Zaccanti, A. Tosi, A. Dalla Mora, F. Zappa, and S. Cova, “Timeresolved diffuse reflectance using small sourcedetector separation and fast singlephoton gating,” Physical Review Letters 100, 138101 (2008).
 [11] L. Bollinger and G. E. Thomas, “Measurement of the time dependence of scintillation intensity by a delayedcoincidence method,” Review of Scientific Instruments 32, 1044–1050 (1961).
 [12] W. Becker, Advanced timecorrelated single photon counting techniques, vol. 81 (Springer Science & Business Media, 2005).
 [13] W. Becker, Advanced timecorrelated single photon counting applications (Springer, 2015).
 [14] D. O’Connor, Timecorrelated single photon counting (Academic Press, 2012).
 [15] S. Cova, A. Longoni, and A. Andreoni, “Towards picosecond resolution with singlephoton avalanche diodes,” Review of Scientific Instruments 52, 408–412 (1981).
 [16] D. E. Schwartz, E. Charbon, and K. L. Shepard, “A singlephoton avalanche diode array for fluorescence lifetime imaging microscopy,” IEEE journal of solidstate circuits 43, 2546–2557 (2008).
 [17] F. Panzeri, A. Ingargiola, R. R. Lin, N. Sarkhosh, A. Gulinatti, I. Rech, M. Ghioni, S. Cova, S. Weiss, and X. Michalet, “Singlemolecule fret experiments with a redenhanced custom technology spad,” Single Molecule Spectroscopy and Superresolution Imaging VI 8590, 85900D (2013).
 [18] L.Q. Li and L. M. Davis, “Single photon avalanche diode for single molecule detection,” Review of Scientific Instruments 64, 1524–1529 (1993).
 [19] I. Gyongy, A. Davies, N. A. Dutton, R. R. Duncan, C. Rickman, R. K. Henderson, and P. A. Dalgarno, “Smartaggregation imaging for single molecule localisation with spad cameras,” Scientific Reports 6 (2016).
 [20] V. Krishnaswami, C. J. Van Noorden, E. M. Manders, and R. A. Hoebe, “Towards digital photon counting cameras for singlemolecule optical nanoscopy,” Optical Nanoscopy 3, 1 (2014).
 [21] Z. Cheng, “CMOSbased single photon avalanche diode and timetodigital converter towards pet imaging applications,” Ph.D. thesis (2016).
 [22] A. Torricelli, D. Contini, A. Pifferi, M. Caffini, R. Re, L. Zucchelli, and L. Spinelli, “Time domain functional nirs imaging for human brain mapping,” Neuroimage 85, 28–50 (2014).
 [23] M. Mazurenka, H. Wabnitz, A. Dalla Mora, D. Contini, A. Pifferi, R. Cubeddu, A. Tosi, F. Zappa, and R. Macdonald, “Development of an optical noncontact timeresolved diffuse reflectance scanning imaging system,” Biomedical Optics pp. BTu3A–50 (2012).
 [24] M. Mazurenka, A. Jelzow, H. Wabnitz, D. Contini, L. Spinelli, A. Pifferi, R. Cubeddu, A. Dalla Mora, A. Tosi, F. Zappa et al., ‘‘Noncontact timeresolved diffuse reflectance imaging at null sourcedetector separation,” Optics Express 20, 283–290 (2012).
 [25] L. Di Sieno, H. Wabnitz, A. Pifferi, M. Mazurenka, Y. Hoshi, A. Dalla Mora, D. Contini, G. Boso, W. Becker, F. Martelli et al., “Characterization of a timeresolved noncontact scanning diffuse optical imaging system exploiting fastgated singlephoton avalanche diode detection,” Review of Scientific Instruments 87, 035118 (2016).
 [26] A. Maccarone, A. McCarthy, X. Ren, R. E. Warburton, A. M. Wallace, J. Moffat, Y. Petillot, and G. S. Buller, “Underwater depth imaging using timecorrelated singlephoton counting,” Optics Express 23, 33911–33926 (2015).
 [27] C. Niclass, M. Soga, and E. Charbon, “3d imaging based on single photon detectors,” 2nd Symposium on Range Imaging (2007).
 [28] A. Kirmani, D. Venkatraman, D. Shin, A. Colaço, F. N. Wong, J. H. Shapiro, and V. K. Goyal, “Firstphoton imaging,” Science 343, 58–61 (2014).
 [29] D. Shin, F. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. Wong, and J. H. Shapiro, “Photonefficient imaging with a singlephoton camera,” Nature Communications 7 (2016).
 [30] D. Shin, J. H. Shapiro, and V. K. Goyal, “Computational singlephoton depth imaging without transverse regularization,” IEEE International Conference on Image Processing (2016).
 [31] M. Buttafava, J. Zeman, A. Tosi, K. Eliceiri, and A. Velten, “Nonlineofsight imaging using a timegated single photon avalanche diode,” Optics Express 23, 20997–21011 (2015).
 [32] C.Y. Tsai, K. N. Kutulakos, S. G. Narasimhan, and A. C. Sankaranarayanan, “The geometry of firstreturning photons for nonlineofsight imaging,” CVPR (2017).
 [33] G. Gariepy, F. Tonolini, R. Henderson, J. Leach, and D. Faccio, “Detection and tracking of moving objects hidden from view,” Nature Photonics (2015).
 [34] A. K. Pediredla, M. Buttafava, A. Tosi, O. Cossairt, and A. Veeraraghavan, “Reconstructing rooms using photon echoes: A plane based model and reconstruction algorithm for looking around the corner,” IEEE International Conference on Computational Photography (2017).
 [35] F. Heide, M. O’Toole, K. Zhang, D. Lindell, S. Diamond, and G. Wetzstein, “Robust nonlineofsight imaging with single photon detectors,” arXiv preprint arXiv:1711.07134 (2017).
 [36] P. Coates, “The correction for photon pileup’ in the measurement of radiative lifetimes,” Journal of Physics E: Scientific Instruments 1, 878 (1968).
 [37] P. Coates, “Analytical corrections for dead time effects in the measurement of timeinterval distributions,” Review of Scientific Instruments 63, 2084–2088 (1992).
 [38] C. Davis and T. King, “Single photon counting pileup corrections for timevarying light sources,” Review of Scientific Instruments 41, 407–408 (1970).
 [39] T. Luhmann, “Statistics and dead time correction of twoparticle timeofflight coincidence experiments,” Review of Scientific Instruments 68, 2347–2356 (1997).
 [40] M. Renfro, S. Pack, G. King, and N. Laurendeau, “A pulsepileup correction procedure for rapid measurements of hydroxyl concentrations using picosecond timeresolved laserinduced fluorescence,” Applied Physics B: Lasers and Optics 69, 137–146 (1999).
 [41] J. G. Walker, “Iterative correction for pileup’ in singlephoton lifetime measurement,” Optics Communications 201, 271–277 (2002).
 [42] M. Patting, M. Wahl, P. Kapusta, and R. Erdmann, “Deadtime effects in TCSPC data analysis,” International Congress on Optics and Optoelectronics 6583, 658307 (2007).
 [43] T. Rebafka, F. Roueff, and A. Souloumiac, “Information bounds and mcmc parameter estimation for the pileup model,” Journal of Statistical Planning and Inference 141, 1–16 (2011).
 [44] J. Arlt, D. Tyndall, B. R. Rae, D. D.U. Li, J. A. Richardson, and R. K. Henderson, “A study of pileup in integrated timecorrelated single photon counting systems,” Review of Scientific Instruments 84, 103105 (2013).
 [45] S. M. Kay, “Statistical signal processing,” Estimation Theory 1 (1993).
 [46] D. W. Scott, Multivariate density estimation: theory, practice, and visualization (John Wiley & Sons, 2015).
 [47] “Dataset captured and all the codes used in this paper,” https://www.dropbox.com/sh/8zyk5r0qla4c6xc/AADns1HgBi40P1pU1YJwKie3a?dl=0. Accessed: 20180219.
 [48] S. Cova, M. Ghioni, A. Lacaita, C. Samori, and F. Zappa, “Avalanche photodiodes and quenching circuits for singlephoton detection,” Applied optics 35, 1956–1976 (1996).
 [49] Q. Hernandez, D. Gutierrez, and A. Jarabo, “A computational model of a singlephoton avalanche diode sensor for transient imaging,” arXiv preprint arXiv:1703.02635 (2017).
 [50] S. M. Ross, Introduction to probability models (Academic press, 2014).
 [51] M. Buttafava, G. Boso, A. Ruggeri, A. Dalla Mora, and A. Tosi, “Timegated singlephoton detection module with 110 ps transition time and up to 80 MHz repetition rate,” Review of Scientific Instruments 85, 083114 (2014).
Appendix A Introduction
Imaging at ultrahigh speed is pivotal in many scientific disciplines. From demystifying the hovering of a hummingbird to the accurate hunts of dragonflies, increasing the speed of imaging has unlocked many secrets in nature. Today, highspeed imaging is capable of imaging at picosecond time scales [1]; at such resolutions, we can observe lightinflight or light as it propagates in a scene [2, 3, 4, 5]. One approach for achieving this is by using a singlephoton avalanche diode (SPAD) coupled with timecorrelated single photon counting (TCSPC) electronics; this device has found use in fluorescencelifetimeimaging [6], depth sensing [7], imaging through scattering medium [8], and imaging beyond lineofsight [4, 9] (Figure 1).
SPADs work by detecting the firstarriving photon after the scene is illuminated by a laser pulse. The illumination source, typically sends few millions picosecond pulses per second. By binning the time of arrival of the firstarriving photons after each laser pulse, the TCSPC records a histogram of photon counts as a function of arrival time. The measured histogram is proportional to the transient response if the probability of measuring a photon in a time bin is proportional to the transient response. Unfortunately, this is true only when the laser illumination power is sufficiently low so that there is at most one returning photon after each pulse of the laser. To understand why, note that the probability of photon arrivals increases with laser power. Suppose that the laser power is large enough so that multiple photons return in each cycle. The SPAD only records the first returning photon in each cycle and, hence, does not bin the laterarriving photons. This causing a systematic bias between the measured histogram and the true transient response; in particular, this bias is such that there is an undercounting of photons with larger time of arrival. This effect is called pileup, since the photons appear to pileup near the origin due to uneven bias introduced in the transient response (Figure 1(c)). In the limit, when the laser power is extremely high, we will not measure any photons with large time of arrivals. An undesirable consequence of pileup is that SPADbased TCSPC is often performed at very low illumination power which leads to inefficiencies in the form of long exposure times and poor SNR.
To avoid pileup, Pifferi et al. [10] propose a gated use of the SPADs where in the SPAD is selectively turned off for some duration every cycle. Typically the gate is used to block photons from the highphoton rate parts of the transient such as to decrease their probability of photon arrival. The gated operation decreases the pileup effect significantly, but the number of cycles the SPAD does not detect any photon remains high. Gatedmode SPAD operation is also ineffective in the presence of strong background wherein the high photonrate regions of the transient is not temporally localized.
In this paper, we propose a signalprocessing approach that enables the operation of gated SPADs even in the presence of high illumination power; we show that the number of cycles the SPAD detects a photon can be increased to 90% from the typical current efficiency of . The transient recorded by the TCSPC circuit is still biased, but we show that it can be unbiased via the use of appropriate postprocessing techniques (Figure 1(d)). Our approach works in tandem with the SPAD gate and further increases the intensity levels the SPAD can accurately record transients.
Contributions:
We make the following contributions:

Estimationtheoretic framework for modeling pileup: We derive maximumlikelihood (ML), and maximum a posteriori (MAP) estimates for the transient response of the SPAD. We also derive an asymptotic bound (CramerRao bound) on the estimation errors.

Experimental validation: We have built experimental hardware with the SPAD and TCSPC devices and showed that the SNR of the transients improves with our estimation framework. We also show that even in the presence of strong ambient illumination that causes the pileup, we can recover the underlying signal.
Limitations:
The signal processing approach provided in this paper compensates for the pileup effect that happens due to the dead time of the SPAD and TCSPC devices. However, the SPADs also suffer from finite quantum efficiency, dark counts, after pulsing, gateringing, and timewalk artifacts. Our theory does not account for these effects. The quantum efficiency is a linear phenomenon equivalent to decreasing the illumination power by a constant amount and hence, can be neglected. Other nonlinear effects are minor, as the specs of a typical SPAD (refer Section E) show that our approximations are reasonable.
Appendix B Prior work
Timecorrelated single photon counting was first introduced by Bollinger and Thomas [11] in 1961. A lot of progress was made since then, including the works of Becker [12, 13], O’Connor and Desmund [14]. Usage of SPADs for TCSPC for picosecond temporal resolution was first shown by Cova et al. [15] in 1981. Since then, a lot of novel applications at both microscopic scale and macroscopic scale were designed with the SPADbased TCSPCs. Imaging applications that can image objects beyond the lineofsight at macroscopic scale were also becoming famous. Below, we review some of the recent advances in microscopic and macroscopic imaging applications with the SPAD.
b.1 Microscopy and Spectroscopy
Schwartz et al. [16] designed and characterized a fully integrated SPAD imager with a timetodigital converter (TDC) for fluorescence lifetime imaging. Panzeri et al. [17] characterized the use of SPADs in singlemolecule fluorescence resonant energy (FRET) studies on freely diffusing molecules in confocal and alternating laser excitation schemes. Li et al. [18] first proposed to detect single molecules with SPADs. Recently, SPADs are also employed in other singlemolecule studies like localization [19] or optical nanoscopy [20]. Cheng et al. [21] developed a noninvasive positronemission tomography with SPADs. The SPADbased TCSPCs were extensively used in timedomain functional near infrared spectroscopy (NIRS) imaging for human brain mapping [22]. Mazurenka et al. [23, 24] developed a noncontact timeresolved diffuse reflectance imaging using a fast gated single photon counting for detection of absorption changes few centimeters deep of tissue (turbid medium). Sieno et al. [25] characterized these noncontact diffuse optical imaging systems.
b.2 Macroscopy
Many of the macroscopic applications targeted low noise and high sensitivity of SPADs. For example, Maccarone et al. [26] designed a SPADbased low power underwater depth imaging system. Niclass et al. [27] proposed a fast 3D imaging sensor with SPADs based on the time of arrival of the photons. Kirmani et al. [28] proposed a lowphotonflux imaging system that can reconstruct both depth and reflectivity of a scene from the first arriving photons. Shin et al. [29, 30] extended this system that can reproduce similar results even with a nanosecond jitter system but with prior knowledge on the scene. Gariepy et al. [4] used a 3232 SPAD array to capture transient images directly at 67 ps temporal resolution. Buttafava et al. [31] repurposed SPADs to look around the corners, and reconstruct objects beyond lineofsight. Pediredla et al. [9] employed a linear systems approach to derive geometric and photometric bounds of looking around corners with SPADs. Tsai et al. [32] improved on the technique proposed by Buttafava et al. using the arrival times of only firstphotons coming from the hidden object. Gariepy et al. [33] used SPADs to detect and track slow moving objects around the corners. Pediredla et al. [34] designed a system based on SPADs to image hiddenrooms if one wall is visible from a window or door. Heide et al. [35] proposed algorithmic framework robust to partial occlusions in the hidden scene and recovered the hidden object with data captured from SPADbased TCSPC system. All these methods are inherently affected by pileup.
b.3 Pileup compensation
Most of the priorart on pileup compensation was targeted at lifetime applications and for the nongated mode. Early work by Coates [36, 37] proposed a pileup algorithm to measure the radiative lifetimes. Davis and King [38] proposed to use approximations for the Poisson process to develop a pileup compensation technique. Luhmann [39] created a calculation method to compensate for the deadtime in twoparticle coincidence experiments. Renfro et al. [40] proposed an iterative technique called saturateandcompare for measuring fluorescence lifetime. Walker [41] proposed an iterative technique to compensate for the pileup under variable pulse energy and showed the results of simulations. Patting et al. [42] proposed a modelbased approach to compensate for the dead time and the associated pileup. Rebafka et al. [43] used concepts of information theory to decrease the acquisition time for lifetime imaging. Arlt et al. [44] studied the effect of the pileup in integrated solidstate TCSPC sensors with small dead time.
In all these techniques, the analysis is mostly experimental, and no guarantees on the efficacy of the recovered parameters are ever made, except maybe in [43] where the goal is to minimize the acquisition time of timeresolved fluorescence for nongated mode SPAD operation. These techniques do not apply mature signal processing approaches for pileup compensation. In this work, we deploy some fundamental estimationtheoretic methods from signal processing literature on the transients captured by the SPADbased TCSPCs. We believe that the vast literature of estimation theory [45, 46] will further improve the results that we show in the paper and hence, we made both the code and data publicly available [47].
Appendix C Modelling the transients captured by the SPADbased TCSPC
In this section, we will develop a probabilistic model for the histogram measured by the TCSPC. We will first explain the operation of the SPADbased TCSPC following which we will develop a simulation framework and show how pileup appears at high illumination power.
c.1 Operation of the SPADbased TCSPC
At a highlevel, TCSPC works by binning the photons detected by the SPAD, based on their arrival times. The arrival time of a photon is calculated by measuring the time difference between the SPAD detection and the nearest illumination laser pulse. Figure 2 shows a sketch of how TCSPC creates the transients.
When a photon hits the SPAD active area, an avalanche current builds up inside the device and is detected by the readout electronics (which generates the corresponding output signal). This current is then quenched by dedicated circuitry (quenching phase) in few nanoseconds and the detector is kept OFF for few tens of nanoseconds (holdoff time), in order to decrease its afterpulsing probability [48]. The sum of quenching time and holdoff time is commonly known as the SPAD deadtime. We illustrate the effect of deadtime in Figure 3. We notice that the photons corresponding to pulses that are colored red are not detected by the SPAD as they arrive during its deadtime. Note that the entire circuitry is operated with Nuclear Instrumentation Module (NIM) standard with negative true (0.8 volts) and hence the pulses are loosely referred as NIM pulses throughout the paper.
Recently, gated mode operation for SPADs is becoming increasingly common. Gated mode operation helps SPADs to turn ON only when the photons are expected and hence, reject a lot of unwanted photons. In the gated operation mode, the SPAD turns ON only when the gate signal is high. The gate signal is synchronized with the laser pulse signal. Typically, the gate signal width is less than the duration between two consecutive laser pulses (if not, there is no real advantage in using the gate signal as the SPAD will be always ON). In gated mode operation, a photon is not detected if it arrives either when the gate signal is low, or during the SPAD deadtime. It should also be noted that, at the end of the deadtime period, the SPAD turns back ON only at the next useful rising edge of the gate signal. In Figure 4, the laser pulses and gate signals of a typical case are shown, along with the photons reaching the sensor. Among these photons, the ones tagged 1, 3, 4, 5 are not detected by the SPAD due to three different reasons. The first photon is rejected as the gate is OFF at its arrival time. The third and fourth photons are rejected as they arrive during the SPAD deadtime. The fifth photon arrives when the SPAD gate signal is high, but the SPAD is still OFF as the rising edge of the current gate precedes the end of the deadtime period, and this results in the rejection of the fifth photon as well (the SPAD will turn back ON at the next gate signal rising edge).
Notice that, in gated mode, the SPAD can only detect one photon per laser cycle. Even if the SPAD deadtime is shorter than the laser period the detector won’t turn back ON in the same cycle, as the rising edge of the gate signal will only arrive in the next laser pulse. To minimize the photons that are not detected by the SPAD, it is recommended to keep the illumination intensity lower enough such that photon detection rate (number of photons detected per second) remains within 15% of the laser repetition rate. In sharp contrast, the techniques developed in this paper enable the SPAD operation at photon detection rates close to 90%.
c.2 Simulating SPADs
Based on the understanding of the operation of the SPAD, we can simulate their operation if we know the photon arrival rate of each bin. In most literature, photon arrivals are modeled as the Poisson process. In this section, we empirically verify that this model is indeed accurate. In Figure 5(a), we plot the mean and variance of the photons captured by our set up for 600 trials and integration period of 0.1 seconds per trial. The illumination is maintained at a very low value so that the pileup does not affect the transient measurements. We observe that the mean and variance of any temporal bin are equal, just as it would if the arrival were a Poisson process. Further, the density of the photons in all temporal bin fits a Poisson density (Figure 5(b)). Hence, we model the photon counts at any temporal bin as a Poisson distribution, given as
(1) 
where is a random variable, denoting the number of photons arriving at the bin per laser pulse. is the mean photon count of the bin.
Next, we develop a simulation framework for the histogram measurements using the operating principles of the SPAD and the TCSPC coupled with the Poisson model for the photon count. The details are provided in Algorithm 1. The algorithm depends on the system parameters of the SPAD (namely integration time, dead time of the SPAD, laser repetition rate, and sampling rate of TCSPC), the illumination power, and the ideal transient response of the scene for one laser pulse (). Notice that the algorithm is embarrassingly parallel and we have implemented it on a GPU, which is 1000 faster than a CPU implementation. The GPU version of the MATLAB code is available at [47]. Hernandez et al. [49] proposed a GPU accelerated simulation model for the SPAD operated in nongated mode. Our algorithm is similar to that of Hernandez et al.’s algorithm, but for gated mode.
Using Algorithm 1, we can observe the pileup effect as we increase the illumination power. We consider the transient of a scene with two peaks and compute the histogram captured by the SPAD at higher illumination power. In Figure 6(a), we show the normalized transient measured by the SPAD simulation along with the ground truth transient. We can notice that the photons appear to pileup near the origin. However, from Figure 6(b), we can observe that the photons are dropped at all the time bins; Photons from the time bins that are farthest from the origin are dropped more compared to the photons near the origin.
c.3 Probabilistic forward model for TCSPC histogram
Let be the ideal transient response per laser cycle at light level . Note that denotes the bin index of the TCSPC histogram, starting from the bin immediately after the gate is turned ON. Hence, bin index ‘1’ denotes the bin exactly after the gate is turned ON. The measurement by an ideal transient imager for a laser cycle will be , as noted in Section C.2.
In each cycle of the laser, the SPAD can measure the arrival time of at most one photon. Let denote of the event that the SPADbased TCSPC measured the photon arrival in the bin. The event happens when no photons arrive in the first bins, and at least one photon arrives in the th bin, and therefore happens with the probability
We further define as a special event where no photons arrived at any timebin in a laser cycle that SPAD is not under quenching mode. Therefore,
If all the arrival rates (s) are very small, then
Therefore at low photon count rates, the histogram captured by the SPAD is proportional to the transient response. If the low photon count assumption is not valid, then the histogram is not proportional to the transient response. We derive the histogram for all cases (both low illumination and high illumination) next.
Let be an observed histogram for integration time ; note that we have introduced the count of no photons arrivals () in the observed () even though we do not directly observe . We will compute by first computing the number of cycles the SPAD is active () and then subtracting the total number of cycles () we have a photon arrival.
The total number of laser cycles in integration time is where is the repetition time of the laser. Recall from Figure 4 that the number of cycles that the SPAD is active is less than or equal to the number of laser cycles. Each time the SPAD detects a photon in time bin, the SPAD will not detect the photon for laser cycles, where is hold off or dead time and is bin width of histogram. Therefore the total number of laser cycles the SPAD is active is given by
(2) 
If the holdoff time () is very small compared to repetition time of the laser () then , i.e., the SPAD is active for all the laser cycles. Note that the above equation is valid only for gated SPAD operation as nongated operation can measure more than a photon per laser pulse.
We now have all the ingredients to compute the probabilistic forward model for the TCSPC histogram. Our first observation is that the histogram measured by the SPAD can also be modeled as a multinomial distribution. Recall that the multinomial distribution models the event of tossing a faced dice times and observing the vector , where denotes the number of times face showed up in the toss. We can now map each event in to a face of the dice; hence . The number of tosses is equal to the number of the cycles that the SPAD is active. Hence, the probability of observing the histogram is given as
(3) 
Equation (3) is the probabilistic forward model that gives the relationship between the histogram measured by TCSPC and the true photon arrival rates. Equation (3) states that the probability mass function (pmf) of the histogram of the bin conditioned on the histogram observed in all the previous time bins is binomial distributed and the true photon arrival rate of the bin. In the next sections, we will derive inverse models to computationally recover the underlying transients given the histogram measured by TCSPC.
Appendix D Compensating pileup
Given the forward model and the experimental photon count measurements in the form of a histogram, we now seek to estimate the transient response . We derive two estimates for the transient response: first, a maximumlikelihood (ML) estimate and, second, a conjugate priorbased maximumaposteriori (MAP) estimate.
d.1 ML Estimate
Given an observation , the likelihood is simply the conditional probability
Maximizing the loglikelihood with respect to s provide the MLestimate of :
(4) 
In Figure 7, we show the efficacy of our estimate in recovering the underlying transient signal from the piledup transient responses shown in Figure 6.
d.1.1 CramerRao bound
The CramerRao bound provides a lower bound on the error of unbiased estimators. In this section, we will derive the CramerRao lower bound on the estimation of transients, explain the implications of the bound, and show that MLestimation reaches this bound for practical scenarios.
The CramerRao bound for the variance of any unbiased estimator of the transient rate is equal to
(5) 
We provide the derivation in the Appendix. Since is directly proportional to the illumination intensity, increasing illumination increases the error in the ML estimate. Therefore, for very large illumination, it will become increasingly difficult to capture the transient response with SPADs, which is intuitive as well – for very large illumination power, the first arrival photon at cycle will occur at first time bin leading to a massive information loss.
A second observation from (5) is that increasing integration time decreases the estimation error. As integration time increases, the number of cycles will increase, decreasing the estimation error. Therefore, for any finite illumination power for a given system, the estimation error can be made arbitrarily small by increasing integration period.
A third observation from (5) is that bins with higher true transient value tend to have a higher error. In fact, bins that have zero count rate will have zero estimation error. As the arrival process is Poisson, higher the mean, higher will be the variance leading to this condition.
A fourth observation from (5) is that early histogram bins tend to have smaller estimation error than later histogram bins. For two time bins with same , the denominator of (5) will be smaller for the later histogram bin. Hence, the error of the later histogram bin will be higher. This is also intuitive from the pileup effect that causes more photons in the early time bins, hence less variance, compared to the later timebins.
In Figure 8, we show the CramerRao lower bound on the estimation error and the error of the ML estimate for a fixed integration time of 10 s. We can observe that they are almost equal implying that the error in ML estimate converges to CramerRao bound. This is a wellknown phenomenon in signal processing community and happens because ML estimate is asymptotically efficient [50] and reaches CramerRao bound for a large number of cycles. As the repetition rate of the laser is in the order of few million cycles per second, the ML estimate reaches CramerRao bound even for exposure duration as small as 10 s. Hence, for most practical SPAD usages, the MLestimate is the best unbiased estimate for solving pileup.
d.2 MAP Estimate
One way to circumvent the limits imposed by CramerRao bound (especially at smaller ) is to use prior knowledge on to get a better estimate. This can be achieved by using the MAP estimate with various priors on . We can assume that is a sparse vector and use regularizer, which is equivalent to modeling s as being Laplacian distributed. We can also use the Gaussian distribution ( regularizer) on , which is another standard prior. However, both Laplacian and Gaussian priors can result in negative values for with nonzero probability. Also, Laplacian and Gaussian priors do not give closed form solutions for and iterative techniques must be employed for solving them. Hence, we use the conjugate prior for the multinomial distribution, which guarantees a closed form solution for .
The conjugate prior for the binomial distribution is the beta distribution. Since in (3) is a multinomial distribution (dimensional binomial), we employ dimensional beta distribution as its conjugate prior. The beta distribution prior restricts the values of all s to the positive side of the real line. Hence, the optimization problem that we have is
where is beta function and and are shape parameters. Solving the above equation, we get the estimate . Note that when , all values of are equally likely, and we get ML estimate. Also, for a large number of cycles s, the data term will dominate the prior knowledge term and will result in ML estimate.
We know that many s are close to zero as the transient response is typically sparse. Therefore, we choose the prior to be more skewed to increase the probability for by choosing and .
In Figure 8, we show that at low integration times (s), the MAP estimate can result in an error that is smaller than both the ML estimate and the CR bound (lower bound on all unbiased estimates). However, as the integration time increases (s), this improvement became very insignificant even under high illumination conditions. The hardware setup in our experiments cannot measure with integration times less than a millisecond, and hence, the comparison between ML estimate and MAP estimate is limited to theoretical and simulation analysis in this paper.
Appendix E Experimental setup
The experimental setup developed for the validation of the pileup compensation framework is shown in Figure 9. A supercontinuum laser source (SuperK extreme, NKT photonics A/S) produces light pulses with 5 ps width at 78 MHz of repetition rate, in a wavelength range from 400 nm to 2400 nm. The repetition rate is controlled with a pulsepicker to decrease the repetition rate up to 2.1 MHz. A tunable bandpass filter (SuperK varia, NKT photonics A/S) is used to select the operating wavelength interval of 550 nm 20 nm, achieving an average light power of around 0.25 mW. The fibercoupled output of the bandpass filter is collimated (PAFX11, Thorlabs Inc.) and passed through a double polarizer to adjust the illuminating power of the entire scene. The beam is then divided into two paths using a 50/50 beam splitter (CM1BS013, Thorlabs Inc.) and directed toward two white cardboardmade patches, placed at different distances from the beam splitter. A part of the photons scattered from the patches can reach the detection system at different time instants, giving rise to two wellseparated peaks in the TCSPC histogram.
The detection system is composed of a fastgated singlephoton counting module [51], hosting a singlepixel 50 m activearea diameter CMOS SPAD. The module exhibits a photon detection efficiency higher than 30% at 550 nm, with a temporal resolution of 35 ps FWHM (fullwidth at half maximum) and a darkcount noise lower than 100 counts per second (cps). The SPAD can be operated either in gatedmode (with adjustable gate width and less than 200 ps transition times) or in freerunning mode, both with selectable holdoff time (120 ns in these experiments, determining a maximum count rate of 7.77 Mcps). The photonout output pulse from the detection module acts as a START for the timemeasurement system, while the synchronization signal from the laser acts as STOP. As a timemeasurement system, we used a PicoHarp 300 TCSPC module (PicoQuant GmbH), able to acquire photon arrival times with 4 ps resolution, 260 ns of range and 10 Mconv/s of maximum conversion rate. The TCSPC histograms are downloaded to a PC via a USB 2.0 interface for data analysis. When operated in gatedmode, the SPAD module also needs a trigger signal for generating the ONtime window; this signal is derived from the laser synchronization signal and properly delayed (using an adjustable picosecond, Microphoton Devices s.r.l.), being able to timeshift the ONtime window position of the SPAD respect to the laser pulse.
Appendix F Experiments
We performed our experiments in the twopath scene shown in Figure 9. If the illumination intensity is highenough or if the background photons are strong enough then multiple photons will arrive at the SPAD within the same laser cycle.
Figure 10 shows the effect of increasing the illumination power for the twopath scene. As the illumination is increased, the first peak becomes stronger, and the second peak decreases. ML estimate from the proposed framework recovers the relative amplitude of both the peaks accurately. We quantify the intensity of the illumination with metric which represents the probability of two or more photons arriving at the SPAD in a laser cycle. From (3), this value is given by . However, as we do not have access to in real data, we approximate with .
Figure 11 shows the effect of increasing the background for the twopath scene. When the background (ambient illumination) is increased steadily, the first peak starts appearing stronger than the second peak, and a further increase in background intensity resulted in transients where the peaks due to the objects are indistinguishable from the peaks due to noise at the early part of the transient. Note that the gate is turned ON around 144 ns for a duration of 5 ns. We can observe that our framework is able to recover the transient even when 90% of the photons are not detected by the SPADbased TCSPC technique due to the dead time.
Appendix G Conclusions
To summarize, when multiple photons arrive at the SPAD in less than a laser cycle, only first photon is detected, and other photons are dropped by the SPAD. We model the photons accepted by the SPAD as a function the truephotons arriving at the SPAD. By inverting this model, we recover the lost photons computationally.
Our inversion algorithm allows for highillumination operation of the SPAD decreasing the integration time. SPADbased techniques such as transient imaging or looking around the corners currently have poor integration times (order of minutes) and cannot image dynamic phenomenon. The integration times of these imaging methods can potentially be decreased by first employing a high power laser and then compensating for the pileup applying the techniques in this paper.
Datadriven priors, based on the imaging application can further improve the transient estimation similar to what we have shown in Section D.2. However, usage of datadriven priors mostly will not lead to simple algorithms that we have focused in this paper, but may potentially decrease the integration times further leading to realtime operation of the nonlineofsight imaging or tracking around the corners.
Acknowledgments
This work was supported in part by NSF CAREER Grants IIS1652633 and CCF1652569, DARPA REVEAL grants HR001116C0028 and HR00111620021, ONR grant N000141512735, and the BigData PrivateCloud Research Cyberinfrastructure MRIaward funded by NSF under grant CNS1338099.
Appendix
Proof for CRBound
Fisher information matrix is
From (3), due to separability of , . If , we have
The CramerRao bound states that cov. As is a diagonal matrix, we have