Sparse Poisson Intensity Reconstruction Algorithms
Abstract
The observations in many applications consist of counts of discrete events, such as photons hitting a dector, which cannot be effectively modeled using an additive bounded or Gaussian noise model, and instead require a Poisson noise model. As a result, accurate reconstruction of a spatially or temporally distributed phenomenon () from Poisson data () cannot be accomplished by minimizing a conventional objective function. The problem addressed in this paper is the estimation of from in an inverse problem setting, where (a) the number of unknowns may potentially be larger than the number of observations and (b) admits a sparse approximation in some basis. The optimization formulation considered in this paper uses a negative Poisson loglikelihood objective function with nonnegativity constraints (since Poisson intensities are naturally nonnegative). This paper describes computational methods for solving the constrained sparse Poisson inverse problem. In particular, the proposed approach incorporates key ideas of using quadratic separable approximations to the objective function at each iteration and computationally efficient partitionbased multiscale estimation methods.
Sparse Poisson Intensity Reconstruction Algorithms
Zachary T. Harmany, Roummel F. Marcia, and Rebecca M. Willett ^{†}^{†}thanks: This work was supported by NSF CAREER Award No. CCF0643947, NSF Award No. DMS0811062, DARPA Grant No. HR0011071003, and AFRL Gran No. FA865007D1221. 
Duke University, Department of Electrical and Computer Engineering, Durham, NC 27708 USA 
Index Terms— Photonlimited imaging, Poisson noise, wavelets, convex optimization, sparse approximation, compressed sensing
1 Introduction
In a variety of applications, ranging from nuclear medicine to night vision and from astronomy to traffic analysis, data are collected by counting a series of discrete events, such as photons hitting a detector or vehicles passing a sensor. The measurements are often inherently noisy due to low count levels, and we wish to reconstruct salient features of the underlying phenomenon from these noisy measurements as accurately as possible. The inhomogeneous Poisson process model [1] has been used effectively in many such contexts. Under the Poisson assumption, we can write our observation model as
(1) 
where is the signal or image of interest, linearly projects the scene onto a dimensional set of observations, and is a length vector of observed Poisson counts.
The problem addressed in this paper is the estimation of from in a compressed sensing context, when (a) the number of unknowns, , may be larger than the number of observations, , and (b) is sparse or compressible in some basis (i.e. and is sparse). This challenging problem has clear connections to “compressed sensing” (CS) (e.g., [2]), but arises in a number of other settings as well, such as tomographic reconstruction in nuclear medicine, superresolution image reconstruction in astronomy, and deblurring in confocal microscopy.
In recent work [3], we explored some of the theoretical challenges associated with CS in a Poisson noise setting, and in particular highlighted two key differences between the conventional CS problem and the Poisson CS problem:

unlike many sensing matrices in the CS literature, the matrix must contain all nonnegative elements, and

the intensity , and any estimate of , must be nonnegative.
Not only are these differences significant from a theoretical standpoint, but they pose computational challenges as well. In this paper, we consider algorithms for computing as estimate of from observations according to
where

is the Poisson likelihood of given intensity ,

is the element of the vector ,

is a collection of nonnegative estimators such that for all , and

is a penalty term which measures the sparsity of in some basis.
This paper explores computational methods for solving (1). The nonnegativity of and results in challenging optimization problems. In particular, the restriction that is nonnegative introduces a set of inequality constraints into the minimization setup; as shown in [4], these constraints are simple to satisfy when is sparse in the canonical basis, but they introduce significant challenges when enforcing sparsity in an arbitrary basis. We will consider several variants of the penaly term, including , for some arbitrary orthonormal basis , and a complexity regularization penalty based upon recursive dyadic partitions. We refer to our approach as SPIRAL (Sparse Poisson Intensity Reconstruction ALgorithm).
The paper is organized as follows. We first consider related optimization methods for solving the compressed sensing problem in Sec. 2. Next, in Sec. 3, we consider three methods for solving the constrained penalized Poisson likelihood estimation problem with different penalty terms. The first two approaches use an penalty while the third uses partitionbased penalty functions. Experimental results comparing the proposed approaches with expectationmaximization (EM) algorithms are presented in Sec. 4.
2 Sparsityregularized optimization
Without any nonnegativity constraints on , the Poisson minimization problem (1) can be solved using the SpaRSA algorithm of Figueireido et al. [5], which solves a sequence of minimization problems using quadratic approximations to the log penalty function:
(2) 
where
(3) 
is a positive scalar chosen using BarzilaiBorwein (spectral) methods (see [5] for details), and
is the negative log Poisson likelihood in (1). If the penalty term in (2) is separable, i.e., it is the sum of functions with each function depending only on one component of , then (2) can be solved componentwise, making the problem (2) relatively easy to solve.
Much of existing CS optimization literature (e.g., [6, 7, 8]) focuses on penalty functions where , where is some orthonormal basis, such as a wavelet basis, and are the expansion coefficients in that basis. Also, nonnegativity in the signal is not necessarily enforced. In this paper, we develop three approaches to solving the optimization problem in (1). In all three approaches, we require to be nonnegative. The first approach assumes that the image is sparse in the canonical basis, i.e., , while the second involves the more general orthogonal matrix . The third uses partitionbased denoising methods, described in detail in Sec. 3.3.
3 nonnegative regularized leastsquares subproblem
3.1 Sparsity in the canonical basis
Let , where is a regularization parameter. The minimization problem (2) with this penalty term and that is subject to nonnegativity constraints on has the following analytic solution:
where the operation is to be understood componentwise. Thus solving (1) subject to nonnegativity constraints with an penalty function and with a solution that is sparse in the canonical basis is straightforward to solve. An alternative algorithm for solving this Poisson inverse problem with sparsity in the canonical basis was also explored in the recent literature [4].
3.2 Sparsity in an arbitrary orthonormal basis
Now suppose that the signal of interest is sparse in some other basis. Then the penalty term is given by
where for some orthonormal basis , and is some scalar. When the reconstruction must be nonnegative (i.e., ), the minimization problem
(4) 
no longer has an analytic solution necessarily. We can solve this minimization problem by solving its dual. First, we reformulate (4) so that its objective function is differentiable by defining such that . The minimization problem (4) becomes
(5) 
which has twice as many parameters and has additional nonnegativity constraints on the new parameters, but now has a differentiable objective function. The Lagrange dual problem associated with this problem is given by
(6)  
and at the optimal values and , the primal iterate is given by
We note that the minimizers of the primal problem (5) and its dual (6) satisfy since (5) satisfies (a weakened) Slater’s condition [9]. In addition, the function is a lower bound on at any dual feasible point.
The objective function of (6) can be minimized by alternatingly solving for and , which is accomplished by taking the partial derivatives of and setting them to zero. Each component is then constrained to satisfy the bounds in (6). At the iteration, the variables can, thus, be defined as follows:
where the operator mid() chooses the middle value of the three arguments componentwise. Note that at the end of each iteration , the approximate solution to (4) is feasible with respect to the constraint :
Unfortunately, there are several important disadvantages associated with the above approach. First, the solution to the subproblem must itself be computed by an iterative algorithm – so while the problem may be solvable, it must by solved at each iterate, and the resulting overall algorithm may not be fast, particularly for large problems. Furthermore, since we rely on an iterative solver in this subproblem, our computed solution may not be the true optimal point, particularly if we use a gentle convergence requirement to limit computation time. Further study is needed to understand the impact of computing inaccurate solutions to this subproblem on the performance of the overall algorithm.
3.3 Partitionbased penalties
In the special case where the signal of interest is known to be smooth or piecewise smooth in the canonical basis (i.e. is compressible in a wavelet basis), we can formulate a penalty function which is a useful alternative to the norm of the wavelet coefficient vector. In particular, we can build on the framework of recursive dyadic partitions (RDP), which we summarize here and are described in detail in [10, 11]. Let be the class of all recursive dyadic partitions of where each interval in the partition has length at least , and let be a candidate partition. The intensity on , denoted , is calculated using a nonnegative leastsquares method to fit a model (such as a constant or polynomial) to in (3) on each interval in the RDP. Furthermore, a penalty can be assigned to the resulting estimator which is proportional to , the number of intervals in . Thus we set
A search over can be computed quickly using a dynamic program. When using constant partition interval models , the nonnegative leastsquares fits can be computed noniteratively in each interval by simply using the maximum of the average of in that interval and zero. Because of this, enforcing the constraints is trivial and can be accomplished very quickly.
It can be shown that partitionbased denoising methods such as this are closely related to Haar wavelet denoising with an important hereditary constraint placed on the thresholded coefficients—if a parent coefficient is thresholded, then its children coefficients must also be thresholded [10]. This constraint is akin to wavelettree ideas which exploit persistence of significant wavelet coefficients across scales and have recently been shown highly useful in compressed sensing settings [12].
As we mention in Sec. 4 below, using constant model fits makes it easy to satisfy nonnegativity constraints and results in a very fast noniterative algorithm, but has the disadvantage of yielding piecewise constant estimators. However, a cyclespun translationinvariant version of this approach can be implemented with high computational efficiency [10] and be used for solving this nonnegative regularized leastsquares subproblem that results in a much smoother estimator.
4 Numerical simulations
We evaluate the effectiveness of the proposed approaches in reconstructing a piecewise smooth function from noisy compressive measurements. In our simulations, the true signal (the black line in Figs. 1(a–c)) is of length 1024. We take 512 noisy compressive measurements of the signal using a sensing matrix that contains 32 randomly distributed nonzero elements per row. This setup yields a mean detector photon count of 50, ranging from as few as 22 photons, to as high as 94 photons. We allowed each algorithm a fixed time budget of three seconds in which to run, which is sufficient to yield approximate convergence for all methods considered. Each algorithm was initialized at the same starting point, which was generated using a single Estep of the EM algorithm. That is if , and , then we initialize with . The value of the regularization parameter was tuned independently for each algorithm to yield the minimal rootmeansquared error at the exhaustion of the computation budget (see Table 1).
Reconstruction Method  RMS 

SPIRAL (TI)  0.1427 
SPIRAL (TV)  0.1855 
EMMPLE (TI)  0.2485 
EMMPLE (TV)  0.2511 
SPIRAL ()  0.2898 
An examination of the results in Figs. 1(ac) reveals that even though models within a partition (constant pieces) are less smooth than higherorder wavelets, this drawback is neutralized through a combination of cycle spinning (TI), the hereditary constraint, and a fast, noniterative solver. We compare our algorithm with an EM algorithm based upon the same maximum penalized likelihood estimation (MPLE) objective function proposed in (1), which has been used previously in imaging contexts in which sparsity in a multiscale partitionbased framework was exploited [13]. Although Fig. 1(d) shows that the convergence behavior of the EMMPLE approaches is more stable, their slow convergence ultimately hinders their performance as compared with the corresponding SPIRALbased approaches. In the case of SPIRAL, the estimate seems very oscillatory; a smoother estimate could be achieved by increasing the regularization parameter , but this leads to an “oversmoothed” solution and increases the RMS of the estimate. The large RMS values early in the SPIRAL iterations are due to small values of when defining in (3), which occur when the estimates are flat and the consecutive iterates only change slightly. The BarzilaiBorwein method [7], on which the choice of is based, is not monotone and can exhibit spikes in the iterates’ RMS values (the nonmonotone SpaRSA algorithm is also not immune to this behavior, for example). Although it is difficult to characterize the convergence of this approach, it has been shown to be effective in solving minimization problems (see [7] for details). The SPIRAL partitionbased estimates appear to oscillate in steady state. This effect could be mitigated by forcing the objective function to decrease monotonically with iteration, as described in [5] and as in EM algorithms. However, empirically this appears to hinder performance, as convergence is significantly slowed. We are currently investigating methods that use a nonmonotonic approach to obtain a good approximation to the minimizer, followed by a monotonic method to enforce convergence.
(a)  (b) 
(c)  (d) 
5 Conclusion
We have developed computational approaches for signal reconstruction from photonlimited measurements—a situation prevalent in many practical settings. Our method optimizes a regularized Poisson likelihood under nonnegativity constraints. We have demonstrated that these methods prove effective in the compressed sensing context where an penalty is used to encourage sparsity of the resulting solution. Our method improves upon current approaches in terms of reconstruction accuracy and computational efficiency. By employing modelbased estimates that utilize structure in the coefficients beyond that of a parsimonious representation, we are able to achieve greater accuracy with less computational burden. Future work includes supplementing our algorithms with a debiasing stage, which maximizes the likelihood over the sparse support discovered in the main algorithm. We also will be exploring the efficacy of alternative optimization approaches such as sequential quadratic programming and interiorpoint methods.
References
 [1] D. Snyder, Random Point Processes, WileyInterscience, New York, NY, 1975.
 [2] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 15, no. 12, pp. 4203–4215, 2005.
 [3] R. Willett and M. Raginsky, “Performance bounds for compressed sensing with Poisson noise,” in IEEE Int. Symp. on Inf. Theory, 2009, Accepted.
 [4] D. Lingenfelter, J. Fessler, and Z. He, “Sparsity regularization for image reconstruction with Poisson data,” in Proc. SPIE Computational Imaging VII, 2009, vol. 7246.
 [5] S. Wright, R. Nowak, and M. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Trans. Signal Process., 2009 (to appear).
 [6] S.J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, “A method for largescale l1regularized least squares,” IEEE J. Sel. Top. Sign. Process., vol. 1, no. 4, pp. 606–617, 2007.
 [7] M. Figueiredo, R.D. Nowak, and S.J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Sel. Top. Sign. Process., vol. 1, no. 4, pp. 586, 2007.
 [8] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, “Bregman iterative algorithms for l1minimization with applications to compressed sensing,” SIAM J. Imag. Sci., vol. 1, no. 1, pp. 143–168, 2008.
 [9] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, Cambridge, 2004.
 [10] R. Willett and R. Nowak, “Multiscale Poisson intensity and density estimation,” IEEE Trans. Inform. Theory, vol. 53, no. 9, pp. 3171–3187, 2007.
 [11] R. Nowak, U. Mitra, and R. Willett, “Estimating inhomogeneous fields using wireless sensor networks,” IEEE J. Sel. Areas Commun., vol. 22, no. 6, pp. 999–1006, 2004.
 [12] R. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Modelbased compressive sensing,” Submitted, 2009.
 [13] R. Willett and R. Nowak, “Platelets: a multiscale approach for recovering edges and surfaces in photonlimited medical imaging,” IEEE Transactions on Medical Imaging, vol. 22, no. 3, pp. 332–350, 2003.