Spatiotemporal Compressed Sensing with Coded Apertures and Keyed Exposures
Abstract
Optical systems which measure independent random projections of a scene according to compressed sensing (CS) theory face a myriad of practical challenges related to the size of the physical platform, photon efficiency, the need for high temporal resolution, and fast reconstruction in video settings. This paper describes a coded aperture and keyed exposure approach to compressive measurement in optical systems. The proposed projections satisfy the Restricted Isometry Property for sufficiently sparse scenes, and hence are compatible with theoretical guarantees on the video reconstruction quality. These concepts can be implemented in both space and time via either amplitude modulation or phase shifting, and this paper describes the relative merits of the two approaches in terms of theoretical performance, noise and hardware considerations, and experimental results. Fast numerical algorithms which account for the nonnegativity of the projections and temporal correlations in a video sequence are developed and applied to microscopy and shortwave infrared data.
I Introduction
The theory of compressed sensing (CS) suggests that we can collect highresolution imagery with relatively few photodetectors or a small focal plane array (FPA) when the scene is sparse or compressible in some dictionary or basis and the measurements are chosen appropriately [1, 2]. There has been significant recent interest in building imaging systems in a variety of contexts to exploit CS (cf. [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]). By designing optical sensors to collect measurements of a scene according to CS theory, we can use sophisticated computational methods to infer critical scene structure and content. One particularly famous example in optical imaging is the Single Pixel Camera [13], which collects individual projections of the scene sequentially. While these measurements are supported by the CS literature, there are several practical challenges associated with the tradeoff between temporal resolution physical system footprint. In this paper we describe an alternative approach to designing a low framerate snapshot CS camera which naturally parallelizes the compressive data acquisition. Our approach is based on two imaging techniques called coded apertures and keyed exposures, which we explain next.
Coded apertures [14, 15] have a long history in lowlight astronomical applications. Coded apertures were first developed to increase the amount of light hitting a detector in an optical system without sacrificing resolution (by, say, increasing the diameter of an opening in a pinhole camera). The basic idea is to use a mask, i.e., an opaque rectangular plate with a specified pattern of openings, that allows significantly brighter observations with higher signaltonoise ratio than those from conventional pinhole cameras [14, 15]. These masks encode the image before detection, inducing a more complicated point spread function than that associated with a pinhole aperture. The original scene is then recovered from the encoded observations in postprocessing using an appropriate reconstruction algorithm which exploits the mask pattern. These multiplexing techniques are particularly popular in astronomical [16, 17] and medical [18, 19, 20] applications because of their efficacy at wavelengths where lenses cannot be used, but recent work has also demonstrated their utility for collecting both high resolution images and object depth information simultaneously [21].
Keyed exposure (also called coded exposure [22], flutter shutter [23], or coded strobing [24]) imaging was initially developed to facilitate motion deblurring in video using a relatively low frame rate. In some cases motion has been inferred from a single keyed exposure snapshot. The basic idea is that the camera sensor continuously collects light while the shutter is rapidly opened and closed; the shutter movement effectively modulates the motion blur point spread function, and with wellchosen shutter movement patterns it becomes possible to deblur moving objects. Similar effects can be achieved using a strobe light instead of moving a shutter.
Despite the utility of the above methods in specific settings, they both face some limitations. The design of conventional coded apertures does not account for the inherent structure and compressibility of natural scenes, nor the potential for nonlinear reconstruction algorithms. Likewise, existing keyed exposure methods focus on direct (uncoded) measurements of the spatial content of the scene and have limited reconstruction capabilities, as we detail below. The Coded Aperture Keyed Exposure (CAKE) sensing paradigm we propose in this paper is designed to allow nonlinear highresolution video reconstruction from relatively few measurements in more general settings.
Ia Problem Formulation
We consider the problem of reconstructing an frame video sequence , where each frame is an twodimensional image denoted . Using standard vector representation, we have that for where is the total number of pixels. As a result, the vector representation of the video sequence is .
The observations of are also acquired as a video sequence. We do not assume that the observations are acquired at the same rate at which we will ultimately reconstruct . In general, we assume is an frame video sequence, with each frame of size . Similarly to , we have for , where , therefore .
We observe via a spatiotemporal sensing matrix which linearly projects the spatiotemporal scene onto an dimensional set of observations:
(1) 
where is noise associated with the physics of the sensor.
CS optical imaging systems must be designed to meet several competing objectives:

The sensing matrix must satisfy some necessary criterion (such as the RIP, defined below) which provides theoretical guarantees on the accuracy with which we can estimate from .

The total number of measurements, , must be lower than the total number of pixels to be reconstructed, . This is achievable via compressive spatial acquisition (), frame rate reduction (), or simultaneous spatiotemporal compression.

The measurements must be causally related to the temporal scene , which restricts the structure of the projections .

The optical measurements modeled by must be implementable in a way that results in a smaller, cheaper, more robust, or lower power system.

The sensing matrix structure must facilitate fast reconstruction algorithms.
This paper demonstrates that compressive Coded Aperture Keyed Exposure systems achieve all these objectives.
IB Contributions
The primary contribution of this paper is the design and theoretical characterization of compressive Coded Aperture Keyed Exposure (CAKE) sensing. We explore amplitude modulating and phase shifting masks and describe theoretical and implementation aspects of both. We further describe how keyed exposure ideas can be used in conjunction with coded apertures to increase both the spatial and temporal resolution of video from relatively few measurements. We prove hitherto unknown theoretical properties of such systems and demonstrate their efficacy in several simulations. In addition, we discuss several important algorithmic aspects of our approach, including a meansubtraction preprocessing step which allows us to sidestep challenging theoretical aspects associated with nonnegative sensing matrices (such as amplitude modulating coded apertures). This paper builds substantially upon earlier preliminary studies by the authors [25, 26, 27] and related independent work by Romberg [28].
IC Organization of the Paper
The paper is organized as follows. Section IIA describes conventional coded aperture imaging techniques and Section IIB describes keyed exposure techniques currently in the literature. We describe the compressive sensing problem and formulate it mathematically in Section IIC. In Section III, we show how CS theory can be used for constructing coded aperture masks that can easily be implemented for improving image reconstruction resolution in snapshot imaging; this includes theoretical results, a discussion of implementation details and tradeoffs in practical optical systems, and experimental results. We consider applications to video compressed sensing using the full CAKE paradigm in Section IV, including theory and experimental results.
Ii Background
Prior to detailing our main contributions, we first review pertinent background material. This review touches upon the development of coded aperture imaging, coded exposure photography, and a brief review of salient concepts in compressed sensing theory.
Iia Coded Aperture Imaging
Seminal work in coded aperture imaging includes the development of masks based on Hadamard transform optics [29] and pseudorandom phase masks [30]. Modified Uniformly Redundant Arrays (MURAs) [31] are generally accepted as optimal mask patterns for coded aperture imaging. These mask patterns (which we denote by ) are binary, square patterns, whose grid size matches the spatial resolution of the photodetector. Each mask pattern is specifically designed to have a complementary pattern such that is a single peak with flat sidelobes (i.e., a Kronecker function).
In practice, the resolution of a detector array dictates the properties of the mask pattern and hence resolution at which can be reconstructed. We model this effect as being downsampled to the resolution of the detector array and then convolved with the mask pattern , which has the same resolution as the FPA and the downsampled , i.e.,
(2) 
where denotes circular convolution, corresponds to noise associated with the physics of the sensor, and is the integration downsampling of the scene, which consists of partitioning into uniformly sized blocks, where for , and measuring the total intensity in each block.
Because of the construction of and , can be reconstructed using
However, the resulting resolution is often lower than what is necessary to capture some of the desired details in the image. Clearly, the estimates from MURA reconstruction are limited by the spatial resolution of the photodetector. Thus, high resolution reconstructions cannot generally be obtained from lowresolution MURAcoded observations.
It can be shown that this mask design and reconstruction result in minimal reconstruction errors at the FPA resolution and subject to the constraint that linear, convolutionbased reconstruction methods would be used. However, when the scene of interest is sparse or compressible, and nonlinear sparse reconstruction methods may be employed, then CS ideas can be used to design coded aperture which yield higher resolution images. Before describing the details of this, we briefly review two key relevant concepts from CS.
IiB Coded (Keyed) Exposure Imaging
Coded (or keyed) exposures were developed recently in the computational photography community. Initial work in this area was focused on engineering the temporal component of a motion blur point spread function by rapidly opening and closing the shutter during a single exposure or a small number of exposures at a low frame rate [22, 23]. That is,
where the keyed exposure (KE) measurement matrix selects the subset of frames during which the shutter is open. We refer to this subset as the exposure code .
If an object is moving during image acquisition, then a static shutter would induce a typical motion blur, making the moving object difficult to resolve with standard deblurring methods. However, by “fluttering” the shutter during the exposure using carefully designed patterns, the induced motion blur can be made invertible and moving objects can be accurately reconstructed. Instead of a moving shutter, more recent work uses a strobe light to produce a similar effect [24].
While this novel approach to video acquisition can produce very accurate deblurred images of moving objects, there is significant overhead associated with the reconstruction process. To see why, note that every object moving with a different velocity or trajectory will produce a different motion blur. This means that (a) any stationary background must be removed during preprocessing and (b) multiple moving objects must be separated and processed individually.
More recently, it was shown that these challenges could be sidestepped when the video is temporally periodic (e.g., consider a video of an electronic toothbrush spinning) [24]. The periodic assumption amounts to a sparse temporal Fourier transform of the video, and this approach, called coded strobing, is a compressive acquisition in the temporal domain. As a result, the authors were able to leverage ideas from compressed sensing to achieve highquality video reconstruction.
The assumption of a periodic video makes it possible to apply much more general reconstruction algorithms that do not require background subtraction or separating different moving components. However, it is a very strong assumption, which places some limits in its applicability to realworld settings. The approach described in this paper has similar performance guarantees but operates on much more general video sequences.
IiC Compressed Sensing
In this section we briefly define the Restricted Isometry Property (RIP) and explain its significance to reconstruction performance. In subsequent sections, we demonstrate our primary theoretical contribution, which is to prove the RIP for compressive coded aperture and keyed exposure systems.
Definition 1 (Restricted Isometry Property (RIP) [32]).
A matrix satisfies the RIP of order if there exists a constant for which
(3) 
holds for all sparse . If this property holds, we say is
Matrices which satisfy the RIP are called CS matrices; and when combined with sparse recovery algorithms, they are guaranteed to yield accurate estimates of the underlying function :
Theorem 1 (Sparse Recovery with RIP Matrices [33, 34]).
Let be a matrix satisfying with , and let be a vector of noisy observations of any signal , where the is a noise or error term with for some . Let be the best sparse approximation of ; that is, is the approximation obtained by keeping the largest entries of and setting the others to zero. Then the estimate
(4)  
obeys
where and are constants which depend on but not on or .
Iii Compressive Coded Apertures for Snapshot Imaging
This section first considers a snapshot acquisition model. Our goal is to recover a static highresolution scene from a single image where all pixels are collected simultaneously. In terms of the notation in Sec. IA, we have , so that is of size . The sensing matrix for compressive coded aperture (CCA) systems can be modeled mathematically as
(6) 
where is a coding mask, and is a subsampling operator (detailed below). Here the coding mask, , is at the size and resolution at which will be reconstructed; this is in contradistinction to the MURA system, in which is at the size and resolution of the FPA. Thus in (6), we model the measurements as the scene being convolved with the coded mask and then downsampled.
Using a similar model, Romberg [28] conducted related work concurrent and independent of our initial investigations [25]. We will summarize the key features of this model and compare with our approach in Sec. IIIB. While these models share common elements, we will see in Sec. IIIC that there are important tradeoffs associated with the theory and implementation of each strategy.
Recent work by Bajwa et al. [35, 36], showed that random circulant matrices (and Toeplitz matrices, in general) are sufficient to recover sparse from with high probability. In particular, they showed that a Toeplitz matrix whose first row contains elements drawn independently from a Gaussian distribution are when for some constant . Here, we extend these results to pseudocirculant matrices and use them to motivate our mask design. Our model differs from that in [28] in that we consider a different generative model for the coded aperture mask, as well as a different subsampling strategy:

The elements of the coding mask are generated iid according to a particular generating distribution (e.g., an appropriately scaled Rademacher, uniform, or Gaussian distribution).

Our analysis allows for deterministic downsampling in which we collect one sample per nonoverlapping block.
In our notation, we distinguish mask patterns in this manner using the abbreviations BS, US, and GS, where the first character denotes the distribution used to generate the mask, i.e., binary, uniform, or Gaussian, and the second is a reminder that these masks are generated directly in the spatial domain.
Iiia CCAs Generated in the Spatial Domain
The twodimensional convolution of with an image as in (6) can be represented as the application of the Fourier transform to and , followed by elementwise multiplication and application of the inverse Fourier transform. In matrix notation, this series of linear operations can be expressed as
(7) 
where is a vectorized representation of an image , is the twodimensional Fourier transform matrix, and is a diagonal matrix whose elements correspond to the transfer function, which is the Fourier transform of . The matrix product is blockcirculant and each block is in turn circulant. In matrix notation, is consists of blocks,
(8) 
where each is circulant; i.e., is of the form
(see Fig. 1). This blockcirculant with circulantblock (BCCB) structure of is a direct result of the fact that the point onedimensional Fourier transform diagonalizes any circulant matrix (such as with ) and so diagonalizes blockcirculant matrices (such as ). Here denotes the Kronecker matrix product.
We now examine the generation of a compressed sensing matrix from the convolution by restricting the number of measurements collected of the vector . Here we simply subsample the vector by applying a pointwise subsampling matrix . The operation of applying consists of retaining only one measurement per uniformly sized block, i.e., we subsample by in the first coordinate and in the second coordinate so that the result is an image with . In matrix form can be thought of as retaining a certain number of rows of the identity matrix. Because of the structure and deterministic nature of this type of downsampling, it is often more straightforward to realize in practical imaging hardware (see Sec. IIIC).
The resulting projection matrix is then given by
We now show that if the elements of are drawn from an appropriate probability distribution, then will be with high probability.
Theorem 2 (SpatialDomain CCA Sensing).
Let be a mask with entries generated i.i.d. according to the scaled Rademacher distribution
(9) 
for . Let be an matrix, where is the BCCB matrix generated from , and is the pointwise subsampling matrix. Then, there exists constants depending on such that for any
(10) 
is with probability exceeding
(11) 
Remark 1.
This binaryvalued distribution was selected to model coded apertures with two states – open and closed – per mask element. We discuss the issue of implementing these masks in Section IIIC1.
Remark 2.
It is straightforward to extend the result to other mask generating distributions, such as uniform and Gaussian, which we denote and , with associated sensing matrices and .
For successful recovery, the coded aperture masks are designed to be satisfy as described in (3) with high probability when . Note that this is somewhat weaker than what can be achieved by a purely unstructured i.i.d. randomly generated sensing matrix which satisfies (3) with high probability when for some constant . Intuition may lead one to believe the extra factor of is due to the fact that the projections sensed using the amplitude modulated mask framework exhibit dependencies. However, in many settings this theoretical disadvantage is offset by advantageous practical implementations.
Sparse Gradients Scenes
The above theory is applicable to reconstructing when is sparse in the pixel basis; similar results hold when the gradient of is sparse. For simplicity of notation we will show then in a 1d setting; the extension to 2d is straightforward. Let denote the firstorder gradient of , so that
(12) 
As noted in [35], , where is the sparse gradient image. Thus if we sense , we can expect to recover using sparse reconstruction methods.
IiiB CCAs Generated in the Fourier Domain
The random convolution in [28] follows the same structure as that in Sec. IIIA. However, in that work the convolution is generated randomly in the frequency domain. More specifically, the entries of the transfer function correspond to random phase shifts (with some constraints to keep the resulting observation matrix realvalued). We denote the resulting convolution kernel as , where “UP” refers to “uniform phase”. For simplicity of presentation, we describe the generating distribution for a onedimensional convolution for evenlength masks. In particular, let , and generate such that for ,
(13) 
Here denotes the uniform distribution over . The realvalued convolution kernel is then given by . This result can be extended easily for twodimensional convolutions.
To form a compressed sensing matrix from this random convolution, [28] considers two different downsampling strategies: sampling at random locations, and random demodulation. The first method entails selecting a random subset of indices. We form a downsampling matrix by retaining only the rows of the identity matrix indexed by , hence the resulting measurement matrix is given by
(14) 
where is generated according to the twodimensional analog of (13). The random demodulation method multiplies the result of the random convolution by a random sign sequence , such that with equal probability for all , then performs an integration downsampling of the result. Therefore in this case the measurement matrix is
(15) 
It can be shown that both of these strategies yield RIPsatisfying matrices with high probability.
Theorem 3 (FourierDomain CCA Sensing).
Let be an sensing matrix resulting from random convolution with phase shifts followed by a downsampling strategy, and let denote an arbitrary orthonormal basis. If the downsampling is random subsampling as in (14), then there exists constant such that with probability exceeding , for
satisfies with . If the downsampling is random demodulation as in (15), then there exists constant such that with probability exceeding , for
satisfies with .
These theoretical results are stronger than those in Theorem 2, especially since they allow sparsity in arbitrary orthonormal bases. However, there are important differences between the observation models which have a significant impact on the feasibility and ease of hardware implementation. We elaborate on this in Section IIIC. Nevertheless, the theory developed in [28] lends important theoretical support to the general concept of compressive coded apertures.
IiiC Hardware and Practical Implementation Considerations
In this section we describe how we shift from modeling the coded aperture masks in a way that is compatible with compressed sensing theory, to a model that describes their actual implementation in an optical system. Our analysis does not account for the bandwidth of the lenses; in particular, we implicitly assume that the bandwidth of the lenses is high enough that bandlimitation effects are negligible at the resolution of interest. In all of the hardware settings described below, precise alignment of the optical components (e.g., the mask and the focal plane array) is critical to the performance of the proposed system. Often a highresolution FPA is helpful for alignment and calibration.
In this paper we focus on incoherent light settings (consistent with many applications in astronomy, microscopy, and infrared imaging). In this case, the coded aperture must be realvalued and fluxpreserving (i.e., the light intensity hitting the detection cannot exceed the light intensity of the source). In this section, we consider the following apertures:

Binary Spatial Mask: , drawn with equal probability,

Uniform Spatial Mask: , where each element is drawn independently from a uniform distribution, or

Uniform Phase Mask: for some , where corresponds to a phaseshifting mask in an incoherent light setting.
IiiC1 Amplitude Modulation Masks
In a conventional lensless coded aperture imaging setup, the point spread function associated with the aperture is the mask pattern itself. To shift a RIPsatisfying aperture as in Theorem 2 to an implementable aperture, one simply needs to apply an affine transform to mapping to . This transform ensures that the resulting mask pattern is nonnegative and fluxpreserving.
These amplitude modulating masks may be implemented using a spatial light modulator (SLM) or placing chrome on quartz. The SLMs may be preferable in video settings where the underlying scene contains motion and using a different mask pattern at each time step boosts performance. However, in order for the proposed approach to work, the mask or SLM used must be higher resolution than the FPA. Currently, very high resolution SLMs are still in development. Chrome on quartz masks can be made with higher resolution than many SLMs, but cannot be changed on the fly unless we mount a small number of fixed masks on a rotating wheel or translating stage. The uniform amplitude modulation masks in particular could be constructed using a highresolution halftoning procedure, which is easiest to implement at the necessary resolution with chrome on quartz.
Both Robucci et al. [37] and Majidzadeh et al. [38] have proposed performing the analog, random convolution step in complementary, metaloxidesemiconductor (CMOS) electronics. A clear advantage to this architecture is that the additional optics required for spatial light modulation are removed in favor of additional circuitry, immediately reducting imager size.
IiiC2 Phase Shift Masks
Phase shifting masks for coded aperture imaging have been implemented recently using a phase screen [39]. This approach allows one to account for diffraction in the optical design. However, depending on the precise optical architecture, phase shift masks may be much less photonefficient than amplitude modulation masks. Additionally, the mask generation distribution described in Eq. (13) will result in negative entries for the corresponding PSF . To compensate for this, the phase mask must be meanshifted to make all entries nonnegative, so that we are actually implementing
(16) 
with the constant selected so that the implementable PSF is fluxpreserving and is a matrix of ones and of the same dimension as .
IiiC3 Implementable Masks for Scenes with Sparse Gradients
As described in Section IIIA, theoretical results for scenes with sparse gradients hold for the sparse difference operator defined in (12). The problem with this approach is that is invertible but not circulant – and hence the sensing matrix cannot be implemented with a coded aperture system. We address this by noting that if the upper right element of were instead of , then the resulting sensing matrix, denoted , could be implemented physically and is a close approximation to the theoretically supported . In short, the theoretically supported solution is to set
while an implementable close approximation is to set and  
where is the total variation seminorm which causes to have sparse gradients. In our numerical experiments we compare the performance of TV regularization to sparsity penalization for the task of reconstructing static scenes.
IiiC4 Downsampling Implementation
In developing RIPsatisfying AMM coded apertures, Theorem 2 assumes the subsampling operation selects one measurement per block. From an implementation standpoint, this operation effectively discards a large portion () of the available light, which would result in much lower signaltonoise ratios at the detector. A more pragmatic approach is to use larger detector elements that essentially sum the intensity over each block, making a better use of the available light. We call this operation integration downsampling to distinguish it from subsampling. The drawback to this approach is that we lose many of the desirable features of the system in terms of the RIP. Integration downsampling causes a large coherence between neighboring columns of the resulting sensing matrix . An intermediate approach would randomly sum a fraction of the elements in each size block, which increases the signaltonoise ratio versus subsampling, but yields smaller expected coherence. This approach is motivated by the random demodulation proposed in [28] and described in Sec IIIB, whereby the signal is multiplied by a random sequence of signs , then blockwise averaged. The pseudorandom summation proposed here can be thought of as an optically realizable instantiation of the same idea where we multiply by a random binary sequence. We explore the effects of these choices in our numerical results section.
IiiC5 Noise and Quantization
While CS is particularly useful when the FPA needs to be kept compact, it should be noted that CS is more sensitive to measurement errors and noise than more direct imaging techniques. The experiments conducted in this paper simulated very high signaltonoise ratio (SNR) settings and showed that CS methods can help resolve high resolution features in images. However, in low SNR settings CS reconstructions can exhibit significant artifacts that may even cause more distortion than the lowresolution effects associated with conventional coded aperture techniques such as MURA.
Similar observations are made in [40], which presents a direct comparison of the noise robustness of CS in contrast to conventional imaging techniques both in terms of bounds on how reconstruction error decays with the number of measurements and in a simulation setup; the authors conclude that for most realworld images, CS yields the biggest gains in high signaltonoise ratio (SNR) settings. Related theoretical work in [41] show that in the presence of low SNR photon noise, theoretical error bounds can be large, and thus the expected performance of CS may be limited unless the number of available photons to sense is sufficiently high. These considerations play an important role in choosing the type of downsampling to implement.
Similar issues arise when considering the bitdepth of focal plane arrays, which corresponds to measurement quantization errors. Future efforts in designing optical CS systems must carefully consider the amount of noise anticipated in the measurements to find the optimal tradeoff between the focal plane array size and image quality.
IiiD Reconstruction
To solve the CS minimization problem (5), we use wellestablished gradientbased optimization methods. They are particularly suitable in our setting because the blockcirculant structure of described in the previous section allows for very fast matrixvector products that are critical to the speed of their performance. However, our observation matix is not zero mean and, therefore, negatively impacts the performance of these CSbased reconstruction algorithms. In this section, we describe how we take full advantage of the structure of and address how we mitigate the negative effects of having nonzeromean.
IiiD1 Sparsitypromoting Methods
We note that most popular and effective methods for performing the above reconstruction are iterative algorithms with repeated applications of the operators and to scene estimates and residuals [42, 43]. In most CS settings, computing these matrixvector multiplications is a large computational burden. However, because of the circulant structure of , this computation is very efficient using fast Fourier transform algorithms. In particular, we compared the computation time for reconstructing a scene using CCA measurements with the time required to reconstruct the same scene using the same number of measurements computing using a dense random projection matrix; our reconstruction was 250 times faster because of our exploitation of the Toeplitz structure in .


Different mask distributions  
Binary Spatial  Uniform Spatial  Uniform Phase  
Integration Downsampling  
Random Summation  
Subsampling  
IiiD2 Mean Subtraction
Generative models for random projection matrices used in CS involve drawing elements independently from a zeromean probability distribution [1, 44, 32, 35, 45], and likewise a zeromean distribution was used to analyze the coded aperture masks described in Sec. III. However, as coded aperture masks with zero mean are not physically realizable in optical systems, we generate our physically realizable masks from an appropriately scaled Bernoulli distribution with values . This shifting ensures that the coded aperture corresponds to an implementable (i.e., nonnegative and intensity preserving) mask pattern which does satisfies the assumptions needed to model photon propagation through an optical system.
While necessary to accurately model realworld optical systems, this shifting negatively impacts the performance of wellestablished  reconstruction algorithms for the following reason. Several of these algorithms (e.g., [42, 43]) assume that the data fidelity term (proportional to the negative log Gaussian likelihood) is such that can be wellapproximated by , . This assumption is crucial to the performance of these algorithms.
If we collect measurements using a realizable mask with elements in the range , the resulting matrix does not satisfy this condition due to the mean offset. Therefore in the reconstruction we use a meanshifted sensing matrix
where is the mean value of . This new matrix is such that is a valid assumption. However, to use this matrix in the reconstruction, we need to adjust our estimate accordingly. To compensate for this, we decompose our estimate into its mean component and a zeromean deviation : . So then
The data can be decomposed in a similar fashion, so that . Thus a straightforward estimate for the mean of our solution is , and we can use the zeromean quantities in our data fidelity term
(17) 
within the iterative algorithm, which now has the desired property that .
We have motivated this mean subtraction approach from an estimation setting where we first estimate the mean, then deviations about the mean. However, it can be equally motivated from a pure optimization perspective by viewing the mean subtraction step as a preconditioning strategy. The gradientbased reconstruction methods considered are known to exhibit poor convergence when the Hessian has large condition number, and the mean subtraction operation essentially improves the conditioning of the problem by eliminating the single dominant eigenvalue that occurs due to the nonzero mean.
IiiE Experimental Comparison on Static Images
Here we present numerical experiments supporting the proposed architectures for snapshot compressive imaging. We compare the simulated performance of the described imaging architectures in the task of highfidelity image reconstruction. We utilize a pixel realistic phantom image of a neuron (ground truth image in Fig. 2) which is designed to test the resolution limits of our architectures with its highresolution features. This image is inspired by a real microscopic image found at http://www.lsi.umich.edu/facultyresearch/labs/bingye/research. We examine a traditional imaging system and a total of six convolutionbased compressive imaging systems. These systems are constructed by considering each combination of the following design considerations:

the mask generation method, including binary spatial (BS) and uniform phase (UP) models,

switching among integration downsamping, pseudorandom summation (randomly summing values in each block), and subsampling.
To analyze the effect of the number of measurements we collect, we vary the scenetosensor downsampling ratio to be either 4, 16, or 64 (i.e., downsampling in both directions by a factor of 2, 4, or 8). While each architecture has its own unique considerations in terms of signaltonoise efficiency (see Section IIIC), we choose to normalize the experiments in such a way as to keep a constant signaltonoise ratio at the detector. This is achieved by fixing the variance of the additive white Gaussian noise to be for each architecture when we choose a downsampling ratio of . This variance is then fixed for each architecture as we vary the downsampling ratio, the rationale being that this allows us to showcase the impact of the various subsampling operations.
We consider a comparison of many penalization schemes for image reconstruction, such as sparsity in an orthonormal wavelet (Haar) basis, isotropic and anisotropic totalvariation [46], as well as an sparsity penalty in the overcomplete curvelet basis [47]. Curvelets are similar to wavelets in that they capture more spatially localized information than the Fourier components, however they are designed to be more welladapted for capturing curvilinear singularities in images (e.g., edges). To reconstruct the image, we use our own implementation of the SpaRSA algorithm [42] which utilizes the mean subtraction procedure described in Section IIID2. A coarse initialization is found by solving an unpenalized leastsquares minimization via a conjugate gradient method using only the compressive data. We terminate the reconstruction when the relative change in the iterates falls below a prespecified tolerance of 1e3. We choose the regularization weighting to minimize the final reconstruction RMSE, measured by . A table of the resulting RMSE values is presented in Table I.
As evidenced by both the RMSE values and the images, significant gains in performance can be achieved by the compressive architectures. The UP mask architectures slightly outperform the BSbased approaches in this simulation, but the type of subsampling has a greater effect on the quality of the reconstruction. Focusing on the case, we see that subsampling performs best overall, followed by random summation, then integration downsampling. This is readily apparent from the RMSE values and the greater clarity by which we can resolve finescale features in the dendrites of the neuron. This is exactly predicted by the theory: larger coherence yields poorer performance. However, pushing the level of undersampling, we see that simple subsampling is not always the solution, since the performance degrades quite rapidly, as the permeasurement signal to noise ratio does not naturally increase as it does with the other subsampling architectures. An optimal architecture must strike a careful balance between low coherence and robustness to noise.
Sensing Architecture  Reconstruction RMSE (%)  

Subsampler  Mask Model  CG Init  Haar  Aniso. TV  Iso. TV  Curvelet  
Integration  BS  4  43.552652  31.908468  23.632185  23.712212  34.552177 
downsampling  16  52.508761  47.604093  44.602132  43.339913  43.606292  
64  60.435296  58.592028  57.537579  56.960604  56.747754  
US  4  43.220564  31.861358  23.497272  23.643347  34.325474  
16  52.126070  47.551547  44.400403  43.069598  43.610772  
64  59.642746  58.145647  57.118346  56.684747  56.541953  
UP  4  38.474720  30.797415  23.265628  23.214944  32.475710  
16  50.284641  46.653984  44.638167  43.091206  44.231886  
64  58.815932  58.168464  57.184195  56.785933  56.558017  
Random  BS  4  56.890247  34.101763  18.992251  20.535913  35.202771 
Summation  16  62.274874  51.430633  45.573628  44.644012  45.537800  
64  65.672366  60.160050  58.296549  57.789539  57.630494  
US  4  56.544494  33.659520  17.484428  19.021174  34.322115  
16  61.950795  50.977833  44.857052  43.641823  45.217990  
64  65.350187  59.697311  57.573941  57.159371  57.096311  
UP  4  54.473107  29.831022  16.033758  17.728624  33.651808  
16  60.297045  48.902002  44.939914  43.733698  45.219903  
64  64.317781  60.909768  57.360741  57.106131  56.933892  
Pointwise  BS  4  61.578677  39.742969  17.129559  18.611897  35.793504 
Subsampling  16  67.787137  65.569753  51.912236  50.842530  52.625216  
64  69.244807  69.202225  64.772033  64.693113  64.741188  
US  4  61.578677  39.780250  17.200487  18.703763  35.813267  
16  67.787137  65.570111  51.928327  50.883713  52.629357  
64  69.244807  69.202225  64.791854  64.687147  64.741653  
UP  4  61.005834  36.260055  14.108310  15.883293  34.282494  
16  67.675516  64.606384  48.280444  47.379639  48.924798  
64  69.187676  69.189077  63.591662  63.514672  63.350203  
Conventional Imager  4  38.157537  36.654837  36.067380  32.946067  33.157436  
16  50.137698  50.117667  49.836071  47.932575  45.466410  
64  58.873484  58.873455  58.472819  57.806412  56.764936 
Iv Coded Aperture Keyed Exposure Sensing for Dynamic Scenes
Here we detail our compressive video acquisition method that combines coded apertures and keyed exposures to address all the competing challenges detailed in Sec. IA. Specifically, in our CAKE imaging method, each observed frame is given by an exposure of highrate coded observations:
(18) 
where each describes an AMM CCA sensing matrix. Note that since in our theory, the downsampling operator is a structured nonrandom operator, we can rewrite the above as
(19) 
Hence all that is required to implement this sensing paradigm is modulating the coded aperture mask over the frame exposure time. Because of this, one can think of our system as performing a coded exposure acquisition for each coded aperture element.
Since our sensing strategy is independent across each lowresolution frame, and also for simplicity of presentation, we only consider the recovery of a length block of frames from a single snapshot image in our theoretical analysis.
Theorem 4 (Coded Aperture Keyed Exposure Sensing).
Let be an sensing matrix for the CAKE system where the coded aperture pattern for each is drawn i.i.d. from a suitable probability distribution. Then there exists constants depending on such that for any
(20) 
is with probability exceeding
(21) 
Note here that denotes the sparsity of the first frames of the video sequence, rather than simply the sparsity of an individual frame as in Thm. 2. The proof of Thm. 4 is presented in Appendix B. Similar to Thm. 2, this proof assumes that the entries of each , are generated i.i.d. according to the scaled Rademacher distribution (9). Again, it is straightforward to extend the result to other mask generating distributions.
Iva Sparse Transformation Sensing
It is most often the case that the original frames are not sparse, but can be sparsely represented by a temporal transform of the frames. For simplicity, consider the first coded measurement. Within this acquisition, the coefficient sequence
may be more sparse for a wellchosen sparse temporal transform . Notationally this is equivalent to , where denotes the Kronecker matrix product. A preferred sensing strategy would be to use our RIPsatisfying CAKE acquisition to sense the coefficient sequence :
(22) 
Surprisingly, this can be accomplished using an identical architecture, with some slight adjustment to the coded aperture mask patterns used during the keyed acquisition.
To see this, we examine the resulting sensing matrix in terms of :
where we define
(23) 
Therefore (22) is also a CAKE acquisition using the sensing matrices . Because of the linear dependence on the generating mask patterns, this amounts to using the an identical architecture with adjusted mask patterns
(24) 
If we denote , then this can be written more simply as . Hence the CAKE system can adapt to any temporal sparse coding of the video sequence over the block of coded frames. In summary, to incorporate the sparse transform applied to the frames, we simply apply the adjoint transform to the independently generated mask patterns.
A useful transformation that allows the exploitation of dependencies between frames is to assume that the difference between subsequent frames are sparse. In this case, we select , as defined in (12). In particular, we sense using the coded sequence of aperture patterns where
and the generating aperture patterns are drawn from a suitable distribution.
It is of interest to note that since the CAKE sensing matrix is a concatenation of downsampled Toeplitz matricies, this sensing strategy has clear connections to [48] where they consider concatenations of Toeplitz matricies as a sensing matrix for performing multiuser detection in wireless sensor networks. The important conceptual link is that their sensing matrix is used to determine a sparse set of simultaneously active users, where in our system, we are using it to infer a sequence of sparse frames, or a sequence of sparse difference frames.
IvB Implementation and Normalization Details
Recall that these RIPsatisfying sensing matrices are generated using a zeromean probability distribution and cannot be directly implemented in an optical system. As before, we apply an affine transformation to each of the so that the resulting mask elements for each are within the range . If we are generating a sensing matrix that satisfies RIP with respect to the difference frames, then even a binary generating distribution will cause the resulting mask patterns to have elements that are neither fully open nor fully closed. This can be accomplished using halftoning as discussed in Sec. IIIC. If such an implementation is difficult, then one strategy would be to set these intermediate values to or at random with equal probability.
IvC Reconstruction for Video
Given measurements from the proposed CAKE imaging system, where we have designed the mask patterns for sparsity in a temporal transform, we recover the frames by solving
For the case of sparse difference frames (), improvements in empirical performance are made by penalizing the total variation of the first frame, instead of simply the sparsity in that frame:
here is a lower triangular matrix of all ones.
In addition, by using more than one lowrate observation per reconstruction step, we are typically able to improve performance by coding the difference frames across more than one length block of highrate frames. This is the reconstruction technique we use in the experimental results section. In previous work, we noticed that when the amount of processing time allotted per frame is held constant, the accuracy generally increases with the number of frames processed simultaneously. However, one simply cannot solve for arbitrarily many frames simultaneously, as the improvement in accuracy diminishes when the size of the problem is such that only a very small number of reconstruction iterations can be run within the allotted time. We refer the reader to [26] for details.
For video sequences of longer duration, we may wish to process the video in an online fashion, solving for only a few blocks of frames simultaneously. In many applications, such as surveillance and monitoring, the video frames are strongly correlated, and the solution to a previous block of frames may be used as an initialization to the algorithm to solve the next block of frames. This estimate will generally be very accurate, and therefore, relatively few iterations are needed to obtain a solution to each optimization problem.
IvD Numerical Experiments
In this section, we demonstrate the effectiveness of the proposed CAKE architecture at successfully recovering a video sequence of shortwave infrared (SWIR) data collected (courtesy of Jon Nichols at NRL) by a shortwave IR () camera. The camera is based on a InGaAs (Indium, gallium arsenide) focal plane array with pixel pitch. Optically, the camera was built around a fixed, aperture and provides a field of view along the diagonal with a focus range of . Imagery were output at the standard 30Hz frame rate with a 14 bit dynamic range. An example frame is shown in Fig. 3. In this sequence, the three boats are traveling at different velocities with respect to the slowlychanging background of the waves. The size of each frame is , and we consider reconstructing 28 frames of the sequence.
We consider CAKE observations where we downsample spatially by a factor of 2 in both directions (), and use a coded exposure block length of . We compare acquiring the sequence with independent mask codes, and mask codes designed to exploit the sparse difference frames directly. We reconstruct the entire video sequence using all 7 lowresolution lowrate frames using a total variation penalty on the first frame, and sparsity penalty on all subsequent difference frames. We optimize the regularization parameters to minimize the reconstruction error. For comparison, we consider traditionally captured data (i.e., by simply averaging over blocks of the spatiotemporal video sequence). To interpolate this data to the original resolution of the video sequence, we consider both nearestneighbor and spline interpolation.
We show the estimates for the different acquisition and reconstruction methods in Fig. 4. Here we focus only on a ROI of the sailboat. We see that the CAKE sensing is able to reconstruct the scene with a higher spatial and temporal resolution. This is evidenced in the residuals, which include much less critical scene structure than the conventional system. We see from the examination of the difference frame that the nearestneighbor reconstruction from conventionally sampled data yields no motion over the two frames and suffers from poor spatial resolution. Using spline interpolation helps improve the spatial resolution, but it is still insufficient to recover the scene with high temporal resolution, as can be seen in the blur on the leading edge of the sail. Numerically we quantify the performance over the entire video sequence in terms of the RMSE (%), , calculated both over the entire frame, and only over the sailboat ROI. This is tabulated in Table II. In summary we see that the CAKE acquisitions are able to outperform traditionally sampled video in terms of reconstruction accuracy and reconstructing salient motion.
It should be noted that there are limitations to the CAKE system in the presence of strong motion. In this case, the sparsity level of the difference frames may drastically increase as the previous frame ceases to be a good prediction of the next frame. As such, the RIP bound in Theorem 4 states the number of measurements we require to reconstruct the scene must necessarily increase, requiring either a faster temporal resolution measurements or higher resolution FPA to achieve the same accuracy. Because of this balance, a system designer may need to make important engineering tradeoffs to implement CAKE acquisition for a particular application.
Reconstruction RMSE (%)  
Sensing Architecture  Full Frame  Sailboat ROI 
Conventional (Nearestneighbor)  5.5163 (5.5305)  10.3525 (10.4241) 
Conventional (Spline)  3.7654 (3.8047)  5.9335 (6.0377) 
CAKE (Differenceframe Codes)  3.5840 (3.9183)  5.6971 (7.0421) 
CAKE (Independent Codes)  2.9079 (3.0932)  4.3266 (4.9604) 
V Conclusions
Compressed sensing offers a strong theoretical foundation for sparse signal recovery. However, practical and implementable imaging system designs based on CS theory have lagged far behind. In this paper, we demonstrate how CS principles can be applied to physically realizable optical system designs, namely coded aperture imaging. Numerical experiments show that CS methods can help resolve high resolution features in images and videos that conventional imaging systems cannot. We have also demonstrated that our CAKE acquisition system can recover video sequences from highly undersampled data in much broader settings than those considered in initial coded exposure studies. However, our derived theoretical limits show that there are important tradeoffs involved that depend on the spatial sparsity, and temporal similarity of the scene that ultimately govern the accuracy of our reconstructions for a specified number of compressive measurements. Finally, we note that our proposed approach performs well in high signaltonoise ratio settings, but like all CS reconstructions, it can exhibit significant artifacts in photonlimited settings.
Two practical issues associated with coded aperture imaging in general are the blur due to misalignment of the mask and diffraction and interference effects. Noncompressive aperture codes have been developed to be robust to these effects [49]. One important avenue for future research is the development of compressive coded aperture makes with similar robustness properties. Noncompressive coded apertures have also been shown useful in inferring the depth of different objects in a scene; similar inference may be possible with the compressive coded apertures described in this paper.
Acknowledgements
The authors would like to thank Prof. Justin Romberg and Dr. Kerkil Choi for several valuable discussions.
a Proof of the RIP for SpatialDomain CCA Sensing
Here we present the proof that appropriately scaled compressive coded apertures satisfy the RIP.
Proof:
In the interest of notational simplicity, it is easier to work on the twodimensional images versus their onedimensional vectorial representations. For concreteness, we assume the entries of are generated i.i.d. according to the scaled Rademacher distribution
for and . The proof uses the same techniques as that of Theorem 4 in [36], where the RIP is established by shifting the analysis of the submatrices of to the entries of the Gram matrix by invoking Geršgorin’s disc theorem [50]. This theorem states that the eigenvalues of an complex matrix all lie in the union of discs , , centered at with radius
In essence, we show that with high probability , so that the eigenvalues of are clustered around one with suitably high probability.
From the discussion above, we have that
so the entries of the resulting Gram matrix are
(25)  
From the normalization introduced for , we can show that . Now we need to bound the deviation about the mean via concentration. We consider first the diagonal entries of , when . Each term is a sum of bounded i.i.d. entires, and applying Hoeffding’s inequality yields
Next we consider the offdiagonal entries in the special case that either or . In this case, each of the terms in the summand in (25) picks out a different set of coefficients from , and hence there are no dependencies between different terms in the sum, hence it is also a sum of bounded i.i.d. entries, and thus from Hoeffding’s inequality, we have
Lastly, we have the case that both and with . This case deserves special care since each of the terms in the summand in (25) picks out the same set of coefficients from . Therefore there necessarily exist dependencies within the terms of the sum. However, due to the special nature by which we select the coefficients, we can partition the sum into two sums, denoted and such that each of these is a sum of independent terms. We can then apply the Hoeffding bound to each of these to yield
Then applying the union bound gives us
Since this latter bound decays more slowly, and in the interest of simplicity, we overbound using this latter expression. What remains is to now apply the union bound over all the diagonal and offdiagonal elements. For this we need to count the number of elements in the union. For the diagonal entries, there are clearly elements. And since the entries of are symmetric, we only have remaining terms. Hence taking ,
where . If , this probability is less than one provided where , noting that establishes the theorem. ∎
B Proof of RIP for CAKE
Proof:
This section details the proof of Theorem 4. Our strategy here is the same as that of a single frame, we bound the entries of the Gram matrix. First note that in this case the Gram matrix has a certain block structure; since , we have
This block structure allows us to utilize the results established in Appendix A. For the diagonal elements of , we simply use the same bound for one particular block , since this is the situation for a sensing a single frame. Therefore
For the offdiagonal entries of , we need to consider the offdiagonal entries of and any entry of , . Recall that in the Appendix A, we simply used the worstcase slowerdecaying bound when the offdiagonal entry of the Gram matrix was no longer a sum of all independent terms. Here we use the same overbound and hence have
Lastly, we need to perform a union bound over all the entries of the matrix. We have diagonal entries, and exploiting symmetry, we only have remaining entries. Hence taking ,
where . If , this probability is less than one provided where , again noting that establishes the theorem. ∎
References
 [1] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489 – 509, 2006.
 [2] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
 [3] M. Shankar, R. Willett, N. Pitsianis, T. Schulz, R. Gibbons, R. T. Kolste, J. Carriere, C. Chen, D. Prather, and D. Brady, “Thin infrared imaging systems through multichannel sampling,” Applied Optics, vol. 47, no. 10, pp. B1–B10, 2008.
 [4] A. F. Coskun, I. Sencan, T.W. Su, and A. Ozcan, “Lensless widefield flourescent imaging on a chip using compressive decoding of sparse objects,” Optics Express, vol. 18, no. 10, pp. 10510–10523, 2010.
 [5] L. C. Potter, E. Ertin, J. T. Parker, and M. Cetin, “Sparsity and compressed sensing in radar imaging,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1006–1020, 2010.
 [6] J. Gu, S. K. Nayar, E. Grinspun, P. N. Belhumeur, and R. Ramamoorthi, “Compressive structured light for recovering inhomogeneous participating media,” in Proceedings of the European Conference on Computer Vision, 2008.
 [7] M. Mohtashemi, H. Smith, D. Walburger, F. Sutton, and J. Diggans, “Sparse sensing dna microarraybased biosensor: Is it feasible?,” in 2010 IEEE Sensors Applications Symposium, Feb 2010, pp. 127–130.
 [8] M.A. Sheikh, O. Milenkovic, and R.G. Baraniuk, “Designing compressive sensing dna microarrays,” in 2nd IEEE International Workshop on Computational Advances in MultiSensor Adaptive Processing, 2007, December 2007, pp. 141–144.
 [9] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The application of compressed sensing for rapid MR imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 1182–1195, December 2007.
 [10] A.C. Gurbuz, J.H. McClellan, and W.R. Scott, “A compressive sensing data acquisition and imaging method for stepped frequency GPRs,” IEEE Transactions on Signal Processing, vol. 57, no. 7, pp. 2640 –2650, July 2009.
 [11] P. Ye, J.L. Paredes, G.R. Arce, Y. Wu, C. Chen, and D.W. Prather, “Compressive confocal microscopy,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, April 2009, pp. 429 –432.
 [12] J. Bobin, J.L. Starck, and R. Ottensamer, “Compressed sensing in astronomy,” Selected Topics in Signal Processing, IEEE Journal of, vol. 2, no. 5, pp. 718 –726, October 2008.
 [13] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single pixel imaging via compressive sampling,” IEEE Sig. Proc. Mag., vol. 25, no. 2, pp. 83–91, 2008.
 [14] J. G. Ables, “Fourier transform photography: a new method for Xray astronomy,” Proceedings of the Astronomical Society of Australia, vol. 1, pp. 172, 1968.
 [15] R. H. Dicke, “Scatterhole cameras for Xrays and gammarays,” Astrophysical Journal, vol. 153, pp. L101–L106, 1968.
 [16] E. Caroli, J. B. Stephen, G. Di Cocco, L. Natalucci, and A. Spizzichino, “Coded aperture imaging in X and gammaray astronomy,” Space Science Reviews, vol. 45, pp. 349–403, 1987.
 [17] G. Skinner, “Imaging with codedaperture masks,” Nucl. Instrum. Methods Phys. Res., vol. 221, pp. 33–40, Mar. 1984.
 [18] R. Accorsi, F. Gasparini, and R. C. Lanza, “A coded aperture for highresolution nuclear medicine planarimaging with a conventional anger camera: experimental results,” IEEE Trans. Nucl. Sci., vol. 28, pp. 2411–2417, 2001.
 [19] G. R. Gindi, J. Arendt, H. H. Barrett, M. Y. Chiu, A. Ervin, C. L. Giles, M. A. Kujoory, E. L. Miller, and R. G. Simpson, “Imaging with rotating slit apertures and rotating collimators,” Medical Physics, vol. 9, pp. 324–339, 1982.
 [20] S. R. Meikle, R. R. Fulton, S. Eberl, M. Bahlbom, K.P. Wong, and M. J. Fulham, “An investigation of coded aperture imaging for small animal SPECT,” IEEE Trans. Nucl. Sci., vol. 28, pp. 816–821, 2001.
 [21] A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” in Proc. of Intl. Conf. Comp. Graphics. and Inter. Tech., 2007.
 [22] A. Agrawal and Yi Xu, “Coded exposure deblurring: Optimized codes for psf estimation and invertibility,” in IEEE Computer Vision and Pattern Recognition, 2009.
 [23] R. Raskar, A. Agrawal, and J. Tumblin, “Coded exposure photography: Motion deblurring using fluttered shutter,” in ACM Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH), 2006.
 [24] A. Veeraraghavan, D. Reddy, and R. Raskar, “Coded strobing photography: Compressive sensing of high speed periodic videos,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 4, pp. 671–686, 2011.
 [25] R. F. Marcia and R. M. Willett, “Compressive coded aperture superresolution image reconstruction,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), 2008.
 [26] R. F. Marcia and R. M. Willett, “Compressive coded aperture video reconstruction,” in Proc. 16th Euro. Signal Process. Conf., 2008, Lausanne, Switzerland, August 2008.
 [27] R. F. Marcia, Z. T. Harmany, and R. M. Willett, “Compressive coded apertures for highresolution imaging,” in Proc. 2010 SPIE Photonics Europe, Brussels, Belgium, April 2010.
 [28] J. Romberg, “Compressive sampling by random convolution,” SIAM Journal on Imaging Sciences, vol. 2, no. 4, pp. 1098–1128, 2009.
 [29] N. J. A. Sloane and M. Harwit, “Masks for Hadamard transform optics, and weighing designs,” Appl. Opt., vol. 15, no. 1, pp. 107–114, 1976.
 [30] A. Ashok and M. A. Neifeld, “Pseudorandom phase masks for superresolution imaging from subpixel shifting,” Appl. Opt., vol. 46, no. 12, pp. 2256–2268, 2007.
 [31] S. R. Gottesman and E. E. Fenimore, “New family of binary arrays for coded aperture imaging,” Appl. Opt., vol. 28, no. 20, pp. 4344–4352, 1989.
 [32] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 15, no. 12, pp. 4203–4215, 2005.
 [33] E. Candès, “Compressive sampling,” in Int. Congress of Mathematics, 2006, vol. 3, pp. 1433–1452.
 [34] E. J. Candès, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Communications on Pure and Applied Mathematics, vol. 59, no. 8, pp. 1207–1223, 2006.
 [35] W. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak, “Toeplitzstructured compressed sensing matrices,” in Proc. of Stat. Sig. Proc. Workshop, 2007.
 [36] J. D. Haupt, W. U. Bajwa, G. M. Raz, and R. D. Nowak, “Toeplitz compressed sensing matrices with applications to sparse channel estimation,” IEEE Transactions on Information Theory, vol. 56, no. 11, pp. 5862–5875, 2010.
 [37] R. Robucci, J. D. Gray, L. K. Chiu, J. Romberg, and P. Hasler, “Compressive sensing on a CMOS separabletransform image sensor,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1089–1101, 2010.
 [38] V. Majidzadeh, L. Jacques, A. Schmid, P. Vandergheynst, and Y. Leblebici, “A (256*256) pixel 76.7mW CMOS imager/compressor based on realtime inpixel compressive sensing,” in IEEE International Symposium on Circuits and Systems, 2010, pp. 2956–2959.
 [39] W. Chi and N. George, “Phasecoded aperture for optical imaging,” Optics Communications, vol. 282, pp. 2110–2117, 2009.
 [40] J. Haupt and R. Nowak, “Compressive sampling vs. conventional imaging,” in Proc. IEEE Intl. Conf. on Imaging Processing, 2006, pp. 1269–1272.
 [41] M. Raginsky, R. M. Willett, Z. T. Harmany, and R. F. Marcia, “Compressed sensing performance bounds under Poisson noise,” IEEE Trans. Signal Process., vol. 58, no. 8, pp. 3990–4002, 2010.
 [42] S. Wright, R. Nowak, and M. Figueiredo, “Sparse reconstruction by separable approximation,” IEEE Trans. Signal Process., vol. 57, pp. 2479–2493, 2009.
 [43] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE J. Sel. Top. Sign. Proces.: Special Issue on Convex Optimization Methods for Signal Processing, vol. 1, no. 4, pp. 586–597, 2007.
 [44] J. Haupt and R. Nowak, “Signal reconstruction from noisy random projections,” IEEE Trans. Inform. Theory, vol. 52, no. 9, pp. 4036–4048, 2006.
 [45] R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. B. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253–263, 2008.
 [46] L.I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 14, pp. 259–268, 1992.
 [47] E. J. Candès and D. L. Donoho, “New tight frames of curvelets and optimal representations of objects with piecewise singularities,” Communications on Pure and Applied Mathematics, vol. 57, no. 2, pp. 219–266, 2004.
 [48] L. Applebaum, W.U. Bajwa, M.F. Duarte, and R. Calderbank, “Asynchronous codedivision random access using convex optimization,” Accepted to Elsevier Phy. Commun., 2011.
 [49] J. W. Stayman, N. Subotic, and W. Buller, “An analysis of coded aperture acquisition and reconstruction using multiframe code sequences for relaxed optical design constraints,” in Proceedings SPIE 7468, 2009.
 [50] R. Varga, Geršgorin and His Circles, SpringerVerlag, Berlin, Germany, 2004.
Zachary T. Harmany Zachary Harmany received the B.S. (magna cum lade, with honors) in Electrical Engineering and B.S. (cum lade) in Physics from The Pennsylvania State University in 2006. Currently, he is a Ph.D. student in the department of Electrical and Computer Engineering at Duke University. In 2010 he was a visiting researcher at The University of California, Merced. His research interests include nonlinear optimization, functional neuroimaging, and signal and image processing with applications in medical imaging, astronomy, and night vision. 
Roummel F. Marcia Roummel Marcia received his B.A. in Mathematics f 