Learning an optimal PSFpair for ultradense 3D localization microscopy
Abstract
A longstanding challenge in multipleparticletracking is the accurate and precise 3D localization of individual particles at close proximity. One established approach for snapshot 3D imaging is pointspreadfunction (PSF) engineering, in which the PSF is modified to encode the axial information. However, engineered PSFs are challenging to localize at high densities due to lateral PSF overlaps. Here we suggest using multiple PSFs simultaneously to help overcome this challenge, and investigate the problem of engineering multiple PSFs for dense 3D localization. We implement our approach using a bifurcated optical system that modifies two separate PSFs, and design the PSFs using three different approaches including endtoend learning. We demonstrate our approach experimentally by volumetric imaging of fluorescently labelled telomeres in cells.
1 Introduction
In a conventional imaging system, the spatial resolution is bounded by Abbe’s diffraction limit. In a high numerical aperture microscope, this corresponds to approximately half the optical wavelength, i.e. 200 nm for visible light. For cellimaging applications, this obscures subcellular features of interest with dimensions on the nanoscale. Since 2006, SingleMolecule Localization Microscopy (SMLM) superresolution techniques have revolutionized biologicalstructure imaging by circumventing the diffraction limit, namely, using many lowdensity images of different sets of fluorescent emitters to generate a highresolution reconstruction [1, 2, 3].
While biological structures are intrinsically 3D, attaining axial (z) information at superresolution is not trivial. This is due to the standard Point Spread Function (PSF) of the microscope being approximately symmetric about the focal plane, and having only a thin axial range before the signal becomes very diffuse. Several approaches have been developed to capture 3D data in microscopy. For example, one can acquire multiple 2D datasets at different focal planes [4, 5, 6], or determine the axial positions of emitters from the images themselves. The latter can be enabled by PSF engineering, where the PSF is modified to encode the desired 3D information. This is typically done by either inducing an intentional aberration in the imaging path, e.g. a cylindrical lens [7] or a phase mask at the Fourier plane of the microscope using an extended optical system [8, 9, 10]. Notably, while providing scanfree axial information, this approach poses a limitation on the maximum emitter densities suitable for imaging, due to the increased lateral size of the PSFs, and requires more complex imageanalysis algorithms than 2D localizaiton microscopy.
When imaging samples that are even just several microns thick, engineered PSFs spread the signal photons over a large lateral footprint relative to the infocus PSF [11]. This poses a difficult localization challenge when the experimental objective of obtaining a superresolution reconstruction necessitates that many molecules be localized in a densely labelled structure. Currently available software packages struggle to achieve good performance in this regime [12, 13]; however, recent work has shown that deep neural networks are well suited to the problem [14], enabling highquality reconstruction estimations from low emitter densities [15, 16], and increaseddensity processing [17, 18, 19, 20, 21, 22, 23].
Deep Learning (DL) has excelled in a variety of challenging computationalimaging problems in computer vision, computational photography, medical imaging, and microscopy [24, 25]. Within the realm of computational microscopy, DL has been deployed for tasks such as cell segmentation [26], image restoration [27, 28, 29, 30], sample classification [31, 32], artificial labelling [33], phase imaging [34, 35, 36], optical tomography [37], lifetime imaging [38], singlemolecule localization [17, 18, 15, 19, 39, 40, 41, 20, 21, 22, 23, 16, 42], aberration correction [43, 44, 45, 46], CryoEM [47], and more [48].
An exciting recent application enabled by deep learning is the endtoend design of “computational cameras.” Powered by differentiable imaging models and backpropagation, endtoend learning jointly optimizes the sensing system alongside the dataprocessing algorithm, thus enabling both components to work harmoniously. This approach has quickly expanded within the computationalimaging community for numerous applications in computer vision and computational photography, for example, color sensing and demosaicing [49, 50], illuminationdesign through scattering media [51], extendeddepthoffield imaging [52, 53, 54], monocular depth estimation [53, 52, 55, 56], highdynamicrange imaging [57, 58], and hyperspectral imaging [59, 60]. In computational microscopy, endtoend learning has been utilized by our group and others to enhance various computational modalities such as sample classification [31, 32], singlemolecule color sensing and 3D localization [41, 21], quantitative phase imaging [61] and multiphoton microscopy [62].
Here, to address the challenge of high density 3D localization from a snapshot, we suggest the simultaneous use of multiple PSFs, as well as the method to design and implement the optimal phase masks. Specifically, we introduce a bifurcated optical system that modifies two separate PSFs with a pair of phase masks using LiquidCrystal Spatial Light Modulators (LCSLMs). First, we demonstrate that there is an advantage of splitting precious signal photons into two channels compared to a single PSF system even in moderately dense emitter conditions. For this task we utilize a PSFpair that splits the 3D information into complementary channels, namely, for lateral and axial localization. To localize the emitters from the obtained pair of images we employed a convolutional neural network (CNN) architecture based on DeepSTORM3D [21]. Next, we revisit the problem of optimizing the information content of a single emitter in a pair of PSF measurements [9]. Lastly, we implement endtoend learning to jointly design our localization algorithm and the PSFpair. The resulting PSFs, which we call the Nebulae PSFs, achieve unprecedented performance in localizing volumes of dense emitters in 3D. We quantify and directly compare the performance of each approach by simulation and experimentally with volumetric imaging of fluorescently labelled telomeres in fixed cells. Finally, we demonstrate continuous, scanfree, livecell tracking of 60 telomeres in a single cell’s nucleus simultaneously with 30 nm 3D precision and 100 ms temporal resolution over an axial range of 5 .
2 Optical setup
Dualcamera systems have been utilized in the past in microscopy for localizing single emitters in 3D [63, 64, 65, 66]. Most recently, the use of a dualview scheme was utilized in DAISY [67] to combine Astigmatismbased PSF engineering with Supercritical Angle Fluorescence (SAF) [68] to provide a semiisotropic 3D resolution over a 1 axial range. However, while these works proposed creative designs to combine the information in both channels, their objective was to enable a precise and experimentallyrobust axial localization of single emitters. In addition, the proposed PSFs were handcrafted based on desired properties and not fully optimized. Here we use a bifurcated optical system with two detection paths for the task of precise 3d localization of multiple emitters in ultradense samples.
The optical system used to implement the monocular PSFpair is presented in Fig. 1. Briefly, our system is composed of an epifluorescence microscope extended with two identical detection paths. The fluorescent light emitted from the particle in the sample is split using a polarizing beam splitter into two 4 optical processing systems, each equipped with a LCSLM placed in the Fourier plane. The LCSLM is used to implement a phase modulation modifying the emission pattern to encode the 3D position onto the 2D captured measurements, which are then decoded jointly via further image processing. For a list of the specific components used in our implementation see supplementary section A.5.2.
We model our system using the scalar diffraction approximation where the emitters are modeled as isotropic point sources [69]. Thus, the PSFs of our system can be efficiently computed by a Fast Fourier Transform (FFT). A full description of our imaging model is provided in supplementary section A.1.
Equipped with the system above, the question is what pair of PSFs is suited for the task of dense 3D localization. In the next sections we gradually answer this question.
3 Disentangling lateral/axial information
For simplicity, we first consider the problem of designing an additional PSF while keeping the first PSF fixed to the 4 Tetrapod [9], which was optimized for the sparse case in a single channel. Given that Tetrapod PSFs encode depth at the cost of a large lateral footprint, we would like the complementary PSF to be compact and help disentangle the approximate lateral positions in overlapping regions. Then, aided by this additional measurement, the overlapping Tetrapods can be decoded to recover the 3D positions. In other words, we are, broadly, separating the problem into an “axial localization” channel, encoded by the Tetrapod PSF, and a “lateral localization” channel, to be encoded by a different PSF.
For encoding lateral information we propose the use of an ExtendedDepthofField (EDOF) PSF, namely, a PSF that maintains its lateral shape over extended axial ranges. However, unlike traditional EDOF designs [70, 71], the desired PSF needs to be laterallycompact and signalefficient, because it should work for very dense samples. These requirements motivated us to design a novel EDOF suited for the task.
3.1 EDOF PSF design
To design the desired EDOF PSF, we formulate the problem as a phase retrieval task. Specifically, given a desired axial range (e.g. 4 s), we first generate a synthetic zstack comprised of the approximate infocus Airy disk PSF at 200 nm steps. Afterwards, we use stochastic gradient descent iterations with importance sampling [72] to recover the phase mask associated with this PSF. Let D be the diffraction limit for the assumed optical setup. Then our cost function for this task is given by
(1) 
where is the onaxis PSF at depth , is the number of axial slices (), and is a weighting term added to quickly “squeeze” the signal photons into the diffraction limited spot, given by
(2) 
The resulting phase mask and PSF are presented in Fig. 2. This simple approach leads to a powerful EDOF, with very high signalefficiency and small lateralfootprint (Fig. 2b) compared to previous designs [70, 71] (see supplementary section A.2 for comparisons and implementation details). While we designed and implemented this EDOF to complement the Tetrapod information in emitterdense regions, its potential applications extend far beyond our localization task.
Notably, recent endtoend designs of EDOF PSFs have achieved quite compelling results [52, 53, 54]. In particular, the phase mask presented in [54] resembles the result of our approach. However, these datadriven approaches are ultimately datasetdependant, and take hours of training to design for a new range, whereas our approach is independent of the dataset and converges in less than 2 minutes on GPU.
3.2 Dualview vs Singleview
In typical LCSLM PSF engineering systems, half of the signalphotons are discarded, since the LCSLM can only modulate polarized light. Therefore, in our system the second PSF measurement comes at no additional photoncost, with the only caveat being the need of an additional detection path in the twoview setup. It should be noted that 4 systems that utilize a Diffractive Optical Element (DOE), instead of a LCSLM, do not suffer from this photon loss. Yet, this comes at the cost of versatility. Now that we have designed a novel EDOF PSF for our task, we can test the hypothesis whether or not splitting the signal into two cameras is in fact beneficial compared to a DOE based system.
Since neural networks are already established to be incredibly efficient for dense localization [17, 21], we modify our previously published fully convolutional architecture [21] to receive an image with two channels comprised of the two measurements. For training details and network architecture see supplementary section A.4. Our results in simulation (see supplementary Fig. S8) confirms that for the task of dense 3D localization, a split signal dualview system is superior to a single measurement with a DOE, even when that measurement is sensed using an optimal endtoend learned design [21].
3.3 TetrapodEDOF experimental validation
Next, we validate our approach in cells. For this task, we imaged fluorescently labeled telomeres in fixed human osteosarcoma (U2OS) cells (for fixation and labeling see supplementary section A.5.2). We first chose fixed cells to enable the acquisition of a ground truth approximation via axial scanning. The imaged cell line was hypertriploid, meaning that it has an unusually large number of telomeres (70130), which facilitates testing our method in a dense environment. The experiment consisted of two parts: first, each cell was scanned in the axial direction using a piezo stage (100 nm steps) the 3D ground truth positions were approximated via fitting (see supplementary section A.5.1). Afterwards, we recorded 3 snapshot images: one with the Tetrapod PSF utilizing 100% of the signal (accomplished using a longer exposure time) and two more with the signal split 50%/50% between the Tetrapod PSF and the EDOF PSF (Fig. 3). In agreement with simulations, these results demonstrate that at a density of 0.27 , the TetrapodEDOF pair is superior in localizing overlapping telomeres as measured by the Jaccard index [13, 21].
While the complementary PSFpair is effective, this way of decoupling the 3D positional information by dedicated “lateral” and “axial” channels is unlikely to be the optimal solution. For example, beyond a certain density, the axial information in the Tetrapod PSF will be occluded completely by overlapping PSFs. Having a second measurement that is solely dedicated to encode the lateral information (EDOF PSF) will not be beneficial for decoding . This motivates us to revisit the task of designing a PSFpair for dense 3D localization. For simplicity we start with the singleemitter case, viewed from an estimation theory perspective.
4 Optimal PSFpair design
4.1 Singleemitter case
Optimal PSFs for twochannel localization of only a single emitter can be derived by minimizing the Cramér Rao Lower Bound (CRLB) [73, 74, 9]. Considering the system in Fig. 1, we can jointly optimize the sensitivity of a PSFpair with respect to a change in the 3D position of a single emitter. The CRLB then defines the lower bound on the precision of unbiased estimation of the 3D position from a noisyPSF pair. Unlike the original Tetrapod optimization [9], here we employed a pixelwise approach to explore aberrations not spanned by loworder Zernike polynomials. For a full derivation of the CRLB and the optimization objective see supplementary section A.3.
The CRLBoptimized PSF pair is given in Fig. 4. Notably, the CRLB of the PSFpair is similar to the CRLB of a 4 Tetrapod PSF with twice the signal. Therefore, as can be expected, splitting the information does not improve precision in the singleemitter case, suggesting that a twochannel system is not justified for sparse localization.
The resulting PSFpair combines the concept of biplane imaging and PSF engineering in an elegant way to encode the 3D position in two measurements. Simulation results show that this PSFpair outperforms the TetrapodEDOF pair described earlier (see supplementary sections A.6 and A.7); however, previous work demonstrates that endtoend designs using deep neural networks can lead to superior performance [21], and this is the path we describe next.
4.2 Endtoend learning of a monocular PSFpair
As shown previously [21], endtoend designs lead to efficient PSF patterns that are highly suited for dense 3D imaging. Here, we extend the DeepSTORM3D approach to tackle the problem of designing a PSFpair. This is achieved by designing the encoding stage to incorporate two disjoint and differentiable physicalsimulation layers (Fig. 5a). Each layer is parameterized by its own phase mask ( & ) dictating the respective PSF (see supplementary section A.1 for the imaging model). During training, we randomly simulate 3D positions (), and feed them to the two physical layers. Each physical layer encodes the 3D positions to their simulated sensor image ( & ). These images are concatenated and fed to the localization CNN (parameterized by ) which decodes them in order to recover the underlying 3D positions (). The difference between the simulated and the recovered positions is quantified by our loss function () and backpropagated to jointly optimize the phase masks ( & ), and the localization CNN parameters () endtoend. This process is usually repeated for 30 epochs until convergence. For training details see supplementary section A.4.
The endtoend learned phase masks and their respective PSFs, hereafter referred to as the Nebulae PSFs, are presented in Fig. 5. Two distinctive features stand out in this pair compared to the previous approaches described earlier. First, both channels encode 3D information in their individual intensity patterns, as well as in the relative position of the intensity centroids throughout the entire axial range, a trait conceived to be useful for 3D localization before [63]. Second, in phasespace, the learned phase masks are approximately rotated versions compared to one another, although our optimization was performed pixelwise and our loss function did not include any constraints on the mutual information of both measurements.
To evaluate the performance of the Nebulae PSFs, we first compare them in simulation to the TetrapodEDOF pair (section 3), as well as to a single channel Tetrapod PSF with twice the signal (Fig. 7). The results indicate that the Nebulae PSFs achieve unprecedented performance in localizing dense 3D emitters over a large axial range of 4 s assuming our experimental telomere imaging conditions, i.e. 15K signal photons per emitter and 500 background photons per pixel.
4.3 Nebulae PSFs experimental validation
Next, we applied the Nebulae PSFs in fixed cells, and compared the performance to the TetrapodEDOF pair experimentally (Fig. 7). Similar to section 3.3, we first found the emitter positions by axial scanning, for comparison to our snapshots images taken at a single focal plane: once with the TetrapodEDOF pair, and once with the Nebulae PSFs. The results show that at a density of 0.34 , the Nebulae PSFs are superior in localizing overlapping telomeres as measured by the Jaccard index. The Nebulae PSFs were also found to have superior performance relative to the CRLBoptimized pair from section 4.1. For a headtohead comparison in simulations as well as experiments see supplementary sections A.6 and A.7.
5 Live telomere tracking
Throughout this work we have imaged and localized 3D positions of telomeres in fixed cells to facilitate quantitative comparisons of the proposed solutions. However, more pertinent is the application of our method to multipleparticletracking in live cells, where axial scanning is inapplicable due to the motion of the objects. Here, our simultaneous multichannel snapshot approach enables capturing the behavior of diffusing telomeres in living cells at an unprecedented combination of density, speed, and axial range [75].
Quantifying telomere dynamics in live cells is of paramount importance for answering fundamental questions under normal and disease conditions [75, 76], as tracking the 3D diffusion of telomeres unveils information on the chromatin environment and on DNA folding regulation. One challenge in observing chromatin in living cells is the intrinsic biological heterogeneity between diffusing telomeres [77]. Therefore, to fully characterize chromatin dynamics it is desired to capture all single telomere trajectories, including in emitterdense regions.
Figure 8 demonstrates the full applicability of the Nebulae PSFs for volumetric tracking of 61 diffusing telomeres, spanning an axial range of 4.7 in the nucleus of a living U2OS cell. The trained localization CNN is able to reliably track all of the labelled telomeres over the course of 500 frames (50 s), even those in close proximity, and with a low signaltonoise ratio. As evident in the resulting tracks (Fig. 8), the telomeres exhibit variable diffusion profiles (Fig. 8e) necessitating individual processing as facilitated by our approach.
6 Discussion
In computational imaging, the codesign of optics and imageprocessing algorithms has been introduced in various applications spanning the fields of computational photography and computational microscopy. In the realm of localization microscopy, this is the key concept in PSF engineering [7, 8, 9], and has been utilized to extend the imaging capabilities in SMLM [12, 78]. Until recently, however, the standard approach was to design the optical system to optimize a specific trait of the PSF that would facilitate its processing afterwards, e.g. an axialdisplacementinduced rotation in the DoubleHelix PSF [79, 8]. In addition to conceived physical properties, informationcontentdriven optimization was also used in PSFdesign; for example, in [80], where the PSF was optimized for depth discrimination. Similarly, for SMLM applications [9] the PSF has been optimized to minimize the variance of an unbiased estimator for localizing the 3D position of a point source. While the latter two identified theoretically optimal solutions to encode the information, in complex environments, the decoding step is often limiting the problem as well.
Recently, powered by deep learning and differentiable physical models, endtoend designs of physical elements and dataprocessing algorithms have been demonstrated by our group and others to facilitate efficient imaging modalities in microscopy [31, 32, 61, 62]. Specifically in SMLM, the efficiency of jointly designing PSFs and deep networks was demonstrated for multicolor 2D imaging [41] and snapshot dense 3D imaging [21].
In this work, we addressed the challenging task of multiPSF engineering for dense 3D imaging. Specifically, we proposed three different PSFpairs, each derived with a different set of considerations. For the first pair, we introduced an efficient and laterallycompact EDOF PSF to complement the Tetrapod PSF at high emitter densities. Notably, this EDOF PSF has numerous applications in its own right for imaging in thick samples with little need for deconvolution [71]. In the second pair, we extended the CRLBdesign metric to optimize the sensitivity of a PSFpair in the singleemitter case. Lastly, we presented the Nebulae PSFs, learned endtoend to achieve reliable dense 3D localization via from snapshot measurements. We validated each of the proposed designs numerically and experimentally. To demonstrate the applicability for dense 3D tracking in live cells, we tracked regions of dense telomeres using the Nebulae PSFs, enabling a statistical analysis of population heterogeneity, and highresolution 3D modelling of chromatin dynamics in single cells.
In contrast to standard CNN filters, a notable aspect of endtoend learning with physical layers, is our ability to visualize and interpret the designed physical elements. For example, for the Nebulae PSFs, the signal photons are compacted into a single lobe in each channel. This feature is understandably advantageous in the dense fields of emitters with limited SNR used in our simulations and experimental conditions. Moreover, the intensity patterns at each axial position combine elementary depthencoding aberrations, such as astigmatism, rotation, and relative interchannel single lobe movement. What separates these PSFs from predetermined designs is the simultaneous deployment of multiple depthencoding strategies making full use of the decoding CNN capacity, and thereby optimizing dense 3D localization from noisy measurements.
Notably, our approach is not limited to particle tracking. By tweaking the physicalsimulation layers, this method can be readily adapted to any pointsourcesensing paradigm, including DAISY [67], MINFLUX [81, 82], multiplane microscopy [5, 4, 83, 6], and more. In a concurrent work [84], similar ideas were pursued for multiplane PSF engineering demonstrating promising results in simulations. In these, and for SMLM applications, it is likely that modifying the CNN architecture, initializations, training sets, and loss functions, may further improve the performance, raising questions of how globally optimal is the solution derived in our framework. At this point, it is unclear how each optimization component affects the learning process, a question that will be addressed in future work. In particular, we anticipate that the emerging suite of tools developed to make deep learning more accessible to the community will assist in answering these critical questions [85, 86, 87].
To the best of our knowledge, this work reports the first endtoend learning of multiple PSFs with experimental feasibility. Such multiPSF designs may prove useful outside the realm of computational microscopy. For example, in computational photography, the design of coded aperture pairs and their optimal combination with stereo imaging has been a long standing question [88, 89, 90, 91, 92, 93]. Most recently, Gil et al. [93] proposed to exploit identical phasemask pairs for improved depth estimation and online stereo calibration. We believe this work paves the way for asymmetric strategies in the field of computational photography, with applications in stereo imaging, and multishot monocular depth estimation. Depending on the specific task at hand, the optimal PSFpair could vary, however, we believe that the approaches to PSFpair optimization in this work will provide a useful initialization to the general problem.
Funding
The Israel Science Foundation (grant 852/17), the Technion Ollendorff Minerva Center, the Zuckerman Foundation. H2020 European Research Council Horizon 2020 (802567), Google Faculty Research Award for Machine Perception, The Israel Science Foundation (grant 450/18).
Acknowledgments
We thank Rotem Mulayoff for insights and fruitful discussions with respect to the EDOF design. We also thank Romain F. Laine for his help with conceiving the name Nebulae. We gratefully acknowledge the support of the NVIDIA Corporation with the donation of the Titan Xp and the Titan V GPU used for this research. We thank Google for the cloud units provided to accelerate this research.
Disclosures
The authors declare no conflicts of interest.
Appendix A Appendix
a.1 Imaging model
In this section we briefly review the imaging model used throughout this work. Our system is composed of fluorescent emitters with an emission wavelength suspended in water (refractive index of ) above an oilimmersed objective (refractive index of ). The emitters are imaged with an objective lens (numerical aperture of NA), focused at a focus plane , and their image is magnified onto the sensor with a microscope magnification M. Let denote the phase mask placed in the conjugate back focal plane of an extended emission path with a 4 system (Fig. 1), and let denote the normalized radial coordinates in the Fourier plane such that at . Under the scalar approximation [69], the PSF of a point source located at above a wateroil interface is given by
(S1) 
where are the coordinates at the image plane, is the twodimensional Fourier transform, is the effective aperture of the compound system, limited by for high NA objectives
(S2) 
and is the accumulated phase due to the emitter 3D position and the focal plane setting. This phase can be decomposed into lateral and axial components
(S3) 
The lateral component is assumed to be a linear phase (i.e. shiftinvariant convolution system), given by
(S4) 
As for the axial component, it is split into two terms to account for refractive indexmismatch [94]: the phase accumulated in water due to the emitter depth , and the phase accumulated in oil due to a focus shift from the coverslip
(S5) 
where,
(S6) 
Finally, the PSF in eq. S1 is slightly smoothed in image space
(S7) 
Where denote convolution, and is a 2D Gaussian kernel, with a standard deviation that is fit empirically to match experimental data (usually 70 nm). This blur accounts for the finite size of the emitter, its spectrum, and the inherent blur in the optical system, alleviating the need to explicitly model these effects. For a full derivation of the model that includes neglected dipole and nearfield effects, the reader is referred to [95, 10].
The image of a set of emitters is given by the incoherent sum of their PSFs
(S8) 
where is the 3D position of the emitter.
The commonly used measurement model is given by a datadependant Poisson noise, and an additive Gaussian read noise
(S9) 
where is the Poisson distribution, is a perpixel background noise, is the the normal distribution, is a baseline count level, and is the readnoise variance.
To make the measurement model differentiable, by the law of large numbers, we can approximate the Poisson noise with a Gaussian noise using the central limit theorem
(S10) 
The resulting datadependant noise approximation is implemented using the reparameterization trick [96]
(S11) 
where is a realization of a standard normal distribution
(S12) 
Now, the measurement model is differentiable w.r.t. the phase mask and is therefore suited for endtoend learning.
a.2 EDOF PSF design
In this section we provide the implementation details for designing the EDOF PSF, then compare our result with existing popular designs. There are several ways to implement an EDOF PSF, including: placing an axicon in the optical path [97], using ring apertures, and reducing the numerical aperture of the system. Due to photonefficiency considerations, in this work we focus on the implementation of an EDOF PSF using a phase mask. Our general strategy is to formulate the design problem as a phaseretrieval task as detailed next.
First, we start by simulating the infocus Airy disk PSF for the desired optical system. Afterwards, this PSF is thresholded to keep only the main lobe with diameter D, and the result is fitted with a 2D Gaussian . This Gaussian is then replicated to generate a synthetic zstack with 200 nm jumps between slices. is also used to define a weighting matrix , that “squeezes” signal photons quickly into the diffraction limited spot, . Let be centered pixel coordinates in image space, the matrix is given by
(S13) 
where in our implementation , and determined empirically to achieve appealing results.
Given , we try to retrieve the corresponding phase mask associated with the synthetic zstack via phase retrieval [72]. This is implemented using Stochastic Gradient Descent (SGD) with importance sampling to minimize the following cost function
(S14) 
where is the onaxis PSF at depth , and is the number of axial slices (). Let denote the current PSF stack dictated by phase mask , such that . Our optimization is comprised of the 3 following steps:

We compute the correlation of with at each axial slice
(S15) and choose the three axial slices with the lowest correlation.

To avoid overfitting the sampled 200 nm “knots” throughout the axial range, we perturb each of locally with a random continuous shift while clipping outofrange values.

We calculate the gradient of the cost in eq. S14 sampled only at , and take a gradient step.
In the third step, we experimented with a few adaptive SGD optimizers [98, 99, 100, 101], and ultimately chose Adam [98]. The process is repeated for 400 iterations, or till the loss function stagnates.
Notably, the correlation in our implementation serves as sideinformation [102], and is used to adaptively sample slices and direct the SGD iterations. Compared with a stochastic sampling approach, this has the benefit of accelerating convergence, and empirically led to better solutions.
Figure S1 compares the result to two common EDOF implementations: the cubic phase mask [71], and the randomly sampled Fresnel lenses phase mask [70]. The amplitude of the cubic phase mask was chosen such that the PSF is consistent over the FOV, but retains as much SNR as possible. Our proposed EDOF has three significant advantages over the classical designs: (1) its lateral extent is much smaller than the cubic phase mask PSF, matching our density requirements, (2) the SNR in the main spot is higher than both other methods, and (3) the proposed phase mask is smooth compared to randomly sampled Fresnel lenses. This facilitates its implementation using LCSLM devices as these suffer from interpixel crosstalk [103].
a.3 CRLB optimization
In this section we derive the Cramér Rao Lower Bound (CRLB) [73, 74, 9] of the system in Fig. 1. For simplicity, we start with the assumption that the measurement model is reduced to a Poisson datadependent noise only. At the end, we also provide the expression for the extended case including the read noise.
First let us start with some notation. Let denote the 3D position of a single emitter imaged with the system in Fig. 1, let denote the concatenated coordinates at the image plane, and let denote the model PSF of the emitter in the detection path with phase modulation . Assuming Poisson statistics for the source and background signals, the measured PSF is given by
(S16) 
where is a perpixel background. The loglikelihood function for the measurement in eq. S16 is given by
(S17) 
where is the number of pixels in the image, and is a function of the measurements that is independent of the unknown 3D position .
Given a loglikelihood function, the Fisher Information matrix is defined as [73]
(S18) 
Substituting the loglikelihood from eq. S17 we get
(S19) 
Assuming independent photon arrivals in each detection path, the measurements , become independent. Therefore, the joint information of both PSFs is given by the sum of the individual information from each PSF. Formally, let denote the information matrix of the measurement with phase modulation . The joint Fisher Information matrix for measurements , is given by
(S20) 
Let denote the coordinate of the 3D position. Given , the CRLB for estimating is defined as [73]
(S21) 
where denote the inverse of the Fisher information matrix. Based on eq. S21, to derive the phase masks , optimizing the CRLB for all three estimated parameters , we minimize the following cost function
(S22) 
In our implementation, is evaluated at onaxis positions , where is sampled each 250 nm throughout the desired axial range. We also simplify the perpixel background term to a single scalar of 15 photons per pixel, and scale the PSFs to match realistic signal counts encountered in SMLM imaging, i.e. 2000 photons per emitter. Notably, different from our previous work [9], we optimized the CRLB using a perpixel approach rather than constraining the solution to a subspace of Zernike polynomials. This was particularly important to efficiently navigate the wide variety of possible solutions.
Finally, in this work we focused our attention on SMLM experimental conditions. Therefore, for our purpose the read noise effects were negligible. However, the optimization is readily extended to the mixed PoissonGaussian case by revisiting eqs. S19, S17 and S16. Specifically, assume the measured PSF is given by
(S23) 
where is the normal distribution, is a baseline, and is the variance of the read noise.
We can approximate the Poisson noise by a Gaussian noise using eq. S10
(S24) 
Assuming both noise sources are independent we get
(S25) 
The resulting loglikelihood function for the measurement in eq. S25 is given by
(S26) 
where is the number of pixels in the image. Substituting the loglikelihood from eq. S26 in the definition from eq. S18 we get
(S27) 
a.4 Learning details
CNN architecture
In this work, we adapt the CNN architecture previously proposed in DeepSOTRM3D [21] to process an image with 2 channels (Fig. S2). Our architecture is relatively light with only 440K trainable parameters, comprised of 3 main modules:

Multiscale contextaggregation module: we used dilated convolutions [105] to increase the receptive field of each layer while keeping a fixed number of 64 channels. The two concatenated snapshots are processed through 6 convolutional blocks with increasing dilation rates. The maximal dilation rate was set according to the PSFs lateral footprint: for the TetrapodEDOF pair, and for the other two PSF pairs (see Fig. S2). We also include skip connections to improve gradient flow [106] (not shown in the figure).

Upsampling module: composed of two consecutive resizeconvolutions [107] to increase the lateral resolution by a factor of 4. We used nearestneighbor interpolation to resize the images. Assuming a CCD pixelsize of 110 nm, the lateral pixelsize of the upsampled features is 27.5 nm.

Prediction module: after superresolving emitters in the lateral dimension, we further refine their axial position through 3 additional convolutional blocks with an increased number of channels. For a 4 m range, we use 80 channels, i.e. a voxelsize of 50 nm in . The final prediction is given by a convolution followed by an elementwise HardTanh to limit the output range to , where W is set empirically to 800 to account for class imbalance (occupied vs. vacant voxels).
The spatial supports of all convolutional filters are . Each convolution block is follow by a Batch Normalization layer, and a LeakyReLU nonlinearity with slope . Note that depth is exchanged with channels as our architecture is composed of solely 2D convolutional layers. Afterwards, these dimensions are permuted in the recovered volume. To compile a list of localizations at test time, we threshold the voxel values and find local maxima in clustered components (details in section A.4.5). Lastly, to efficiently learn the phase masks with reduced computation, we modify the architecture in a similar fashion to that described in [21].
Notably, in this work we used the same encoder to process both images. In our implementation the image pair is first warped using a calibrated affine transform prior to CNN processing. However, in case of severe interchannel misalingment this is expected to be suboptimal, and a “Ynet” structure with separate encoders should be considered. In particular, one of the encoders could be potentially swapped with a spatial transformer network [108] to alleviate the need for calibration.
Training set
To learn a localization CNN solely with predefined phase masks, we simulate a training set composed of 10K simulated imagepairs and their corresponding labels which are lists of emitter positions. 9K examples were used for training with 1K examples held out for validation. Alternatively, to jointly learn the phase masks and the localization CNN parameters, the training set is composed of solely simulated emitter positions, as the respective imagepairs are being changed throughout iterations according to the phase masks.
In our implementation the training positions are randomly drawn within the 3D cube of possible locations in order for the method to generalize to arbitrary imaged structures. The Boolean grid used as label in training is given by projecting the continuous positions on the recovery grid (voxel size of ).
Given a set of 3D locations, the expected model images are simulated using the measurement model in eq. S9. To accurately model experimental data in our simulations, we image beads on the coverslip prior to the experiment, and retrieve the aberrated pupil functions using VIPR [72]. To make our simulations realistic, we diversify the training conditions to include experimentally variability. Namely, we vary the emitter density, the signaltonoise ratio, the amount of blur, and any additional expected experimental challenges (e.g. motion blur, laser fringes etc.). For example, in telomere imaging we have observed a highly nonuniform perpixel background, presumably resulting from the nucleus autofluorescence. To model this effect, we approximate the perpixel background in eq. S9 using a superGaussian
(S28) 
where is the combined 2D coordinates in image space, , are scaling parameters, is the 2D centroid, and is the covariance matrix. These parameters are augmented in training to make the model robust to their variations.
Loss function
Let denote the GT boolean volume, and denote the network prediction. Our loss function for training the net is a combination of two terms
(S29) 
The first term is a 3D heatmap matching loss, given by
(S30) 
where is a 3D Gaussian kernel with a standard deviation of 1 voxel. This term measures the proximity of our prediction to the simulated ground truth by measuring the distance between their respective heatmaps.
The second term is a measure of overlap, given by
(S31) 
This term provides a soft approximation of the true positive rate in the prediction. Note that doesn’t take into account false positives, and hence if optimized alone will result in a predicted volume of 1s. Although, here this is not a feasible solution as it is not favored by . In our implementation we weight voxels containing emitters with a factor of W=800 in order to balance out the contributions of vacant and occupied voxels. Hence, the CNN output is constrained to be in the range . This strategy makes optimization easier and prevents gradient clipping.
Optimization and hyperparameters
We used the Adam optimizer [98] with the following parameters: . The batch size was 16 for learning a phase mask, and 4 for learning a recovery net (due to GPU memory). The learning rate was reduced by a factor of 10 when the loss plateaus for more than 5 epochs, and training was stopped if no improvement was observed for more than 7 epochs, or alternatively a maximum number of 50 epochs was reached. The initial weights were sampled from a uniform distribution on the interval where , with the filter spatial dimensions, and the number of input channels to the convolutional layer. Training and evaluation were run on a workstation equipped with 32 GB of memory, an Intel(R) Core(TM) , 3.20 GHz CPU, and a NVidia GeForce Titan Xp GPU with 12 GB of video memory. Phase mask learning took h, and recovery net training took h. Our code is implemented using the Pytorch framework [109], and soon will be made publicly available at https://github.com/EliasNehme/DeepNebulae.
Postprocessing
The fully convolutional architecture that we adopted in this work outputs a superresolved 3D volume, where occupied voxels account for emitters. To compile a list of localizations, we first threshold this volume keeping only voxels with a minimal confidence of 80 (maximal output is 800). Afterwards, out of the remaining localizations we discard those which are not local maximas in their 3D vicinity. The radius used for grouping and local maxima finding was 100 nm. Lastly, the recovered continuous 3D position is given by applying the 3D Center of Gravity (CoG) estimator to the vicinity of the local maximas in the prediction volume. While it is possible to use more sophisticated postprocessing steps we choose to use this simple and efficient strategy to keep our method as fast as possible. In our implementation we write these steps as a composition of pooling and convolution operations, making calculations extremely efficient on GPU.
Notably, While grouping and local maxima finding potentially limits the maximal density, keep in mind that overlaps in 2D normally translates to nonoverlapping “blobs” in 3D. Hence, this is hardly a limitation in common imaging conditions as localization algorithms struggle considerably before reaching this limit.
In the telomere tracking experiment, the perframe localizations were linked using DBSCAN clustering [110] applied directly to the 3D positions. The maximum distance allowed between points was , and the minimal number of emitters per cluster was minPts=25. This resulted in filtering 83 localizations out of 24530 throughout the 500 frames, i.e. less than 0.3%. All tracks started within the first 6 frames and were relatively clustered in 3D with no bifurcations observed. For more complicated tracking scenarios the reader is encouraged to link the CNN localizations by resorting to a more robust tracking software such as [111].
a.5 Experimental implementation
This section details the full experimental procedure to localize emitters using snapshot measurements from the dualview setup. An outline of a typical experiment is presented in Fig. S3. The following subsections detail each part of the experiment for completeness.
Dual channel calibration
The goal of this section is to describe the process of calibrating the proposed dualcamera system, such that simulated PSFs will match measured data and their positions will correspond between the two images. The practice of aligning an optical 4 Fourier processing system, calibrating a LCSLM, and creating a simulated model for a single channel has been meticulously explained in many previous works (e.g. [112]).
The proposed system consists of two identical optical paths which generate 3D PSF images. The acquired images are encoded simultaneously in the localization network, and thus pose some extra challenges in the calibration process, specifically with respect to their spatial alignment. In our work, postprocessing corrections are not a viable option due to the density of PSFs, necessitating a good calibration of the 3D alignment. For this end, we created two calibration samples (sparse and dense) consisting of a watercovered glass coverslip (Fisher Scientific) with 40 nm fluorescent beads (FluoSpheres (580/605), ThermoFisher) adhered to the surface with 1% PVA. The dense sample was chosen such that the unmodulated PSFs will cover the entire field of view (FOV) but each individual bead can still be fit using ThunderSTORM [113]. The localizations from each channel were used to estimate an affine transformation between the two cameras (Fig. S4). To prevent outliers from biasing the transformation, we implemented a Random sample consensus (RANSAC) procedure.
Next, the sparse sample is chosen such that each slice of the 3D PSFs (for both channels) can be imaged without any overlaps from neighboring emitters. An axial scan is performed to ensure that both channels measure corresponding PSFs at the same focal plane positions, to account for any minor axial misalignment between the two cameras. The point of reference (lateral) was chosen as the center of gravity of the maximum projection in one of the channels.
The reference point of the second channel was calculated using the aforementioned affine transformation. Next, we used VIPR [72] to generate a phase mask for each channel, as it provides with a good model and accounts for the issue of wobble and near field effects by implementing the vectorial diffraction model. Importantly, while the affine transformation is calculated using localizations and not images, ultimately the input to the localization network is an imagepair. However, since a global affine transformation is not a shiftinvariant operator, a fully convolutional model will struggle to learn this operator efficiently. Therefore, at test time, we warp the image of one camera to align with its counterpart, and feed the aligned concatenated image pair to the network. The warping operation is implemented using cubicspline interpolation.
To test the importance of image alignment, we trained three different models: (1) with perfectly aligned positions, (2) with randomly misaligned positions (achieve by sampling portions of the estimated transformation), and (3) with misaligned positions accompanied by a known transformation between channels (up to 50 nm) that is used to warp the images. Three conclusions can be made based on the results (Fig. S5): (1) it is clear that the model is unable to efficiently cope with a random global transform, (2) calibrating the affine transform up to 50 nm errors and warping the images prior to localization improves performance, and (3) perfect alignment of the Tetrapod and the EDOF PSFs does not improve the axial localization precision. The latter is expected because the axial information is decoded solely based on the Tetrapod channel. Therefore, it is insensitive to the alignment with the EDOF PSF which does not encode .
Optical components
The imaging system in Fig. 1 consists of a Nikon EclipseTi inverted fluorescence microscope with a 100X/1.49 NA Nikon objective (CFI SR HP Apo TIRF 100XC). A polarizing beam splitter was placed after the first achromatic doublet lens (f=15 cm) to split the emission path. Both paths consisted of three additional achromatic doubles lenses to image the back focal plane onto a LCSLM (PlutoVIS020, Holoeye in the first path, and 1920X1152 liquid crystal on silicon, Meadowlark in the second). After a last imageforming lens, the modulated images were recorded by two sCMOS cameras (Prime 95B, Photometrics). For full synchronization, the first camera triggered the second camera (in a leaderfollower configuration), which in turn triggered the 561 nm illumination laser (iChrome MLE, Toptica).
Biological sample preparation
For cell experiments, U2OS cells were prepared as described previously in [21]. In brief, cells were grown in standard conditions: , 5% in Dulbecco‘s Modified Eagle Media (DMEM  without phenol red for the live cells experiment) with 1 Dglucose (low glucose), supplemented with 10% fetal bovine serum, and 1% penicillin–streptomycin and glutamine. To fluorescently label the telomeres, cells were transfected with a plasmid encoding the fluorescently tagged telomeric repeat binding factor 1 (DsRedhTRF1) using Lipofectamine 3000 (Thermo Fisher Scientific). After 2024 hours, cells were either fixed with 4% paraformaldehyde for 20 min, washed three times with PBS and mounted to a slide ( , 170 thick) with mounting medium; or imaged live in a temperature, humidity, and gasmixture controlled imaging chamber mounted to the microscope (Okolab) on a glass bottom culture dish (15mm, 180 thick).
Ground truth estimation
In fixed cell experiments, the experimental ground truth 3D positions were approximated via axial scanning with the unmodulated PSF (Fig. S6). The scan consisted of 100 nm steps over a range of 45 . The resulting zstack was then processed in the following manner: first, detection and lateral position estimation were performed with ThunderSTORM [113]. Next, the infocus position of emitters was estimated by fitting a second order polynomial to the mean intensity across focal slices. The mean intensity was calculated as the mean of number of counts in the central pixels of each detected PSF. The emitter axial position was obtained by correcting the detected infocus position with a factor of 0.8 accounting for refractive index mismatch. Since VIPR [72] accounts for the 3D wobble of the modulated PSFs, the final required correction was a global lateral shift between the infocus PSF and the chosen modulatedPSF center in the phase retrieval step.
a.6 Additional simulation results
In this section we present further numerical simulation results which support conclusions from the main text and the choice of the PSF pair. The first result presented in Figs. S7 and S8 shows a numerical comparison between singlechannel and dualchannel setups in terms of their detection (measured by the Jaccard index) and the average precision (measured by the lateral\axial RMSE). We compare the TetrapodEDOF (blue) pair to the commonly used biplane (cyan) method [5, 4] and to two singlechannel approaches with double signal, namely the Tetrapod PSF (red) and the single channel endtoend optimized phase mask (orange) adopted from DeepSTORM3D [21]. The numerical results show that the TetrapodEDOF pair is the best in detection. In terms of the lateral RMSE in high densities, the biplane approach is better as the infocus PSF is more photon efficient than the EDOF. The axial RMSE result shows that the proposed pair is surpassed only by the endtoend encoding of a single channel. This is likely because the axial position is mainly encoded in the Tetrapod path, thus is limited to the axial localization performance of the single channel Tetrapod at high densities. These results reinforce the decision to explore other solutions which mutually encode all parameters in both channels, and are optimal for detection and localization.
The second result in Fig. S9 is a comparison between the three proposed PSF pairs in this manuscript. Both the detection and the average precision support our claim that the Nebulae PSFs (green) are better than the CRLB (black) and TetrapodEDOF pairs (blue). A similar conclusion was drawn from the experimental results in fixed cells, which ultimately supports our decision to use the Nebulae PSFs for live cell tracking. Comparing all the tested metrics, we can see that at every density, the Nebulae PSFs are undoubtedly the best choice out of the three.
a.7 Additional experimental results
This section presents more experimental results in fixed cell data. Figure S10 explores the false negatives presented in 3. All of the experimentally undetected points were with a very low signal. While the EDOF performs well in 2D, it is not as signal efficient as the infocus unmodulated PSF. Thus, emitters which are slightly above the noise limit (without a phase mask) can be detected in the axial scan but are invisible for the EDOF and Tetrapod PSFs. This was improved in the subsequent PSFpairs which complement each other more efficiently.
To validate our conclusions from simulation regarding the Nebulae PSFs being the optimal pair, we have shown in Fig. S11 that the Nebulae PSFs are outperform the TetrapodEDOF pair. For completeness, we show in Fig. S11 the results including the CRLBpair for the same cell. As predicted in simulations, the CRLB pair performs slightly worse than the Nebulae PSFs but better than the TetrapodEDOF pair. To verify reproducibility, we present in Fig. S12 similar experimental results for a bigger cell, which exhibits a staggering number of 142 emitters. The reconstruction results are improved for all PSF pairs as this cell experiences less overlaps, yet, they are consistent with the previous conclusions on PSFpair performance.
References
 E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. LippincottSchwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science, vol. 313, no. 5793, pp. 1642–1645, 2006.
 S. T. Hess, T. P. Girirajan, and M. D. Mason, “Ultrahigh resolution imaging by fluorescence photoactivation localization microscopy,” Biophysical journal, vol. 91, no. 11, pp. 4258–4272, 2006.
 M. J. Rust, M. Bates, and X. Zhuang, “Subdiffractionlimit imaging by stochastic optical reconstruction microscopy (storm),” Nature methods, vol. 3, no. 10, pp. 793–796, 2006.
 S. Ram, P. Prabhat, J. Chao, E. S. Ward, and R. J. Ober, “High accuracy 3d quantum dot tracking with multifocal plane microscopy for the study of fast intracellular dynamics in live cells,” Biophysical journal, vol. 95, no. 12, pp. 6025–6043, 2008.
 M. F. Juette, T. J. Gould, M. D. Lessard, M. J. Mlodzianoski, B. S. Nagpure, B. T. Bennett, S. T. Hess, and J. Bewersdorf, “Threedimensional sub–100 nm resolution fluorescence microscopy of thick samples,” Nature methods, vol. 5, no. 6, pp. 527–529, 2008.
 B. Louis, R. Camacho, R. BresolíObach, S. Abakumov, J. Vandaele, T. Kudo, H. Masuhara, I. G. Scheblykin, J. Hofkens, and S. Rocha, “Fasttracking of single emitters in large volumes with nanometer precision,” Optics Express, vol. 28, no. 19, pp. 28656–28671, 2020.
 B. Huang, W. Wang, M. Bates, and X. Zhuang, “Threedimensional superresolution imaging by stochastic optical reconstruction microscopy,” Science, vol. 319, no. 5864, pp. 810–813, 2008.
 S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. Moerner, “Threedimensional, singlemolecule fluorescence imaging beyond the diffraction limit by using a doublehelix point spread function,” Proceedings of the National Academy of Sciences, vol. 106, no. 9, pp. 2995–2999, 2009.
 Y. Shechtman, S. J. Sahl, A. S. Backer, and W. Moerner, “Optimal point spread function design for 3d imaging,” Physical review letters, vol. 113, no. 13, p. 133902, 2014.
 A. S. Backer and W. Moerner, “Extending singlemolecule microscopy using optical fourier processing,” The Journal of Physical Chemistry B, vol. 118, no. 28, pp. 8313–8329, 2014.
 Y. Shechtman, L. E. Weiss, A. S. Backer, S. J. Sahl, and W. Moerner, “Precise threedimensional scanfree multipleparticle tracking over large axial ranges with tetrapod point spread functions,” Nano letters, vol. 15, no. 6, pp. 4194–4199, 2015.
 A. Aristov, B. Lelandais, E. Rensen, and C. Zimmer, “Zola3d allows flexible 3d localization microscopy over an adjustable axial range,” Nature communications, vol. 9, no. 1, p. 2409, 2018.
 D. Sage, T.A. Pham, H. Babcock, T. Lukes, T. Pengo, J. Chao, R. Velmurugan, A. Herbert, A. Agrawal, S. Colabrese, et al., “Superresolution fight club: assessment of 2d and 3d singlemolecule localization microscopy software,” Nature methods, vol. 16, no. 5, p. 387, 2019.
 L. Möckl, A. R. Roy, and W. Moerner, “Deep learning in singlemolecule microscopy: fundamentals, caveats, and recent developments,” Biomedical Optics Express, vol. 11, no. 3, pp. 1633–1661, 2020.
 W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates superresolution localization microscopy,” Nature biotechnology, 2018.
 S. K. Gaire, Y. Zhang, H. Li, R. Yu, H. F. Zhang, and L. Ying, “Accelerating multicolor spectroscopic singlemolecule localization microscopy using deep learning,” Biomedical Optics Express, vol. 11, no. 5, pp. 2705–2721, 2020.
 E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deepstorm: superresolution singlemolecule microscopy by deep learning,” Optica, vol. 5, no. 4, pp. 458–464, 2018.
 N. Boyd, E. Jonas, H. P. Babcock, and B. Recht, “Deeploco: Fast 3d localization microscopy using neural networks,” BioRxiv, p. 267096, 2018.
 J. M. Newby, A. M. Schaefer, P. T. Lee, M. G. Forest, and S. K. Lai, “Convolutional neural networks automate detection for tracking of submicronscale particles in 2d and 3d,” Proceedings of the National Academy of Sciences, vol. 115, no. 36, pp. 9026–9031, 2018.
 B. Diederich, P. Then, A. Jügler, R. Förster, and R. Heintzmann, “cellstorm—costeffective superresolution on a cellphone using dstorm,” PloS one, vol. 14, no. 1, p. e0209827, 2019.
 E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, and Y. Shechtman, “Deepstorm3d: dense 3d localization microscopy and psf design by deep learning,” Nature Methods, vol. 17, no. 7, pp. 734–740, 2020.
 A. Speiser, S. C. Turaga, and J. H. Macke, “Teaching deep neural networks to localize sources in superresolution microscopy by combining simulationbased learning and unsupervised learning,” arXiv preprint arXiv:1907.00770, 2019.
 R. Barth, K. Bystricky, and H. Shaban, “Coupling chromatin structure and dynamics by live superresolution imaging,” bioRxiv, p. 777482, 2019.
 G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica, vol. 6, no. 8, pp. 921–943, 2019.
 G. Ongie, A. Jalal, C. A. M. R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning techniques for inverse problems in imaging,” IEEE Journal on Selected Areas in Information Theory, 2020.
 T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald, et al., “Unet: deep learning for cell counting, detection, and morphometry,” Nature methods, vol. 16, no. 1, pp. 67–70, 2019.
 Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica, vol. 4, no. 11, pp. 1437–1443, 2017.
 M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, et al., “Contentaware image restoration: pushing the limits of fluorescence microscopy,” Nature methods, vol. 15, no. 12, p. 1090, 2018.
 A. Krull, T.O. Buchholz, and F. Jug, “Noise2voidlearning denoising from single noisy images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137, 2019.
 S. Lim, H. Park, S.E. Lee, S. Chang, B. Sim, and J. C. Ye, “Cyclegan with a blur kernel for deconvolution microscopy: Optimal transport geometry,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1127–1138, 2020.
 R. Horstmeyer, R. Y. Chen, B. Kappes, and B. Judkewitz, “Convolutional neural networks that teach microscopes how to image,” arXiv preprint arXiv:1709.07223, 2017.
 A. Muthumbi, A. Chaware, K. Kim, K. C. Zhou, P. C. Konda, R. Chen, B. Judkewitz, A. Erdmann, B. Kappes, and R. Horstmeyer, “Learned sensing: jointly optimized microscope hardware for accurate image classification,” Biomedical Optics Express, vol. 10, no. 12, pp. 6351–6369, 2019.
 C. Ounkomol, S. Seshamani, M. M. Maleckar, F. Collman, and G. R. Johnson, “Labelfree prediction of threedimensional fluorescence images from transmittedlight microscopy,” Nature methods, vol. 15, no. 11, pp. 917–920, 2018.
 Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Science & Applications, vol. 7, no. 2, p. 17141, 2018.
 T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for fourier ptychography microscopy,” Optics express, vol. 26, no. 20, pp. 26470–26484, 2018.
 Y. Xue, S. Cheng, Y. Li, and L. Tian, “Reliable deeplearningbased phase imaging with uncertainty quantification,” Optica, vol. 6, no. 5, pp. 618–629, 2019.
 Z. Wu, Y. Sun, A. Matlock, J. Liu, L. Tian, and U. S. Kamilov, “Simba: scalable inversion in optical tomography using deep denoising priors,” IEEE Journal of Selected Topics in Signal Processing, 2020.
 J. T. Smith, R. Yao, N. Sinsuebphon, A. Rudkouskaya, J. Mazurkiewicz, M. Barroso, P. Yan, and X. Intes, “Ultrafast fitfree analysis of complex fluorescence lifetime imaging via deep learning,” bioRxiv, p. 523928, 2019.
 P. Zelger, K. Kaser, B. Rossboth, L. Velas, G. Schütz, and A. Jesacher, “Threedimensional localization microscopy using deep learning,” Optics express, vol. 26, no. 25, pp. 33166–33179, 2018.
 P. Zhang, S. Liu, A. Chaurasia, D. Ma, M. J. Mlodzianoski, E. Culurciello, and F. Huang, “Analyzing complex singlemolecule emission patterns with deep learning,” Nature methods, vol. 15, no. 11, p. 913, 2018.
 E. Hershko, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Multicolor localization microscopy and pointspreadfunction engineering by deep learning,” Optics express, vol. 27, no. 5, pp. 6158–6183, 2019.
 G. DardikmanYoffe and Y. C. Eldar, “Learned sparcom: Unfolded deep superresolution microscopy,” arXiv preprint arXiv:2004.09270, 2020.
 L. Möckl, P. N. Petrov, and W. Moerner, “Accurate phase retrieval of complex 3d point spread functions with deep residual neural networks,” Applied Physics Letters, vol. 115, no. 25, p. 251106, 2019.
 L. Möckl, A. R. Roy, P. N. Petrov, and W. Moerner, “Accurate and rapid background estimation in singlemolecule localization microscopy using the deep neural network bgnet,” Proceedings of the National Academy of Sciences, vol. 117, no. 1, pp. 60–67, 2020.
 D. Saha, U. Schmidt, Q. Zhang, A. Barbotin, Q. Hu, N. Ji, M. J. Booth, M. Weigert, and E. W. Myers, “Practical sensorless aberration estimation for 3d microscopy with deep learning,” Optics Express, vol. 28, no. 20, pp. 29044–29053, 2020.
 A. Shajkofci and M. Liebling, “Spatiallyvariant cnnbased point spread function estimation for blind deconvolution and depth estimation in optical microscopy,” IEEE Transactions on Image Processing, vol. 29, pp. 5848–5861, 2020.
 H. Gupta, M. T. McCann, L. Donati, and M. Unser, “Cryogan: A new reconstruction paradigm for singleparticle cryoem via deep adversarial learning,” BioRxiv, 2020.
 C. Belthangady and L. A. Royer, “Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction,” Nature methods, pp. 1–11, 2019.
 A. Chakrabarti, “Learning sensor multiplexing design through backpropagation,” in Advances in Neural Information Processing Systems, pp. 3081–3089, 2016.
 E. Schwartz, R. Giryes, and A. M. Bronstein, “Deepisp: Toward learning an endtoend image processing pipeline,” IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 912–923, 2018.
 A. Turpin, I. Vishniakou, and J. d Seelig, “Light scattering control in transmission and reflection with neural networks,” Optics express, vol. 26, no. 23, pp. 30911–30929, 2018.
 S. Elmalem, R. Giryes, and E. Marom, “Learned phase coded aperture for the benefit of depth of field extension,” Optics express, vol. 26, no. 12, pp. 15316–15331, 2018.
 V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “Endtoend optimization of optics and image processing for achromatic extended depth of field and superresolution imaging,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, p. 114, 2018.
 U. Akpinar, E. Sahin, and A. Gotchev, “Learning wavefront coding for extended depth of field imaging,” arXiv preprint arXiv:1912.13423, 2019.
 Y. Wu, V. Boominathan, H. Chen, A. Sankaranarayanan, and A. Veeraraghavan, “Phasecam3d—learning phase masks for passive single view depth estimation,” in 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–12, IEEE, 2019.
 J. Chang and G. Wetzstein, “Deep optics for monocular depth estimation and 3d object detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 10193–10202, 2019.
 C. A. Metzler, H. Ikoma, Y. Peng, and G. Wetzstein, “Deep optics for singleshot highdynamicrange imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1375–1385, 2020.
 Q. Sun, E. Tseng, Q. Fu, W. Heidrich, and F. Heide, “Learning rank1 diffractive optics for singleshot high dynamic range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1386–1396, 2020.
 X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive achromat for fullspectrum computational imaging,” Optica, vol. 7, no. 8, pp. 913–922, 2020.
 S.H. Baek, H. Ikoma, D. S. Jeon, Y. Li, W. Heidrich, G. Wetzstein, and M. H. Kim, “Endtoend hyperspectraldepth imaging with learned diffractive optics,” arXiv preprint arXiv:2009.00463, 2020.
 M. Kellman, E. Bostan, M. Chen, and L. Waller, “Datadriven design for fourier ptychographic microscopy,” in 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–8, IEEE, 2019.
 H. Pinkard, H. Baghdassarian, A. Mujal, E. Roberts, K. H. Hu, D. H. Friedman, I. Malenica, T. Shagam, A. Fries, K. Corbin, et al., “Learned adaptive multiphoton illumination microscopy,” bioRxiv, 2020.
 M. D. Lew, S. F. Lee, M. Badieirostami, and W. Moerner, “Corkscrew point spread function for farfield threedimensional nanoscale localization of pointlike objects,” Optics letters, vol. 36, no. 2, pp. 202–204, 2011.
 M. P. Backlund, M. D. Lew, A. S. Backer, S. J. Sahl, G. Grover, A. Agrawal, R. Piestun, and W. Moerner, “Simultaneous, accurate measurement of the 3d position and orientation of single molecules,” Proceedings of the National Academy of Sciences, vol. 109, no. 47, pp. 19087–19092, 2012.
 C. Roider, A. Jesacher, S. Bernet, and M. RitschMarte, “Axial superlocalisation using rotating point spread functions shaped by polarisationdependent phase modulation,” Optics express, vol. 22, no. 4, pp. 4029–4037, 2014.
 J. Min, S. J. Holden, L. Carlini, M. Unser, S. Manley, and J. C. Ye, “3d highdensity localization microscopy using hybrid astigmatic/biplane imaging and sparse image reconstruction,” Biomedical optics express, vol. 5, no. 11, pp. 3935–3948, 2014.
 C. Cabriel, N. Bourg, P. Jouchet, G. Dupuis, C. Leterrier, A. Baron, M.A. BadetDenisot, B. Vauzeilles, E. Fort, and S. LevequeFort, “Combining 3d single molecule localization strategies for reproducible bioimaging,” Nature communications, vol. 10, no. 1, pp. 1–10, 2019.
 N. Bourg, C. Mayet, G. Dupuis, T. Barroca, P. Bon, S. Lécart, E. Fort, and S. LévêqueFort, “Direct optical nanoscopy with axially localized detection,” Nature Photonics, vol. 9, no. 9, pp. 587–593, 2015.
 J. W. Goodman, Introduction to Fourier optics. Roberts and Company Publishers, 2005.
 E. BenEliezer, E. Marom, N. Konforti, and Z. Zalevsky, “Experimental realization of an imaging system with an extended depth of field,” Applied Optics, vol. 44, no. 14, pp. 2792–2798, 2005.
 E. R. Dowski and W. T. Cathey, “Extended depth of field through wavefront coding,” Applied optics, vol. 34, no. 11, pp. 1859–1866, 1995.
 B. Ferdman, E. Nehme, L. E. Weiss, R. Orange, O. Alalouf, and Y. Shechtman, “Vipr: Vectorial implementation of phase retrieval for fast and accurate microscopic pixelwise pupil estimation,” Optics Express, vol. 28, no. 7, pp. 10179–10198, 2020.
 S. M. Kay, Fundamentals of statistical signal processing. Prentice Hall PTR, 1993.
 R. J. Ober, S. Ram, and E. S. Ward, “Localization accuracy in singlemolecule microscopy,” Biophysical journal, vol. 86, no. 2, pp. 1185–1200, 2004.
 L. E. Weiss, T. Naor, and Y. Shechtman, “Observing dna in live cells,” Biochemical Society Transactions, vol. 46, no. 3, pp. 729–740, 2018.
 I. Bronshtein, E. Kepten, I. Kanter, S. Berezin, M. Lindner, A. B. Redwood, S. Mai, S. Gonzalo, R. Foisner, Y. ShavTal, et al., “Loss of lamin a function increases chromatin dynamics in the nuclear interior,” Nature communications, vol. 6, p. 8044, 2015.
 L. E. Weiss, Y. S. Ezra, S. Goldberg, B. Ferdman, O. Adir, A. Schroeder, O. Alalouf, and Y. Shechtman, “Threedimensional localization microscopy in live flowing cells,” Nature Nanotechnology, pp. 1–7, 2020.
 Y. Shechtman, L. E. Weiss, A. S. Backer, M. Y. Lee, and W. Moerner, “Multicolour localization microscopy by pointspreadfunction engineering,” Nature photonics, vol. 10, no. 9, p. 590, 2016.
 Y. Y. Schechner, R. Piestun, and J. Shamir, “Wave propagation with rotating intensity distributions,” Physical Review E, vol. 54, no. 1, p. R50, 1996.
 A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM transactions on graphics (TOG), vol. 26, no. 3, pp. 70–es, 2007.
 F. Balzarotti, Y. Eilers, K. C. Gwosch, A. H. Gynnå, V. Westphal, F. D. Stefani, J. Elf, and S. W. Hell, “Nanometer resolution imaging and tracking of fluorescent molecules with minimal photon fluxes,” Science, vol. 355, no. 6325, pp. 606–612, 2017.
 K. C. Gwosch, J. K. Pape, F. Balzarotti, P. Hoess, J. Ellenberg, J. Ries, and S. W. Hell, “Minflux nanoscopy delivers 3d multicolor nanometer resolution in cells,” Nature methods, vol. 17, no. 2, pp. 217–224, 2020.
 M. J. Amin, S. Petry, J. W. Shaevitz, and H. Yang, “Localization precision in chromatic multifocal imaging,” arXiv preprint arXiv:2008.10488, 2020.
 H. Ikoma, Y. Peng, M. Broxton, and G. Wetzstein, “Snapshot multipsf 3d singlemolecule localization microscopy using deep learning,” in Computational Optical Sensing and Imaging, pp. CW3B–3, Optical Society of America, 2020.
 W. Ouyang, F. Mueller, M. Hjelmare, E. Lundberg, and C. Zimmer, “Imjoy: an opensource computational platform for the deep learning era,” Nature methods, vol. 16, no. 12, pp. 1199–1200, 2019.
 E. Gómezde Mariscal, C. GarcíaLópezde Haro, L. Donati, M. Unser, A. MuñozBarrutia, and D. Sage, “Deepimagej: A userfriendly plugin to run deep learning models in imagej,” bioRxiv, p. 799270, 2019.
 L. Von Chamier, J. Jukkala, C. Spahn, M. Lerche, S. HernándezPérez, P. Mattila, E. Karinou, S. Holden, A. C. Solak, A. Krull, et al., “Zerocostdl4mic: an open platform to simplify access and use of deeplearning in microscopy,” BioRxiv, 2020.
 C. Zhou, S. Lin, and S. Nayar, “Coded aperture pairs for depth from defocus,” in 2009 IEEE 12th international conference on computer vision, pp. 325–332, IEEE, 2009.
 C. Zhou and S. Nayar, “What are good apertures for defocus deblurring?,” in 2009 IEEE international conference on computational photography (ICCP), pp. 1–8, IEEE, 2009.
 C. Zhou, S. Lin, and S. K. Nayar, “Coded aperture pairs for depth from defocus and defocus deblurring,” International journal of computer vision, vol. 93, no. 1, pp. 53–72, 2011.
 A. Levin, “Analyzing depth from coded aperture sets,” in European Conference on Computer Vision, pp. 214–227, Springer, 2010.
 Y. Takeda, S. Hiura, and K. Sato, “Fusing depth from defocus and stereo with coded apertures,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 209–216, 2013.
 Y. Gil, S. Elmalem, H. Haim, E. Marom, and R. Giryes, “Monster: Awakening the mono in stereo,” arXiv preprint arXiv:1910.13708, 2019.
 S. Hell, G. Reiner, C. Cremer, and E. H. Stelzer, “Aberrations in confocal fluorescence microscopy induced by mismatches in refractive index,” Journal of microscopy, vol. 169, no. 3, pp. 391–405, 1993.
 D. Axelrod, “Fluorescence excitation and imaging of single molecules near dielectriccoated and bare surfaces: a theoretical study,” Journal of microscopy, vol. 247, no. 2, pp. 147–160, 2012.
 D. P. Kingma and M. Welling, “Autoencoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
 P. Dufour, M. Piché, Y. De Koninck, and N. McCarthy, “Twophoton excitation fluorescence microscopy with a high depth of field using an axicon,” Applied optics, vol. 45, no. 36, pp. 9246–9252, 2006.
 D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 L. Xiao and T. Zhang, “A proximal stochastic gradient method with progressive variance reduction,” SIAM Journal on Optimization, vol. 24, no. 4, pp. 2057–2075, 2014.
 T. Dozat, “Incorporating nesterov momentum into adam,” 2016.
 L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” arXiv preprint arXiv:1908.03265, 2019.
 S. Gopal, “Adaptive sampling for sgd by exploiting side information,” in International Conference on Machine Learning, pp. 364–372, 2016.
 S. Moser, M. RitschMarte, and G. Thalhammer, “Modelbased compensation of pixel crosstalk in liquid crystal spatial light modulators,” Optics express, vol. 27, no. 18, pp. 25046–25063, 2019.
 A. LeNail, “Nnsvg: Publicationready neural network architecture schematics,” Journal of Open Source Software, vol. 4, no. 33, p. 747, 2019.
 F. Yu and V. Koltun, “Multiscale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
 K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
 A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, vol. 1, no. 10, p. e3, 2016.
 M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” in Advances in neural information processing systems, pp. 2017–2025, 2015.
 A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
 M. Ester, H.P. Kriegel, J. Sander, X. Xu, et al., “A densitybased algorithm for discovering clusters in large spatial databases with noise.,” in Kdd, vol. 96, pp. 226–231, 1996.
 J.Y. Tinevez, N. Perry, J. Schindelin, G. M. Hoopes, G. D. Reynolds, E. Laplantine, S. Y. Bednarek, S. L. Shorte, and K. W. Eliceiri, “Trackmate: An open and extensible platform for singleparticle tracking,” Methods, vol. 115, pp. 80–90, 2017.
 M. Siemons, C. Hulleman, R. Thorsen, C. Smith, and S. Stallinga, “High precision wavefront control in point spread function engineering for single emitter localization,” Optics express, vol. 26, no. 7, pp. 8397–8416, 2018.
 M. Ovesnỳ, P. Křížek, J. Borkovec, Z. Švindrych, and G. M. Hagen, “Thunderstorm: a comprehensive imagej plugin for palm and storm data analysis and superresolution imaging,” Bioinformatics, vol. 30, no. 16, pp. 2389–2390, 2014.