Learning an optimal PSF-pair for ultra-dense 3D localization microscopy

Learning an optimal PSF-pair for ultra-dense 3D localization microscopy


A long-standing challenge in multiple-particle-tracking is the accurate and precise 3D localization of individual particles at close proximity. One established approach for snapshot 3D imaging is point-spread-function (PSF) engineering, in which the PSF is modified to encode the axial information. However, engineered PSFs are challenging to localize at high densities due to lateral PSF overlaps. Here we suggest using multiple PSFs simultaneously to help overcome this challenge, and investigate the problem of engineering multiple PSFs for dense 3D localization. We implement our approach using a bifurcated optical system that modifies two separate PSFs, and design the PSFs using three different approaches including end-to-end learning. We demonstrate our approach experimentally by volumetric imaging of fluorescently labelled telomeres in cells.

1 Introduction

In a conventional imaging system, the spatial resolution is bounded by Abbe’s diffraction limit. In a high numerical aperture microscope, this corresponds to approximately half the optical wavelength, i.e200 nm for visible light. For cell-imaging applications, this obscures subcellular features of interest with dimensions on the nanoscale. Since 2006, Single-Molecule Localization Microscopy (SMLM) super-resolution techniques have revolutionized biological-structure imaging by circumventing the diffraction limit, namely, using many low-density images of different sets of fluorescent emitters to generate a high-resolution reconstruction [1, 2, 3].

While biological structures are intrinsically 3D, attaining axial (z) information at super-resolution is not trivial. This is due to the standard Point Spread Function (PSF) of the microscope being approximately symmetric about the focal plane, and having only a thin axial range before the signal becomes very diffuse. Several approaches have been developed to capture 3D data in microscopy. For example, one can acquire multiple 2D datasets at different focal planes [4, 5, 6], or determine the axial positions of emitters from the images themselves. The latter can be enabled by PSF engineering, where the PSF is modified to encode the desired 3D information. This is typically done by either inducing an intentional aberration in the imaging path, e.g. a cylindrical lens [7] or a phase mask at the Fourier plane of the microscope using an extended optical system [8, 9, 10]. Notably, while providing scan-free axial information, this approach poses a limitation on the maximum emitter densities suitable for imaging, due to the increased lateral size of the PSFs, and requires more complex image-analysis algorithms than 2D localizaiton microscopy.

When imaging samples that are even just several microns thick, engineered PSFs spread the signal photons over a large lateral footprint relative to the in-focus PSF [11]. This poses a difficult localization challenge when the experimental objective of obtaining a super-resolution reconstruction necessitates that many molecules be localized in a densely labelled structure. Currently available software packages struggle to achieve good performance in this regime [12, 13]; however, recent work has shown that deep neural networks are well suited to the problem [14], enabling high-quality reconstruction estimations from low emitter densities [15, 16], and increased-density processing [17, 18, 19, 20, 21, 22, 23].

Figure 1: The multi-PSF optical system. (a) A standard inverted microscope with laser illumination. (b-c) Two image planes, split by their polarization, employing two LC-SLMs placed in conjugate back focal planes to the objective lens. Each optical path can be modulated with a different phase mask ( & ). (d) A comparison between the standard PSF (left) and a 4 Tetrapod PSF (right).

Deep Learning (DL) has excelled in a variety of challenging computational-imaging problems in computer vision, computational photography, medical imaging, and microscopy [24, 25]. Within the realm of computational microscopy, DL has been deployed for tasks such as cell segmentation [26], image restoration [27, 28, 29, 30], sample classification [31, 32], artificial labelling [33], phase imaging [34, 35, 36], optical tomography [37], lifetime imaging [38], single-molecule localization [17, 18, 15, 19, 39, 40, 41, 20, 21, 22, 23, 16, 42], aberration correction [43, 44, 45, 46], CryoEM [47], and more [48].

An exciting recent application enabled by deep learning is the end-to-end design of “computational cameras.” Powered by differentiable imaging models and back-propagation, end-to-end learning jointly optimizes the sensing system alongside the data-processing algorithm, thus enabling both components to work harmoniously. This approach has quickly expanded within the computational-imaging community for numerous applications in computer vision and computational photography, for example, color sensing and demosaicing [49, 50], illumination-design through scattering media [51], extended-depth-of-field imaging [52, 53, 54], monocular depth estimation [53, 52, 55, 56], high-dynamic-range imaging [57, 58], and hyper-spectral imaging [59, 60]. In computational microscopy, end-to-end learning has been utilized by our group and others to enhance various computational modalities such as sample classification [31, 32], single-molecule color sensing and 3D localization [41, 21], quantitative phase imaging [61] and multi-photon microscopy [62].

Here, to address the challenge of high density 3D localization from a snapshot, we suggest the simultaneous use of multiple PSFs, as well as the method to design and implement the optimal phase masks. Specifically, we introduce a bifurcated optical system that modifies two separate PSFs with a pair of phase masks using Liquid-Crystal Spatial Light Modulators (LC-SLMs). First, we demonstrate that there is an advantage of splitting precious signal photons into two channels compared to a single PSF system even in moderately dense emitter conditions. For this task we utilize a PSF-pair that splits the 3D information into complementary channels, namely, for lateral and axial localization. To localize the emitters from the obtained pair of images we employed a convolutional neural network (CNN) architecture based on DeepSTORM3D [21]. Next, we revisit the problem of optimizing the information content of a single emitter in a pair of PSF measurements [9]. Lastly, we implement end-to-end learning to jointly design our localization algorithm and the PSF-pair. The resulting PSFs, which we call the Nebulae PSFs, achieve unprecedented performance in localizing volumes of dense emitters in 3D. We quantify and directly compare the performance of each approach by simulation and experimentally with volumetric imaging of fluorescently labelled telomeres in fixed cells. Finally, we demonstrate continuous, scan-free, live-cell tracking of 60 telomeres in a single cell’s nucleus simultaneously with 30 nm 3D precision and 100 ms temporal resolution over an axial range of 5 .

2 Optical setup

Dual-camera systems have been utilized in the past in microscopy for localizing single emitters in 3D [63, 64, 65, 66]. Most recently, the use of a dual-view scheme was utilized in DAISY [67] to combine Astigmatism-based PSF engineering with Super-critical Angle Fluorescence (SAF) [68] to provide a semi-isotropic 3D resolution over a 1 axial range. However, while these works proposed creative designs to combine the information in both channels, their objective was to enable a precise and experimentally-robust axial localization of single emitters. In addition, the proposed PSFs were hand-crafted based on desired properties and not fully optimized. Here we use a bifurcated optical system with two detection paths for the task of precise 3d localization of multiple emitters in ultra-dense samples.

The optical system used to implement the monocular PSF-pair is presented in Fig. 1. Briefly, our system is composed of an epifluorescence microscope extended with two identical detection paths. The fluorescent light emitted from the particle in the sample is split using a polarizing beam splitter into two 4 optical processing systems, each equipped with a LC-SLM placed in the Fourier plane. The LC-SLM is used to implement a phase modulation modifying the emission pattern to encode the 3D position onto the 2D captured measurements, which are then decoded jointly via further image processing. For a list of the specific components used in our implementation see supplementary section A.5.2.

We model our system using the scalar diffraction approximation where the emitters are modeled as isotropic point sources [69]. Thus, the PSFs of our system can be efficiently computed by a Fast Fourier Transform (FFT). A full description of our imaging model is provided in supplementary section A.1.

Equipped with the system above, the question is what pair of PSFs is suited for the task of dense 3D localization. In the next sections we gradually answer this question.

3 Disentangling lateral/axial information

For simplicity, we first consider the problem of designing an additional PSF while keeping the first PSF fixed to the 4 Tetrapod [9], which was optimized for the sparse case in a single channel. Given that Tetrapod PSFs encode depth at the cost of a large lateral footprint, we would like the complementary PSF to be compact and help disentangle the approximate lateral positions in overlapping regions. Then, aided by this additional measurement, the overlapping Tetrapods can be decoded to recover the 3D positions. In other words, we are, broadly, separating the problem into an “axial localization” channel, encoded by the Tetrapod PSF, and a “lateral localization” channel, to be encoded by a different PSF.

For encoding lateral information we propose the use of an Extended-Depth-of-Field (EDOF) PSF, namely, a PSF that maintains its lateral shape over extended axial ranges. However, unlike traditional EDOF designs [70, 71], the desired PSF needs to be laterally-compact and signal-efficient, because it should work for very dense samples. These requirements motivated us to design a novel EDOF suited for the task.

3.1 EDOF PSF design

To design the desired EDOF PSF, we formulate the problem as a phase retrieval task. Specifically, given a desired axial range (e.g. 4 s), we first generate a synthetic z-stack comprised of the approximate in-focus Airy disk PSF at 200 nm steps. Afterwards, we use stochastic gradient descent iterations with importance sampling [72] to recover the phase mask associated with this PSF. Let D be the diffraction limit for the assumed optical setup. Then our cost function for this task is given by


where is the on-axis PSF at depth , is the number of axial slices (), and is a weighting term added to quickly “squeeze” the signal photons into the diffraction limited spot, given by


The resulting phase mask and PSF are presented in Fig. 2. This simple approach leads to a powerful EDOF, with very high signal-efficiency and small lateral-footprint (Fig. 2b) compared to previous designs [70, 71] (see supplementary section A.2 for comparisons and implementation details). While we designed and implemented this EDOF to complement the Tetrapod information in emitter-dense regions, its potential applications extend far beyond our localization task.

Notably, recent end-to-end designs of EDOF PSFs have achieved quite compelling results [52, 53, 54]. In particular, the phase mask presented in [54] resembles the result of our approach. However, these data-driven approaches are ultimately dataset-dependant, and take hours of training to design for a new range, whereas our approach is independent of the dataset and converges in less than 2 minutes on GPU.

Figure 2: A small-footprint EDOF mask. (a) The evolution of the EDOF phase mask optimization over 400 ietrations. (b) Comparison between the standard PSF (top) and the final EDOF PSF (bottom). (c) The XZ cross-sections of the standard (left) and EDOF (right) PSFs, respectively. The colorscale is normalized to the maximum intensity of the in-focus, unmodulated PSF.

3.2 Dual-view vs Single-view

In typical LC-SLM PSF engineering systems, half of the signal-photons are discarded, since the LC-SLM can only modulate polarized light. Therefore, in our system the second PSF measurement comes at no additional photon-cost, with the only caveat being the need of an additional detection path in the two-view setup. It should be noted that 4 systems that utilize a Diffractive Optical Element (DOE), instead of a LC-SLM, do not suffer from this photon loss. Yet, this comes at the cost of versatility. Now that we have designed a novel EDOF PSF for our task, we can test the hypothesis whether or not splitting the signal into two cameras is in fact beneficial compared to a DOE based system.

Since neural networks are already established to be incredibly efficient for dense localization [17, 21], we modify our previously published fully convolutional architecture [21] to receive an image with two channels comprised of the two measurements. For training details and network architecture see supplementary section A.4. Our results in simulation (see supplementary Fig. S8) confirms that for the task of dense 3D localization, a split signal dual-view system is superior to a single measurement with a DOE, even when that measurement is sensed using an optimal end-to-end learned design [21].

3.3 Tetrapod-EDOF experimental validation

Next, we validate our approach in cells. For this task, we imaged fluorescently labeled telomeres in fixed human osteosarcoma (U2OS) cells (for fixation and labeling see supplementary section A.5.2). We first chose fixed cells to enable the acquisition of a ground truth approximation via axial scanning. The imaged cell line was hypertriploid, meaning that it has an unusually large number of telomeres (70-130), which facilitates testing our method in a dense environment. The experiment consisted of two parts: first, each cell was scanned in the axial direction using a piezo stage (100 nm steps) the 3D ground truth positions were approximated via fitting (see supplementary section A.5.1). Afterwards, we recorded 3 snapshot images: one with the Tetrapod PSF utilizing 100% of the signal (accomplished using a longer exposure time) and two more with the signal split 50%/50% between the Tetrapod PSF and the EDOF PSF (Fig. 3). In agreement with simulations, these results demonstrate that at a density of 0.27 , the Tetrapod-EDOF pair is superior in localizing overlapping telomeres as measured by the Jaccard index [13, 21].

While the complementary PSF-pair is effective, this way of decoupling the 3D positional information by dedicated “lateral” and “axial” channels is unlikely to be the optimal solution. For example, beyond a certain density, the axial information in the Tetrapod PSF will be occluded completely by overlapping PSFs. Having a second measurement that is solely dedicated to encode the lateral information (EDOF PSF) will not be beneficial for decoding . This motivates us to revisit the task of designing a PSF-pair for dense 3D localization. For simplicity we start with the single-emitter case, viewed from an estimation theory perspective.

Figure 3: Snapshot, dense-emitter, 3D localizations in fluorescently labelled cells. (a) A single frame recorded with a single-channel, 4 Tetrapod PSF (left) and the split-channel, dual PSF approach (right). (b-c) Localizations are plotted with the ground truth measured by axially scanning the sample with the unmodulated PSF.

4 Optimal PSF-pair design

4.1 Single-emitter case

Optimal PSFs for two-channel localization of only a single emitter can be derived by minimizing the Cramér Rao Lower Bound (CRLB) [73, 74, 9]. Considering the system in Fig. 1, we can jointly optimize the sensitivity of a PSF-pair with respect to a change in the 3D position of a single emitter. The CRLB then defines the lower bound on the precision of unbiased estimation of the 3D position from a noisy-PSF pair. Unlike the original Tetrapod optimization [9], here we employed a pixel-wise approach to explore aberrations not spanned by low-order Zernike polynomials. For a full derivation of the CRLB and the optimization objective see supplementary section A.3.

The CRLB-optimized PSF pair is given in Fig. 4. Notably, the CRLB of the PSF-pair is similar to the CRLB of a 4 Tetrapod PSF with twice the signal. Therefore, as can be expected, splitting the information does not improve precision in the single-emitter case, suggesting that a two-channel system is not justified for sparse localization.

The resulting PSF-pair combines the concept of bi-plane imaging and PSF engineering in an elegant way to encode the 3D position in two measurements. Simulation results show that this PSF-pair outperforms the Tetrapod-EDOF pair described earlier (see supplementary sections A.6 and A.7); however, previous work demonstrates that end-to-end designs using deep neural networks can lead to superior performance [21], and this is the path we describe next.

Figure 4: CRLB-optimized PSF pair. (a) Two phase masks were generated by CRLB optimization, namely by estimating the 3D position of a single emitter from a pair of images. Interestingly, each channel encodes a complementary part of the axial range. These PSFs have a smaller lateral footprint than the similar z-range 4 Tetrapod. The colorbar is normalized to the in-focus unmodulated PSF of the system. (b) The estimated CRLB lateral (upper) and axial (lower) precision as a function of emitter depth of each PSF separately (red and orange), and after combining both channels (green), as well as the single-channel PSF Tetrapod (blue).
Figure 5: End-to-end learning of the dual-channel optical system. (a) Simulated 3D positions of emitters are fed into two physical layers, which differ only in the applied phase mask , to simulate the acquired image pairs with the modulated PSFs - & . Next, both images are fed through a convolutional neural network to recover the 3D positions in the simulation . Afterwards, these reconstructed positions are compared to the ground truth with our loss function , and the gradients are back propagated through the layers (red lines) to jointly optimize the encoding masks & , and localization CNN parameters . (b) Nebulae PSFs, which are the result of the end-to-end learning for a 4 axial range. The colorbar is normalized compared to the in-focus unmodulated PSF of the system.

4.2 End-to-end learning of a monocular PSF-pair

As shown previously [21], end-to-end designs lead to efficient PSF patterns that are highly suited for dense 3D imaging. Here, we extend the DeepSTORM3D approach to tackle the problem of designing a PSF-pair. This is achieved by designing the encoding stage to incorporate two disjoint and differentiable physical-simulation layers (Fig. 5a). Each layer is parameterized by its own phase mask ( & ) dictating the respective PSF (see supplementary section A.1 for the imaging model). During training, we randomly simulate 3D positions (), and feed them to the two physical layers. Each physical layer encodes the 3D positions to their simulated sensor image ( & ). These images are concatenated and fed to the localization CNN (parameterized by ) which decodes them in order to recover the underlying 3D positions (). The difference between the simulated and the recovered positions is quantified by our loss function () and back-propagated to jointly optimize the phase masks ( & ), and the localization CNN parameters () end-to-end. This process is usually repeated for 30 epochs until convergence. For training details see supplementary section A.4.

The end-to-end learned phase masks and their respective PSFs, hereafter referred to as the Nebulae PSFs, are presented in Fig. 5. Two distinctive features stand out in this pair compared to the previous approaches described earlier. First, both channels encode 3D information in their individual intensity patterns, as well as in the relative position of the intensity centroids throughout the entire axial range, a trait conceived to be useful for 3D localization before [63]. Second, in phase-space, the learned phase masks are approximately rotated versions compared to one another, although our optimization was performed pixel-wise and our loss function did not include any constraints on the mutual information of both measurements.

To evaluate the performance of the Nebulae PSFs, we first compare them in simulation to the Tetrapod-EDOF pair (section 3), as well as to a single channel Tetrapod PSF with twice the signal (Fig. 7). The results indicate that the Nebulae PSFs achieve unprecedented performance in localizing dense 3D emitters over a large axial range of 4 s assuming our experimental telomere imaging conditions, i.e15K signal photons per emitter and 500 background photons per pixel.

Figure 6: Performance as function of density. (a) Performance comparison of a single-channel Tetrapod (red), Tetrapod-EDOF pair (blue), and the Nebulae PSFs (green). The Nebulae PSFs performs best both in detectability (Jaccard index) and in precision (lateral/axial RMSE). Emitters were simulated with 15K signal photons per emitter and 500 background photons per pixel. Matching of points was computed with a threshold distance of 100 nm using the Hungarian algorithm. Each data point is an average of n = 100 simulated images. Average standard deviation in Jaccard index was 5% and in precision was 3 nm. (b) Example of a simulated frame of density 0.5 alongside 3D comparison of the recovered (red) and the ground truth (blue) positions.

4.3 Nebulae PSFs experimental validation

Next, we applied the Nebulae PSFs in fixed cells, and compared the performance to the Tetrapod-EDOF pair experimentally (Fig. 7). Similar to section 3.3, we first found the emitter positions by axial scanning, for comparison to our snapshots images taken at a single focal plane: once with the Tetrapod-EDOF pair, and once with the Nebulae PSFs. The results show that at a density of 0.34 , the Nebulae PSFs are superior in localizing overlapping telomeres as measured by the Jaccard index. The Nebulae PSFs were also found to have superior performance relative to the CRLB-optimized pair from section 4.1. For a head-to-head comparison in simulations as well as experiments see supplementary sections A.6 and A.7.

Figure 7: Experimental measurement of fixed U2OS cells with fluorescently labelled telomeres. Example images showing the two proposed mask pairs: the Tetrapod + EDOF (left) and the end-to-end learned pair (right). (b) The single-frame 3D localizations with the ground truth (achieved via axial scanning) for the Tetrapod + EDOF and learned pair, respectively.

5 Live telomere tracking

Throughout this work we have imaged and localized 3D positions of telomeres in fixed cells to facilitate quantitative comparisons of the proposed solutions. However, more pertinent is the application of our method to multiple-particle-tracking in live cells, where axial scanning is inapplicable due to the motion of the objects. Here, our simultaneous multi-channel snapshot approach enables capturing the behavior of diffusing telomeres in living cells at an unprecedented combination of density, speed, and axial range [75].

Quantifying telomere dynamics in live cells is of paramount importance for answering fundamental questions under normal and disease conditions [75, 76], as tracking the 3D diffusion of telomeres unveils information on the chromatin environment and on DNA folding regulation. One challenge in observing chromatin in living cells is the intrinsic biological heterogeneity between diffusing telomeres [77]. Therefore, to fully characterize chromatin dynamics it is desired to capture all single telomere trajectories, including in emitter-dense regions.

Figure 8 demonstrates the full applicability of the Nebulae PSFs for volumetric tracking of 61 diffusing telomeres, spanning an axial range of 4.7 in the nucleus of a living U2OS cell. The trained localization CNN is able to reliably track all of the labelled telomeres over the course of 500 frames (50 s), even those in close proximity, and with a low signal-to-noise ratio. As evident in the resulting tracks (Fig. 8), the telomeres exhibit variable diffusion profiles (Fig. 8e) necessitating individual processing as facilitated by our approach.

Figure 8: Dense-particle tracking of labelled telomeres in live cancer cells with the Nebulae PSFs. (a) A single time point showing the two PSF-modulated images. (b)-(c) 3D spatiotemporal trajectories for telomeres (b) and (c), exhibiting drastically different diffusion behaviors, in different regions of the nucleus. (d) 3D rendered cell with all the accumulated tracks showing the motion tracking of telomeres in 3D. Most telomeres were localized in all frames ( missing localizations). (e) Ensemble MSD of all the estimated tracks, obscures the dynamics of individual particles, such as tracks (b) and (c), which exhibit very different diffusion dynamics.

6 Discussion

In computational imaging, the co-design of optics and image-processing algorithms has been introduced in various applications spanning the fields of computational photography and computational microscopy. In the realm of localization microscopy, this is the key concept in PSF engineering [7, 8, 9], and has been utilized to extend the imaging capabilities in SMLM [12, 78]. Until recently, however, the standard approach was to design the optical system to optimize a specific trait of the PSF that would facilitate its processing afterwards, e.g. an axial-displacement-induced rotation in the Double-Helix PSF [79, 8]. In addition to conceived physical properties, information-content-driven optimization was also used in PSF-design; for example, in [80], where the PSF was optimized for depth discrimination. Similarly, for SMLM applications [9] the PSF has been optimized to minimize the variance of an unbiased estimator for localizing the 3D position of a point source. While the latter two identified theoretically optimal solutions to encode the information, in complex environments, the decoding step is often limiting the problem as well.

Recently, powered by deep learning and differentiable physical models, end-to-end designs of physical elements and data-processing algorithms have been demonstrated by our group and others to facilitate efficient imaging modalities in microscopy [31, 32, 61, 62]. Specifically in SMLM, the efficiency of jointly designing PSFs and deep networks was demonstrated for multi-color 2D imaging [41] and snapshot dense 3D imaging [21].

In this work, we addressed the challenging task of multi-PSF engineering for dense 3D imaging. Specifically, we proposed three different PSF-pairs, each derived with a different set of considerations. For the first pair, we introduced an efficient and laterally-compact EDOF PSF to complement the Tetrapod PSF at high emitter densities. Notably, this EDOF PSF has numerous applications in its own right for imaging in thick samples with little need for deconvolution [71]. In the second pair, we extended the CRLB-design metric to optimize the sensitivity of a PSF-pair in the single-emitter case. Lastly, we presented the Nebulae PSFs, learned end-to-end to achieve reliable dense 3D localization via from snapshot measurements. We validated each of the proposed designs numerically and experimentally. To demonstrate the applicability for dense 3D tracking in live cells, we tracked regions of dense telomeres using the Nebulae PSFs, enabling a statistical analysis of population heterogeneity, and high-resolution 3D modelling of chromatin dynamics in single cells.

In contrast to standard CNN filters, a notable aspect of end-to-end learning with physical layers, is our ability to visualize and interpret the designed physical elements. For example, for the Nebulae PSFs, the signal photons are compacted into a single lobe in each channel. This feature is understandably advantageous in the dense fields of emitters with limited SNR used in our simulations and experimental conditions. Moreover, the intensity patterns at each axial position combine elementary depth-encoding aberrations, such as astigmatism, rotation, and relative inter-channel single lobe movement. What separates these PSFs from predetermined designs is the simultaneous deployment of multiple depth-encoding strategies making full use of the decoding CNN capacity, and thereby optimizing dense 3D localization from noisy measurements.

Notably, our approach is not limited to particle tracking. By tweaking the physical-simulation layers, this method can be readily adapted to any point-source-sensing paradigm, including DAISY [67], MINFLUX [81, 82], multi-plane microscopy [5, 4, 83, 6], and more. In a concurrent work [84], similar ideas were pursued for multiplane PSF engineering demonstrating promising results in simulations. In these, and for SMLM applications, it is likely that modifying the CNN architecture, initializations, training sets, and loss functions, may further improve the performance, raising questions of how globally optimal is the solution derived in our framework. At this point, it is unclear how each optimization component affects the learning process, a question that will be addressed in future work. In particular, we anticipate that the emerging suite of tools developed to make deep learning more accessible to the community will assist in answering these critical questions [85, 86, 87].

To the best of our knowledge, this work reports the first end-to-end learning of multiple PSFs with experimental feasibility. Such multi-PSF designs may prove useful outside the realm of computational microscopy. For example, in computational photography, the design of coded aperture pairs and their optimal combination with stereo imaging has been a long standing question [88, 89, 90, 91, 92, 93]. Most recently, Gil et al[93] proposed to exploit identical phase-mask pairs for improved depth estimation and online stereo calibration. We believe this work paves the way for asymmetric strategies in the field of computational photography, with applications in stereo imaging, and multi-shot monocular depth estimation. Depending on the specific task at hand, the optimal PSF-pair could vary, however, we believe that the approaches to PSF-pair optimization in this work will provide a useful initialization to the general problem.


The Israel Science Foundation (grant 852/17), the Technion Ollendorff Minerva Center, the Zuckerman Foundation. H2020 European Research Council Horizon 2020 (802567), Google Faculty Research Award for Machine Perception, The Israel Science Foundation (grant 450/18).


We thank Rotem Mulayoff for insights and fruitful discussions with respect to the EDOF design. We also thank Romain F. Laine for his help with conceiving the name Nebulae. We gratefully acknowledge the support of the NVIDIA Corporation with the donation of the Titan Xp and the Titan V GPU used for this research. We thank Google for the cloud units provided to accelerate this research.


The authors declare no conflicts of interest.

Appendix A Appendix

a.1 Imaging model

In this section we briefly review the imaging model used throughout this work. Our system is composed of fluorescent emitters with an emission wavelength suspended in water (refractive index of ) above an oil-immersed objective (refractive index of ). The emitters are imaged with an objective lens (numerical aperture of NA), focused at a focus plane , and their image is magnified onto the sensor with a microscope magnification M. Let denote the phase mask placed in the conjugate back focal plane of an extended emission path with a 4 system (Fig. 1), and let denote the normalized radial coordinates in the Fourier plane such that at . Under the scalar approximation [69], the PSF of a point source located at above a water-oil interface is given by


where are the coordinates at the image plane, is the two-dimensional Fourier transform, is the effective aperture of the compound system, limited by for high NA objectives


and is the accumulated phase due to the emitter 3D position and the focal plane setting. This phase can be decomposed into lateral and axial components


The lateral component is assumed to be a linear phase (i.e. shift-invariant convolution system), given by


As for the axial component, it is split into two terms to account for refractive index-mismatch [94]: the phase accumulated in water due to the emitter depth , and the phase accumulated in oil due to a focus shift from the coverslip




Finally, the PSF in eq. S1 is slightly smoothed in image space


Where denote convolution, and is a 2D Gaussian kernel, with a standard deviation that is fit empirically to match experimental data (usually 70 nm). This blur accounts for the finite size of the emitter, its spectrum, and the inherent blur in the optical system, alleviating the need to explicitly model these effects. For a full derivation of the model that includes neglected dipole and near-field effects, the reader is referred to [95, 10].

The image of a set of emitters is given by the incoherent sum of their PSFs


where is the 3D position of the emitter.

The commonly used measurement model is given by a data-dependant Poisson noise, and an additive Gaussian read noise


where is the Poisson distribution, is a per-pixel background noise, is the the normal distribution, is a baseline count level, and is the read-noise variance.

To make the measurement model differentiable, by the law of large numbers, we can approximate the Poisson noise with a Gaussian noise using the central limit theorem


The resulting data-dependant noise approximation is implemented using the reparameterization trick [96]


where is a realization of a standard normal distribution


Now, the measurement model is differentiable w.r.t. the phase mask and is therefore suited for end-to-end learning.

a.2 EDOF PSF design

In this section we provide the implementation details for designing the EDOF PSF, then compare our result with existing popular designs. There are several ways to implement an EDOF PSF, including: placing an axicon in the optical path [97], using ring apertures, and reducing the numerical aperture of the system. Due to photon-efficiency considerations, in this work we focus on the implementation of an EDOF PSF using a phase mask. Our general strategy is to formulate the design problem as a phase-retrieval task as detailed next.

First, we start by simulating the in-focus Airy disk PSF for the desired optical system. Afterwards, this PSF is thresholded to keep only the main lobe with diameter D, and the result is fitted with a 2D Gaussian . This Gaussian is then replicated to generate a synthetic z-stack with 200 nm jumps between slices. is also used to define a weighting matrix , that “squeezes” signal photons quickly into the diffraction limited spot, . Let be centered pixel coordinates in image space, the matrix is given by


where in our implementation , and determined empirically to achieve appealing results.

Given , we try to retrieve the corresponding phase mask associated with the synthetic z-stack via phase retrieval [72]. This is implemented using Stochastic Gradient Descent (SGD) with importance sampling to minimize the following cost function


where is the on-axis PSF at depth , and is the number of axial slices (). Let denote the current PSF stack dictated by phase mask , such that . Our optimization is comprised of the 3 following steps:

  1. We compute the correlation of with at each axial slice


    and choose the three axial slices with the lowest correlation.

  2. To avoid overfitting the sampled 200 nm “knots” throughout the axial range, we perturb each of locally with a random continuous shift while clipping out-of-range values.

  3. We calculate the gradient of the cost in eq. S14 sampled only at , and take a gradient step.

In the third step, we experimented with a few adaptive SGD optimizers [98, 99, 100, 101], and ultimately chose Adam [98]. The process is repeated for 400 iterations, or till the loss function stagnates.

Notably, the correlation in our implementation serves as side-information [102], and is used to adaptively sample slices and direct the SGD iterations. Compared with a stochastic sampling approach, this has the benefit of accelerating convergence, and empirically led to better solutions.

Figure S1 compares the result to two common EDOF implementations: the cubic phase mask [71], and the randomly sampled Fresnel lenses phase mask [70]. The amplitude of the cubic phase mask was chosen such that the PSF is consistent over the FOV, but retains as much SNR as possible. Our proposed EDOF has three significant advantages over the classical designs: (1) its lateral extent is much smaller than the cubic phase mask PSF, matching our density requirements, (2) the SNR in the main spot is higher than both other methods, and (3) the proposed phase mask is smooth compared to randomly sampled Fresnel lenses. This facilitates its implementation using LC-SLM devices as these suffer from inter-pixel cross-talk [103].

Figure S1: Comparisons of EDOF PSFs in simulation. (a) Standard unmodulated PSF. (b) Cubic phase mask PSF. (c) Randomly sampled Fresnel lenses PSF. (d) Ours. The colorscale is normalized to the maximum intensity of the in-focus, unmodulated PSF.

a.3 CRLB optimization

In this section we derive the Cramér Rao Lower Bound (CRLB) [73, 74, 9] of the system in Fig. 1. For simplicity, we start with the assumption that the measurement model is reduced to a Poisson data-dependent noise only. At the end, we also provide the expression for the extended case including the read noise.

First let us start with some notation. Let denote the 3D position of a single emitter imaged with the system in Fig. 1, let denote the concatenated coordinates at the image plane, and let denote the model PSF of the emitter in the detection path with phase modulation . Assuming Poisson statistics for the source and background signals, the measured PSF is given by


where is a per-pixel background. The log-likelihood function for the measurement in eq. S16 is given by


where is the number of pixels in the image, and is a function of the measurements that is independent of the unknown 3D position .

Given a log-likelihood function, the Fisher Information matrix is defined as [73]


Substituting the log-likelihood from eq. S17 we get


Assuming independent photon arrivals in each detection path, the measurements , become independent. Therefore, the joint information of both PSFs is given by the sum of the individual information from each PSF. Formally, let denote the information matrix of the measurement with phase modulation . The joint Fisher Information matrix for measurements , is given by


Let denote the coordinate of the 3D position. Given , the CRLB for estimating is defined as [73]


where denote the inverse of the Fisher information matrix. Based on eq. S21, to derive the phase masks , optimizing the CRLB for all three estimated parameters , we minimize the following cost function


In our implementation, is evaluated at on-axis positions , where is sampled each 250 nm throughout the desired axial range. We also simplify the per-pixel background term to a single scalar of 15 photons per pixel, and scale the PSFs to match realistic signal counts encountered in SMLM imaging, i.e. 2000 photons per emitter. Notably, different from our previous work [9], we optimized the CRLB using a per-pixel approach rather than constraining the solution to a subspace of Zernike polynomials. This was particularly important to efficiently navigate the wide variety of possible solutions.

Finally, in this work we focused our attention on SMLM experimental conditions. Therefore, for our purpose the read noise effects were negligible. However, the optimization is readily extended to the mixed Poisson-Gaussian case by revisiting eqs. S19, S17 and S16. Specifically, assume the measured PSF is given by


where is the normal distribution, is a baseline, and is the variance of the read noise.

We can approximate the Poisson noise by a Gaussian noise using eq. S10


Assuming both noise sources are independent we get


The resulting log-likelihood function for the measurement in eq. S25 is given by


where is the number of pixels in the image. Substituting the log-likelihood from eq. S26 in the definition from eq. S18 we get


Substituting eq. S27 in eqs. S22, S21 and S20 we get the desired cost function for the general case.

Figure S2: CNN architecture. (a) The concatenated snapshot images , are fed to a CNN composed of 3 modules as described in the text. Feature maps dimensions are depicted with [104] to reflect the operation of each module. The spatial supports of all convolutional filters are . The number of channels is fixed to 64 in both the multi-scale context aggregation, and the upsampling modules. Then, the number is increased to 80 for the refinement module. Note that in the context-aggregation module the spatial support of all convolutional filters is , although their receptive field grows exponentially with the dilation rate. Blue square depicts the final receptive field for both choices of . The output 3D high-resolution volume is translated to a list of 3D localizations through simple post-processing. Scale bars are 3 m.

a.4 Learning details

CNN architecture

In this work, we adapt the CNN architecture previously proposed in DeepSOTRM3D [21] to process an image with 2 channels (Fig. S2). Our architecture is relatively light with only 440K trainable parameters, comprised of 3 main modules:

  1. Multi-scale context-aggregation module: we used dilated convolutions [105] to increase the receptive field of each layer while keeping a fixed number of 64 channels. The two concatenated snapshots are processed through 6 convolutional blocks with increasing dilation rates. The maximal dilation rate was set according to the PSFs lateral footprint: for the Tetrapod-EDOF pair, and for the other two PSF pairs (see Fig. S2). We also include skip connections to improve gradient flow [106] (not shown in the figure).

  2. Upsampling module: composed of two consecutive resize-convolutions [107] to increase the lateral resolution by a factor of 4. We used nearest-neighbor interpolation to resize the images. Assuming a CCD pixel-size of 110 nm, the lateral pixel-size of the upsampled features is 27.5 nm.

  3. Prediction module: after super-resolving emitters in the lateral dimension, we further refine their axial position through 3 additional convolutional blocks with an increased number of channels. For a 4 m range, we use 80 channels, i.e. a voxel-size of 50 nm in . The final prediction is given by a convolution followed by an element-wise HardTanh to limit the output range to , where W is set empirically to 800 to account for class imbalance (occupied vs. vacant voxels).

The spatial supports of all convolutional filters are . Each convolution block is follow by a Batch Normalization layer, and a LeakyReLU non-linearity with slope . Note that depth is exchanged with channels as our architecture is composed of solely 2D convolutional layers. Afterwards, these dimensions are permuted in the recovered volume. To compile a list of localizations at test time, we threshold the voxel values and find local maxima in clustered components (details in section A.4.5). Lastly, to efficiently learn the phase masks with reduced computation, we modify the architecture in a similar fashion to that described in [21].

Notably, in this work we used the same encoder to process both images. In our implementation the image pair is first warped using a calibrated affine transform prior to CNN processing. However, in case of severe inter-channel misalingment this is expected to be sub-optimal, and a “Y-net” structure with separate encoders should be considered. In particular, one of the encoders could be potentially swapped with a spatial transformer network [108] to alleviate the need for calibration.

Training set

To learn a localization CNN solely with predefined phase masks, we simulate a training set composed of 10K simulated image-pairs and their corresponding labels which are lists of emitter positions. 9K examples were used for training with 1K examples held out for validation. Alternatively, to jointly learn the phase masks and the localization CNN parameters, the training set is composed of solely simulated emitter positions, as the respective image-pairs are being changed throughout iterations according to the phase masks.

In our implementation the training positions are randomly drawn within the 3D cube of possible locations in order for the method to generalize to arbitrary imaged structures. The Boolean grid used as label in training is given by projecting the continuous positions on the recovery grid (voxel size of ).

Given a set of 3D locations, the expected model images are simulated using the measurement model in eq. S9. To accurately model experimental data in our simulations, we image beads on the coverslip prior to the experiment, and retrieve the aberrated pupil functions using VIPR [72]. To make our simulations realistic, we diversify the training conditions to include experimentally variability. Namely, we vary the emitter density, the signal-to-noise ratio, the amount of blur, and any additional expected experimental challenges (e.g. motion blur, laser fringes etc.). For example, in telomere imaging we have observed a highly non-uniform per-pixel background, presumably resulting from the nucleus auto-fluorescence. To model this effect, we approximate the per-pixel background in eq. S9 using a super-Gaussian


where is the combined 2D coordinates in image space, , are scaling parameters, is the 2D centroid, and is the covariance matrix. These parameters are augmented in training to make the model robust to their variations.

Figure S3: Overview of a typical experiment. Fluorescent beads are used to create 3D PSF scans of the two channels, which are then modelled using VIPR. The calibrated PSF models are used to train the localization net. The trained net can then localize experimental data, and output the desired 3D positions from snapshot measurements. For fixed samples, where an experimental ground truth is available, the Jaccard index is calculated by matching the axial scan results with the net output.

Loss function

Let denote the GT boolean volume, and denote the network prediction. Our loss function for training the net is a combination of two terms


The first term is a 3D heatmap matching loss, given by


where is a 3D Gaussian kernel with a standard deviation of 1 voxel. This term measures the proximity of our prediction to the simulated ground truth by measuring the distance between their respective heatmaps.

The second term is a measure of overlap, given by


This term provides a soft approximation of the true positive rate in the prediction. Note that doesn’t take into account false positives, and hence if optimized alone will result in a predicted volume of 1s. Although, here this is not a feasible solution as it is not favored by . In our implementation we weight voxels containing emitters with a factor of W=800 in order to balance out the contributions of vacant and occupied voxels. Hence, the CNN output is constrained to be in the range . This strategy makes optimization easier and prevents gradient clipping.

Optimization and hyper-parameters

We used the Adam optimizer [98] with the following parameters: . The batch size was 16 for learning a phase mask, and 4 for learning a recovery net (due to GPU memory). The learning rate was reduced by a factor of 10 when the loss plateaus for more than 5 epochs, and training was stopped if no improvement was observed for more than 7 epochs, or alternatively a maximum number of 50 epochs was reached. The initial weights were sampled from a uniform distribution on the interval where , with the filter spatial dimensions, and the number of input channels to the convolutional layer. Training and evaluation were run on a workstation equipped with 32 GB of memory, an Intel(R) Core(TM) , 3.20 GHz CPU, and a NVidia GeForce Titan Xp GPU with 12 GB of video memory. Phase mask learning took h, and recovery net training took h. Our code is implemented using the Pytorch framework [109], and soon will be made publicly available at https://github.com/EliasNehme/DeepNebulae.


The fully convolutional architecture that we adopted in this work outputs a super-resolved 3D volume, where occupied voxels account for emitters. To compile a list of localizations, we first threshold this volume keeping only voxels with a minimal confidence of 80 (maximal output is 800). Afterwards, out of the remaining localizations we discard those which are not local maximas in their 3D vicinity. The radius used for grouping and local maxima finding was 100 nm. Lastly, the recovered continuous 3D position is given by applying the 3D Center of Gravity (CoG) estimator to the vicinity of the local maximas in the prediction volume. While it is possible to use more sophisticated post-processing steps we choose to use this simple and efficient strategy to keep our method as fast as possible. In our implementation we write these steps as a composition of pooling and convolution operations, making calculations extremely efficient on GPU.

Notably, While grouping and local maxima finding potentially limits the maximal density, keep in mind that overlaps in 2D normally translates to non-overlapping “blobs” in 3D. Hence, this is hardly a limitation in common imaging conditions as localization algorithms struggle considerably before reaching this limit.

In the telomere tracking experiment, the per-frame localizations were linked using DBSCAN clustering [110] applied directly to the 3D positions. The maximum distance allowed between points was , and the minimal number of emitters per cluster was minPts=25. This resulted in filtering 83 localizations out of 24530 throughout the 500 frames, i.e. less than 0.3%. All tracks started within the first 6 frames and were relatively clustered in 3D with no bifurcations observed. For more complicated tracking scenarios the reader is encouraged to link the CNN localizations by resorting to a more robust tracking software such as [111].

a.5 Experimental implementation

This section details the full experimental procedure to localize emitters using snapshot measurements from the dual-view setup. An outline of a typical experiment is presented in Fig. S3. The following subsections detail each part of the experiment for completeness.

Dual channel calibration

The goal of this section is to describe the process of calibrating the proposed dual-camera system, such that simulated PSFs will match measured data and their positions will correspond between the two images. The practice of aligning an optical 4 Fourier processing system, calibrating a LC-SLM, and creating a simulated model for a single channel has been meticulously explained in many previous works (e.g[112]).

The proposed system consists of two identical optical paths which generate 3D PSF images. The acquired images are encoded simultaneously in the localization network, and thus pose some extra challenges in the calibration process, specifically with respect to their spatial alignment. In our work, post-processing corrections are not a viable option due to the density of PSFs, necessitating a good calibration of the 3D alignment. For this end, we created two calibration samples (sparse and dense) consisting of a water-covered glass coverslip (Fisher Scientific) with 40 nm fluorescent beads (FluoSpheres (580/605), ThermoFisher) adhered to the surface with 1% PVA. The dense sample was chosen such that the unmodulated PSFs will cover the entire field of view (FOV) but each individual bead can still be fit using ThunderSTORM [113]. The localizations from each channel were used to estimate an affine transformation between the two cameras (Fig. S4). To prevent outliers from biasing the transformation, we implemented a Random sample consensus (RANSAC) procedure.

Next, the sparse sample is chosen such that each slice of the 3D PSFs (for both channels) can be imaged without any overlaps from neighboring emitters. An axial scan is performed to ensure that both channels measure corresponding PSFs at the same focal plane positions, to account for any minor axial misalignment between the two cameras. The point of reference (lateral) was chosen as the center of gravity of the maximum projection in one of the channels.

The reference point of the second channel was calculated using the aforementioned affine transformation. Next, we used VIPR [72] to generate a phase mask for each channel, as it provides with a good model and accounts for the issue of wobble and near field effects by implementing the vectorial diffraction model. Importantly, while the affine transformation is calculated using localizations and not images, ultimately the input to the localization network is an image-pair. However, since a global affine transformation is not a shift-invariant operator, a fully convolutional model will struggle to learn this operator efficiently. Therefore, at test time, we warp the image of one camera to align with its counterpart, and feed the aligned concatenated image pair to the network. The warping operation is implemented using cubic-spline interpolation.

Figure S4: Channel registration. The estimated affine transformation for the Tetrapod-EDOF experiment (main text Fig. 3).
Figure S5: Effect of image misalignment. Numerical comparison between networks trained with aligned images (blue), misaligned images (green) and approximately aligned images (up to 50 nm) by warping (red).

To test the importance of image alignment, we trained three different models: (1) with perfectly aligned positions, (2) with randomly misaligned positions (achieve by sampling portions of the estimated transformation), and (3) with misaligned positions accompanied by a known transformation between channels (up to 50 nm) that is used to warp the images. Three conclusions can be made based on the results (Fig. S5): (1) it is clear that the model is unable to efficiently cope with a random global transform, (2) calibrating the affine transform up to 50 nm errors and warping the images prior to localization improves performance, and (3) perfect alignment of the Tetrapod and the EDOF PSFs does not improve the axial localization precision. The latter is expected because the axial information is decoded solely based on the Tetrapod channel. Therefore, it is insensitive to the alignment with the EDOF PSF which does not encode .

Optical components

The imaging system in Fig. 1 consists of a Nikon Eclipse-Ti inverted fluorescence microscope with a 100X/1.49 NA Nikon objective (CFI SR HP Apo TIRF 100XC). A polarizing beam splitter was placed after the first achromatic doublet lens (f=15 cm) to split the emission path. Both paths consisted of three additional achromatic doubles lenses to image the back focal plane onto a LC-SLM (Pluto-VIS020, Holoeye in the first path, and 1920X1152 liquid crystal on silicon, Meadowlark in the second). After a last image-forming lens, the modulated images were recorded by two sCMOS cameras (Prime 95B, Photometrics). For full synchronization, the first camera triggered the second camera (in a leader-follower configuration), which in turn triggered the 561 nm illumination laser (iChrome MLE, Toptica).

Biological sample preparation

For cell experiments, U2OS cells were prepared as described previously in [21]. In brief, cells were grown in standard conditions: , 5% in Dulbecco‘s Modified Eagle Media (DMEM - without phenol red for the live cells experiment) with 1 D-glucose (low glucose), supplemented with 10% fetal bovine serum, and 1% penicillin–streptomycin and glutamine. To fluorescently label the telomeres, cells were transfected with a plasmid encoding the fluorescently tagged telomeric repeat binding factor 1 (DsRed-hTRF1) using Lipofectamine 3000 (Thermo Fisher Scientific). After 20-24 hours, cells were either fixed with 4% paraformaldehyde for 20 min, washed three times with PBS and mounted to a slide ( , 170 thick) with mounting medium; or imaged live in a temperature, humidity, and gas-mixture controlled imaging chamber mounted to the microscope (Okolab) on a glass bottom culture dish (15mm, 180 thick).

Ground truth estimation

In fixed cell experiments, the experimental ground truth 3D positions were approximated via axial scanning with the unmodulated PSF (Fig. S6). The scan consisted of 100 nm steps over a range of 4-5 . The resulting z-stack was then processed in the following manner: first, detection and lateral position estimation were performed with ThunderSTORM [113]. Next, the in-focus position of emitters was estimated by fitting a second order polynomial to the mean intensity across focal slices. The mean intensity was calculated as the mean of number of counts in the central pixels of each detected PSF. The emitter axial position was obtained by correcting the detected in-focus position with a factor of 0.8 accounting for refractive index mismatch. Since VIPR [72] accounts for the 3D wobble of the modulated PSFs, the final required correction was a global lateral shift between the in-focus PSF and the chosen modulated-PSF center in the phase retrieval step.

Figure S6: Experimental ground truth approximation. (a) A focal sweep is performed with an unmodulated imaging path. (b) Max projection of the focal sweep, showing the density of labelled telomeres in the U2OS cell. (c) Axial fit of the mean intensity to determine the in-focus position of an emitter.
Figure S7: PSFs for single and dual channel comparisons. Phase masks which were used in the single-channel vs. dual-channel comparison: (top to bottom) Tetrapod, end-to-end encoding for single-channel, EDOF, and biplane.
Figure S8: Single-channel vs. dual-channel systems. Detection (left) and localization precision (lateral \axial) over a range of simulated density of sources. Emitters were simulated with 15K signal photons per emitter and 500 background photons per pixel. Each data point is an average of n = 100 simulated images. Average standard deviation in Jaccard index was 5% and in precision was 3 nm.
Figure S9: Performance as function of density for the three proposed PSF pairs. The methods are tested in detection (left) and localization precision (lateral \axial RMSE). Emitters were simulated with 15K signal photons per emitter and 500 background photons per pixel. Each data point is an average of n = 100 simulated images. Average standard deviation in Jaccard index was 5% and in precision was 3 nm.

a.6 Additional simulation results

In this section we present further numerical simulation results which support conclusions from the main text and the choice of the PSF pair. The first result presented in Figs. S7 and S8 shows a numerical comparison between single-channel and dual-channel setups in terms of their detection (measured by the Jaccard index) and the average precision (measured by the lateral\axial RMSE). We compare the Tetrapod-EDOF (blue) pair to the commonly used biplane (cyan) method [5, 4] and to two single-channel approaches with double signal, namely the Tetrapod PSF (red) and the single channel end-to-end optimized phase mask (orange) adopted from DeepSTORM3D [21]. The numerical results show that the Tetrapod-EDOF pair is the best in detection. In terms of the lateral RMSE in high densities, the biplane approach is better as the in-focus PSF is more photon efficient than the EDOF. The axial RMSE result shows that the proposed pair is surpassed only by the end-to-end encoding of a single channel. This is likely because the axial position is mainly encoded in the Tetrapod path, thus is limited to the axial localization performance of the single channel Tetrapod at high densities. These results reinforce the decision to explore other solutions which mutually encode all parameters in both channels, and are optimal for detection and localization.

The second result in Fig. S9 is a comparison between the three proposed PSF pairs in this manuscript. Both the detection and the average precision support our claim that the Nebulae PSFs (green) are better than the CRLB (black) and Tetrapod-EDOF pairs (blue). A similar conclusion was drawn from the experimental results in fixed cells, which ultimately supports our decision to use the Nebulae PSFs for live cell tracking. Comparing all the tested metrics, we can see that at every density, the Nebulae PSFs are undoubtedly the best choice out of the three.

a.7 Additional experimental results

This section presents more experimental results in fixed cell data. Figure S10 explores the false negatives presented in 3. All of the experimentally undetected points were with a very low signal. While the EDOF performs well in 2D, it is not as signal efficient as the in-focus unmodulated PSF. Thus, emitters which are slightly above the noise limit (without a phase mask) can be detected in the axial scan but are invisible for the EDOF and Tetrapod PSFs. This was improved in the subsequent PSF-pairs which complement each other more efficiently.

To validate our conclusions from simulation regarding the Nebulae PSFs being the optimal pair, we have shown in Fig. S11 that the Nebulae PSFs are outperform the Tetrapod-EDOF pair. For completeness, we show in Fig. S11 the results including the CRLB-pair for the same cell. As predicted in simulations, the CRLB pair performs slightly worse than the Nebulae PSFs but better than the Tetrapod-EDOF pair. To verify reproducibility, we present in Fig. S12 similar experimental results for a bigger cell, which exhibits a staggering number of 142 emitters. The reconstruction results are improved for all PSF pairs as this cell experiences less overlaps, yet, they are consistent with the previous conclusions on PSF-pair performance.

Figure S10: Experimental false negatives for the Tetrapod-EDOF pair.(a) U2OS cell experimental snapshot with the Tetrapod PSF (Fig. 3). (b) Reconstructed image by rendering the positions recovered by the net with the Tetrapod PSF. Asterisks mark true (green) and false (blue) positives. (c) Paired experimental EDOF snapshot. (d) Zoom-ins on undetected emitters (false positives).
Figure S11: Experimental measurement of fixed U2OS cells with fluorescently labelled telomeres. Example images showing the two proposed mask image pairs and subsequent 3D reconstruction plotted over the approximated ground truth: (a) Tetrapod-EDOF pair with , (b) CRLB pair with , and (c) Nebulae PSFs with .
Figure S12: Experimental measurement of fixed U2OS cells with fluorescently labelled telomeres. Example images showing the two proposed mask image pairs and subsequent 3D reconstruction plotted over the approximated ground truth: (a) Tetrapod-EDOF pair with , (b) CRLB pair with , and (c) Nebulae PSFs with .


  1. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science, vol. 313, no. 5793, pp. 1642–1645, 2006.
  2. S. T. Hess, T. P. Girirajan, and M. D. Mason, “Ultra-high resolution imaging by fluorescence photoactivation localization microscopy,” Biophysical journal, vol. 91, no. 11, pp. 4258–4272, 2006.
  3. M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (storm),” Nature methods, vol. 3, no. 10, pp. 793–796, 2006.
  4. S. Ram, P. Prabhat, J. Chao, E. S. Ward, and R. J. Ober, “High accuracy 3d quantum dot tracking with multifocal plane microscopy for the study of fast intracellular dynamics in live cells,” Biophysical journal, vol. 95, no. 12, pp. 6025–6043, 2008.
  5. M. F. Juette, T. J. Gould, M. D. Lessard, M. J. Mlodzianoski, B. S. Nagpure, B. T. Bennett, S. T. Hess, and J. Bewersdorf, “Three-dimensional sub–100 nm resolution fluorescence microscopy of thick samples,” Nature methods, vol. 5, no. 6, pp. 527–529, 2008.
  6. B. Louis, R. Camacho, R. Bresolí-Obach, S. Abakumov, J. Vandaele, T. Kudo, H. Masuhara, I. G. Scheblykin, J. Hofkens, and S. Rocha, “Fast-tracking of single emitters in large volumes with nanometer precision,” Optics Express, vol. 28, no. 19, pp. 28656–28671, 2020.
  7. B. Huang, W. Wang, M. Bates, and X. Zhuang, “Three-dimensional super-resolution imaging by stochastic optical reconstruction microscopy,” Science, vol. 319, no. 5864, pp. 810–813, 2008.
  8. S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. Moerner, “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proceedings of the National Academy of Sciences, vol. 106, no. 9, pp. 2995–2999, 2009.
  9. Y. Shechtman, S. J. Sahl, A. S. Backer, and W. Moerner, “Optimal point spread function design for 3d imaging,” Physical review letters, vol. 113, no. 13, p. 133902, 2014.
  10. A. S. Backer and W. Moerner, “Extending single-molecule microscopy using optical fourier processing,” The Journal of Physical Chemistry B, vol. 118, no. 28, pp. 8313–8329, 2014.
  11. Y. Shechtman, L. E. Weiss, A. S. Backer, S. J. Sahl, and W. Moerner, “Precise three-dimensional scan-free multiple-particle tracking over large axial ranges with tetrapod point spread functions,” Nano letters, vol. 15, no. 6, pp. 4194–4199, 2015.
  12. A. Aristov, B. Lelandais, E. Rensen, and C. Zimmer, “Zola-3d allows flexible 3d localization microscopy over an adjustable axial range,” Nature communications, vol. 9, no. 1, p. 2409, 2018.
  13. D. Sage, T.-A. Pham, H. Babcock, T. Lukes, T. Pengo, J. Chao, R. Velmurugan, A. Herbert, A. Agrawal, S. Colabrese, et al., “Super-resolution fight club: assessment of 2d and 3d single-molecule localization microscopy software,” Nature methods, vol. 16, no. 5, p. 387, 2019.
  14. L. Möckl, A. R. Roy, and W. Moerner, “Deep learning in single-molecule microscopy: fundamentals, caveats, and recent developments,” Biomedical Optics Express, vol. 11, no. 3, pp. 1633–1661, 2020.
  15. W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nature biotechnology, 2018.
  16. S. K. Gaire, Y. Zhang, H. Li, R. Yu, H. F. Zhang, and L. Ying, “Accelerating multicolor spectroscopic single-molecule localization microscopy using deep learning,” Biomedical Optics Express, vol. 11, no. 5, pp. 2705–2721, 2020.
  17. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-storm: super-resolution single-molecule microscopy by deep learning,” Optica, vol. 5, no. 4, pp. 458–464, 2018.
  18. N. Boyd, E. Jonas, H. P. Babcock, and B. Recht, “Deeploco: Fast 3d localization microscopy using neural networks,” BioRxiv, p. 267096, 2018.
  19. J. M. Newby, A. M. Schaefer, P. T. Lee, M. G. Forest, and S. K. Lai, “Convolutional neural networks automate detection for tracking of submicron-scale particles in 2d and 3d,” Proceedings of the National Academy of Sciences, vol. 115, no. 36, pp. 9026–9031, 2018.
  20. B. Diederich, P. Then, A. Jügler, R. Förster, and R. Heintzmann, “cellstorm—cost-effective super-resolution on a cellphone using dstorm,” PloS one, vol. 14, no. 1, p. e0209827, 2019.
  21. E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, and Y. Shechtman, “Deepstorm3d: dense 3d localization microscopy and psf design by deep learning,” Nature Methods, vol. 17, no. 7, pp. 734–740, 2020.
  22. A. Speiser, S. C. Turaga, and J. H. Macke, “Teaching deep neural networks to localize sources in super-resolution microscopy by combining simulation-based learning and unsupervised learning,” arXiv preprint arXiv:1907.00770, 2019.
  23. R. Barth, K. Bystricky, and H. Shaban, “Coupling chromatin structure and dynamics by live super-resolution imaging,” bioRxiv, p. 777482, 2019.
  24. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica, vol. 6, no. 8, pp. 921–943, 2019.
  25. G. Ongie, A. Jalal, C. A. M. R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning techniques for inverse problems in imaging,” IEEE Journal on Selected Areas in Information Theory, 2020.
  26. T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald, et al., “U-net: deep learning for cell counting, detection, and morphometry,” Nature methods, vol. 16, no. 1, pp. 67–70, 2019.
  27. Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica, vol. 4, no. 11, pp. 1437–1443, 2017.
  28. M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, et al., “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nature methods, vol. 15, no. 12, p. 1090, 2018.
  29. A. Krull, T.-O. Buchholz, and F. Jug, “Noise2void-learning denoising from single noisy images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137, 2019.
  30. S. Lim, H. Park, S.-E. Lee, S. Chang, B. Sim, and J. C. Ye, “Cyclegan with a blur kernel for deconvolution microscopy: Optimal transport geometry,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1127–1138, 2020.
  31. R. Horstmeyer, R. Y. Chen, B. Kappes, and B. Judkewitz, “Convolutional neural networks that teach microscopes how to image,” arXiv preprint arXiv:1709.07223, 2017.
  32. A. Muthumbi, A. Chaware, K. Kim, K. C. Zhou, P. C. Konda, R. Chen, B. Judkewitz, A. Erdmann, B. Kappes, and R. Horstmeyer, “Learned sensing: jointly optimized microscope hardware for accurate image classification,” Biomedical Optics Express, vol. 10, no. 12, pp. 6351–6369, 2019.
  33. C. Ounkomol, S. Seshamani, M. M. Maleckar, F. Collman, and G. R. Johnson, “Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy,” Nature methods, vol. 15, no. 11, pp. 917–920, 2018.
  34. Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction using deep learning in neural networks,” Light: Science & Applications, vol. 7, no. 2, p. 17141, 2018.
  35. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for fourier ptychography microscopy,” Optics express, vol. 26, no. 20, pp. 26470–26484, 2018.
  36. Y. Xue, S. Cheng, Y. Li, and L. Tian, “Reliable deep-learning-based phase imaging with uncertainty quantification,” Optica, vol. 6, no. 5, pp. 618–629, 2019.
  37. Z. Wu, Y. Sun, A. Matlock, J. Liu, L. Tian, and U. S. Kamilov, “Simba: scalable inversion in optical tomography using deep denoising priors,” IEEE Journal of Selected Topics in Signal Processing, 2020.
  38. J. T. Smith, R. Yao, N. Sinsuebphon, A. Rudkouskaya, J. Mazurkiewicz, M. Barroso, P. Yan, and X. Intes, “Ultra-fast fit-free analysis of complex fluorescence lifetime imaging via deep learning,” bioRxiv, p. 523928, 2019.
  39. P. Zelger, K. Kaser, B. Rossboth, L. Velas, G. Schütz, and A. Jesacher, “Three-dimensional localization microscopy using deep learning,” Optics express, vol. 26, no. 25, pp. 33166–33179, 2018.
  40. P. Zhang, S. Liu, A. Chaurasia, D. Ma, M. J. Mlodzianoski, E. Culurciello, and F. Huang, “Analyzing complex single-molecule emission patterns with deep learning,” Nature methods, vol. 15, no. 11, p. 913, 2018.
  41. E. Hershko, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Multicolor localization microscopy and point-spread-function engineering by deep learning,” Optics express, vol. 27, no. 5, pp. 6158–6183, 2019.
  42. G. Dardikman-Yoffe and Y. C. Eldar, “Learned sparcom: Unfolded deep super-resolution microscopy,” arXiv preprint arXiv:2004.09270, 2020.
  43. L. Möckl, P. N. Petrov, and W. Moerner, “Accurate phase retrieval of complex 3d point spread functions with deep residual neural networks,” Applied Physics Letters, vol. 115, no. 25, p. 251106, 2019.
  44. L. Möckl, A. R. Roy, P. N. Petrov, and W. Moerner, “Accurate and rapid background estimation in single-molecule localization microscopy using the deep neural network bgnet,” Proceedings of the National Academy of Sciences, vol. 117, no. 1, pp. 60–67, 2020.
  45. D. Saha, U. Schmidt, Q. Zhang, A. Barbotin, Q. Hu, N. Ji, M. J. Booth, M. Weigert, and E. W. Myers, “Practical sensorless aberration estimation for 3d microscopy with deep learning,” Optics Express, vol. 28, no. 20, pp. 29044–29053, 2020.
  46. A. Shajkofci and M. Liebling, “Spatially-variant cnn-based point spread function estimation for blind deconvolution and depth estimation in optical microscopy,” IEEE Transactions on Image Processing, vol. 29, pp. 5848–5861, 2020.
  47. H. Gupta, M. T. McCann, L. Donati, and M. Unser, “Cryogan: A new reconstruction paradigm for single-particle cryo-em via deep adversarial learning,” BioRxiv, 2020.
  48. C. Belthangady and L. A. Royer, “Applications, promises, and pitfalls of deep learning for fluorescence image reconstruction,” Nature methods, pp. 1–11, 2019.
  49. A. Chakrabarti, “Learning sensor multiplexing design through back-propagation,” in Advances in Neural Information Processing Systems, pp. 3081–3089, 2016.
  50. E. Schwartz, R. Giryes, and A. M. Bronstein, “Deepisp: Toward learning an end-to-end image processing pipeline,” IEEE Transactions on Image Processing, vol. 28, no. 2, pp. 912–923, 2018.
  51. A. Turpin, I. Vishniakou, and J. d Seelig, “Light scattering control in transmission and reflection with neural networks,” Optics express, vol. 26, no. 23, pp. 30911–30929, 2018.
  52. S. Elmalem, R. Giryes, and E. Marom, “Learned phase coded aperture for the benefit of depth of field extension,” Optics express, vol. 26, no. 12, pp. 15316–15331, 2018.
  53. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Transactions on Graphics (TOG), vol. 37, no. 4, p. 114, 2018.
  54. U. Akpinar, E. Sahin, and A. Gotchev, “Learning wavefront coding for extended depth of field imaging,” arXiv preprint arXiv:1912.13423, 2019.
  55. Y. Wu, V. Boominathan, H. Chen, A. Sankaranarayanan, and A. Veeraraghavan, “Phasecam3d—learning phase masks for passive single view depth estimation,” in 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–12, IEEE, 2019.
  56. J. Chang and G. Wetzstein, “Deep optics for monocular depth estimation and 3d object detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 10193–10202, 2019.
  57. C. A. Metzler, H. Ikoma, Y. Peng, and G. Wetzstein, “Deep optics for single-shot high-dynamic-range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1375–1385, 2020.
  58. Q. Sun, E. Tseng, Q. Fu, W. Heidrich, and F. Heide, “Learning rank-1 diffractive optics for single-shot high dynamic range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1386–1396, 2020.
  59. X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,” Optica, vol. 7, no. 8, pp. 913–922, 2020.
  60. S.-H. Baek, H. Ikoma, D. S. Jeon, Y. Li, W. Heidrich, G. Wetzstein, and M. H. Kim, “End-to-end hyperspectral-depth imaging with learned diffractive optics,” arXiv preprint arXiv:2009.00463, 2020.
  61. M. Kellman, E. Bostan, M. Chen, and L. Waller, “Data-driven design for fourier ptychographic microscopy,” in 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–8, IEEE, 2019.
  62. H. Pinkard, H. Baghdassarian, A. Mujal, E. Roberts, K. H. Hu, D. H. Friedman, I. Malenica, T. Shagam, A. Fries, K. Corbin, et al., “Learned adaptive multiphoton illumination microscopy,” bioRxiv, 2020.
  63. M. D. Lew, S. F. Lee, M. Badieirostami, and W. Moerner, “Corkscrew point spread function for far-field three-dimensional nanoscale localization of pointlike objects,” Optics letters, vol. 36, no. 2, pp. 202–204, 2011.
  64. M. P. Backlund, M. D. Lew, A. S. Backer, S. J. Sahl, G. Grover, A. Agrawal, R. Piestun, and W. Moerner, “Simultaneous, accurate measurement of the 3d position and orientation of single molecules,” Proceedings of the National Academy of Sciences, vol. 109, no. 47, pp. 19087–19092, 2012.
  65. C. Roider, A. Jesacher, S. Bernet, and M. Ritsch-Marte, “Axial super-localisation using rotating point spread functions shaped by polarisation-dependent phase modulation,” Optics express, vol. 22, no. 4, pp. 4029–4037, 2014.
  66. J. Min, S. J. Holden, L. Carlini, M. Unser, S. Manley, and J. C. Ye, “3d high-density localization microscopy using hybrid astigmatic/biplane imaging and sparse image reconstruction,” Biomedical optics express, vol. 5, no. 11, pp. 3935–3948, 2014.
  67. C. Cabriel, N. Bourg, P. Jouchet, G. Dupuis, C. Leterrier, A. Baron, M.-A. Badet-Denisot, B. Vauzeilles, E. Fort, and S. Leveque-Fort, “Combining 3d single molecule localization strategies for reproducible bioimaging,” Nature communications, vol. 10, no. 1, pp. 1–10, 2019.
  68. N. Bourg, C. Mayet, G. Dupuis, T. Barroca, P. Bon, S. Lécart, E. Fort, and S. Lévêque-Fort, “Direct optical nanoscopy with axially localized detection,” Nature Photonics, vol. 9, no. 9, pp. 587–593, 2015.
  69. J. W. Goodman, Introduction to Fourier optics. Roberts and Company Publishers, 2005.
  70. E. Ben-Eliezer, E. Marom, N. Konforti, and Z. Zalevsky, “Experimental realization of an imaging system with an extended depth of field,” Applied Optics, vol. 44, no. 14, pp. 2792–2798, 2005.
  71. E. R. Dowski and W. T. Cathey, “Extended depth of field through wave-front coding,” Applied optics, vol. 34, no. 11, pp. 1859–1866, 1995.
  72. B. Ferdman, E. Nehme, L. E. Weiss, R. Orange, O. Alalouf, and Y. Shechtman, “Vipr: Vectorial implementation of phase retrieval for fast and accurate microscopic pixel-wise pupil estimation,” Optics Express, vol. 28, no. 7, pp. 10179–10198, 2020.
  73. S. M. Kay, Fundamentals of statistical signal processing. Prentice Hall PTR, 1993.
  74. R. J. Ober, S. Ram, and E. S. Ward, “Localization accuracy in single-molecule microscopy,” Biophysical journal, vol. 86, no. 2, pp. 1185–1200, 2004.
  75. L. E. Weiss, T. Naor, and Y. Shechtman, “Observing dna in live cells,” Biochemical Society Transactions, vol. 46, no. 3, pp. 729–740, 2018.
  76. I. Bronshtein, E. Kepten, I. Kanter, S. Berezin, M. Lindner, A. B. Redwood, S. Mai, S. Gonzalo, R. Foisner, Y. Shav-Tal, et al., “Loss of lamin a function increases chromatin dynamics in the nuclear interior,” Nature communications, vol. 6, p. 8044, 2015.
  77. L. E. Weiss, Y. S. Ezra, S. Goldberg, B. Ferdman, O. Adir, A. Schroeder, O. Alalouf, and Y. Shechtman, “Three-dimensional localization microscopy in live flowing cells,” Nature Nanotechnology, pp. 1–7, 2020.
  78. Y. Shechtman, L. E. Weiss, A. S. Backer, M. Y. Lee, and W. Moerner, “Multicolour localization microscopy by point-spread-function engineering,” Nature photonics, vol. 10, no. 9, p. 590, 2016.
  79. Y. Y. Schechner, R. Piestun, and J. Shamir, “Wave propagation with rotating intensity distributions,” Physical Review E, vol. 54, no. 1, p. R50, 1996.
  80. A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM transactions on graphics (TOG), vol. 26, no. 3, pp. 70–es, 2007.
  81. F. Balzarotti, Y. Eilers, K. C. Gwosch, A. H. Gynnå, V. Westphal, F. D. Stefani, J. Elf, and S. W. Hell, “Nanometer resolution imaging and tracking of fluorescent molecules with minimal photon fluxes,” Science, vol. 355, no. 6325, pp. 606–612, 2017.
  82. K. C. Gwosch, J. K. Pape, F. Balzarotti, P. Hoess, J. Ellenberg, J. Ries, and S. W. Hell, “Minflux nanoscopy delivers 3d multicolor nanometer resolution in cells,” Nature methods, vol. 17, no. 2, pp. 217–224, 2020.
  83. M. J. Amin, S. Petry, J. W. Shaevitz, and H. Yang, “Localization precision in chromatic multifocal imaging,” arXiv preprint arXiv:2008.10488, 2020.
  84. H. Ikoma, Y. Peng, M. Broxton, and G. Wetzstein, “Snapshot multi-psf 3d single-molecule localization microscopy using deep learning,” in Computational Optical Sensing and Imaging, pp. CW3B–3, Optical Society of America, 2020.
  85. W. Ouyang, F. Mueller, M. Hjelmare, E. Lundberg, and C. Zimmer, “Imjoy: an open-source computational platform for the deep learning era,” Nature methods, vol. 16, no. 12, pp. 1199–1200, 2019.
  86. E. Gómez-de Mariscal, C. García-López-de Haro, L. Donati, M. Unser, A. Muñoz-Barrutia, and D. Sage, “Deepimagej: A user-friendly plugin to run deep learning models in imagej,” bioRxiv, p. 799270, 2019.
  87. L. Von Chamier, J. Jukkala, C. Spahn, M. Lerche, S. Hernández-Pérez, P. Mattila, E. Karinou, S. Holden, A. C. Solak, A. Krull, et al., “Zerocostdl4mic: an open platform to simplify access and use of deep-learning in microscopy,” BioRxiv, 2020.
  88. C. Zhou, S. Lin, and S. Nayar, “Coded aperture pairs for depth from defocus,” in 2009 IEEE 12th international conference on computer vision, pp. 325–332, IEEE, 2009.
  89. C. Zhou and S. Nayar, “What are good apertures for defocus deblurring?,” in 2009 IEEE international conference on computational photography (ICCP), pp. 1–8, IEEE, 2009.
  90. C. Zhou, S. Lin, and S. K. Nayar, “Coded aperture pairs for depth from defocus and defocus deblurring,” International journal of computer vision, vol. 93, no. 1, pp. 53–72, 2011.
  91. A. Levin, “Analyzing depth from coded aperture sets,” in European Conference on Computer Vision, pp. 214–227, Springer, 2010.
  92. Y. Takeda, S. Hiura, and K. Sato, “Fusing depth from defocus and stereo with coded apertures,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 209–216, 2013.
  93. Y. Gil, S. Elmalem, H. Haim, E. Marom, and R. Giryes, “Monster: Awakening the mono in stereo,” arXiv preprint arXiv:1910.13708, 2019.
  94. S. Hell, G. Reiner, C. Cremer, and E. H. Stelzer, “Aberrations in confocal fluorescence microscopy induced by mismatches in refractive index,” Journal of microscopy, vol. 169, no. 3, pp. 391–405, 1993.
  95. D. Axelrod, “Fluorescence excitation and imaging of single molecules near dielectric-coated and bare surfaces: a theoretical study,” Journal of microscopy, vol. 247, no. 2, pp. 147–160, 2012.
  96. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  97. P. Dufour, M. Piché, Y. De Koninck, and N. McCarthy, “Two-photon excitation fluorescence microscopy with a high depth of field using an axicon,” Applied optics, vol. 45, no. 36, pp. 9246–9252, 2006.
  98. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  99. L. Xiao and T. Zhang, “A proximal stochastic gradient method with progressive variance reduction,” SIAM Journal on Optimization, vol. 24, no. 4, pp. 2057–2075, 2014.
  100. T. Dozat, “Incorporating nesterov momentum into adam,” 2016.
  101. L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” arXiv preprint arXiv:1908.03265, 2019.
  102. S. Gopal, “Adaptive sampling for sgd by exploiting side information,” in International Conference on Machine Learning, pp. 364–372, 2016.
  103. S. Moser, M. Ritsch-Marte, and G. Thalhammer, “Model-based compensation of pixel crosstalk in liquid crystal spatial light modulators,” Optics express, vol. 27, no. 18, pp. 25046–25063, 2019.
  104. A. LeNail, “Nn-svg: Publication-ready neural network architecture schematics,” Journal of Open Source Software, vol. 4, no. 33, p. 747, 2019.
  105. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
  106. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
  107. A. Odena, V. Dumoulin, and C. Olah, “Deconvolution and checkerboard artifacts,” Distill, vol. 1, no. 10, p. e3, 2016.
  108. M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” in Advances in neural information processing systems, pp. 2017–2025, 2015.
  109. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
  110. M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise.,” in Kdd, vol. 96, pp. 226–231, 1996.
  111. J.-Y. Tinevez, N. Perry, J. Schindelin, G. M. Hoopes, G. D. Reynolds, E. Laplantine, S. Y. Bednarek, S. L. Shorte, and K. W. Eliceiri, “Trackmate: An open and extensible platform for single-particle tracking,” Methods, vol. 115, pp. 80–90, 2017.
  112. M. Siemons, C. Hulleman, R. Thorsen, C. Smith, and S. Stallinga, “High precision wavefront control in point spread function engineering for single emitter localization,” Optics express, vol. 26, no. 7, pp. 8397–8416, 2018.
  113. M. Ovesnỳ, P. Křížek, J. Borkovec, Z. Švindrych, and G. M. Hagen, “Thunderstorm: a comprehensive imagej plug-in for palm and storm data analysis and super-resolution imaging,” Bioinformatics, vol. 30, no. 16, pp. 2389–2390, 2014.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description