Partially Coherent Ptychography by Gradient Decomposition of the Probe
Abstract
Coherent ptychographic imaging experiments often discard over 99.9 % of the flux from a light source to define the coherence of an illumination. Even when coherent flux is sufficient, the stability required during an exposure is another important limiting factor. Partial coherence analysis can considerably reduce these limitations. A partially coherent illumination can often be written as the superposition of a single coherent illumination convolved with a separable translational kernel. In this paper we propose the Gradient Decomposition of the Probe (GDP), a model that exploits translational kernel separability, coupling the variances of the kernel with the transverse coherence. We describe an efficient firstorder splitting algorithm GDPADMM to solve the proposed nonlinear optimization problem. Numerical experiments demonstrate the effectiveness of the proposed method with Gaussian and binary kernel functions in flyscan measurements. Remarkably, GDPADMM produces satisfactory results even when the ratio between kernel width and beam size is more than one, or when the distance between successive acquisitions is twice as large as the beam width.
I Introduction
Ptychography is a popular imaging technique in scientific fields as diverse as condensed matter physics (25), cell biology (11), materials science (24) and electronics (15), among others. In a coherent ptychography experiment (Figure 1), a localized coherent Xray probe (or illumination) scans through an specimen , while the detector collects a sequence of phaseless intensities in the far field. The goal is to obtain a high resolution reconstruction of the specimen from the sequence of intensity measurements. In a discrete setting, is a 2D image with pixels, is a localized 2D probe with pixels ( and are both written as a vector by a lexicographical order), and is a stack of phaseless measurements . Here represents the elementwise absolute value of a vector, denotes the elementwise multiplication, and denotes the normalized 2 dimensional discrete Fourier transform. Each is a binary matrix that crops a region of size from the image .
In practice, as the probe is almost never completely known, one has to solve a blind ptychographic phase retrieval problem or probe retrieval (26), as follows:
(1) 
where bilinear operators and , are denoted as follows:
and
Coherent ptychographic imaging experiments often rely on apertures to define a coherent illumination. Research institutions around the world are investing considerable resources to produce brighter xray sources in order to overcome this limitation. Meanwhile, most of the xray photons generated are currently discarded by secondary apertures. Even when there is enough coherent flux, the stability required during an exposure is often another limiting factor. Both flux and stability limitations can be reduced using partial coherence analysis (21); (7); (16).
First, we briefly review the existing algorithmic work for partial coherence. In (10) the authors applied a gradient descent phase retrieval algorithm from incoherently averaged illuminations to compute the aberration of the Hubble space telescope. In (5), the authors considered a constant quasimonochromatic illumination on the sample (beam much larger than the sample), and derived a convolution based model using the mutual optical intensity as:
(2) 
where is the measured partial coherent intensity, is the coherent intensity, denotes the convolution operator, and is the kernel function (Fourier transform of the complex coherence function). Physically, the convolution kernel represents the combination of the detector response function and the angular spread of the illumination. It was solved by the alternating projection algorithm in (5), when is a Gaussian kernel function with free horizontal and vertical coherence length parameters. Burdet et al. (2) applied the above model to ptychographic imaging and proposed the DouglasRachford algorithm with a known Gaussian kernel function.
A more general description of a partially coherent wave field illuminating a specimen in a ptychography experiment was considered in (27), where the authors employed an orthogonal decomposition of the mutual coherence function (30) to describe partially coherent measurement as follows:
(3) 
with orthogonal probes , where is the measured intensity. The extended ptychographic iterative engine (18) was adopted to solve such model (3). Experimental ’flyscan’ data with translational blur was successfully reconstructed using (3) by (21); (7); (16). However, it is important to note that such blur is a special case of a model that has many more degrees of freedom. Moreover, the physical interpretation of the multiple modes is unclear, and relationship with the coherence function is indirect.
Other existing studies focus mostly on (2), which only addressed blurring coherent intensities at the detector. In practice, the blur is dominated by the source dimensions and by the translation of the probe with respect to the image during an exposure (21); (7). To the best of our knowledge, there is no algorithm in the literature to jointly recover the unknown image, probe, and coherence kernel function that exploits such property.
In this paper, we propose Gradient Decomposition of the Probe (GDP), a new forward model to characterize the partially coherent ptychography problem. The new model is based on coupling the experimental coherence widths with the variances of the kernel functions using the second order Taylor expansion to translate the probe. In the second part of this work we present GDPADMM: a novel fast iterative solver that jointly optimizes the image, the probe and the variance of the kernel function. The main benefits of the algorithm are listed below: \setlistnolistsep
 [noitemsep]

The approximation accuracy for a general partial coherent source is increased, while providing its coherent properties.

Subproblems can be solved using low computational and memory resources, and usually have close form.

It is insensitive to the coherence kernel and scanning step sizes, and achieves high SNR even when the data is contaminated by Poisson noise.
Numerical experiments show that satisfactory results with GDPADMM can be obtained even when the ratio between kernel and beam widths is more than one, or when the distance between successive acquisitions is almost twice as large as the beam width (Full Width Half MaximumFWHM).
Ii Approximate forward model for partial coherence
Partially coherent illumination from standard microscopes can be written as the superposition of a single quasimonochromatic coherent illumination convolved with separable angular and translational kernels (13); (17); (22); (6). Translational convolution of the illumination is equivalent to translational motion during a scan while exposing the detector (7). In other words, an extended incoherent source upstream of the lens can be viewed as the superposition, photonbyphoton, of a rapidly moving source demagnified by the lens onto the image plane. The demagnified source defines the degree of coherence of the probe, or the blurring kernel. Coherence and vibrations kernels can be combined into one, such that partially coherent ptychography imaging with coherence kernel function in a continuous setting (same notations as in the discrete setting) is formulated as:
(4) 
where is the measured partial coherent intensity and is the normalized Fourier transform. When setting to a binary function, the above model is exactly the same as flyscan ptychography (21); (7); (16); Setting to the Dirac delta function reduces it to the coherent model (1). We remark that (4) is quite different from (2), since (4) illustrates the effects of blurring of images with respect to the probe, while (2) can be interpreted as blurring or binning multiple pixels at the detector.
Generally speaking, solving Eq. (4) is a nonlinear illposed problem with an unknown kernel and there is no fast method to even compute the integral on the right hand side with known kernel, probe and images. In (20); (28), the authors considered probes translated with weights :
(5) 
with the translation operators and partially coherent intensity However, those methods in (20); (28) cannot be directly applied to (4), since either the weights (28) or the probe (20) are assumed to be known in advance. Moreover, as coherence degrades, or the number of probes increases, their computation and memory costs would increase dramatically. Hence instead of solving (4) directly as in (20), in the following sections we will reformulate the above model (4), and solve the related nonlinear optimization problem.
ii.1 Gradient Decomposition of the Probe (Gdp)
Following (4) and using the Taylor expansion of , one has:
where we assume that the third order derivatives are uniformly bounded, e.g.
(6) 
with a positive constant It is easy to verify that this condition is satisfied when the illumination is generated by a small lens (with small aperture).
Consider a kernel function characterized by its moment expansion , normalized with , center of mass , and second order moments , , and (). We further assume that:
(7) 
The above relation holds if the kernel function is centrosymmetric with respect to the origin, i.e. . For higher order moments, we also assume that
(8) 
Therefore, one has:
(9) 
More details of the above derivation can be found in Appendix B. In order to further simplify the partial coherence approximation we introduce a new variable (variance adjusted probe):
and nonlinear operator :
(10) 
where . Neglecting high order terms , following (9), we obtain the approximate forward model:
(11) 
Remarks:

If the probe has a support by a lens of finite size, then the same support applies to the variance adjusted probe due to the Fourier transform relationship:

Similarly to the decomposition model (3), GDP has three different modes . The difference between them is that in the GDP model, two modes can be expressed by the first mode explicitly, while the multiple modes in the decomposition model (3) are only assumed to be orthogonal to each other (30); (27). Such orthogonal constraint is essentially nonconvex, which is more difficult to handle and leads to more local minima as well.
Iii Fast iterative algorithm: GdpAdmm
The amplitude based nonlinear optimization model can be established as:
(12) 
where denotes the standard norm in Euclidean space. We remark that by introducing the new variable , it is much easier to solve the subproblems of the nonlinear optimization model (12) with respect to than using the original formula (9).
The GDP based nonlinear optimization model (12) is nonconvex and nondifferentiable. We are interested in designing a fast first order algorithm, whose subproblems can be easily implemented. The Alternating Direction Method of Multipliers (ADMM) (12) has been successfully applied to large scale nonlinear and nondifferential optimization problems arising from machine learning or computer vision, among other areas. The connection between ADMM, the DouglasRachford algorithm, and the popular Hybrid Input Output (9) for classical phase retrieval was discussed in (1); (29). By introducing auxiliary variables , one has:
(13) 
The corresponding augmented Lagrangian reads as:
(14) 
with multipliers , , and the positive parameter where denotes the standard inner product in complex Euclidean space, and denotes the real part of a complex number. Briefly, the GDPADMM algorithm alternates minimization of the above augmented Lagrangian with respect to the variables , updates the multipliers , and repeats. In our previous work, ADMM was applied to coherent ptychographic imaging in (29); (4). Here we propose a new, more robust variant of ADMM. It employs an additional proximal term added to the augmented Lagrangian: to avoid division by zeros when minimizing with respect to . The detailed description of GDPADMM can be found in Algorithm 1 (further details in Appendix A). The numerical results can be found in the following section. For simplicity, the gradient operator in the numerical section is considered in a discrete setting, using the forward and finite difference operator with respect to directions.
Theoretical analysis about the convergence properties of this blind
algorithm (to refine the probe, the image, and the coherence function
variances during iterations) is likely to be very challenging.
However, if the variances are known or fixed, convergence to the
stationary point of the nonlinear optimization model (12)
can be guaranteed by assuming that the iterative sequence is bounded
and the parameter is sufficiently large (4).
Compute as: Update the multipliers as
Iv Numerical experiments
The experimental setup of this section is introduced next. The reference specimen used is a complex valued image “Goldballs” (19), of size pixels (Figure 2). The probe is an Airy disk of size pixels (Figure 2).
We compare the proposed GDPADMM (Appendix A) with the “fullycoherent” model by ADMM (FCADMM (4)). Both algorithms are executed with a maximum iteration number of 300. We measure the quality of recovery images by relative residuals as:
for FCADMM and GDPADMM respectively, where are the iterative solutions, and is the measured intensity. The signaltonoise ratio (SNR) is also measured:
where is the ground truth of the image.
We show the performance of GDPADMM with the following Gaussian kernel function:
(18) 
The discrete truncated Gaussian matrix is generated by fspecial in Matlab with variances and support sizes . We remark that the kernel width is .
Simulated data is generated using raster grid scanning with sliding distance pixels (slightly bigger than the beam width) as a default, and we incorporate the support constraint of the lens as in (19). The parameter is selected manually with default value 0.2, and .
Noiseless data.
Following (4), the partially coherent intensity in a discrete setting is generated as
(19) 
with translation operator , discrete Gaussian weights , and periodical boundary condition for the probe. We conduct the first numerical experiment to explore the performance of the proposed algorithm with respect to different degrees of partial coherence by varying the variances , while keeping the beam width constant (the smaller the variances, the more coherent the data).
FCADMM  
GDPADMM  
FCADMM  
GDPADMM 
The reconstructed images can be seen in Figure 3, and the accuracy of the reconstructed results can be seen in Table 1. Figure 3 shows significant improvements when using GDPADMM compared to FCADMM. When coherence is very low (4th column), the visual quality of FCADMM drops, and smallscale features are completely lost. In comparison, with the same configuration, GDPADMM can still produce images containing significantly sharper large and smallscale features. However the results by GDPADMM seem blurry when the kernel variance becomes too large, as can be seen from the fourth column of Figure (3). Table (1) shows the enhanced accuracy of GDPADMM compared to the coherent model: residuals are at least 50% percent smaller, and images with SNRs about twice as high. We also include the results of the recovered probe with its modes in Figure 3.
(2,2)  (3,3)  (4,4)  (5,5)  
1.36E1  1.88E1  2.11E1  2.20E1  
2.60E2  5.68E2  8.95E2  1.14E1  
14.78  9.53  6.14  4.37  
23.73  18.83  13.02  8.37  
(4,0)  (5,0)  (6,0)  (7,0)  
1.70E1  1.89E1  2.04E1  2.14E1  
5.62E2  7.87E2  9.76E2  1.12E1  
11.47  9.42  7.61  5.64  
19.83  16.39  13.46  10.94 
Noisy data.
We consider Poisson noisy measurements as
where is used to control the noise level, and is the clean partial coherent intensity data defined by (19). Figure 4 shows the recovered images at different noise levels. It can be seen that, as the noise level decreases (increasing ), the quality of the results increases. We remark that when , the image in Figure 4 (a) is very blurry, which implies that it is more challenging to recover high quality images from partially coherent data contaminated by strong noise.
(a) =7.08  (b)10.17  (c) 11.30  (d) 12.35 
Parameter and sliding distances.
The following experiment is conducted to study the influence of parameter by varying . See the convergence curve in Figure 5, where the change histories of the relative errors and SNRs with respect to iteration numbers are reported. One can readily see that with smaller parameter , the algorithm tends to be unstable (see Figure 5 (b)). It is consistent with the condition for convergence guarantee in (4), where the parameter should be sufficiently large to make the augmented Lagrangian monotonically decrease. If the parameter is too big, the iterative solution could be trapped into the unsatisfactory local minima. Therefore, to gain optimal performance, a moderate should be used. We remark that although the parameter is selected manually, it can be applied to diverse cases with different degrees of partial coherence and sliding distances when fixing the kernel function.
(a)  (b) 
In Figure 6 we conduct the tests with different sliding distances , which determine the redundancy levels of the data. On one hand, inferred from the reported results, scanning with a smaller step size or will help to increase the quality of the images. On the other hand, when ( almost twice as large as the beam width), GDPADMM can still produce satisfactory results with clear smallscale features, showing the robustness of GDPADMM with respect to the redundancy of the measured data.
=4  6  8  12 
SNR=15.93  14.54  13.02  8.96 
Proximal term.
In the following experiment we show the effect of the proximal term. We conduct tests comparing GDPADMM baseline with a variant that replaces the proximal term with a penalization term , i.e.
with identity matrix and when solving Eq. (24), Appendix A. See the results in Figure 7. One can readily observe that with the help of the proximal term, the recovered images have cleaner boundaries (Figure 7 (b)) and higher SNR values (Figure 7 (d)). Moreover, GDPADMM is speeded up when using the proximal term, as shown by the convergence of the relative errors in Figure 7 (c).
(a)  (b) 
(c)  (d) 
FCADMM  
GDPADMM  
13  15  17 
Different kernels.
Previous experiments reported in this section are based on the Gaussian kernel function. Here we assess the performance of GDPADMM when using a binary kernel function for flyscan ptychography (7). Figure 8 shows the reconstructed images obtained by varying the kernel size . Similarly to the case of Gaussian kernel function, GDPADMM displays a significant increase in visual quality. In the first row of Figure 8, the results produced by FCADMM are completely blurry. On the other hand, GDPADMM (second row of Figure 8) achieves much sharper overall results. One can also see the obvious decrease of the residual and increase of SNRs by GDPADMM in Table 2.
Similar improvements by GDPADMM can be obtained with other motion blur type kernel functions, however, due to page limitations, we do not provide further results. These results show that GDPADMM can be applied to partial coherence problems with more general kernel function.
11  13  15  17  

2.00E1  2.16E1  2.29E1  2.37E1  
4.47E2  6.46E2  9.04E2  1.16E1  
6.19  5.60  7.15  6.71  
21.15  18.07  14.63  11.58 
Runtime and memory performance
Finally we report the the computational performance of the GDPADMM algorithm on a machine with an Intel i75600U CPU and 16G RAM using Matlab. GDPADMM requires two additional modes and four additional variables and compared with FCADMM . Because of this, the memory cost and runtime of GDPADMM are in theory about three times as large as those of FCADMM per iteration. When computing the image in Figure 2, GDPADMM requires an average of 873MB of RAM and takes 655 seconds to compute, whereas FCADMM requires 344MB ad takes 218 seconds. These results are consistent with the previous theoretical estimate.
We further investigate the change histories of the SNRs for FCADMM and GDPADMM with respect to elapsed the time, and report the results in Figure 9. The SNR histories show that, FCADMM can recover better images in the first 50 seconds, but GDPADMM improves the image further after that. Hence, in order to accelerate the GDPADMM algorithm, we could use the iterative solution of FCADMM as the initialization for GDPADMM. We also emphasize that the runtime and memory requirements GDPADMM are insensitive to the variances and the support sizes of the kernel functions. It is important to note that, if we solve the problem directly following (20), the probe and the weights in (5) in the case of the setting in Figure 3 (a)(b), at most translated probes for should be introduced, which requires much more memory and computation costs.
V Conclusions
In this paper, we propose Gradient Decomposition of the Probe (GDP), an efficient model that exploits translational kernel separability. The GDP model increases the approximation accuracy compared with the coherent model, and it holds for a general partial coherent source. We derive an optimization model coupling the variances of the kernel with the transverse coherence widths, and a fast and memory efficient proximal first order GDPADMM algorithm to solve the nonlinear optimization problem based on GDP. Numerical experiments demonstrate the effectiveness of such approximation in the case of Gaussian kernel function, and binary functions in flyscanning schemes, showing that the proposed methods are suitable for general partially coherent sources.
However the results by GDPADMM seem blurry when inferred from the fourth column of Figure 3, and further improvement may be achieved by considering sparse prior, or using piecewise Taylor expansion series or other integration schemes. This will be subject of future work. Additional research directions to extend this work are: (i) consider partial coherence with framewise different variances when using scan geometries on irregular grids or spirals, (ii) combine translational blur at the sample with detector blur in Eq. (2) for near field ptychography. (ii) incorporate longitudinal coherence when using broadband illumination with chromatic aberrations of the probe, dispersion of the diffraction pattern (8), and spectral KramersKronig dispersion across resonances (14).
Vi Acknowledgments
This work was partially funded by the Center for Applied Mathematics for Energy Research Applications, a joint ASCRBES funded project within the Office of Science, US Department of Energy, under contract number DOEDEAC0376SF00098, and by the Advanced Light Source, which is a DOE Office of Science User Facility under contract no. DEAC0205CH11231.
Appendix A GdpAdmm
Once the variables are set, in the iteration the augmented Lagrangian is modified by adding a proximal term as follows:
where for simplicity we removed superscripts and subscripts, and is a positive semidefinite matrix defined by the approximated solution in the iteration. We remark that with the help of an additional proximal term , the subproblem in Step 2 admits a unique solution. Below we demonstrate how to solve each subproblem.
(20)  
Step 1 involves finding the first order stationary point of the augmented Lagrangian which has the following form:
(21) 
with symmetric sparse matrix defined as:
(22) 
and In this section, the gradient operator is given in a discrete setting for simplicity, where represent the forward finite difference operator with respect to directions.
Step 2 is similar to Step 1, the closed form solution can be expressed as:
(23) 
with diagonal matrix :
(24) 
Empirically, in order to avoid divisions by 0 when solving (23), set the matrix to a diagonal matrix with diagonal elements:
(25) 
Step 3 can be expressed as the following problem:
which can be solved independently with respect to each frame, hence one needs to consider:
where Similarly, it also has a closed form solution, see Eq. (17), Section III.
Step 4 requires solving a linear least square problem whose closed form solution is: