# Partially Coherent Ptychography by Gradient Decomposition of the Probe

## Abstract

Coherent ptychographic imaging experiments often discard over 99.9 % of the flux from a light source to define the coherence of an illumination. Even when coherent flux is sufficient, the stability required during an exposure is another important limiting factor. Partial coherence analysis can considerably reduce these limitations. A partially coherent illumination can often be written as the superposition of a single coherent illumination convolved with a separable translational kernel. In this paper we propose the Gradient Decomposition of the Probe (GDP), a model that exploits translational kernel separability, coupling the variances of the kernel with the transverse coherence. We describe an efficient first-order splitting algorithm GDP-ADMM to solve the proposed nonlinear optimization problem. Numerical experiments demonstrate the effectiveness of the proposed method with Gaussian and binary kernel functions in fly-scan measurements. Remarkably, GDP-ADMM produces satisfactory results even when the ratio between kernel width and beam size is more than one, or when the distance between successive acquisitions is twice as large as the beam width.

## I Introduction

Ptychography is a popular imaging technique in scientific fields as diverse as condensed matter physics (25), cell biology (11), materials science (24) and electronics (15), among others. In a coherent ptychography experiment (Figure 1), a localized coherent X-ray probe (or illumination) scans through an specimen , while the detector collects a sequence of phaseless intensities in the far field. The goal is to obtain a high resolution reconstruction of the specimen from the sequence of intensity measurements. In a discrete setting, is a 2D image with pixels, is a localized 2D probe with pixels ( and are both written as a vector by a lexicographical order), and is a stack of phaseless measurements  . Here represents the element-wise absolute value of a vector, denotes the element-wise multiplication, and denotes the normalized 2 dimensional discrete Fourier transform. Each is a binary matrix that crops a region of size from the image .

In practice, as the probe is almost never completely known, one has to solve a blind ptychographic phase retrieval problem or probe retrieval (26), as follows:

 To find ~{}ω∈C¯m~{}and~{}u∈Cn,  s.t.  |A(ω,u)|2=f, (1)

where bilinear operators and , are denoted as follows:

 A(ω,u):=(AT0(ω,u),AT1(ω,u),⋯,ATJ−1(ω,u))T,Aj(ω,u):=F(ω∘Sju),

and

Coherent ptychographic imaging experiments often rely on apertures to define a coherent illumination. Research institutions around the world are investing considerable resources to produce brighter x-ray sources in order to overcome this limitation. Meanwhile, most of the x-ray photons generated are currently discarded by secondary apertures. Even when there is enough coherent flux, the stability required during an exposure is often another limiting factor. Both flux and stability limitations can be reduced using partial coherence analysis (21); (7); (16).

First, we briefly review the existing algorithmic work for partial coherence. In (10) the authors applied a gradient descent phase retrieval algorithm from incoherently averaged illuminations to compute the aberration of the Hubble space telescope. In (5), the authors considered a constant quasi-monochromatic illumination on the sample (beam much larger than the sample), and derived a convolution based model using the mutual optical intensity as:

 fpc=f∗κ, (2)

where is the measured partial coherent intensity, is the coherent intensity, denotes the convolution operator, and is the kernel function (Fourier transform of the complex coherence function). Physically, the convolution kernel represents the combination of the detector response function and the angular spread of the illumination. It was solved by the alternating projection algorithm in (5), when is a Gaussian kernel function with free horizontal and vertical coherence length parameters. Burdet et al. (2) applied the above model to ptychographic imaging and proposed the Douglas-Rachford algorithm with a known Gaussian kernel function.

A more general description of a partially coherent wave field illuminating a specimen in a ptychography experiment was considered in (27), where the authors employed an orthogonal decomposition of the mutual coherence function (30) to describe partially coherent measurement as follows:

 fpc,j=∑L−1l=0|Aj(ωl,u)|2, ∀0≤j≤J−1, (3)

with orthogonal probes , where is the measured intensity. The extended ptychographic iterative engine (18) was adopted to solve such model (3). Experimental ’fly-scan’ data with translational blur was successfully reconstructed using (3) by (21); (7); (16). However, it is important to note that such blur is a special case of a model that has many more degrees of freedom. Moreover, the physical interpretation of the multiple modes is unclear, and relationship with the coherence function is indirect.

Other existing studies focus mostly on (2), which only addressed blurring coherent intensities at the detector. In practice, the blur is dominated by the source dimensions and by the translation of the probe with respect to the image during an exposure (21); (7). To the best of our knowledge, there is no algorithm in the literature to jointly recover the unknown image, probe, and coherence kernel function that exploits such property.

In this paper, we propose Gradient Decomposition of the Probe (GDP), a new forward model to characterize the partially coherent ptychography problem. The new model is based on coupling the experimental coherence widths with the variances of the kernel functions using the second order Taylor expansion to translate the probe. In the second part of this work we present GDP-ADMM: a novel fast iterative solver that jointly optimizes the image, the probe and the variance of the kernel function. The main benefits of the algorithm are listed below: \setlistnolistsep

[noitemsep]
1. The approximation accuracy for a general partial coherent source is increased, while providing its coherent properties.

2. Subproblems can be solved using low computational and memory resources, and usually have close form.

3. It is insensitive to the coherence kernel and scanning step sizes, and achieves high SNR even when the data is contaminated by Poisson noise.

Numerical experiments show that satisfactory results with GDP-ADMM can be obtained even when the ratio between kernel and beam widths is more than one, or when the distance between successive acquisitions is almost twice as large as the beam width (Full Width Half Maximum-FWHM).

## Ii Approximate forward model for partial coherence

Partially coherent illumination from standard microscopes can be written as the superposition of a single quasi-monochromatic coherent illumination convolved with separable angular and translational kernels (13); (17); (22); (6). Translational convolution of the illumination is equivalent to translational motion during a scan while exposing the detector (7). In other words, an extended incoherent source upstream of the lens can be viewed as the superposition, photon-by-photon, of a rapidly moving source demagnified by the lens onto the image plane. The demagnified source defines the degree of coherence of the probe, or the blurring kernel. Coherence and vibrations kernels can be combined into one, such that partially coherent ptychography imaging with coherence kernel function in a continuous setting (same notations as in the discrete setting) is formulated as:

 ∫|Fx→q(Sju(x)ω(x−y))|2κ(y)dy=fpc,j(q), (4)

where is the measured partial coherent intensity and is the normalized Fourier transform. When setting to a binary function, the above model is exactly the same as fly-scan ptychography (21); (7); (16); Setting to the Dirac delta function reduces it to the coherent model (1). We remark that (4) is quite different from (2), since (4) illustrates the effects of blurring of images with respect to the probe, while (2) can be interpreted as blurring or binning multiple pixels at the detector.

Generally speaking, solving Eq. (4) is a non-linear ill-posed problem with an unknown kernel and there is no fast method to even compute the integral on the right hand side with known kernel, probe and images. In (20); (28), the authors considered probes translated with weights :

 fpc,j=L−1∑l=0wl|Aj(Tlω,u)|2, ∀0≤j≤J−1, (5)

with the translation operators and partially coherent intensity However, those methods in (20); (28) cannot be directly applied to (4), since either the weights (28) or the probe (20) are assumed to be known in advance. Moreover, as coherence degrades, or the number of probes increases, their computation and memory costs would increase dramatically. Hence instead of solving (4) directly as in (20), in the following sections we will reformulate the above model (4), and solve the related nonlinear optimization problem.

### ii.1 Gradient Decomposition of the Probe (Gdp)

Following (4) and using the Taylor expansion of , one has:

 fpc,j(q)=∫∣∣Fx→q(Sju(x)(ω(x)−yT∇ω(x)+12yT∇2ω(x)y+O(|y|3)))∣∣2κ(y)dy,

where we assume that the third order derivatives are uniformly bounded, e.g.

 max{α∈N2: |α|=3}maxx∈C2|∂αω(x)|≤C, (6)

with a positive constant It is easy to verify that this condition is satisfied when the illumination is generated by a small lens (with small aperture).

Consider a kernel function characterized by its moment expansion , normalized with , center of mass , and second order moments , , and (). We further assume that:

 mi1,i2=0,if~{}mod(i1+i2,2)=1. (7)

The above relation holds if the kernel function is centro-symmetric with respect to the origin, i.e. . For higher order moments, we also assume that

 ∫|y|kκ(y)=O(max{σ1,σ2}k),k≥3. (8)

Therefore, one has:

 fpc,j(q)=|Aj(ω+12(σ21∇11ω+σ22∇22ω+2σ12∇12ω),u)|2+σ21|Aj(∇1ω,u)|2+σ22|Aj(∇2ω,u)|2+O(∫|y|3κ(y)dy). (9)

More details of the above derivation can be found in Appendix B. In order to further simplify the partial coherence approximation we introduce a new variable (variance adjusted probe):

 ~ω:=ω+12(σ21∇11ω+σ22∇22ω+2σ12∇12ω),

and nonlinear operator :

 Gj(ω,u,σ):=|Aj(ω,u)|2+σ21|Aj(∇1ω,u)|2+σ22|Aj(∇2ω,u)|2, (10)

where . Neglecting high order terms , following (9), we obtain the approximate forward model:

 GDP:  fpc,j≈Gj(~ω,u,σ). (11)

#### Remarks:

• The above assumptions hold if is a Gaussian kernel function, which well approximates standard light sources such as synchrotrons (17); (22); (6), SASE FELs (23), and others. When the above formula (11) reduces to the coherent model in (1).

• The variance adjusted probe incorporates second order effects (such as the covariance ) into the illumination, while reducing the cost of computing the intensities using the forward model (11) compared to (5).

• If the probe has a support by a lens of finite size, then the same support applies to the variance adjusted probe due to the Fourier transform relationship:

 (F~ω)(q)=F((1+12σ21∇11+12σ22∇22+σ12∇12)ω)(q)=(1−12σ21q21−12σ22q22−σ12q1q2)(Fω)(q),∀ q=(q1,q2)∈R2.
• Similarly to the decomposition model (3), GDP has three different modes . The difference between them is that in the GDP model, two modes can be expressed by the first mode explicitly, while the multiple modes in the decomposition model (3) are only assumed to be orthogonal to each other (30); (27). Such orthogonal constraint is essentially nonconvex, which is more difficult to handle and leads to more local minima as well.

## Iii Fast iterative algorithm: Gdp-Admm

The amplitude based nonlinear optimization model can be established as:

 min~ω,u,σ12J−1∑j=0∥∥√fpc,j−√Gj(~ω,u,σ)∥∥2, (12)

where denotes the standard norm in Euclidean space. We remark that by introducing the new variable , it is much easier to solve the subproblems of the nonlinear optimization model (12) with respect to than using the original formula (9).

The GDP based nonlinear optimization model (12) is nonconvex and non-differentiable. We are interested in designing a fast first order algorithm, whose subproblems can be easily implemented. The Alternating Direction Method of Multipliers (ADMM) (12) has been successfully applied to large scale nonlinear and non-differential optimization problems arising from machine learning or computer vision, among other areas. The connection between ADMM, the Douglas-Rachford algorithm, and the popular Hybrid Input Output (9) for classical phase retrieval was discussed in (1); (29). By introducing auxiliary variables , one has:

 min~ω,u,σ,zl,j12J∑j=0∥∥ ∥∥√fpc,j− ⎷2∑l=0|zl,j|2∥∥ ∥∥2,s.t.z0,j=Aj(~ω,u),z1,j=σ1Aj(∇1~ω,u),z2,j=σ2Aj(∇2~ω,u), 0≤j≤J−1. (13)

The corresponding augmented Lagrangian reads as:

 Lr(~ω,u,{zl}2l=0,σ,{Λl}2l=0):=12∑j∥∥∥√fpc,j−√∑l|zl,j|2∥∥∥2+rR(⟨z0−A(~ω,u),Λ0⟩)+r2∥z0−A(~ω,u)∥2+rR(⟨z1−A(σ1∇1~ω,u),Λ1⟩)+r2∥z1−A(σ1∇1~ω,u)∥2+rR(⟨z2−A(σ2∇2~ω,u),Λ2⟩)+r2∥z2−A(σ2∇2~ω,u)∥2, (14)

with multipliers , , and the positive parameter where denotes the standard inner product in complex Euclidean space, and denotes the real part of a complex number. Briefly, the GDP-ADMM algorithm alternates minimization of the above augmented Lagrangian with respect to the variables , updates the multipliers , and repeats. In our previous work, ADMM was applied to coherent ptychographic imaging in (29); (4). Here we propose a new, more robust variant of ADMM. It employs an additional proximal term added to the augmented Lagrangian: to avoid division by zeros when minimizing with respect to . The detailed description of GDP-ADMM can be found in Algorithm 1 (further details in Appendix A). The numerical results can be found in the following section. For simplicity, the gradient operator in the numerical section is considered in a discrete setting, using the forward and finite difference operator with respect to directions.

Theoretical analysis about the convergence properties of this blind algorithm (to refine the probe, the image, and the coherence function variances during iterations) is likely to be very challenging. However, if the variances are known or fixed, convergence to the stationary point of the nonlinear optimization model (12) can be guaranteed by assuming that the iterative sequence is bounded and the parameter is sufficiently large (4).

Algorithm 1: GDP-ADMM Initialization: Set maximum iteration number parameters and . output:x and for to Compute by solving the linear system: (15) using conjugate gradient method, where , and are given in Eq. (22) (Appendix A). Compute as (16) with diagonal matrices , and vector defined in Eq. (24), and (25) (see Appendix A). Compute as: (17) with
Compute as: Update the multipliers as

## Iv Numerical experiments

The experimental setup of this section is introduced next. The reference specimen used is a complex valued image “Goldballs” (19), of size pixels (Figure 2). The probe is an Airy disk of size pixels (Figure 2).

We compare the proposed GDP-ADMM (Appendix A) with the “fully-coherent” model by ADMM (FC-ADMM (4)). Both algorithms are executed with a maximum iteration number of 300. We measure the quality of recovery images by relative residuals as:

 efc=∥∥√f−|A(ωk,uk)|∥∥1∥∥√f∥∥1,epc=∥∥∥√f−√G(~ωk,uk,σk)∥∥∥1∥∥√f∥∥1,

for FC-ADMM and GDP-ADMM respectively, where are the iterative solutions, and is the measured intensity. The signal-to-noise ratio (SNR) is also measured:

 SNR(uk,utrue)=−20minc∈Clog∥uk−c utrue∥∥uk∥,

where is the ground truth of the image.

We show the performance of GDP-ADMM with the following Gaussian kernel function:

 κ(y):=12πσ1σ2exp(−y212σ21−y222σ22). (18)

The discrete truncated Gaussian matrix is generated by fspecial in Matlab with variances and support sizes . We remark that the kernel width is .

Simulated data is generated using raster grid scanning with sliding distance pixels (slightly bigger than the beam width) as a default, and we incorporate the support constraint of the lens as in (19). The parameter is selected manually with default value 0.2, and .

#### Noiseless data.

Following (4), the partially coherent intensity in a discrete setting is generated as

 fpc,j=∑iκi∣∣F((Tiω)∘Sju)∣∣2, (19)

with translation operator , discrete Gaussian weights , and periodical boundary condition for the probe. We conduct the first numerical experiment to explore the performance of the proposed algorithm with respect to different degrees of partial coherence by varying the variances , while keeping the beam width constant (the smaller the variances, the more coherent the data).

The reconstructed images can be seen in Figure 3, and the accuracy of the reconstructed results can be seen in Table 1. Figure 3 shows significant improvements when using GDP-ADMM compared to FC-ADMM. When coherence is very low (4th column), the visual quality of FC-ADMM drops, and small-scale features are completely lost. In comparison, with the same configuration, GDP-ADMM can still produce images containing significantly sharper large- and small-scale features. However the results by GDP-ADMM seem blurry when the kernel variance becomes too large, as can be seen from the fourth column of Figure (3). Table (1) shows the enhanced accuracy of GDP-ADMM compared to the coherent model: residuals are at least 50% percent smaller, and images with SNRs about twice as high. We also include the results of the recovered probe with its modes in Figure 3.

#### Noisy data.

We consider Poisson noisy measurements as

 fnoisepc,j=1γPoisson(γfpc,j),

where is used to control the noise level, and is the clean partial coherent intensity data defined by (19). Figure 4 shows the recovered images at different noise levels. It can be seen that, as the noise level decreases (increasing ), the quality of the results increases. We remark that when , the image in Figure 4 (a) is very blurry, which implies that it is more challenging to recover high quality images from partially coherent data contaminated by strong noise.

#### Parameter r and sliding distances.

The following experiment is conducted to study the influence of parameter by varying . See the convergence curve in Figure 5, where the change histories of the relative errors and SNRs with respect to iteration numbers are reported. One can readily see that with smaller parameter , the algorithm tends to be unstable (see Figure 5 (b)). It is consistent with the condition for convergence guarantee in (4), where the parameter should be sufficiently large to make the augmented Lagrangian monotonically decrease. If the parameter is too big, the iterative solution could be trapped into the unsatisfactory local minima. Therefore, to gain optimal performance, a moderate should be used. We remark that although the parameter is selected manually, it can be applied to diverse cases with different degrees of partial coherence and sliding distances when fixing the kernel function.

In Figure 6 we conduct the tests with different sliding distances , which determine the redundancy levels of the data. On one hand, inferred from the reported results, scanning with a smaller step size or will help to increase the quality of the images. On the other hand, when ( almost twice as large as the beam width), GDP-ADMM can still produce satisfactory results with clear small-scale features, showing the robustness of GDP-ADMM with respect to the redundancy of the measured data.

#### Proximal term.

In the following experiment we show the effect of the proximal term. We conduct tests comparing GDP-ADMM baseline with a variant that replaces the proximal term with a penalization term , i.e.

 uk+1=(Nk2+τrI)−1bk,

with identity matrix and when solving Eq. (24), Appendix A. See the results in Figure 7. One can readily observe that with the help of the proximal term, the recovered images have cleaner boundaries (Figure 7 (b)) and higher SNR values (Figure 7 (d)). Moreover, GDP-ADMM is speeded up when using the proximal term, as shown by the convergence of the relative errors in Figure 7 (c).

#### Different kernels.

Previous experiments reported in this section are based on the Gaussian kernel function. Here we assess the performance of GDP-ADMM when using a binary kernel function for fly-scan ptychography (7). Figure 8 shows the reconstructed images obtained by varying the kernel size . Similarly to the case of Gaussian kernel function, GDP-ADMM displays a significant increase in visual quality. In the first row of Figure 8, the results produced by FC-ADMM are completely blurry. On the other hand, GDP-ADMM (second row of Figure 8) achieves much sharper overall results. One can also see the obvious decrease of the residual and increase of SNRs by GDP-ADMM in Table 2.

Similar improvements by GDP-ADMM can be obtained with other motion blur type kernel functions, however, due to page limitations, we do not provide further results. These results show that GDP-ADMM can be applied to partial coherence problems with more general kernel function.

#### Runtime and memory performance

We further investigate the change histories of the SNRs for FC-ADMM and GDP-ADMM with respect to elapsed the time, and report the results in Figure 9. The SNR histories show that, FC-ADMM can recover better images in the first 50 seconds, but GDP-ADMM improves the image further after that. Hence, in order to accelerate the GDP-ADMM algorithm, we could use the iterative solution of FC-ADMM as the initialization for GDP-ADMM. We also emphasize that the runtime and memory requirements GDP-ADMM are insensitive to the variances and the support sizes of the kernel functions. It is important to note that, if we solve the problem directly following (20), the probe and the weights in (5) in the case of the setting in Figure 3 (a)-(b), at most translated probes for should be introduced, which requires much more memory and computation costs.

## V Conclusions

In this paper, we propose Gradient Decomposition of the Probe (GDP), an efficient model that exploits translational kernel separability. The GDP model increases the approximation accuracy compared with the coherent model, and it holds for a general partial coherent source. We derive an optimization model coupling the variances of the kernel with the transverse coherence widths, and a fast and memory efficient proximal first order GDP-ADMM algorithm to solve the nonlinear optimization problem based on GDP. Numerical experiments demonstrate the effectiveness of such approximation in the case of Gaussian kernel function, and binary functions in fly-scanning schemes, showing that the proposed methods are suitable for general partially coherent sources.

However the results by GDP-ADMM seem blurry when inferred from the fourth column of Figure 3, and further improvement may be achieved by considering sparse prior, or using piecewise Taylor expansion series or other integration schemes. This will be subject of future work. Additional research directions to extend this work are: (i) consider partial coherence with framewise different variances when using scan geometries on irregular grids or spirals, (ii) combine translational blur at the sample with detector blur in Eq. (2) for near field ptychography. (ii) incorporate longitudinal coherence when using broadband illumination with chromatic aberrations of the probe, dispersion of the diffraction pattern (8), and spectral Kramers-Kronig dispersion across resonances (14).

## Vi Acknowledgments

This work was partially funded by the Center for Applied Mathematics for Energy Research Applications, a joint ASCR-BES funded project within the Office of Science, US Department of Energy, under contract number DOE-DE-AC03-76SF00098, and by the Advanced Light Source, which is a DOE Office of Science User Facility under contract no. DE-AC02-05CH11231.

## Appendix A Gdp-Admm

Once the variables are set, in the iteration the augmented Lagrangian is modified by adding a proximal term as follows:

 ~Lkr(~ω,u,z,σ,Λ)=Lr(~ω,u,z,σ,Λ)+τ2∥u−uk∥2Mk,

where for simplicity we removed superscripts and subscripts, and is a positive semi-definite matrix defined by the approximated solution in the iteration. We remark that with the help of an additional proximal term , the subproblem in Step 2 admits a unique solution. Below we demonstrate how to solve each subproblem.

 Step 1:~{}~ωk+1=argmin~ωLr(~ω,uk,{zkl},σk,{Λkl}); (20) Step 2:~{}uk+1=argminu~Lkr(~ωk+1,u,{zkl},σk,{Λkl}); Step 4:~{}{zk+1l}2l=0=argmin{zl}Lr(~ωk+1,uk+1,{zl},σk,{Λkl}); Extra open brace or missing close brace Step 6:~{}Λk+10=Λk0+(zk+10−A(~ωk+1,uk+1)); ~{}~{}~{}~{}~{}~{}~{}\quadΛk+11=Λk1+(zk+11−A(σk+11∇1~ωk+1,uk+1)); ~{}~{}~{}~{}~{}~{}~{}\quadΛk+12=Λk2+(zk+12−A(σk+12∇2~ωk+1,uk+1));

Step 1 involves finding the first order stationary point of the augmented Lagrangian which has the following form:

 Nk1~ωk+1−ck=0, (21)

with symmetric sparse matrix defined as:

 Nk1:=diag(∑j∣∣Sjuk∣∣2)+2∑l=1(σkl)2∇Tl(diag(∑j|Sjuk|2)∇l),ck:=∑j(Sj(uk)∗∘¯zk0,j+2∑l=1σkl∇Tl(Sj(uk)∗∘¯zkl,j)), (22)

and In this section, the gradient operator is given in a discrete setting for simplicity, where represent the forward finite difference operator with respect to directions.

Step 2 is similar to Step 1, the closed form solution can be expressed as:

 (Nk2+τrMk)uk+1−(bk+τrMkuk)=0, (23)

with diagonal matrix :

 Nk2:=diag(∑jSTj(|~ωk+1|2+2∑l=1σ2l|∇l~ωk+1|2)),bk:=∑jSTj((~ωk+1)∗∘¯zk0,j+2∑l=1σkl∇l(~ωk+1)∗∘¯zkl,j). (24)

Empirically, in order to avoid divisions by 0 when solving (23), set the matrix to a diagonal matrix with diagonal elements:

 Mk(l,l):={0,if~% {}Nk2(l,l)≥110∥diag(Nk2)∥∞;∥diag(Nk2)∥∞−rτNk2(l,l),otherwise. (25)

Step 3 can be expressed as the following problem:

 min{zl,j}0≤j≤J−10≤l≤2∑j12∥∥√fpc,j−√∑l|zl,j|2∥∥2+r2(∥z0+Λ0−A(~ω,u)∥2+∥z1+Λ1−A(σ1∇1~ω,u)∥2+∥z2+Λ2−A(σ2∇2~ω,u)∥2),

which can be solved independently with respect to each frame, hence one needs to consider:

 min{zl,j}2l=012∥∥√fpc,j−√∑l|zl,j|2∥∥2+r2∑l∥zl,j−~zl,j∥2 ∀0≤j≤J−1,

where Similarly, it also has a closed form solution, see Eq. (17), Section III.

Step 4 requires solving a linear least square problem whose closed form solution is:

 σl=R(⟨zl+Λl,A(∇l~ω,u)⟩)∥A(∇l~ω,u)∥