Proximal Heterogeneous Block InputOutput Method and application to Blind Ptychographic Diffraction Imaging
Abstract
We propose a general alternating minimization algorithm for nonconvex optimization problems with separable structure and nonconvex coupling between blocks of variables. To fix our ideas, we apply the methodology to the problem of blind ptychographic imaging. Compared to other schemes in the literature, our approach differs in two ways: (i) it is posed within a clear mathematical framework with practically verifiable assumptions, and (ii) under the given assumptions, it is provably convergent to critical points. A numerical comparison of our proposed algorithm with the current stateoftheart on simulated and experimental data validates our approach and points toward directions for further improvement.
Abstract
Keywords: Alternating minimization, deconvolution, KurdykaŁojasiewicz, nonconvexnonsmooth minimization, ptychography.
1 Introduction
We consider algorithms for nonconvex constrained optimization problems of the following form
(1.1) 
Here (that is, the constraints apply to disjoint blocks of variables) and is a nonlinear penalty function characterizing the coupling between the blocks of variables. It will be convenient to reformulate problem (1.1) using indicator functions. The indicator function of a set is defined as for and for . Define
(1.2) 
An equivalent formulation of (1.1) is the formally unconstrained nonsmooth optimization problem
(1.3) 
Algorithms for solving (1.1) or (1.3) typically seek only to satisfy firstorder necessary conditions for optimality, and the algorithm we propose below is no different. These conditions are given compactly by
(1.4) 
where is a set, the subdifferential, that generalizes the notion of a gradient for nonsmooth, subdifferentially regular functions defined precisely in Definition 3.1 below. For the sake of fixing the ideas, we focus on the particular application of blind ptychography, however our goal and approach are much more general. The partially smooth character of the objective in (1.2) is a common feature in many optimization models which involve sums of functions, some of which are smooth and some of which are not. Forwardbackwardtype algorithms are frequently applied to such models, and our approach is no different. The particular threeblock structure of the problem is easily generalized to blocks. The crucial feature of the model for algorithms, and what we hope to highlight in the present study, is the quantification of continuity of the partial gradients of with respect only to blocks of variables. This is in contrast to more classical approaches which rely on the continuity of with respect to all the variables simultaneously (see [4]). For the ptychography application, such a requirement prohibits a convergence analysis along the lines of [4] since the gradient is not Lipschitz continuous. However, the partial gradients with respect to the blocks of variables are Lipschitz continuous. Following [9], this allows us to prove, in Section 3, convergence of the blocked algorithm given below (Algorithm 2.1) to feasible critical points. Our abstract formulation of the blind ptychography problem can be applied to many different applications including control, machine learning, and deconvolution. We do not attempt to provide a review of the many different approaches to these types of problems, or even a more focused review of numerical methods for ptychography, but rather to provide a common theoretical framework by which a variety of methods can be understood. Our focus on ptychography is due to the success of two algorithms, one by Maiden and Rodenburg [18] and the other due to Thibault and collaborators [24]. These two touchstone methods represent, for us, fundamental computational methods whose structure serves as a central bifurcation in numerical strategies. Moreover, the prevalence of these two methods in practice ensures that our theoretical framework will have the greatest practical impact. (Which is not to say that the methods are the most efficient [19, 23].) We present an algorithmic framework in Section 2 by which these algorithms can be understood and analyzed. We present in Section 3 a theory of convergence of the most general Algorithm 2.1 which is refined with increasingly stringent assumptions until it achieves the form of Algorithm 3.4 that can be immediately applied to ptychography. The specialization of our algorithm to ptychography is presented in Section 4 and summarized in Algorithm 4.1. We compare, in Section 5, Algorithm 4.1 with the stateoftheart on simulated and experimental data.
2 Algorithms and Modeling
The solution we seek is a triple, that satisfies a priori constraints, denoted by , as well as a model characterizing the coupling between the variables. We begin naively with a very intuitive idea for solving (1.1): alternating minimization (AM) with respect to the three separate blocks of variables , and . More precisely, starting with any , we consider the following algorithm:
(2.1a)  
(2.1b)  
(2.1c) 
While the simplicity of the above algorithm is attractive, there are several considerations one must address:

The convergence results for the AM method are limited and applicable only in the convex setting. It is unknown if the AM method converges in the nonconvex setting. Of course, in the general nonconvex setting we can not expect for convergence to global optimum but even convergence to critical points is not known. In [3] the authors prove convergence to critical points for a regularized variant of alternating minimization. We follow this approach in Algorithm 2.1 below, applying proximal regularization in each of the steps to obtain provable convergence results.

Each one of the steps of the algorithm involves solving an optimization problem over just one of the blocks of variables. Forwardbackwardtype methods are common approaches to solving such minimization problems [10], and can be mixed between blocks of variables [9]. Forward operators are typically applied to the illposed or otherwise computationally difficult parts of the objective (usually appearing within the smooth part of the objective) while backward operators are applied to the wellposed parts of the objective (appearing often in the nonsmooth part).
In the particular case of ptychography, the subproblems with respect to the and variables (steps (2.1a) and (2.1b) of the algorithm) are illposed. We handle this by applying a regularized backward operator (the prox operator) to blockwiselinearizations of the illposed forward steps. The objective is wellposed, and particularly simple, with respect to the third block of variables, , so we need only employ a backward operator for this step. Generalizing, our approach addresses issue (ii) above by handling each of the blocks of variables differently. Our presentation of Algorithm 2.1 makes use of the following notation. For any fixed and , the function is continuously differentiable and its partial gradient, , is Lipschitz continuous with moduli . The same assumption holds for the function when and are fixed. In this case, the Lipschitz moduli is denoted by . Define where is an arbitrary positive number. Similarly define where is an arbitrary positive number.
Algorithm 2.1 (Proximal Block ImplicitExplicit Algorithm).
Initialization. Choose , and . General Step () Set and select (2.2) Set and select (2.3) Select (2.4)2.1 Blind Ptychography
In scanning ptychography, an unknown specimen is illuminated by a localized electromagnetic beam and the resulting wave is recorded on a CCD array somewhere downstream along the axis of propagation of the wave (i.e., in the far field or the near field of the object). A ptychographic dataset consists of a series of such observations, each differing from the others by a spatial shift of either the object or the illumination. In the original ptychographic reconstruction procedure [13] it was assumed that the illuminating beam was known. What we call blind ptychography, in analogy with blind deconvolution, reflects the fact that the beam is not completely known, this corresponds to what is commonly understood by ptychography in modern applications [21, 22, 18, 24]. Here the problem is to simultaneously reconstruct the specimen and illuminating beam from a given ptychgraphic dataset. We will treat the case of scanning xray ptychography with far field measurements. This is not exhaustive of all the different settings one might encounter, but the mathematical structure of the problem, our principal interest, is qualitatively the same for all cases. For a review of ptychothographic methods, see [1] and the reference herein. We formulate the ptychography problem on the product space where the first block corresponds to the model space for the probe, the second block corresponds to the model space for the specimen, and the third block corresponds to the model space for the data/observations. The physical model space equipped with the real inner product is isomorphic to the Euclidean space with the inner product for . This is in fact how complex numbers are represented on a computer, and hence the model space with real inner product is just an efficient shorthand for with the standard inner product for such product spaces. We will therefore retain the complex model space with real inner product when describing this problem, noting that all linear operators on this space have analogues on the space . The theory, however, will be set on real finite dimensional vector spaces. Denote with (). The objective function in our general optimization problem (1.1), , is given by
(2.5) 
Here denotes th shift operator which shifts the indexes in in some prescribed fashion and is the elementwise Haadamard product. This function measures in some sense the distance to the set
(2.6) 
Let , , denote the experimental observations, and be a D discrete Fourier transform (of dimension ) rearranged for vectors on . The constraints , , and are separable and given by
(2.7a)  
(2.7b)  
(2.7c) 
The qualitative constraints characterized by and are typically a mixture of support, supportnonnegativity or magnitude constraints corresponding respectively to whether the illumination and specimen are supported on a bounded set, whether these (most likely only the specimen) are “real objects” that somehow absorb or attenuate the probe energy, or whether these are “phase” objects with a prescribed intensity but varying phase. A support constraint for the set , for instance, would be represented by
(2.8) 
where is the index set corresponding to which pixels in the field of view the probe beam illuminates and is some given amplitude. A mixture of support and amplitude constraints for the set would be represented by
(2.9) 
where the index set is the analogous index set for the support of the specimen, and are lower/upper bounds on the intensity of the specimen. The set is nothing more than the phase set appearing in feasibility formulations of the phase retrieval problem [16].
Remark 2.1 (Feasibility versus minmization).
Since the algorithms discussed below involve, at some point, projections onto these sets, it is worthwhile noting here that, while, in most applications, the projections onto the sets , and have a closed form and can be computed very accurately and efficiently, we are unaware of any method, analytic or otherwise, for computing the projection onto the set defined by (2.6). For this reason, we have avoided formulation of the problem as a (nonconvex) feasibility problem
Nevertheless, this essentially twoset feasibility model suggests a wide range of techniques within the family of projection methods, alternating projections, averaged projections and Douglas–Rachford being representative members. In contrast to these, our approach is essentially a forwardbackward method that avoids the difficulty of computing a projection onto the set by instead minimizing a nonnegative coupling function that takes the value (only) on .
3 Algorithm Analysis
3.1 Mathematical Preliminaries
Algorithm 2.1 consists of three steps all of which reduce to the computation of a projection onto a given constraint set (convex and nonconvex). Recall that the projection onto a nonempty and closed subset of a Euclidean space , is the (setvalued) mapping defined by
(3.1) 
In Euclidean spaces the projection is single valued if and only if is convex (in addition to being nonempty and closed). Specializing to the present application, for the constraints specified by (2.7), where , and are, in general, multivalued (consider ). Since we are dealing with nonsmooth and nonconvex functions that can take the value , we require the following generalization of the derivative for nonconvex functions.
Definition 3.1 (Subdifferential [20]).
Let be proper (not everywhere infinite) and lower semicontinuous (lsc).

The regular or Fréchet subdifferential of at , denoted , is the set of vectors which satisfy
(3.2) If then .

The limiting subdifferential of at , denoted , is the set of limits of limiting subdifferentials:
We say that is subdifferentially regular at if , and subdifferentially regular (without reference to the point ) if it is subdifferentially regular at every point in .
The notion of regularity of a set can be understood in terms of the subdifferential regularity of the indicator function of that set. We will call a set Clarke regular if the corresponding indicator function is subdifferentially regular (see [20, Definition 6.4]). In [9], Bolte et al. present a general procedure for determining convergence to critical points of generic algorithms for nonsmooth and nonconvex problems. The procedure consists of verifying three criteria, two of which are quite standard and shared by most descent algorithms, see e.g., [4]. The third criterion depends not on the algorithm but on the objective function: it must satisfy the KurdykaŁojasiewicz (KL) property (see [7, 8] and the references therein).
Definition 3.2 (KurdykaŁojasiewicz property).
Let be proper and lower semicontinuous. For define
(3.3) 
The function is said to have the KurdykaŁojasiewicz (KL) property at if there exist , a neighborhood of and a function , such that, for all
the following inequality holds
(3.4) 
If satisfies property (3.4) at each point of , then is called a KL function.
For a given function, the KL property can be verified indirectly by checking membership to certain classes of functions, in particular the class of semialgebraic functions [7]. For the convenience of the reader, we recall here the definition of semialgebraic functions.
Definition 3.3 (Semialgebraic sets and functions).

A is (real) semialgebraic if there exists a finite number of real polynomial functions such that

A function is semi algebraic if its graph
is a semialgebraic subset of .
The class of semialgebraic sets is stable under the following operations: finite unions, finite intersections, complementation and Cartesian products. For a thorough catalog of semialgebraic functions and sets see [2, 3, 4, 9] and the references therein. While it may not be obvious how directly to verify the KL property it is easy to determine whether a function is semialgebraic. The remarkable fact about semialgebraic functions is that, as long as they are lower semicontinuous, they automatically satisfy the KL property on their domain as stated in the following result.
Theorem 3.1 ([7, Th. 3.3, pg. 1215]).
Let be a proper lower semicontinuous function. If is semialgebraic then it satisfies the KL property at any point in .
We now show that the ptychography problem, and hence the phase retrieval problem, is semialgebraic, i.e., both the objective function and the constraint set are semialgebraic.
Proposition 3.2 (Blind ptychography and phase retrieval are semialgebraic).
Proof sketch.
The physical model is formulated with complexvalued vectors, but with the real inner product is isomorphic to the Euclidean space with the inner product for . The function defined by (2.5) is finite everywhere, continuous (indeed, differentiable), and the level sets of the objective are quadratics with respect to and , and quadratic with respect to under linear transformations. Thus is semialgebraic. The sets and are either subspaces (support constraint only, (2.8)) or the intersection of a subspace with a box or ball (support and nonnegativity or support and amplitude constraints (2.9)), and so both of these are nonempty semialgebraic. The set is equivalent to an amplitude constraint in the image space of the linear mapping with respect to the 1norm on each twodimensional component of the product space . Thus is also nonempty semialgebraic. That defined by (1.2) is then a KLfunction for these , , , and then follows from Theorem 3.1. ∎
3.2 Convergence Analysis
Our convergence analysis is centered on Theorem 3.7, a general result concerning the application of Algorithm 2.1 to problem (1.4). The specialization to the ptychography problem, Proposition 4.1, is then easily achieved by verifying that the assumptions of a refinement, Theorem 3.10, are satisfied. Following [9], we carry out the threestep procedure, outlined in Section 3.1, for proving convergence of the sequence , generated by Algorithm 2.1, to a point satisfying (1.4) provided that the initial point . The analysis rests on the following assumptions, collected here to avoid repetition.
Assumption 1.
Let be iterates of Algorithm 2.1 for . , , and are nonempty and closed. is differentiable on and . Moreover, and (as defined above) are Lipschitz continuous with moduli and , respectively. The gradient of , , is Lipschitz continuous on bounded domains in . Moreover, there exists such that (3.5) The iterates are bounded. The function defined by (1.2) is a KL function (see Definition 3.2).Remark 3.1.
In Section 4, we will show that our ptychography model (described in Section 2.1) satisfies these assumptions. For the general setting we point out the following.

Boundedness of the iterates Assumption 1(iv)) is a strong assumption that can be handled by a more technical treatment than we would like to present here. For our purposes this can be guaranteed by the physically natural assumption that the constraint set is bounded. Since Algorithm 2.1 is a feasible point algorithm, all iterates belong to the bounded feasible set, hence, in this case, the iterates are bounded.

Combining Assumptions 1(ii) and (iii) do not guarantee that the gradient is globally Lipschitz, as is the case in the application described below (see Section 4). The inequalities in (3.5) could be obtained in several scenarios, for example, when is and using the boundedness assumption Assumption 1(iv).
We begin with a technical lemma.
Lemma 3.3 (Sufficient decrease property).
Let be a continuously differentiable function with gradient assumed to be Lipschitz continuous and let be a nonempty and closed subset of . Fix any . Then, for any and for defined by
we have
Proof.
The result follows from [9, Lemma 2] where the nonsmooth function is the indicator function of the nonempty closed set . ∎
Remark 3.2.
Proposition 3.4 (Sufficient decrease).
Proof.
We apply Lemma 3.3 to the first subproblem (see (2.2)) as follows. Take , and to obtain that
where the second inequality follows from the fact that and the last inequality follows from the fact that and . Similarly, applying Lemma 3.3 to the second subproblem (see (2.3)) with , and yields
On the other hand, immediately from the third updating rule (see (2.4)) we get that
Summing up all these inequalities yields
Denote . Thus
(3.6) 
This proves that the sequence is decreasing. Since, in addition, we know that is bounded from below (see Assumption 1(ii)), we thus have a decreasing sequence on a compact interval and it follows that converges to some . Summing up this inequality, for , yields
(3.7) 
where the last inequality holds true since . Taking the limit as yields boundedness of the sum of steplengths and completes the proof. ∎
Before proving the second step, we obtain the following immediate consequence.
Corollary 3.5 (Rate of asymptotic regularity).
Proof.
Proposition 3.6 (Lipschitz paths).
Proof.
Let be a positive integer. Writing the optimality condition of the first updating rule yields
where . Hence
It is clear from the definition of (see (1.2)), that
Combining these two facts proves that . Following the same arguments applied on the second updating rule yields the desired result that . Now, writing the optimality condition of the third updating rule yields
for , hence . We begin with an estimation of the norm of . From Assumption 1(iii) and (iv), there exists such that
Thus, from the definition of we obtain
where the second inequality follows from Assumption 1(iii). A similar argument yields
Thus
This proves the desired result. ∎
We are now ready to prove the main result of this section, namely convergence of Algorithm 2.1 to points satisfying (1.4) for any initial point . It is in deducing the last step of the general case that we use the assumption that satisfies the KL inequality (3.4).
Theorem 3.7 (Convergence to critical points).
3.3 Acceleration of the PFB method
In this section we develop an accelerated version of Algorithm 2.1. To motivate our approach, we return to the naive alternating minimization method (2.1) with which we began. If each of the blocks were themselves separable, then we could recursively apply the blocking strategy discussed in Section 2 within the blocks. We first detail recursive blocking, which improves the step sizes, and then we discuss additional structures that enable efficient implementations via parallelization. For simplicity, we focus our discussion on the first block , the same strategy also can (and will) be applied to the block . Suppose the block can be further subdivided into a product of smaller blocks: with . For fixed and we consider the problem (