Proximal Heterogeneous Block Input-Output Method and application to Blind Ptychographic Diffraction Imaging

Proximal Heterogeneous Block Input-Output Method and application to Blind Ptychographic Diffraction Imaging

Abstract

We propose a general alternating minimization algorithm for nonconvex optimization problems with separable structure and nonconvex coupling between blocks of variables. To fix our ideas, we apply the methodology to the problem of blind ptychographic imaging. Compared to other schemes in the literature, our approach differs in two ways: (i) it is posed within a clear mathematical framework with practically verifiable assumptions, and (ii) under the given assumptions, it is provably convergent to critical points. A numerical comparison of our proposed algorithm with the current state-of-the-art on simulated and experimental data validates our approach and points toward directions for further improvement.

Abstract

Keywords: Alternating minimization, deconvolution, Kurdyka-Łojasiewicz, nonconvex-nonsmooth minimization, ptychography.

1 Introduction

We consider algorithms for nonconvex constrained optimization problems of the following form

(1.1)

Here (that is, the constraints apply to disjoint blocks of variables) and is a nonlinear penalty function characterizing the coupling between the blocks of variables. It will be convenient to reformulate problem (1.1) using indicator functions. The indicator function of a set is defined as for and for . Define

(1.2)

An equivalent formulation of (1.1) is the formally unconstrained nonsmooth optimization problem

(1.3)

Algorithms for solving (1.1) or (1.3) typically seek only to satisfy first-order necessary conditions for optimality, and the algorithm we propose below is no different. These conditions are given compactly by

(1.4)

where is a set, the subdifferential, that generalizes the notion of a gradient for nonsmooth, subdifferentially regular functions defined precisely in Definition 3.1 below. For the sake of fixing the ideas, we focus on the particular application of blind ptychography, however our goal and approach are much more general. The partially smooth character of the objective in (1.2) is a common feature in many optimization models which involve sums of functions, some of which are smooth and some of which are not. Forward-backward-type algorithms are frequently applied to such models, and our approach is no different. The particular three-block structure of the problem is easily generalized to blocks. The crucial feature of the model for algorithms, and what we hope to highlight in the present study, is the quantification of continuity of the partial gradients of with respect only to blocks of variables. This is in contrast to more classical approaches which rely on the continuity of with respect to all the variables simultaneously (see [4]). For the ptychography application, such a requirement prohibits a convergence analysis along the lines of [4] since the gradient is not Lipschitz continuous. However, the partial gradients with respect to the blocks of variables are Lipschitz continuous. Following [9], this allows us to prove, in Section 3, convergence of the blocked algorithm given below (Algorithm 2.1) to feasible critical points. Our abstract formulation of the blind ptychography problem can be applied to many different applications including control, machine learning, and deconvolution. We do not attempt to provide a review of the many different approaches to these types of problems, or even a more focused review of numerical methods for ptychography, but rather to provide a common theoretical framework by which a variety of methods can be understood. Our focus on ptychography is due to the success of two algorithms, one by Maiden and Rodenburg [18] and the other due to Thibault and collaborators [24]. These two touchstone methods represent, for us, fundamental computational methods whose structure serves as a central bifurcation in numerical strategies. Moreover, the prevalence of these two methods in practice ensures that our theoretical framework will have the greatest practical impact. (Which is not to say that the methods are the most efficient [19, 23].) We present an algorithmic framework in Section 2 by which these algorithms can be understood and analyzed. We present in Section 3 a theory of convergence of the most general Algorithm 2.1 which is refined with increasingly stringent assumptions until it achieves the form of Algorithm 3.4 that can be immediately applied to ptychography. The specialization of our algorithm to ptychography is presented in Section 4 and summarized in Algorithm 4.1. We compare, in Section 5, Algorithm 4.1 with the state-of-the-art on simulated and experimental data.

2 Algorithms and Modeling

The solution we seek is a triple, that satisfies a priori constraints, denoted by , as well as a model characterizing the coupling between the variables. We begin naively with a very intuitive idea for solving (1.1): alternating minimization (AM) with respect to the three separate blocks of variables , and . More precisely, starting with any , we consider the following algorithm:

(2.1a)
(2.1b)
(2.1c)

While the simplicity of the above algorithm is attractive, there are several considerations one must address:

  • The convergence results for the AM method are limited and applicable only in the convex setting. It is unknown if the AM method converges in the nonconvex setting. Of course, in the general nonconvex setting we can not expect for convergence to global optimum but even convergence to critical points is not known. In [3] the authors prove convergence to critical points for a regularized variant of alternating minimization. We follow this approach in Algorithm 2.1 below, applying proximal regularization in each of the steps to obtain provable convergence results.

  • Each one of the steps of the algorithm involves solving an optimization problem over just one of the blocks of variables. Forward-backward-type methods are common approaches to solving such minimization problems [10], and can be mixed between blocks of variables [9]. Forward operators are typically applied to the ill-posed or otherwise computationally difficult parts of the objective (usually appearing within the smooth part of the objective) while backward operators are applied to the well-posed parts of the objective (appearing often in the nonsmooth part).

In the particular case of ptychography, the subproblems with respect to the and variables (steps (2.1a) and (2.1b) of the algorithm) are ill-posed. We handle this by applying a regularized backward operator (the prox operator) to blockwise-linearizations of the ill-posed forward steps. The objective is well-posed, and particularly simple, with respect to the third block of variables, , so we need only employ a backward operator for this step. Generalizing, our approach addresses issue (ii) above by handling each of the blocks of variables differently. Our presentation of Algorithm 2.1 makes use of the following notation. For any fixed and , the function is continuously differentiable and its partial gradient, , is Lipschitz continuous with moduli . The same assumption holds for the function when and are fixed. In this case, the Lipschitz moduli is denoted by . Define where is an arbitrary positive number. Similarly define where is an arbitrary positive number.

Algorithm 2.1 (Proximal Block Implicit-Explicit Algorithm).
Initialization. Choose , and . General Step () Set and select (2.2) Set and select (2.3) Select (2.4)
The regularization parameters and , , are discussed in Section 5. For the moment, suffice it to say that these parameters are inversely proportional to the stepsize in Steps (2.2) and (2.3) of the algorithm (see Section 3). Noting that and , , are directly proportional to the respective partial Lipschitz moduli, the larger the partial Lipschitz moduli the smaller the stepsize, and hence the slower the algorithm progresses. This brings to light another advantage of blocking strategies that goes beyond convergence proofs: algorithms that exploit block structures inherent in the objective function achieve better numerical performance by taking heterogeneous step sizes optimized for the separate blocks. There is, however, a price to be paid in the blocking strategies that we explore here: namely, they result in procedures that pass sequentially between operations on the blocks, and as such are not immediately parallelizable. Here too, the ptychography application generously rewards us with added structure, as we show in Section 3.3, permitting parallel computations on highly segmented blocks. The convergence theory developed in Section 3 is independent of the precise form of the coupling function and independent of the precise form of the constraints. For our analysis we require that is differentiable with Lipschitz continuous on bounded domains, and the partial gradient (globally) Lipschitz continuous as a mapping on for each fixed, and partial gradient (globally) Lipschitz continuous as a mapping on for fixed. The constraint sets , and could be very general, we only assume that they are closed and disjoint. This is discussed more precisely below. The analysis presented in Section 3 guarantees only that Algorithm 2.1 converges to a point satisfying (1.4), which, it is worth reiterating, are not necessarily solutions to (1.1).

2.1 Blind Ptychography

In scanning ptychography, an unknown specimen is illuminated by a localized electromagnetic beam and the resulting wave is recorded on a CCD array somewhere downstream along the axis of propagation of the wave (i.e., in the far field or the near field of the object). A ptychographic dataset consists of a series of such observations, each differing from the others by a spatial shift of either the object or the illumination. In the original ptychographic reconstruction procedure [13] it was assumed that the illuminating beam was known. What we call blind ptychography, in analogy with blind deconvolution, reflects the fact that the beam is not completely known, this corresponds to what is commonly understood by ptychography in modern applications [21, 22, 18, 24]. Here the problem is to simultaneously reconstruct the specimen and illuminating beam from a given ptychgraphic dataset. We will treat the case of scanning x-ray ptychography with far field measurements. This is not exhaustive of all the different settings one might encounter, but the mathematical structure of the problem, our principal interest, is qualitatively the same for all cases. For a review of ptychothographic methods, see [1] and the reference herein. We formulate the ptychography problem on the product space where the first block corresponds to the model space for the probe, the second block corresponds to the model space for the specimen, and the third block corresponds to the model space for the data/observations. The physical model space equipped with the real inner product is isomorphic to the Euclidean space with the inner product for . This is in fact how complex numbers are represented on a computer, and hence the model space with real inner product is just an efficient shorthand for with the standard inner product for such product spaces. We will therefore retain the complex model space with real inner product when describing this problem, noting that all linear operators on this space have analogues on the space . The theory, however, will be set on real finite dimensional vector spaces. Denote with (). The objective function in our general optimization problem (1.1), , is given by

(2.5)

Here denotes -th shift operator which shifts the indexes in in some prescribed fashion and is the elementwise Haadamard product. This function measures in some sense the distance to the set

(2.6)

Let , , denote the experimental observations, and be a D discrete Fourier transform (of dimension ) rearranged for vectors on . The constraints , , and are separable and given by

(2.7a)
(2.7b)
(2.7c)

The qualitative constraints characterized by and are typically a mixture of support, support-nonnegativity or magnitude constraints corresponding respectively to whether the illumination and specimen are supported on a bounded set, whether these (most likely only the specimen) are “real objects” that somehow absorb or attenuate the probe energy, or whether these are “phase” objects with a prescribed intensity but varying phase. A support constraint for the set , for instance, would be represented by

(2.8)

where is the index set corresponding to which pixels in the field of view the probe beam illuminates and is some given amplitude. A mixture of support and amplitude constraints for the set would be represented by

(2.9)

where the index set is the analogous index set for the support of the specimen, and are lower/upper bounds on the intensity of the specimen. The set is nothing more than the phase set appearing in feasibility formulations of the phase retrieval problem [16].

Remark 2.1 (Feasibility versus minmization).

Since the algorithms discussed below involve, at some point, projections onto these sets, it is worthwhile noting here that, while, in most applications, the projections onto the sets , and have a closed form and can be computed very accurately and efficiently, we are unaware of any method, analytic or otherwise, for computing the projection onto the set defined by (2.6). For this reason, we have avoided formulation of the problem as a (nonconvex) feasibility problem

Nevertheless, this essentially two-set feasibility model suggests a wide range of techniques within the family of projection methods, alternating projections, averaged projections and Douglas–Rachford being representative members. In contrast to these, our approach is essentially a forward-backward method that avoids the difficulty of computing a projection onto the set by instead minimizing a nonnegative coupling function that takes the value (only) on .

3 Algorithm Analysis

3.1 Mathematical Preliminaries

Algorithm 2.1 consists of three steps all of which reduce to the computation of a projection onto a given constraint set (convex and nonconvex). Recall that the projection onto a nonempty and closed subset of a Euclidean space , is the (set-valued) mapping defined by

(3.1)

In Euclidean spaces the projection is single valued if and only if is convex (in addition to being nonempty and closed). Specializing to the present application, for the constraints specified by (2.7), where , and are, in general, multivalued (consider ). Since we are dealing with nonsmooth and nonconvex functions that can take the value , we require the following generalization of the derivative for nonconvex functions.

Definition 3.1 (Subdifferential [20]).

Let be proper (not everywhere infinite) and lower semicontinuous (lsc).

  • The regular or Fréchet subdifferential of at , denoted , is the set of vectors which satisfy

    (3.2)

    If then .

  • The limiting subdifferential of at , denoted , is the set of limits of limiting subdifferentials:

We say that is subdifferentially regular at if , and subdifferentially regular (without reference to the point ) if it is subdifferentially regular at every point in .

The notion of regularity of a set can be understood in terms of the subdifferential regularity of the indicator function of that set. We will call a set Clarke regular if the corresponding indicator function is subdifferentially regular (see [20, Definition 6.4]). In [9], Bolte et al. present a general procedure for determining convergence to critical points of generic algorithms for nonsmooth and nonconvex problems. The procedure consists of verifying three criteria, two of which are quite standard and shared by most descent algorithms, see e.g., [4]. The third criterion depends not on the algorithm but on the objective function: it must satisfy the Kurdyka-Łojasiewicz (KL) property (see [7, 8] and the references therein).

Definition 3.2 (Kurdyka-Łojasiewicz property).

Let be proper and lower semicontinuous. For define

(3.3)

The function is said to have the Kurdyka-Łojasiewicz (KL) property at if there exist , a neighborhood of and a function , such that, for all

the following inequality holds

(3.4)

If satisfies property (3.4) at each point of , then is called a KL function.

For a given function, the KL property can be verified indirectly by checking membership to certain classes of functions, in particular the class of semi-algebraic functions [7]. For the convenience of the reader, we recall here the definition of semi-algebraic functions.

Definition 3.3 (Semi-algebraic sets and functions).
  • A is (real) semi-algebraic if there exists a finite number of real polynomial functions such that

  • A function is semi- algebraic if its graph

    is a semi-algebraic subset of .

The class of semi-algebraic sets is stable under the following operations: finite unions, finite intersections, complementation and Cartesian products. For a thorough catalog of semi-algebraic functions and sets see [2, 3, 4, 9] and the references therein. While it may not be obvious how directly to verify the KL property it is easy to determine whether a function is semi-algebraic. The remarkable fact about semi-algebraic functions is that, as long as they are lower semi-continuous, they automatically satisfy the KL property on their domain as stated in the following result.

Theorem 3.1 ([7, Th. 3.3, pg. 1215]).

Let be a proper lower semicontinuous function. If is semi-algebraic then it satisfies the KL property at any point in .

We now show that the ptychography problem, and hence the phase retrieval problem, is semi-algebraic, i.e., both the objective function and the constraint set are semi-algebraic.

Proposition 3.2 (Blind ptychography and phase retrieval are semi-algebraic).

The objective function , defined by (2.5), is continuous and semi-algebraic. The constraint sets , and , defined by (2.7), are nonempty, closed and semi-algebraic. Consequently, the corresponding function , defined by (1.2), is a KL function on .

Proof sketch.

The physical model is formulated with complex-valued vectors, but with the real inner product is isomorphic to the Euclidean space with the inner product for . The function defined by (2.5) is finite everywhere, continuous (indeed, differentiable), and the level sets of the objective are quadratics with respect to and , and quadratic with respect to under linear transformations. Thus is semi-algebraic. The sets and are either subspaces (support constraint only, (2.8)) or the intersection of a subspace with a box or ball (support and nonnegativity or support and amplitude constraints (2.9)), and so both of these are nonempty semi-algebraic. The set is equivalent to an amplitude constraint in the image space of the linear mapping with respect to the 1-norm on each two-dimensional component of the product space . Thus is also nonempty semi-algebraic. That defined by (1.2) is then a KL-function for these , , , and then follows from Theorem 3.1. ∎

3.2 Convergence Analysis

Our convergence analysis is centered on Theorem 3.7, a general result concerning the application of Algorithm 2.1 to problem (1.4). The specialization to the ptychography problem, Proposition 4.1, is then easily achieved by verifying that the assumptions of a refinement, Theorem 3.10, are satisfied. Following [9], we carry out the three-step procedure, outlined in Section 3.1, for proving convergence of the sequence , generated by Algorithm 2.1, to a point satisfying (1.4) provided that the initial point . The analysis rests on the following assumptions, collected here to avoid repetition.

Assumption 1.
Let be iterates of Algorithm 2.1 for . , , and are nonempty and closed. is differentiable on and . Moreover, and (as defined above) are Lipschitz continuous with moduli and , respectively. The gradient of , , is Lipschitz continuous on bounded domains in . Moreover, there exists such that (3.5) The iterates are bounded. The function defined by (1.2) is a KL function (see Definition 3.2).

Remark 3.1.

In Section 4, we will show that our ptychography model (described in Section 2.1) satisfies these assumptions. For the general setting we point out the following.

  1. Boundedness of the iterates Assumption 1(iv)) is a strong assumption that can be handled by a more technical treatment than we would like to present here. For our purposes this can be guaranteed by the physically natural assumption that the constraint set is bounded. Since Algorithm 2.1 is a feasible point algorithm, all iterates belong to the bounded feasible set, hence, in this case, the iterates are bounded.

  2. Combining Assumptions 1(ii) and (iii) do not guarantee that the gradient is globally Lipschitz, as is the case in the application described below (see Section 4). The inequalities in (3.5) could be obtained in several scenarios, for example, when is and using the boundedness assumption Assumption 1(iv).

We begin with a technical lemma.

Lemma 3.3 (Sufficient decrease property).

Let be a continuously differentiable function with gradient assumed to be -Lipschitz continuous and let be a nonempty and closed subset of . Fix any . Then, for any and for defined by

we have

Proof.

The result follows from [9, Lemma 2] where the nonsmooth function is the indicator function of the nonempty closed set . ∎

Remark 3.2.

When is also convex, the conclusion of Lemma 3.3 can be improved (see [6, Lemma 2.3]) to the following

This means that (rather than only , as in the nonconvex case) is enough to guarantee decrease of function value after projected-gradient step.

Using Lemma 3.3 we can prove the following basic property of Algorithm 2.1.

Proposition 3.4 (Sufficient decrease).

Let be a sequence generated by Algorithm 2.1 for some initial point . Suppose that conditions (i)-(ii) of Assumption 1 hold. Then the sequence is decreasing and

Hence the sequence converges to some as .

Proof.

We apply Lemma 3.3 to the first subproblem (see (2.2)) as follows. Take , and to obtain that

where the second inequality follows from the fact that and the last inequality follows from the fact that and . Similarly, applying Lemma 3.3 to the second subproblem (see (2.3)) with , and yields

On the other hand, immediately from the third updating rule (see (2.4)) we get that

Summing up all these inequalities yields

Denote . Thus

(3.6)

This proves that the sequence is decreasing. Since, in addition, we know that is bounded from below (see Assumption 1(ii)), we thus have a decreasing sequence on a compact interval and it follows that converges to some . Summing up this inequality, for , yields

(3.7)

where the last inequality holds true since . Taking the limit as yields boundedness of the sum of step-lengths and completes the proof. ∎

Before proving the second step, we obtain the following immediate consequence.

Corollary 3.5 (Rate of asymptotic regularity).

Let be a sequence generated by Algorithm 2.1 for some initial point and define the corresponding sequence of steps by . Suppose that conditions (i)-(ii) of Assumption 1 hold. Then as with the following rate

where and .

Proof.

From (3.7) we obtain that

and thus

The result now easily follows. ∎

Proposition 3.6 (Lipschitz paths).

Let be a sequence generated by Algorithm 2.1 for some initial point . Suppose that conditions (i)-(iv) of Assumption 1 hold. For each positive integer , define the following three quantities: ,

and

Then and there exists such that

where

Proof.

Let be a positive integer. Writing the optimality condition of the first updating rule yields

where . Hence

It is clear from the definition of (see (1.2)), that

Combining these two facts proves that . Following the same arguments applied on the second updating rule yields the desired result that . Now, writing the optimality condition of the third updating rule yields

for , hence . We begin with an estimation of the norm of . From Assumption  1(iii) and (iv), there exists such that

Thus, from the definition of we obtain

where the second inequality follows from Assumption 1(iii). A similar argument yields

Thus

This proves the desired result. ∎

We are now ready to prove the main result of this section, namely convergence of Algorithm 2.1 to points satisfying (1.4) for any initial point . It is in deducing the last step of the general case that we use the assumption that satisfies the KL inequality (3.4).

Theorem 3.7 (Convergence to critical points).

Let be a sequence generated by Algorithm 2.1 for some initial point . Suppose that Assumption 1 holds. Then following assertions hold.

  1. The sequence has finite length, that is,

  2. The sequence converges to a point satisfying (1.4).

Proof.

The result follows from Propositions 3.4 and 3.6 together with [9, Theorem 1]. ∎

3.3 Acceleration of the PFB method

In this section we develop an accelerated version of Algorithm 2.1. To motivate our approach, we return to the naive alternating minimization method (2.1) with which we began. If each of the blocks were themselves separable, then we could recursively apply the blocking strategy discussed in Section 2 within the blocks. We first detail recursive blocking, which improves the step sizes, and then we discuss additional structures that enable efficient implementations via parallelization. For simplicity, we focus our discussion on the first block , the same strategy also can (and will) be applied to the block . Suppose the block can be further subdivided into a product of smaller blocks: with . For fixed and we consider the problem (2.1a),

(3.8)

This problem has the same difficulties with respect to the sub-blocks