Computing Reduced Order Models via Inner-Outer Krylov Recycling in Diffuse Optical Tomography1footnote 11footnote 1This material is based upon work supported by the National Science Foundation under Grants No. NSF-DMS 1025327, NSF DMS 1217156 and 1217161, NSF-DMS 0645347, and NIH R01-CA154774.

Computing Reduced Order Models via Inner-Outer Krylov Recycling in Diffuse Optical Tomography111This material is based upon work supported by the National Science Foundation under Grants No. NSF-DMS 1025327, NSF DMS 1217156 and 1217161, NSF-DMS 0645347, and NIH R01-CA154774.

Meghan O’Connell333Department of Mathematics, Tufts University, Medford, MA 02115. , Misha E. Kilmer333Department of Mathematics, Tufts University, Medford, MA 02115. , Eric de Sturler222Department of Mathematics, Virginia Tech, Blacksburg, VA 24061. , and Serkan Gugercin222Department of Mathematics, Virginia Tech, Blacksburg, VA 24061.

In nonlinear imaging problems whose forward model is described by a partial differential equation (PDE), the main computational bottleneck in solving the inverse problem is the need to solve many large-scale discretized PDEs at each step of the optimization process. In the context of absorption imaging in diffuse optical tomography, one approach to addressing this bottleneck proposed recently (de Sturler, et al, 2015) reformulates the viewing of the forward problem as a differential algebraic system, and then employs model order reduction (MOR). However, the construction of the reduced model requires the solution of several full order problems (i.e. the full discretized PDE for multiple right-hand sides) to generate a candidate global basis. This step is then followed by a rank-revealing factorization of the matrix containing the candidate basis in order to compress the basis to a size suitable for constructing the reduced transfer function. The present paper addresses the costs associated with the global basis approximation in two ways. First, we use the structure of the matrix to rewrite the full order transfer function, and corresponding derivatives, such that the full order systems to be solved are symmetric (positive definite in the zero frequency case). Then we apply MOR to the new formulation of the problem. Second, we give an approach to computing the global basis approximation dynamically as the full order systems are solved. In this phase, only the incrementally new, relevant information is added to the existing global basis, and redundant information is not computed. This new approach is achieved by an inner-outer Krylov recycling approach which has potential use in other applications as well. We show the value of the new approach to approximate global basis computation on two DOT absorption image reconstruction problems.

Key words. Krylov recycling, diffuse optical tomography, model order reduction, nonlinear inverse problem

AMS subject classifications. 65F10, 65F22

1 Introduction

Nonlinear inverse problems are becoming ubiquitous but remain very expensive to solve, including medical image reconstruction, identification of anomalous regions such as unexploded ordinance, land mines, and contaminant plumes in the subsurface. In inverse problems we recover an image of an unknown quantity of interest inside a given medium, such as the light absorption coefficient in human tissue, using a mathematical model, the forward model, that relates the image of the unknown quantity to the measured data. In this paper, we consider the problem of imaging in diffuse optical tomography (DOT), where the forward model is a large-scale, discretized, partial differential equation describing the photon fluence/flux through tissue. Other imaging problems (e.g. electrical impedance/resistance tomography, hydraulic tomography) are similar in that the forward problems are described by PDEs.

The need to solve these large-scale, forward problems many times to recover two or three-dimensional images represents the largest computational impediment to effective, practical use of DOT. To this end, the use of reduced order modeling in the context of DOT imaging was considered in [13]. The key observation in that work is that function evaluations for the underlying optimization problem (which requires solving for the parameters defining the parametric level set image) may be viewed as transfer function evaluations along the imaginary axis, which motivates the use of system-theoretic model order reduction methods. Specifically, interpolatory parametric reduced models as surrogates for the full forward model were used. In the DOT setting, such surrogate models were found to approximate both the cost functional and the associated Jacobian with very little loss of accuracy while at the same time drastically reducing the cost of the overall inversion process.

Nevertheless, difficulties remain. In order to determine the global basis to be used for the interpolatory projection, several large-scale linear systems must still be solved. Because of the structure and size of the systems, the solves are typically done iteratively [9]. In recent work [2, 16, 17], the authors investigate the use of Krylov subspace recycling for these systems to generate the model-order reduction (MOR) basis. Recycling for shape based inversion for DOT was investigated in [21], but the goal was solving the sequence of systems in the inversion process, rather than for use in computing global basis matrices for use with MOR.

In the present work, we present several computational advances for applying MOR to this inverse problem, but note that the results are applicable in a more general interpolatory MOR setting. The paper is organized as follows. In Section LABEL:sec:background, we give the system theoretic notation, introduction to DOT, and relevant information on computing reduced order models. In Section LABEL:sec:symmetry, we show that in a particular geometry and discretization, the transfer function may be reformulated as a transfer function for a slightly smaller symmetric problem. In Section LABEL:sec:recycle, we consider innovations in leveraging Krylov recycling techniques to reduce the total amount of computation for the reduced global basis matrix by eliminating redundant information dynamically. An analysis is given in Section LABEL:sec:analysis. Numerical results are given in Section LABEL:sec:numerical and conclusions and future work are outlined in Section LABEL:sec:conclusions.

2 Background

We begin by describing the forward model for DOT. Then, we express the forward problem in the systems theoretic notation that we will use throughout the paper. The first two sections borrow heavily from the presentation in our previous paper [13]. The third subsection gives background on the generation of the reduced transfer function, in preparation for the presentation of our new techniques for computing the reduced model global projection bases given in Sections LABEL:sec:symmetry and LABEL:sec:recycle.

2.1 The DOT Problem

Our image domain is a rectangular slab, , with the top () and bottom () surfaces denoted by and , respectively. We use a diffusion model for the photon flux/fluence [6] driven by an input light source . The input light source is one out of a set of possible sources that are each physically stationary. This means there are functions, , such that for selected . Additionally the observations are made with a limited number of detectors observations. We use to denote the presumed stationary observations located on the bottom surface. Using these definitions, the model for the diffusion and absorption of light is described via


(see [6, p. R56]). Here, the vector refers to spatial location, is a constant defining the particular diffusive boundary reflection (see [6, p.R50]), and and denote diffusion and absorption coefficients, respectively. Also, denotes the outward unit normal and is the speed of light in the medium.

The inverse problem consists of utilizing observations, , made when the system is illuminated by a variety of source signals, , to more accurately determine and . For purposes of this work, we assume the diffusivity is known (a common assumption in DOT breast tissue imaging) and that only the absorption field, , must be recovered. We also assume that the absorption field, , although unknown, is expressible in terms of a finite set of parameters, . We will assume that parametric level sets (PaLS), developed in [1] and used in the context of DOT imaging in [13], have been used for . The inverse problem is often solved by moving to the frequency domain and using frequency domain data (e.g. data using frequency modulated light). However, we adopt the approach in [13] of reformulating the forward problem in dynamical systems notation before moving to the frequency domain, so that we can see how to employ model reduction in the context of solving the inverse problem.

The discretization of (LABEL:eq:ModelPDE1)-(LABEL:eq:ModelPDE4) can be done by finite element or finite difference techniques. This gives the following differential-algebraic system,


where denotes the discretized photon flux, is the vector of detector outputs, constitutes a set of quadrature rules with the approximate photon flux; the columns of are discretizations of the source “footprints” for ; with and discretizations of the diffusion and absorption terms, respectively ( inherits the absorption field parametrization, ). is singular due to the inclusion of the discretized Robin condition (LABEL:eq:ModelPDE2) as an algebraic constraint. This fact will become important in Section LABEL:sec:symmetry.

Let , , and denote the Fourier transforms of , , and , respectively. Taking the Fourier transform of (LABEL:processModelDynSys) and rearranging, we get


where , and is known as the frequency response of the dynamical system defined in (LABEL:processModelDynSys), though we will often refer to it as the transfer function111In describing linear dynamical systems, usually the transfer function is used where and is not restricted to the imaginary axis. Here, though, the measurements are made only on the imaginary axis and it is enough to take with ..

For any absorption field, , associated with , the vector of (estimated) observations for the input source at frequency , as predicted by the forward model in the frequency domain, will be denoted as . Stacking the predicted observation vectors for all sources and frequencies, we obtain


which is a (complex) vector of dimension . We construct the corresponding empirical data vector, , from acquired data. The optimization problem that must be solved is


We note that mathematically, what was just described is equivalent to the usual approach for DOT imaging using frequency modulated data. What is different is using the dynamical systems interpretation, as was developed in [13], so that we can leverage efficiencies from a systems theoretic perspective on solving the parametric inversion problem.

2.2 A systems theoretic perspective on parametric inversion

From (LABEL:FullOrdTransFnc) and (LABEL:eq:TotalObserv), it follows that a single evaluation of involves computing (for all and )


where is the frequency response defined in (LABEL:FullOrdTransFnc) and where is the column of the identity matrix; i.e., excites the source location. The key observation is that as a result, an objective function evaluation at parameter vector requires solving the block linear systems



It is important to note that each column of the matrix is, in our application, a multiple of the column of the identity matrix, corresponding to the source location.

To solve the nonlinear inverse problem (LABEL:eq:InvProb) for the parameters, the Jacobian is constructed using an adjoint-type (or co-state) approach that exploits the fact that the number of detectors is roughly equal to the number of sources, as discussed in [19] and [26, p. 88]. Using (LABEL:FullOrdTransFnc) and (LABEL:eq:CYi_om_j) and differentiating with respect to the th component of ,


where, for each and any we can compute from


Here, the columns of also correspond to columns of the identity matrix, with a non-zero entry appearing at the th detector location.

The matrices need to be computed only once for all parameters. Except for the cost of computing (which was completed already for the function evaluation), the computational cost of evaluating the Jacobian at a parameter vector, for all , consists mainly of the cost of computing the solution to the block systems (LABEL:eq:adj).

In sum, the critical bottleneck in solving the inverse problem is associated with repeatedly solving (LABEL:eq:fwd) and (LABEL:eq:adj).

2.3 Global Basis Computation

As done in [13] for the DOT problem, we therefore seek a surrogate function that is much cheaper to evaluate, yet provides a high-fidelity approximation to over parameters and frequencies of interest. The frequencies are dictated to us by the experimental set-up. The parameters of interest are those which would be chosen by the optimization routine if it were run using the full order model. Likewise, we require that is much cheaper to evaluate and that over the same range of arguments.

The surrogate parametric model is obtained using projection (see [10, 13]). Suppose and full rank matrices and are given. Assuming the full state evolves near the -dimensional subspace , one has . If we enforce a Petrov-Galerkin condition we obtain a reduced system with the reduced matrices given by


The reduced transfer function is then . The goal is to choose so that the transfer function evaluated at (likewise the Jacobian) will match (or be close to) the reduced transfer function evaluated at those parameters and frequencies of interest.

Referring to (LABEL:eq:fwd), we let


Then, ideally, we define to correspond to the left singular vectors corresponding to the non-zero singular values of the matrix . Similar steps are applied to construct from via (LABEL:eq:adj) for and . Taking the exact left singular vectors for non-zero singular values to define both ensures [8] that the corresponding reduced transfer function will match the transfer function evaluation at every for and . A similar result holds for the respective derivative computations.

The one-sided global basis approach, which is used in the context of DOT in [13], refers to using and in (LABEL:eq:PGProj). Clearly, if are symmetric (Hermitian), the reduced counterparts defined in (LABEL:eq:PGProj) are also symmetric (Hermitian). For the use of model reduction in other optimization and inverse problem applications, we refer the reader to [5, 20, 22, 3, 4, 15, 7, 11, 28] and the references therein.

In the preceding description, we have assumed (as elsewhere in the literature), that several full order problems for the () have been solved prior to selecting the global basis. In the context of solving our inverse problem, however, we do not a priori pick parameter values, but rather use the parameter sequence defined at the start of the optimization to construct from the required full order model (FOM) solves. It was observed in [13] for our DOT application that then the singular values of () fall off by several orders of magnitude. Thus, truncated SVDs are used to obtain the which are subsequently concatenated to form one large projection matrix . This means that in first solving for all the , we must have computed redundant information that we subsequently must squeeze out through via SVDs on the concatenated matrices. The purpose of this paper is to construct an approximate symmetric global basis matrix without first solving each linear system in LABEL:eq:fwd and LABEL:eq:adj so that we can also avoid the additional post processing step of computing truncated SVDs.

First, however, we show in §LABEL:sec:symmetry how we can capitalize on the structure of the system matrix in our case to design a ROM scheme on a slightly different system matrix/variables. Then in Section LABEL:sec:recycle we exploit this property to develop a particularly efficient algorithm and corresponding analysis for generating columns of the projection matrix .

3 Rewriting Transfer Function and Derivatives

In this section, we show that the structure of the linear system allows us to express the transfer function in terms of a symmetric positive definite matrix. This leads to a more efficient method for generating the global basis, and is of benefit in some theoretical arguments.

We follow the same discretization as in [21], in which we use second order centered differences away from the boundary, and first order discretization to implement the Robin boundary condition. The matrix appearing in the definition of the full order transfer function (LABEL:FullOrdTransFnc) in the 2D case, has the following block structure:


where we have ordered the unknowns such that the boundary unknowns (lexicographically ordered) appear first, followed by lexicographical ordering of internal points.

Furthermore, regarding the blocks in (LABEL:eq:block1),

  • is an invertible diagonal matrix,

  • has at most one nonzero per row, and these occur only in the first and last columns,

  • , although it has different entries, has the same sparsity pattern as .

We note that the matrix is not symmetric; however, as is shown in [21], the Schur complement (for the case), which is given by , is symmetric and positive definite. We will now investigate why this fact is particularly useful with regard to specification of the transfer function.

3.1 Transfer Function Revisited

The columns of and are scaled columns from an identity matrix. Since sources and detectors appear on the boundary, this means that partitioning conformably with as in (LABEL:eq:block1) we obtain

Since the sources and detectors are not co-located, it follows that .

Let us assume for ease of exposition that in (LABEL:eq:block1) above. In this case, it is the inverse of that appears in the definition of the transfer function. Let the matrix have the following block structure:

From , we have the following three expressions that will be helpful in rewriting the transfer function and in computing corresponding measurement derivatives:


It is straightforward to show that

Now , where is the discretization of the Laplacian at the internal nodes multiplied by the (constant) diffusion coefficient, and is a non-negative diagonal matrix. Thus is SPD, so we express it in terms of its eigendecomposition, , so that

by the Sherman-Morrison-Woodbury formula.

Since the sources and detectors are not co-located, and because is diagonal, it follows that

Importantly, the structure of the matrices involved means that and maintain the same structure as and : namely, they contain multiples of columns of the identity matrix, so those matrices can be considered as ‘effective’ sources and receivers.

Using a similar argument, it is also straightforward to show that if ,

This means that the transfer function (for the 0 frequency case) is in fact expressed using an SPD matrix to represent the system matrix (complex symmetric if is non-zero).

3.2 Systems Theoretic Interpretation

Even though the new expression in the last section was derived from matrix analysis, there is also a systems theoretic interpretation that will lead to the same expression.

Specifically, using the same reordering of unknowns as in the previous section, we observe that the singular matrix has the structure

so that the system is a differential-algebraic system and, due to the structure of , of index 1. If we partition the state vector conformably with the matrices , and , then becomes


If we solve for in (LABEL:eq:first) and plug the result into (LABEL:eq:second) and rearrange, we obtain

Now the same calculations in the previous subsection lead us to

Posing the two previous equations in the Fourier domain yields the transfer function

This manipulation here corresponds to decomposing the transfer function of a DAE as the sum of the strictly proper part and polynomial part. A satisfactory reduced model should match the polynomial component exactly. In the interpolatory MOR setting, this means that the strictly proper part of the reduced transfer function interpolates that of the full-order model. For details on interpolatory MOR of DAEs, we refer the reader to [18]. For the DOT problem we consider here, this derivation illustrates that the transfer function of the index-1 DAE does not contain any polynomial part and is strictly proper.

3.3 Derivative Computation Revisited

Again using for simplicity, define the matrix

We use the “vec” command to map a matrix in to a vector in by unstacking the columns of the argument from left to right. Note that the vector gives the column of the Jacobian matrix. Using ( LABEL:eq:fwd - LABEL:eq:adj), we can write

This calculation can be carried out with only reference to , as follows.

Since the boundary terms have no absorption, then under the same finite difference discretization scheme and ordering of unknowns as before,

for a diagonal matrix Thus,

Now using (LABEL:eq:Sone) and (LABEL:eq:Stwo), we have


This means that the necessary derivatives, for the 0 frequency case, can be computed from the same SPD matrix as the transfer function.

3.4 A New View on Generating the ROM

We have expressed the original transfer function and corresponding derivatives in terms of an SPD (for ) matrix of size . So, we look for a ROM corresponding to this slightly smaller system matrix and its system variables.

Following the discussion in Section LABEL:sec:background of the one-sided global basis projection approach, we need and we define222The as we generate it will typically not have orthonormal columns, hence the need to specify .


so, the reduced transfer function is .

Since is SPD we do not need to solve the forward and adjoint problems separately to generate . Instead, we can solve


for appropriate choices of parameters , and frequencies .

For the remainder of the paper, we will assume that only data for has been provided. This helps with the remaining exposition of our new approach. Indeed, only results for the zero frequency case are provided in [13]. Moreover, treatment of non-zero is not a trivial extension of the algorithm provided, and in the interest of both paper focus and space, we relegate extensions for non-zero to forthcoming work.

The objective in reduced order modeling is to create a surrogate transfer function, , that provides a high-fidelity approximation to as well as ensuring . The following theorem, which follows from [8], shows how to construct to guarantee this for the symmetric DOT-PaLs in the zero frequency case.

Theorem 3.1

Suppose is continuously differentiable in a neighborhood of . Let both and be invertible. If and are in , then the reduced parametric model satisfies and .

In accordance with the discussion at the end of §LABEL:ssec:globalcomp, the approach to obtaining for the solution to the DOT problem would consist of the following steps. Note that what follows is the approach of [13], but modified for our newly formulated transfer function representation.

  1. Solve the systems (LABEL:eq:solves) for for the first parameter vectors produced by the optimization;

  2. Concatenate the block solutions into a large block matrix

  3. Set to be the matrix of the first left singular vectors of the above matrix. This gives a reduced order model, (LABEL:eq:newreduced), of dimension .

Algorithm 1 Generate Symmetric Global Basis via Truncated SVD

On the one hand, we need to compute each , because we need this to compute function and Jacobian evaluations at steps through of the optimization problem. Clearly, as these are large and sparse systems with SPD matrices, we should employ a Krylov subspace algorithm to solve the individual systems. On the other hand, for suitable , some of the information that we generate and put into the concatenated matrix is redundant (or nearly so), as evidenced by the rapid decay of the singular values of the concatenated matrix. This means in terms of generating a global basis matrix , we have computed information that we don’t really need (adding to the cost) and we thus incur the cost of the postprocessing via rank-revealing information.

Therefore, in the next section, we propose a method that is iterative in nature and which has the two-fold advantage of minimizing the work for computing only what we need, in terms of a) approximating the where it’s needed for the optimization, b) generating an approximate global basis matrix by augmenting an initial estimate with only the most non-redundant information. The second part means we eliminate the need for step 3 above in the process for computing , because we build up to have columns as we go, rather than overbuild and and then compress to terms.

4 Inner-Outer Krylov Recycling

Summarizing the previous section, we see that with , , , the relevant information for generating the basis comes from the systems


for several values of in order to determine the global basis. (Note that the matrix has a new definition from that of Section LABEL:sec:background.) Moreover, since


the changes to the matrix as a function of parameter are restricted to the diagaonal.

Recycling for a sequence of systems of the form (LABEL:eq:blkeqn) was shown to be efficient in [21] in the context of optimization for shape parameters in diffuse optical tomographic imaging. In that work, the authors solve (LABEL:eq:blkeqn) for all parameters selected during the course of the optimization, using different recycle spaces for each right-hand side. However, the focus of this paper is different since we not only want to solve the (shorter) sequence of full order problems efficiently, we need to construct this global basis, and we want to do it in such a way that near redundant information is detected on the fly. In this section, we first cover the basic information about recycling for a single system, then introduce our new inner-outer recycling method which avoids adding nearly redundant information to and having to discard it (through a rank-revealing factorization) afterwards.

4.1 Recycling Basics

For this discussion, consider the linear system , with symmetric and . Let , be given such that and . The approximate solution in that minimizes the 2-norm of the residual is


which yields the residual that is orthogonal to . If this solution is not adequate, we expand the subspace as follows [12]. Let be the normalized initial residual. We use a Lanczos recurrence with and to generate the recurrence relation333We note that the matrix in the recurrence is formally equivalent to , but the right-most projector is suppressed for clarity.


where is tridiagonal, since is symmetric. Next, we compute the approximate solution in that minimizes the 2-norm of the residual, , as follows:


where denotes the first Cartesian basis vector in and . The minimization in (LABEL:eq:SmallMin) corresponds to a small least squares problem, whose solution requires only the QR decomposition of the submatrix , which can be easily updated from the QR decomposition at step . The block nature of the system means that the solution can be obtained by a three-step process: find that solves the projected minimization problem

compute that satisfies (LABEL:eq:SmallMin), set . In reality, is computed via short term recurrences (MINRES), so the are not explicitly stored (see also [27, 23]).

4.2 Global Basis Construction

We start by trying to kill two birds with one stone: if we have an estimate of the global basis matrix , we can investigate its use as a candidate recycling space. If the basis can be improved, we have more full order systems to solve and we should consider solving the remaining systems by recycling. However, may already have too many columns for it to prove computationally feasible to use as a recycle space.

4.2.1 Recycling on the Updated Equations

To keep the notation simple and consistent with the previous subsection, let be fixed and set , , and . Next, assume , find the QR factorization , and then set444In practice, is formed without inverting explicitly. , so that .

According to the previous recycling discussion, the optimal solution in is , and the initial residual is . If this initial residual is small in a relative sense, then there is no need to go further, we need no iteration. If the initial residual is not small enough, then according to the previous section, we should expand the search space by running Lanczos to form a basis for the Krylov subspace generated by the projected matrix and projected right-hand side . With this basis in the columns of , we find a an approximate solution in . However, unless the number of columns in , hence , is relatively small, we cannot afford to do this, because the reorthogonalization is too expensive! Moreover, this presupposes that we actually need to find an approximation to directly. In fact, as far as updating our global basis approximation is concerned, we need only information that is not already reconstructable from .

An important fact comes to light if we decompose using the orthogonal projector :


The vector is the correction to the initial guess . So, to obtain the incremental information to construct the global basis matrix, we should consider an iterative solution to (LABEL:eq:resi). As we have one such system for right-hand sides , and a squence of these as we iterate over the parameters, we can employ a recycling type of approach, if we choose our recycle space carefully.

Specifically, suppose is not already suitably small, so we want to solve (LABEL:eq:resi), or alternatively, we want to find

for suitable subspace 555Note that should not be , as then the solution is zero.. We cannot afford to use all the columns of as a recycle space in generating a to use, because of the cost of the orthogonalization against . Instead, we will use a subset of the columns for right-hand side to be the recycle space, which we’ll immediately need to expand.

Specifically, this amounts to finding , , and such that , where . Now we use where the are the Lanczos vectors for (compare to (LABEL:eq:eq01)).That is, we want to solve

Importantly, this choice for gives . Thus, we observe . Next we use the Lanczos recurrence with and to generate the recurrence relation


Now are found by (compare to (LABEL:eq:eqrecyc)) solving

Hence, , where and is generated by a short term recurrence. So . Since , the relevant incremental information about that cannot already be expressed using the columns of is . Therefore, we extend the global basis with this vector. We repeat this process for any for which the initial residual is not already small enough666The integer for which the solution estimate is good enough will vary depending on the system – that is, – but for ease in notation we have omitted the subscript on .. At most, for a fixed , we have added columns to , but in theory, we may add substantially fewer.

Now we consider the remaining issues: a) what to use as the initial guess to the global basis and b) how to choose the individual columns to specify the when we do need to solve (LABEL:eq:resi).

4.2.2 Identifying the Recycling Spaces

In [21] when working on a different parametric inverse model problem for DOT, the authors observed that a good recycle subspace for one right-hand side did not necessarily make a good recycle space for the next right-hand side. Instead, they employed a different recycle space for each right-hand side in the sequence, with the caveat that the recycle spaces did have a common subspace pertaining to a specific (approximate) invariant subspace. That invariant subspace corresponded to the smallest eigenvalues, because that subspace remained relatively unchanged through the optimization process.

Even though in the present paper we are using a different image parameterization than what was used in [21], we also observe that the invariant subspace due to the smallest several eigenvalues for the remains unchanged. Thus, we adopt the same approach here. We use a different recycle space