A Galerkin least squares approach forphotoacoustic tomography1footnote 11footnote 1Funding: Authors gratefully acknowledge the support of the Tyrolean Science Fund (TWF).

# A Galerkin least squares approach for photoacoustic tomography111Funding: Authors gratefully acknowledge the support of the Tyrolean Science Fund (TWF).

Johannes Schwab
Sergiy Pereverzyev Jr.
Markus Haltmeier222Corresponding author, markus.haltmeier@uibk.ac.at.
Department of Mathematics University of Innsbruck
Technikerstrasse 13, A-6020 Innsbruck, Austria
###### Abstract

The development of fast and accurate image reconstruction algorithms is a central aspect of computed tomography. In this paper we address this issue for photoacoustic computed tomography in circular geometry. We investigate the Galerkin least squares method for that purpose. For approximating the function to be recovered we use subspaces of translation invariant spaces generated by a single function. This includes many systems that have previously been employed in PAT such as generalized Kaiser-Bessel basis functions or the natural pixel basis. By exploiting an isometry property of the forward problem we are able to efficiently set up the Galerkin equation for a wide class of generating functions and devise efficient algorithms for its solution. We establish a convergence analysis and present numerical simulations that demonstrate the efficiency and accuracy of the derived algorithm.

Key words: Photoacoustic imaging, computed tomography, Galerkin least squares method, Kaiser-Bessel functions, Radon transform, least-squares approach.

AMS subject classification: 65R32, 45Q05, 92C55.

\setitemize

label= \setenumeratelabel=()

## 1 Introduction

Photoacoustic tomography (PAT) is an emerging non-invasive tomographic imaging modality that allows high resolution imaging with high contrast. Applications are ranging from breast screening in patients to whole body imaging of small animals [4, 45, 27, 58]. The basic principle of PAT is as follows. If a semitransparent sample is illuminated with a short pulse, then parts of the optical energy are absorbed inside the sample (see Figure 1.1). This causes a rapid thermoelastic expansion, which in turns induces an acoustic pressure wave. The pressure wave is measured outside of the sample and used for reconstructing an image of the interior.

In this paper we work with the standard model of PAT, where the acoustic pressure solves the standard wave equation

 ⎧⎪ ⎪⎨⎪ ⎪⎩∂2tp(x,t)−Δxp(x,t)=0, for (x,t)∈Rd×(0,∞),p(x,0)=f(x), for x∈Rd,∂tp(x,0)=0, for x∈Rd. (1.1)

Here is the spatial dimension, the absorbed energy distribution, the spatial Laplacian, and the derivative with respect to the time variable . The speed of sound is assumed to be constant and has been rescaled to one. We further suppose that vanishes outside an open ball . The goal of PAT is to recover the function from measurements of . Evaluation of is referred to as the direct problem and the problem of reconstructing from (possibly approximate) knowledge of as the inverse problem of PAT. The cases and are of actual relevance in PAT (see [29, 7]).

In the recent years several solution methods for the inverse problem of PAT have been derived. These approaches can be classified in direct methods on the one and iterative (model based) approaches on the other hand. Direct methods are based on explicit solutions for inverting the operator that can be implemented numerically. This includes time reversal (see [8, 24, 12, 43, 54]), Fourier domain algorithms (see [1, 20, 31, 46, 53, 59]), and explicit reconstruction formulas of the back-projection type (see [2, 11, 12, 15, 16, 18, 19, 30, 32, 41, 42, 61]). Model based iterative approaches, on the other hand, are based on a discretization of the forward problem together with numerical solution methods for solving the resulting system of linear equations. Existing iterative approaches use interpolation based discretization (see [9, 47, 48, 52, 62]) or approximation using radially symmetric basis functions (see [56, 57]). Recently, also iterative schemes using a continuous domain formulation of the adjoint have been studied, see [3, 5, 17]. Direct methods are numerically efficient and robust and have similar complexity as numerically evaluating the forward problem. Iterative methods typically are slower since the forward and adjoint problems have to be evaluated repeatedly. However, iterative methods have the advantage of being flexible as one can easily add regularization terms and incorporate measurement characteristics such as finite sampling, finite bandwidth and finite detectors size (see [9, 22, 51, 55, 56, 60]). Additionally, iterative methods tend to be more accurate in the case of noisy data.

### 1.1 Proposed Galerkin least squares approach

In this paper we develop a Galerkin approach for PAT that combines advantages of direct and model based approaches. Our method comes with a clear convergence theory, sharp error estimates and an efficient implementation. The Galerkin least squares method for consists in finding a minimizer of the restricted least squares functional,

 fN\coloneqqargmin{∥Wh−g∥∣h∈XN}, (1.2)

where is a finite dimensional reconstruction space and an appropriate Hilbert space norm. If is a basis of then , where is the unique solution of the Galerkin equation

 ANcN=(⟨WφkN,g⟩)k∈ΛN with AN\coloneqq(⟨WφkN,WφℓN⟩)k,ℓ∈ΛN. (1.3)

We call the matrix the (discrete) imaging matrix.

In general, both the computation of the imaging matrix as well as the solution of the Galerkin equation can be numerically expensive. In this paper we demonstrate that for the inverse problem of PAT, both issues can be efficiently implemented. These observations are based on the following:

• Isometry property. Using the isometry property of [11, 12] one shows that the entries of the system matrix are given by ; see Theorem 2.2.

• Shift invariance. If, additionally, we take the basis functions as translates of a single generating function , then for .

Consequently only inner products have to be computed in our Galerkin approach opposed to inner products required in the general case. Further, the resulting shift invariant structure of the system matrix allows to efficiently solve the Galerkin equation.

Note that shift invariant spaces are frequently employed in computed tomography and include splines spaces, spaces of bandlimited functions, or spaces generated by Kaiser-Bessel functions. In this paper we will especially use Kaiser-Bessel functions which are often considered as the most suitable basis for computed tomography [23, 34, 39, 44]. For the use in PAT they have first been proposed in [56]. We are not aware of existing Galerkin approaches for tomographic image reconstruction exploiting isometry and shift invariance. However, we anticipate that similar methods can be derived for other tomographic problems, where an isometry property is known (such as X-ray based CT [28, 40]). We further note that our approach has close relations to the method of approximate inverse, which has frequently been applied to computed tomography [21, 35, 36, 37, 49, 50]. Instead of approximating the unknown function using a prescribed reconstruction space, the method of approximate inverse recovers prescribed moments of the unknown and is somehow dual to the Galerkin approach.

### 1.2 Outline

The rest of this article is organized as follows. In Section 2 we apply the Galerkin least squares method for the inverse problem of PAT. By using the isometry property we derive a simple characterization of the Galerkin equation in Theorem 2.2. We derive a convergence and stability result for the Galerkin least squares method applied to PAT (see Theorem 2.3). In Section 3 we study shift invariant spaces for computed tomography. As the main results in that section we derive an estimate for the -approximation error using elements from the shift invariant space. In Section 4 we present details for the Galerkin approach using subspaces of shift invariant spaces. In Section 5 we present numerical studies using our Galerkin approach and compare it to related approaches in the literature. The paper concludes with a conclusion and a short outlook in Section 6.

## 2 Galerkin approach for PAT

Throughout the following, suppose , let denote the open ball with radius centered at the origin, and let denote the Hilbert space of all square integrable functions which vanish outside . For two measurable functions we write

 ⟨g1,g2⟩t\coloneqq∫∂BR(0)∫∞0g1(z,t)g2(z,t)tdtds(z), (2.1)

provided that the integral exists. We further denote by the Hilbert space of all functions with .

### 2.1 PAT and the wave equation

For initial data consider the wave equation (1.1). The solution of (1.1) restricted to the boundary of is denoted by . The associated operator is defined by

###### Lemma 2.1 (Isometry and continuous extension of ¯W).

1. For all we have .

2. uniquely extends to a bounded linear operator .

3. For all we have .

###### Proof.

1: See [11, Equation (1.16)] for even and [12, Equation (1.16)] for odd. (Note that the isometry identities in [11, 12] are stated for the wave equation with different initial conditions, and therefore at first glance look different from 1.)
2, 3: Item 1 implies that is bounded with respect to the norms of and and defined on a dense subspace of . Consequently it uniquely extends to a bounded operator . The continuity of the inner product finally shows the isometry property on . ∎

We call the acoustic forward operator. PAT is concerned with the inverse problem of estimating from potentially noisy and approximate knowledge of . In this paper we use the Galerkin least squares method for that purpose.

### 2.2 Application of the Galerkin method

Let and be families of subspaces of and , respectively, with . Further let denote the orthogonal projection on and suppose . The Galerkin method for solving defines the approximate solution as the solution of

 QNWfN=QNg. (2.2)

In this paper we consider the special case where , in which case the solution of (2.2) is referred to as Galerkin least squares method. The name comes from the fact that in this case the Galerkin solution can be uniquely characterized as the minimizer of the least squares functional over ,

 ΦN(fN)\coloneqq12∥WfN−g∥2t→minfN∈XN. (2.3)

Because is a quadratic functional on a finite dimensional space and is injective, (2.3) possesses a unique solution. Together with the isometry property we obtain the following characterizations of the least squares Galerkin method for PAT.

###### Theorem 2.2 (Characterizations of the Galerkin least squares method).

For and the following are equivalent:

1. ;

2. minimizes the least squares functional (2.3);

3. For an arbitrary basis of , we have

 ANcN=dN (2.4)

where

• with ;

• ;

• .

###### Proof.

The equivalence of 1 and 2 is a standard result for the Galerkin squares method (see, for example, [26]). Another standard characterization shows the equivalence of 1 and 3 with the system matrix . Now, the isometry property given in Lemma 2.1 shows and concludes the proof. ∎

In general, evaluating all matrix entries can be difficult. For many basis functions an explicit expression for is not available including the natural pixel basis, spaces defined by linear interpolation, or spline spaces. Hence has to be evaluated numerically which is time consuming and introduces additional errors. Even if is given explicitly, then the inner products have to be computed numerically and stored. For large this can be problematic and time consuming. In contrast, by using the isometry property in our approach we only have to compute the inner products . Further, in computed tomography it is common to take as translates of a single function . In such a situation the inner products satisfy and therefore only a small fraction of all inner products actually have to be computed.

### 2.3 Convergence and stability analysis

As another consequence of the isometry property we derive linear error estimates for the Galerkin approach to PAT. We consider noisy data where the data is known to satisfy

 ∥Wf0−gδ∥≤δ, (2.5)

for some noise level and unknown . For noisy data we define the Galerkin least squares solution by

 fδN=argmin{∥Wh−gδ∥t∣h∈XN}. (2.6)

We then have the following convergence and stability result.

###### Theorem 2.3 (Convergence and stability of the Galerkin method for PAT).

Let , , satisfy (2.5) and let be defined by (2.6). Then, the following error estimate for the Galerkin method holds:

 ∥fδN−f0∥≤min{∥h−f0∥∣h∈XN}+√2Rδ. (2.7)
###### Proof.

We start with the noise free case . The definition of and the isometry property of yield

 f0N =argmin{∥Wh−g0∥t∣h∈XN} =argmin{∥Wh−Wf0∥t∣h∈XN} =argmin{∥h−f0∥∣h∈XN}

This shows and yields (2.7) for . Here and below we use to denote the orthogonal projection on a closed subspace .

Now consider the case of arbitrary , with where satisfies . Because is closed we can write

 gδ=(Wf0+PRan(W)(eδ))+PRan(W)⊥(eδ)\eqqcolonWfδ+PRan(W)⊥(eδ).

Following the case and using that one verifies that . Therefore, by the triangle inequality and the isometry property of we obtain

 ∥fδN−f0∥ ≤∥fδN−f0N∥+∥f0N−f0∥ =∥PXn(fδ−f0)∥+min{∥h−f0∥∣h∈XN} ≤√2R∥Wfδ−Wf0∥t+min{∥h−f0∥∣h∈XN}.

Together with this concludes the proof. ∎

The error estimate in Theorem 2.3 depends on two terms: the first term depends on the approximation properties of the space and the second term on the noise level . As easily verified both terms are optimal and cannot be improved. The second term shows stability of our Galerkin least squares approach. Under the reasonable assumption that the spaces satisfy the denseness property

 ∀f∈L2R(Rd):limN→∞min{∥h−f∥∣h∈XN}=0,

the derived error estimate further implies convergence of the Galerkin approach.

## 3 Shift invariant spaces in computed tomography

In many tomographic and signal processing applications, natural spaces for approximating the underlying function are subspaces of shift invariant spaces. In this paper we consider spaces that are generated by translated and scaled versions of a single function ,

 VT,s,φ\coloneqq¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯span({φkT,s∣k∈Zd})⊆L2(Rd). (3.1)

Here denotes the linear hull, stands for the closure with respect to of a set , and

 φkT,s(x)\coloneqq1Td/2φ(xT−sk) for T,s>0 and k∈Zd . (3.2)

We have chosen the normalization of the generating functions in such a way that for all . In this section we derive conditions such that any function can be approximated by elements in . Further, we present examples of generating functions that are relevant for (photoacoustic) computed tomography.

Any tomographic reconstruction method uses, either explicitly or implicitly, a particular discrete reconstruction space. This is obvious for any iterative procedure as it requires a finite dimensional representation of the forward operator that can be evaluated numerically. However, also direct methods use an underlying discrete image space. For example, standard filtered backprojection algorithms usually reconstruct samples of a bandlimited approximation of the unknown function. In such a situation, the underlying discrete signal space consists of bandlimited functions. In this paper we allow more general shift invariant spaces.

The following properties of the generating function and the spaces have been reported desirable for tomographic applications (see [44, 56]):

1. has “small” spatial support;

2. is rotationally invariant;

3. is a Riesz basis of ;

4. satisfies the so called partition of unity property.

Conditions 1 and 2 are desirable from a computational point of view and often help to derive efficient reconstruction algorithms. The properties 3 and 4 are of more fundamental nature as these conditions imply that any function can be approximated arbitrarily well by elements in as (with kept fixed; the so called stationary case). In [44] it has been pointed out that the properties 1-4 cannot be simultaneously fulfilled. This implies that for taking independent of , the spaces have a limited approximation capability in the sense that for a typical function , the approximation error does not converge to zero as and is kept fixed.

Despite these negative results, radially symmetric basis functions are of great popularity in computed tomography (see for example, [23, 33, 34, 39, 44, 57, 56]). In this paper we therefore propose to also allow the shift parameter to be variable. Under reasonable assumptions we show that the approximation error converges to zero for . This convergence in particularly holds for radially symmetric generating functions having some decay in the Fourier space, including generalized Kaiser-Bessel functions which are the most popular choice in tomographic image reconstruction.

### 3.1 Riesz bases of shift invariant spaces

Recall that the family is called a Riesz basis of if there exist such that

 ∀c∈ℓ2(Zd):A∥c∥2ℓ2≤∥∥∑k∈ZdckφkT,s∥∥2L2≤B∥c∥2ℓ2, (3.3)

where is the squared -norm of . A Riesz basis of can equivalently be defined as a linear independent family of frames and the constants and are the lower and upper frame bounds of , respectively. In the following we write for the -dimensional Fourier transform defined by for and extended to by continuity.

The following two Lemmas are well known in the case that (see [38, Theorem 3.4]). Due to page limitations and because the general case is shown analogously, the proofs of the Lemmas are omitted.

###### Lemma 3.1 (Riesz basis property).

The family is a Riesz basis of with frame bounds and , if and only if

 A(2π)d≤1sd∑k∈Zd|^φ(ξ+2πsk)|2≤B(2π)d for a.e. ξ∈[0,2πs]d. (3.4)
###### Proof.

Follows the lines of [38, Theorem 3.4]. ∎

The following Lemma implies that for any Riesz basis one can construct an orthonormal basis of that is again generated by translated and scaled versions of a single function .

###### Lemma 3.2 (Orthonormalization).

Let be a Riesz basis of .

1. orthonormal for a.e. .

2. is an orthonormal basis of , where is defined by

 ^θ(ξ)=sd/2^φ(ξ)(2π)d/2√∑k∈Zd|^φ(ξ+2πsk)|2. (3.5)
###### Proof.

Follows the lines of [38, Theorem 3.4]. ∎

According to Lemma 3.2, for theoretical purposes one may assume that the considered basis of is already orthogonal. From a practical point of view, however, it may be more convenient to work with the original non-orthogonal basis. The function may have additional properties such as small support or radial symmetry which may not be the case for . Also it may not be the case that is known analytically.

### 3.2 The L2-approximation error

We now investigate the -approximation error in shift invariant spaces,

 ∀f∈L2(Rd):minu∈VT,s,φ∥f−u∥L2=∥f−PT,sf∥L2, (3.6)

as well as its asymptotic properties. Here and in the following denotes the orthogonal projection on . It is given by , where is any orthogonal basis of . For the stationary case , the following Theorem has been obtained in [6].

###### Theorem 3.3 (The L2-approximation error).

Let be a Riesz basis of and define

 Eφ(s,Tξ)\coloneqq1−|^φ(Tξ)|2∑k∈Zd|^φ(Tξ+2kπ/s)|2 for ξ∈Rd and T,s>0. (3.7)

Then, for every with we have

 ∥PT,sf−f∥L2=[∫[−πTs,πTs]d|^f(ξ)|2Eφ(s,Tξ)dξ]12+Rφ(f,Ts), (3.8)

where the remainder can be estimated as

 Rφ(f,Ts)≤∥f∥Wr2(Tsπ)r ⎷∑n∈Zd∖{0}1∥n∥2r for T,s>0. (3.9)
###### Proof.

Let denote the orthonormal basis of the space as constructed in Lemma 3.2. Further, for every define and define functions by its Fourier representation

 ^fn(ξ)={^f(ξ) if ξ∈Qn,0 if ξ∉Qn.

Then we have and .

Now for every , we investigate the approximation error . We have . Further,

 ⟨fn,θkT,s⟩ =⟨^fn,^θkT,s⟩ =Td/2∫Qn^fn(ξ)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯^θ(Tξ)e−iTsξ\vbox\scalebox{.6}{∙}kdξ =Td/2∫[−πTs,πTs]d^fn(ξ−2πn/(Ts))¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯^θ(T(ξ−2πn/(Ts)))eiTs(ξ−2πn/(Ts))\vbox\scalebox{.6}{∙}kdξ =Td/2∫[−πTs,πTs]d^fn(ξ−2πn/(Ts))¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯^θ(T(ξ−2πn/(Ts)))eiTsξ\vbox\scalebox{.6}{∙}kdξ =Td/2^dn,−k,

where is the -th Fourier-coefficient of the -periodization of the function . Due to Parseval’s identity we have

 ∑k∈Zd| ⟨fn,θkT,s⟩|2 =Td∑k∈Zd|^dn,k|2 =Td(2π)d(sT)d∫Qn|^fn(ξ)|2|^θ(Tξ)|2dξ =∫Qn|^fn(ξ)|2|^φ(Tξ)|2∑k∈Zd|^φ(Tξ+2kπ/s)|2dξ.

Therefore we obtain

 ∥PT,sfn−fn∥2 =∫Qn|^fn(ξ)|2⎛⎝1−|^φ(Tξ)|2∑k∈Zd|^φ(Tξ+2kπ/s)|2⎞⎠dξ =∫Qn|^fn(ξ)|2Eφ(s,Tξ)dξ.

Next notice that for and we have . Therefore we can estimate

 ∥PT,sfn−fn∥≤(Tsπ)r(1∥n∥)r(∫Qn∥ξ∥2r|^fn(ξ)|2Eφ(s,Tξ)dξ)12.

Together with the triangle inequality and the Cauchy-Schwarz inequality for sums we obtain

 ∥ PT,sf−f∥ ≤∑n∈Zd∥PT,sfn−fn∥ ≤(∫Q0|^f(ξ)|2Eφ(s,Tξ)dξ)12+(Tsπ)r∑n≠01∥n∥r(∫Qn∥ξ∥2r|^fn(ξ)|2dξ)12 ≤(∫Q0|^f(ξ)|2Eφ(s,Tξ)dξ)12+(Tsπ)r⎛⎝∑n≠01∥n∥2r⎞⎠12(∫Rd∖Q0∥ξ∥2r|^f(ξ)|2dξ)12 ≤(∫Q0|^f(ξ)|2Eφ(s,Tξ)dξ)12+(Tsπ)r⎛⎝∑n≠01∥n∥2r⎞⎠12∥f∥Wr2.

Here the sum is convergent because . After recalling that , the above estimate yields (3.8). ∎

Note that the remainder in Theorem 3.3 satisfies as . Consequently, for every sequence we have if and

 ∫[−πTNsN,πTNsN]d|^f(ξ)|2(1−|^φ(TNξ)|2∑k∈Zd|^φ(TNξ+2kπ/sN)|2)=Eφ(sN,TNξ)dξ→0.

By Lebesgue’s dominated convergence theorem this holds if almost everywhere converges to as . In the following theorem we consider two possible sequences where this is the case. Note that Item 1 in that theorem is well known (see, for example, [38]), while Item 2 to the best of our knowledge is new.

###### Theorem 3.4 (Asymptotic behavior of Eφ).

Let .

1. Suppose that . Then, for every we have that almost everywhere if and only if

 1^φ(0)∑m∈Zdφ(x−ms)=(2π)d/2sd for almost every x∈Rd. (3.10)

Equation (3.10) is called the partition of unity property.

2. Suppose as for some . Let and be bounded sequences in with as . Then

 limN→∞Eφ(sN,TNξ)=0 for every % ξ∈Rd. (3.11)
###### Proof.

1 We have

 limT→0Eφ(s,Tξ)=0 for a.e. ξ∈Rd ⟺ limT→0∑k≠0|^φ(Tξ+2kπ/s)|2=0 for a.e. ξ∈Rd ⟺ ∀k∈Zd∖{0}:^φ(2kπ/s)=0 ⟺ ∀k∈Zd∖{0}:∫Rdφ(x)ei2πx\vbox\scalebox{.6}{∙}k/sdx=0 ⟺ ∀k∈Zd∖{0}:∫[0,s]d∑m∈Zdφ(x−ms)ei2πx\vbox\scalebox{.6}{∙}k/sdx=0 ⟺ ∑m∈Zdφ(x−ms)=(2π)d/2sd^φ(0) for a.e. x∈Rd\,.

2 As for there exist constants such that for we have . Further, for all and we have