A Appendix

# Beyond black-boxes in Bayesian inverse problems and model validation: applications in solid mechanics of elastography

## Abstract

The present paper is motivated by one of the most fundamental challenges in inverse problems, that of quantifying model discrepancies and errors. While significant strides have been made in calibrating model parameters, the overwhelming majority of pertinent methods is based on the assumption of a perfect model. Motivated by problems in solid mechanics which, as all problems in continuum thermodynamics, are described by conservation laws and phenomenological constitutive closures, we argue that in order to quantify model uncertainty in a physically meaningful manner, one should break open the black-box forward model. In particular we propose formulating an undirected probabilistic model that explicitly accounts for the governing equations and their validity. This recasts the solution of both forward and inverse problems as probabilistic inference tasks where the problem’s state variables should not only be compatible with the data but also with the governing equations as well. Even though the probability densities involved do not contain any black-box terms, they live in much higher-dimensional spaces. In combination with the intractability of the normalization constant of the undirected model employed, this poses significant challenges which we propose to address with a linearly-scaling, double-layer of Stochastic Variational Inference. We demonstrate the capabilities and efficacy of the proposed model in synthetic forward and inverse problems (with and without model error) in elastography.

\LetLtxMacro\oldtextsc\fvset

fontsize= \creflabelformatequation#2#1#3 \glsdisablehyper \newacronymVIvivariational inference \newacronymKLklKullback-Leibler \newacronymELBOelboevidence lower bound \newacronymMCMCmcmcMarkov chain Monte Carlo

Keywords: Uncertainty quantification; Variational inference; Inverse problems; Model error; Stochastic optimization; Bayesian modeling

## 1 Introduction

The extensive use of large-scale computational models poses several challenges in model calibration and validation [8, 25]. Traditionally in numerical simulations, the emphasis has been placed on decreasing the truncation/discretization errors. Nevertheless, the fidelity of the predictions of such simulations depends strongly on assigning proper values to the model parameters as well as utilizing high-fidelity models. This in turn necessitates a data-driven approach where elaborate computational models are fused with data, originating either from experiments/measurements or from models of higher fidelity (e.g. molecular dynamics). This process is naturally fraught with significant uncertainties. One such source is obviously the noise in the data which constitutes probabilistic estimates more rational. This is particularly important when multiple hypotheses are consistent with the data or the level of confidence in the estimates produced needs to be quantified. Another source of uncertainty, which is largely unaccounted for, is model uncertainty [77, 35, 48]. Namely, the parameters which are calibrated are associated with a particular forward model (in our case a system of (discretized) PDEs consisting of conservation laws and constitutive equations or closures) but one cannot be certain about the validity of the model employed. In general, there will be deviations between the physical reality where measurements are made, and the idealized mathematical/computational description.

An application that motivates this work comes from biomechanics and the identification of the mechanical properties of biological materials, in the context of non-invasive medical diagnosis (elastography). While in certain cases mechanical properties can also be measured directly by excising multiple tissue samples, non-invasive procedures offer obvious advantages in terms of ease, cost and reducing the risk of complications to the patient. Rather than x-ray techniques which capture variations in density, the identification of stiffness, or mechanical properties in general, can potentially lead to earlier and more accurate diagnosis [76, 40], provide valuable insights that differentiate between modalities of the same pathology [26], monitor the progress of treatments and ultimately lead to patient-specific treatment strategies.

All elastographic techniques consist of the following three basic steps [34] : 1) excite the tissue using a (quasi-)static, harmonic or transient source, 2) (indirectly) measure tissue deformation (e.g. displacements, velocities) using an imaging technique such as ultrasound [80], magnetic resonance [72] or optical tomography [56], and 3) infer the mechanical properties from this data using a suitable continuum mechanical model of the tissue’s deformation. Indirect or iterative or model-based methods for solving the latter problem (in contrast to direct methods [3]) admit an inverse problem formulation where the discrepancy [42]) between observed and model-predicted deformations is minimized with respect to the material fields of interest [74, 33, 1, 79, 34]. While these approaches utilize directly the raw data, they generally imply an increased computational cost as the forward problem, and potentially parametric derivatives, have to be solved/computed several times. This effort is amplified when stochastic/statistical formulations are employed as those arising from the Bayesian paradigm.

The solution of such model calibration (and validation) problems in the Bayesian framework is hampered by two main difficulties. The first pertains to their computational efficiency and stems from the poor scaling of traditional Bayesian inference tools with respect to the dimensionality of the unknown parameter vector - another instance of the curse-of-dimensionality. In elastography, the model parameters of interest (i.e. material properties) exhibit spatial variability which requires fine-discretization in order to be captured. This variability can also span different scales [60, 36]. Standard Markov Chain Monte Carlo (MCMC, [44]) techniques require an exorbitant number of likelihood evaluations (i.e. solutions of the forward model) in order to reach convergence [87, 86, 68, 83]. As each of these calls implies the solution of very large systems of (non)linear and potentially transient, equations, it is critical to minimize their number particularly in time-sensitive applications. Advanced sampling schemes, involving adaptive MCMC [63, 50, 20] and Sequential Monte Carlo (SMC, [69, 60, 27]), exploit the physical insight and the use of multi-fidelity solvers in order to expedite the inference process. Nevertheless, they fail to address fundamental challenges as the number of forward calls can still be in the order of tens of thousands. Several attempts have also been directed towards using emulators, surrogates or reduced-order models of various kinds [67, 16, 88, 10, 31, 62] but such a task is severely hindered by the input dimensionality. The use of first and second-order derivatives has also been advocated either in a standard MCMC format or by developing advanced sampling strategies. These are generally available by solving appropriate adjoint problems which are well-understood in the context of deterministic formulations [38, 41, 66, 15, 82]. More recent treatments, attempt to exploit the (potentially) lower intrinsic dimensionality of the target posterior by identifying subspaces where either most of the probability mass is contained [39] or where maximal sensitivity is observed [23, 93, 24, 22]. This enables inference tasks to be performed on much lower dimensions which are not hampered by the aforementioned difficulties. Generally all such schemes construct such approximations around the MAP point by employing local information (e.g. gradients) and therefore are not suitable for multi-modal or highly non-Gaussian posteriors.

The second challenge facing Bayesian inverse problems pertains to the model structure itself. This is especially important in the context of biomechanics applications where inferring model parameters associated with an incorrect model can lead to incorrect diagnosis and prognosis1. Despite progress in Bayesian model calibration tasks [48], the issue of model validation still poses many open questions, which, if not adequately addressed, can lead to biased and over-confident parameter estimates and predictions [14]. Most efforts have been directed towards providing quantitative comparisons between competing models. In the latter context, Bayes factors [54] provide a rigorous means of model comparison. Nevertheless, apart from the computational difficulties involved, they do not reveal the key driver behind model uncertainty nor do they quantify its effect on the predictions of the model. Alternatively, the most widely-adopted statistical strategy involves augmenting the relation between data and model predictions with a Gaussian process, which accounts in an additive fashion, for the model error’s contribution [55, 47]. However, this explicit additive term may violate physical constraints (e.g. conservation of mass, energy), can get entangled with the measurement error, is not physically interpretable and cumbersome or impractical to infer when it depends on a large number of input parameters [5, 7, 61, 94, 89].

The present paper extends previous work [61] towards developing a novel modeling framework and a set of scalable algorithms that will address the two main challenges in model calibration and validation in PDE-based models, i.e. a) the significant computational cost in problems with an expensive, black-box forward model, and b) the quantification of structural, model uncertainty and its effect on model calibration and predictive estimates. This paper advocates a new paradigm that goes beyond the standard black-box setting (Figure 2) employed thus far. Such a shift is necessitated by the need to quantify model discrepancies in a physically relevant manner. It is based on bringing all model equations in the forefront and, in a Bayesian fashion, quantifying their relative validity using the language of probabilities. As we show in the next sections, this recasts the solution of, even deterministic, forward problems as problems of probabilistic inference. More importantly, in the context of inverse problems, it leads to an augmented posterior that involves all of the state variables of the forward model. Nevertheless, it untangles their complex relations into local terms (Figure 2) and yields a well-posed inverse problem, even in cases when the forward problem is not (e.g. due to incomplete boundary conditions). We demonstrate the potential of this framework in the context of biomechanics where the solution of the aforementioned issues can significantly impact progress in the non-invasive, diagnostic capabilities and assist in the development of patient-specific treatment strategies.

The paper is organized as follows. In \crefsec:probabilistic_cm, we present the proposed probabilistic framework in the context of linear elastostatics and discuss extensions to nonlinear conservation laws. We introduce an undirected probabilistic model that encapsulates all model equations and discuss the recasting of forward and inverse problems as probabilistic inference tasks. In \crefsec:variational_approx, we present an efficient computational framework for carrying out the aforementioned inference tasks based on Stochastic Variational Inference. The scheme introduced employs a twofold variational approximation which is capable of dealing with the intractable normalization constant in the model-informed prior density. Lastly, the feasibility of the framework is demonstrated by several numerical examples in \crefsec:examples.

## 2 Proposed modeling framework

We begin the elaboration of the proposed model in the context of linear, elliptic PDEs as those arising in small deformation, linear elasticity. Similar models appear for example in heat diffusion or Darcy flow and can be treated similarly even though the physical meaning of the state variables is different. Furthermore, the extension to nonlinear elliptic PDEs as those for example appearing in the context of large deformation, nonlinear elasticity (geometric and constitutive nonlinearities) requires some technical alterations which are not discussed. Further extensions to time-dependent, nonlinear models, which are also of interest (e.g. harmonic and transient elastography), are deferred to a future paper. With regards to the notations adopted we generally adhere to the following rules:

• with boldface, we denote vectors/vector fields.

• with we denote scalar/vector fields.

• with upper-case, we denote random vectors/variables.

• with lower-case, we denote values taken by vectors/variables.

Almost all problems in continuum thermodynamics share a common structure which consists of:

• conservation law(s) that arise from physical principles and are generally well-founded and trusted. In the case of biomechanics this amounts to the conservation of linear momentum:

 ∇⋅~σ(x)=0∀x∈Ω, (1)

where is the Cauchy stress tensor at a point in the problem domain . Conservation laws serve as the skeleton in many other problems i.e. the conservation of mass/energy in the case of diffusion/advection of mass or heat flow through a (porous) medium. Discretized versions of the aforementioned PDEs are employed, which naturally introduce discretization error.

• equation(s) of state/constitutive law(s)/closure(s). Constitutive equations refer in general to relations between conjugate thermodynamic variables, such as the stress and strain tensors in bio/solid mechanics. In linear elasticity such relations can be expressed as:

 ~σ(x)=C(x):~ϵ(x)∀x∈Ω, (2)

where is the elasticity tensor and the inifnitesimal strain tensor, which relates to the displacements as follows:

 ~ϵ(x)=12(∇~u(x)+(∇~u(x))T)=∇S~u(x). (3)

Similar constitutive relations have been established between velocity and pressure in flow through permeable media, or flux and temperature in heat diffusion, involving permeability or conductivity tensors. In the context of elastography, our goal is to estimate and its spatial variability. Constitutive models are largely phenomenological and therefore represent the primary source of model error/uncertainty.

• observables in the form of boundary/interior conditions or initial conditions in time-dependent problems. In a typical, static forward problem of biomechanics these consist of boundary displacements (Dirichlet boundary conditions) on the part of the boundary :

 ~u(x)=u0(x)∀x∈ΓDBC (4)

and, potentially also, traction (Neumann) boundary conditions on the part of the boundary :

 Missing dimension or its units for \hskip (5)

where denotes the unit outward normal vector. These boundary tractions/displacements might be specified deterministically or stochastically and the well-posedness of the forward problem necessitates that and . In the context of the inverse problems considered these are complemented by observations of interior displacements which we denoted by . These are obtained by employing image processing techniques to the undeformed and deformed (e.g. ultrasound) images of the tissue ([90]). In general, the observables (and potentially also ) are contaminated by noise and represent the primary source of observation error.

Existing deterministic or stochastic (Bayesian) strategies for the solution of the associated inverse problem are based on a premise of a perfect model (up to discretization errors) and a well-posed forward problem i.e. the specification of boundary conditions that ensure the existence/uniqueness of solution. In bio/solid mechanics, a forward solver is usually obtained, upon discretization of the governing equations, in terms of displacements and of the constitutive parameters in . Traditional Bayesian formulations [55, 47] postulate a relation for the observed displacements of the form:

 uobs=u(C)+η,η∼N(0,σ2ηI), (6)

where is the observation noise. Solution strategies treat the forward solver that computes the parameter-to-state map as a black-box (Figure 2). Often, gradients are available through adjoint formulations [75]. Nevertheless, the number of forward solutions required can become extremely large and scales poorly with the dimension of the unknowns as discussed in the introduction. In addition to these deficiencies, such a formulation lacks the ability to quantify model errors and while estimates for can always be obtained, these are obviously invalid if they are based on an incorrect model. The use of an additional term of the form or in Equation (6) as in [55] may violate physical constraints, can get entangled with the measurement error, is not physically interpretable and cumbersome or impracticable when is high-dimensional as in cases of practical interest [70]. Furthermore, when uncertainty is present with regards to the boundary conditions, these must also be included in the vector of unknowns and be inferred from the data, which introduces additional difficulties.

In contrast, our strategy attempts to break open the black-box model (Figure 2) and bring to the surface all quantities/fields involved in the mathematical description of the physical process of deformation. Under this framework, the solution of both forward and inverse problems is recast as one of statistical inference [30, 45, 46, 19] and we attempt to find all latent, unobserved quantities that are compatible not only with the observables, but with the physics-based-model equations as well. Their reliability, or absence thereof, is expressed in terms of probabilities, allowing one to quantify all error sources such as discretization and structural, model errors.

The formulation advocated consists of three pivotal components which are discussed in the sequel, namely:

• an augmented prior and posterior distribution dictated by the model equations (section 2.1).

• the representation (discretization) of the unknown state variables (section 2.2).

• the solution as probabilistic inference (section 2.3).

### 2.1 Augmented prior/posterior densities

In contrast to existing Bayesian formulations which prescribe prior densities merely encapsulating beliefs (in our case, the elastic material properties ), we advocate a prior model that incorporates the relations and dependencies between system states as implied by the governing equations, i.e. conservation & constitutive laws. In particular, we view each of those equations as a source of information 2 and we employ a model interrogation scheme [19] in order to extract it. For the conservation of linear momentum (Equation (1)) and without loss of generality, these interrogations can be performed using the Method of Weighted Residuals (MWR, [37]), according to which, for each weighting function such that on , the weighted residual is (using indicial notation):

 r(ie)e(~σ(x))=∫Ωw(ie)j\leavevmode\nobreak ~σji,i\leavevmode\nobreak dV=∫ΓNBC\leavevmode\nobreak w(ie)j~σjini\leavevmode\nobreak dΓ−∫Ωw(ie)j,i~σji\leavevmode\nobreak dV=∫ΓNBC\leavevmode\nobreak w(ie)jt0,j\leavevmode\nobreak dΓ−∫Ωw(ie)j,i~σji\leavevmode\nobreak dV(from Equation (???)). (7)

Rather than setting each of these residuals to zero, we use them to induce a probability measure on candidate stress fields . Such measures should contract to a Dirac measure around the true, but unknown, stress field, as . Our strategy resembles conceptually the emerging field known as Probabilistic Numerics [45]. In particular, if is analogous to the numerical tolerance employed in deterministic schemes to enforce zero residuals, we can define a Gaussian probability density for which in turn implies a probability measure on the space of candidate solutions . The combination of such relations yields a density for of the form 3 (we omit dependence on for simplicity):

 pe(~σ(x))∝ψ1(~σ(x))(see Figure ???)=∏Neie=1N(r(ie)e(~σ(x))\leavevmode\nobreak ;0,ϵ2). (8)

This scores candidate stress fields according to their satisfaction of the governing equation (in the MWR form). Fields that yield residuals get the highest score and, the more weighting functions are used, the narrower becomes the region in the space of trial solutions where the probability mass is concentrated. We note at this stage that the MWR considered is not at all restrictive as (at least) six methods (i.e. collocation, sub-domain, least-squares, (Petrov)-Galerkin, moments) can be considered as special cases by appropriate selection of the .

Enforcing any reliable model equation (e.g. strain-displacement relation, incompressibility) can also be done in a similar manner by artificially defining a probability measure as above which can degenerate to a Dirac measure as the precision parameter decays. This is particularly useful when it comes to the second piece in the puzzle which arises from the constitutive equations and requires the specification of the strain tensor . This can be indirectly expressed by a probabilistic enforcement of the strain-displacements relation in Equation (3) or directly in terms of the displacement field . If one enforces Equation (2) point-wise at , then at each of those points, the residual :

 r(ic)c(~σ(x),~u(x),C(x))=~σ(x(ic))−C(x(ic)):∇S~u(x(ic)), (9)

which quantifies the discrepancy between the actual stresses and the constitutive model predicted stresses, is zero only in the case of a perfect model. As the validity of the model is unknown a priori, we propose a hierarchical prior of the form:

 p(r(ic)c|Λic)=N(0,Λ−1icI). (10)

The simplest such model utilizes a single hyper-parameter per point , which reflects the magnitude of the constitutive model error. Multivariate representations as well as more complex model-error distributions are also possible. Large values of indicate locations where the model error is high and vice versa. We emphasize that nonzero values of do not arise due to variability in the material parameters , but rather due to the inadequacy of the constitutive model to provide sufficient closures to the governing equations. As discussed earlier, Equation (10) employed over all enforcement points implies a (joint) probability density over the fields (see Figure 3):

 pc(~σ(x),~u(x),C(x)|\leavevmode\nobreak Λ)∝ψ2(~σ(x),~u(x),C(x)),Λ={Λic}Ncic=1=∏Ncic=1N(r(ic)c(~σ(x),~u(x),C(x));\leavevmode\nobreak \leavevmode\nobreak 0,Λ−1icI). (11)

The aforementioned densities can be straightforwardly complemented by “traditional” priors employed in canonical Bayesian formulations [4], e.g.:

 pC(C(x)|θC)∝ψ3(C(x))(see Figure ???) (12)

for the unknown material property field with hyperparameters . Similarly, if other prior information is available about the other state variables , these can be incorporated e.g. as:

 pσ(~σ(x)|θσ)∝ψ4(~σ(x)) (13)

and:

 pu(~u(x)|θu)∝ψ5(~u(x)). (14)

The combination of Equations (8), (11) as well as (11), (13), (14), define a joint prior for the state variables of the following form (we omit hyperparameters for simplicity):

 p(~σ(x),~u(x),C(x)|Λ)=1Z(Λ)\leavevmode\nobreak ∏5k=1ψk(~σ(x),~u(x),C(x))=πΛ(~σ(x),~u(x),C(x))Z(Λ)=1Z(Λ)\leavevmode\nobreak pe(~σ(x))\leavevmode\nobreak \leavevmode\nobreak pc(~σ(x),~u(x),C(x))×\leavevmode\nobreak \leavevmode\nobreak pC(C(x)|θC)pσ(~σ(x)|θσ)pu(~u(x)|θu), (15)

where is the normalization constant 4.

We note that the prior model implies an undirected probabilistic graphical model [59] which takes the form of a factor graph between state variables (Figure 3). We emphasize that this does not rely on the availability of a well-posed forward model (e.g. one would notice the absence of boundary conditions since these are data and will be treated in the likelihood), but rather invokes (in a probabilistic fashion) relationships between the system’s states as suggested by the model’s equations. More importantly, it relieves the formulation of the black-box, input-output relation as in Equation (6). Dependencies between all state variables appear explicitly in the factors making up Equation (15). As explained in section 2.3, this will allow significant accelerations in the inference task, despite the augmented state space in comparison to standard deterministic or Bayesian formulations. We should also note that the prior model effectively depends on the number of interrogations and of the governing equations. The larger become, the more tightly should the prior concentrate around the continuous, model-dictated values. In contrast, the smaller are, the flatter the prior becomes as the number of equations enforced is smaller. This provides an added advantage in the inference task (section 2.3), as these numbers define effectively a tempering schedule, from simpler to more complex densities [60].

The incorporation of various error sources in the prior model enables one to properly account for the observation error in the data. Given displacement data (interior or boundary) at locations , then:

 uobs,io=~u(x(io))+ηio,ηio∼N(0,σ2η),i0=1,…,No, (16)

where is the variance of the observation noise. We emphasize here a fundamental difference with the analogous Equation (6) used in traditional Bayesian formulations. Therein, the ’s express the difference between observables and displacements predicted by the model which is assumed to be perfect, whereas here the model-related equations have been sequestered in the prior (Equation (15)). The Gaussian assumption for the is not restrictive as in Equation (6), where it must simultaneously account for model errors. The equation above yields a likelihood:

 Unknown environment '%' (17)

which, as is suggested in Figure 3, gives rise to an augmented posterior density over all state variables as follows:

 p(~σ(x),~u(x),C(x)|uobs,Λ)∝∏6k=1ψk(~σ(x),~u(x),C(x))∝p(uobs|~u(x))% likelihood Equation (???)\leavevmode\nobreak p(~σ(x),~u(x),C(x)|Λ)prior Equation (???) (18)

Despite the superficial complexity of the posterior density above, we note that this consists of large products, of tractable, local terms nevertheless. The absence of any “black-box” terms is the key to facilitating the inference process.

### 2.2 Representation of the unknown state variables

The presentation of the augmented prior/posterior densities in the previous section was done (with some abuse of notation) in infinite dimensions, i.e. without presupposing a particular discretization of the latent state fields. A possibility for carrying out inference tasks is the utilization of Gaussian Processes that are frequently employed for nonparametric representations (e.g. [19]). We believe though that in the current setting such a choice would be impracticable due to the non-Gaussian terms in Equation (18). In order to carry computations, we advocate discretized representations which will unavoidably introduce discretization errors. We note here that such a representation of the latent fields , is not dependent on the discretization of the governing equations, which is controlled by the number of weighting functions (Equation (8)) and the number of (effectively collocation) points , where the constitutive equation is interrogated (Equation (11)). This decoupling is an additional advantage of the proposed strategy that can lead to adaptive, multi-resolution inference as discussed in the conclusions.

In the framework proposed, we seek all problem fields, i.e. the stresses , displacements , and constitutive model parameters and employ a (potentially overcomplete) dictionary of feature functions such that:

 ~u(x)=∑Muj=1Uj\leavevmode\nobreak fu,j(x),~C(x)=∑MCj=1Cj\leavevmode\nobreak fC,j(x),~σ(x)=∑Mσj=1Σj\leavevmode\nobreak fσ,j(x), (19)

where, in general, are vector-valued, second-order-tensor-valued and fourth-order-tensor-valued coefficients. For a fixed set of functions this would imply an augmented posterior on the state vector , where arising from the substitution of Equation (19) into the respective equations, e.g. Equation (15) or Equation (18).

On one hand, it is desirable to maximize the expressivity of such representations in order to minimize the discretization error. On the other, by increasing the dimensionality of the unknown parameters in the representation, not only one impedes computations, but can potentially lead to a multi-modal posterior, especially when the number of unknowns is much larger than the data and .

This poses an interesting model selection problem. One option to address this is to initiate the search with a small number of feature functions and progressively add more. These can be selected from a pool of candidates by employing appropriate criteria [9, 28]. Another alternative is to preselect a large, overcomplete set of feature functions and prune them by enforcing sparsity. In the Bayesian setting advocated, this can be readily achieved by employing sparsity-promoting hierarchical priors (e.g. spike-and-slab [52], automatic relevance determination [65, 100], horseshoe [18]). The underlying assumption is that the fields of interest are highly compressible, i.e. most of the coefficients are zero [32, 17, 64]. Sparsity as a principle of parsimony has successfully been applied in Bayesian formulations in various contexts such the representation of natural images [78] or the solution of PDE-based inverse problems [39] and a host of algorithms have been developed, not only for finding such representations, but also an appropriate dictionary for achieving this goal [64].

In this work, we adopt a simpler strategy that employs feature functions from traditional finite element analysis (FEA). In particular we employ irregular triangulations (in 2D) of the problem domain and feature functions that are constant within each triangle i.e.:

 fC,j(x)={1,if x∈Ωj0,otherwise,fσ,j(x)={1,if x∈Ωj0,otherwise. (20)

As for the feature function , we employ the piece-wise linear shape functions corresponding to linear triangular elements from finite element analysis [51]. A direct implication of this choice is that the coefficients correspond to the displacements at the vertices of each triangle (also called nodal displacements). Furthermore, the corresponding strain tensor computed from such a displacement field though Equation (3) would also be piece-wise constant (as and ). As a result, enforcement of the constitutive law as in Equation (11), can be effectively done on an element-by-element basis (i.e. the number of such contributions is at most equal to the number of elements/triangles ). We do not claim that such a representation is optimal or the sparsest possible and that it unavoidably introduces a discretization error. We note though that given a sufficiently fine triangulation, one can approximate arbitrary close the true stess, displacement and material parameter fields, which are generally not piece-wise constant, nor piece-wise linear. We assume that the discretization in the problems considered are sufficiently fine to ignore the discretization error which would be smaller than the other contributions.

The weighting functions employed in the MWR as in Equation (7) are also given by the piece-wise linear shape functions of linear triangular elements (as for ). Hence, each residual in Equation (7) corresponds to enforcing equilibrium of forces at each vertex/node. The locality of feature and weighting functions endows the formulation with the computational conveniences of FEA, particularly with the computation of residuals which can also be done on an element basis and make use of existing computer codes. In particular, we note that based on the aforementioned discussion:

• we denote with the vector for the discretized representation of the displacement field . It consists of the displacements at each of the vertices/nodes, i.e. . The strains obtained from Equation (3) will be piece-wise constant. We denote by the strains corresponding to each element . These can be readily computed from the nodal displacements of the element, say , by employing the well-known in FEA strain-displacement matrix as .

• we denote with the vector for the discretized representation of the elastic tensor field . It can also be partitioned on an element/triangle basis as .

• we denote with the vector for the discretized representation of the stress field . Given the piece-wise constant nature of this representation, it can also be partitioned on an element/triangle basis as .

• the enforcement of the conservation of linear momentum as in Equation (7) for each weighting function can be expressed as:

 r(ie)e(~σ(x))=fie−bTieΣ, (21)

where and . The vector is constant due to the discretization choices made and can be pre-computed along with the scalar . The assembly of all such residuals for as in Equation (8) gives rises to the following factor potential :

 ψ1(Σ)=e−12ϵ2||f−BTΣ||2. (22)
• the enforcement of the constitutive law as in Equation (9) over each triangle/element can be expressed as:

 r(e)c(~σ(x),~u(x),C(x))=Σe−CeBeUe. (23)

The assembly of all such residuals for as in Equation (11) gives rise to the following factor potential :

 ψ2(Σ,C,U;\leavevmode\nobreak Λ)=nel∏e=1e−Λe2||Σe−CeBeUe||2. (24)
• the priors on the material parameters will be specialized in the applications. We denote by the corresponding factor potential as in Equation (12).

• we employ vague Gaussian priors on and as in Equation (13) and Equation (14) respectively. Their role is not to reflect any prior information (which is unavoidably problem dependent), but rather to ensure the integrability of the corresponding augmented prior in Equation (15). The corresponding factor potentials are:

 ψ4(Σ)=e−θσ2||Σ||2,ψ5(U)=e−θu2||U||2, (25)

where .

Adaptations to one or three dimensions are straightforward where one can employ the machinery of linear or trilinear hexahedral elements respectively.

### 2.3 Rephrasing forward/inverse problems as probabilistic inference tasks

In this section, we recapitulate the framework introduced in section 2.1 in view of the discretizations adopted in section 2.2, and, more importantly, demonstrate how forward and inverse problems can be solved as probabilistic inference tasks, without ever resorting to the black-box PDE solver.

Consider first a typical forward problem where one is given:

• the conservation law of Equation (1), which is enforced as in Equation (7) with (in practice, very small values are prescribed).

• the constitutive law of Equation (2), which is enforced as in Equation (9) and for the purpose of solving the forward problem is assumed to be valid (i.e. ). In addition, the elasticity tensor is prescribed, i.e. in the discretized representation where is known.

• boundary conditions as in Equation (4) or Equation (5). For the sake of simplicity, suppose only Dirichlet boundary conditions are prescribed (i.e. in Equation (4)), which in the discretized represenation of the displacement field implies that part of the vector is known. If is partitioned to interior and boundary displacements, then where is known.

The augmented prior presented in Equation (15) and adapted in section 2.2 can be expressed as:

 p(Σ,C,Ui,Ub|Λ)=1Z(Λ)\leavevmode\nobreak ψ1(Σ)ψ2(Σ,C,Ui,Ub;\leavevmode\nobreak Λ)ψ3(C)ψ4(Σ)ψ5(Ui,Ub)=πΛ(Σ,C,Ui,Ub)Z(Λ), (26)

where:

 πΛ(Σ,C,Ui,Ub)=ψ1(Σ)ψ2(Σ,C,Ui,Ub;\leavevmode\nobreak Λ)ψ3(C)ψ4(Σ)ψ5(Ui,Ub) (27)

and:

 Z(λ)=∫ψ1(σ)ψ2(σ,c,ui,ub;\leavevmode\nobreak λ)ψ3(c)ψ4(σ)ψ5(ui,ub)\leavevmode\nobreak dσdc\leavevmode\nobreak dui\leavevmode\nobreak dub. (28)

Under the specifications of the forward problem (i.e. , and given ), the goal is to write the conditional density of the undetermined latent random variables , i.e.:

 p(Σ,Ui|C=c,Ub=u0,λ)∝p(Σ,C=c,Ui,Ub=u0|λ)p(C=c,Ub=u0|λ)=p(Σ,C=c,Ui,Ub=u0|λ)∫p(σ,C=c,ui,Ub=u0|λ)\leavevmode\nobreak dσ\leavevmode\nobreak dui=πλ(Σ,C=c,Ui,Ub=u0)∫πλ(σ,C=c,ui,Ub=u0)\leavevmode\nobreak dσ\leavevmode\nobreak dui. (29)

Hence, identifying the latent variables (i.e. stresses and interior displacements) is equivalent to finding the conditional augmented prior above. This, for example, can be performed by sampling (e.g. using MCMC or SMC), which, as can be seen from the form of the factor of potentials, involves explicit terms, i.e. no black-boxes and no calls to a PDE solver. Similarly, derivatives (first and second order) with respect to can be readily computed and used to facilitate the sampling/inference process. In the following section, we demonstrate the use of Stochastic Variational Inference tools, which can be used for that purpose and efficiently provide approximations of the sought conditional density.

Other types of boundary conditions can be readily treated in the same fashion. Furthermore, uncertainty propagation tasks can also be dealt efficiently. If, for example, random material properties were assumed and a density was prescribed, then one could readily sample values, say , for from this density and subsequently identify the conditional as in Equation (29) for each of these 5. We finally note the possibility of solving ill-posed forward problems, e.g. problems where boundary conditions are under- or over-specified. This is because the corresponding augmented density is still well-defined and therefore inference can be readily performed.

Consider now a typical inverse problem where one is given:

• the conservation law of Equation (1), which is enforced as in Equation (7) with (in practice, very small values are prescribed).

• the constitutive law of Equation (2), which is enforced as in Equation (9), but the validity of which is unknown (i.e. are unspecified). In addition, the constitutive parameters are unspecified, in which case in Equation (12) plays the role of the prior in canonical Bayesian formulations.

• boundary conditions as in Equation (4) or Equation (5). For the sake of simplicity suppose only Dirichlet boundary conditions are prescribed (i.e. in Equation (4)), which in the discretized represenation of the displacement field implies that .

• noisy interior displacements as in Equation (16), which give rise to a likelihood as in Equation (17) (upon discretization).

In this scenario, the unknown state variables consist of and . In addition one must also identify the constitutive model error hyper-parameters , which are effectively the connecting threads in the probabilistic web of relationships constructed - the same way constitutive laws are necessary to provide closure to the deterministic, PDE-based model. Rather than a hard enforcement though, they control the validity of the constitutive law at each interrogation point and must be learned, simultaneously with the other latent state variables. Clearly, such a problem is under-determined unless stress and strain data were simultaneously available. Critical to learning meaningful values for these hyperparameters is therefore the selection of an appropriate prior model. A vague such model would unnecessarily relax the threads connecting the state variables in the constitutive model, resulting in a posterior that is also artificially vague. To address this ill-posedness, we invoke an additional postulate of parsimony, i.e. that the constitutive law is correct, unless strong evidence in the data/observables suggests differently. This suggests that, ceteris paribus, solutions with more constitutive law residuals in Equation (9) being zero, should be favored. To that end, we advocate the use of Automatic Relevance Determination (ARD, [65, 100]), which is a heavy-tailed, hierarchical prior model, independently for each of the following form:

 p(Λic|α0,β0)=Gamma(Λic;α0,β0)=Γ(α0)βα00Λα0−1ic\leavevmode\nobreak e−β0Λic. (30)

Very small values for (in the numerical investigations was used) promote robust sparsity patterns [12].

The combination of the aforementioned hierarchical prior with the augmented posterior of Equation (18) yields the following density (based on Bayes’ rule):

 p(Σ,C,Ui,Λ|uobs,Ub=u0)=p(uobs|Σ,C,Ui,Λ,Ub=u0)\leavevmode\nobreak p(Σ,C,Ui,Λ|Ub=u0)p(uobs)=p(uobs|Ui)\leavevmode\nobreak p(Σ,C,Ui|Λ,Ub=u0)\leavevmode\nobreak p(Λ)p(uobs), (31)

where is the aforementioned likelihood, is the ARD defined in Equation (30) (we omit dependence on for simplicity) and can be found from the augmented prior of Equation (15) as:

 p(Σ,C,Ui|Λ,Ub=u0)=p(Σ,C,Ui,Ub=u0|Λ)p(Ub=u0|Λ)=p(Σ,C,Ui,Ub=u0|Λ)∫p(σ,c,ui,Ub=u0|Λ)\leavevmode\nobreak dσ\leavevmode\nobreak dc\leavevmode\nobreak dui=πΛ(Σ,C,Ui,Ub=u0)∫πΛ(σ,c,ui,Ub=u0)\leavevmode\nobreak dσ\leavevmode\nobreak dc\leavevmode\nobreak dui. (32)

We discuss suitable inference tools based on Stochastic Variational Inference in the next section.

## 3 Variational Inference

In the previous section, a probabilistic reformulation of the forward and inverse problems for linear elastostatics was proposed. The solution of canonical such problems amounts to inference tasks with respect to appropriate densities such as the ones in Equation (29) and Equation (32), over the unknown state variables and model parameters. While these densities do not require calls to any black-box, PDE solver, they are analytically intractable. We advocate the use of Variational Inference (VI) [6, 11] techniques, which, in contrast to Monte Carlo-based ones, are approximate albeit highly efficient. Such methods have risen into prominence for probabilistic inference tasks in the machine learning community [53, 2, 98, 81]. They yield approximations to the target density, which are obtained by solving an optimization problem over a family of appropriately selected probability densities with the objective of minimizing the Kullback-Leibler divergence [21] (subsections 3.1 and 3.2). The success of such an approach hinges upon the selection of appropriate densities that have the capacity of providing good approximations while enabling efficient optimization with regards to their parameters (subsection 3.3). In the sequel we discuss the application of VI to the problems of interest and in the last subsection 3.4 present the proposed algorithms.

### 3.1 VI for forward problems

We first review the basic aspects of the method in the context of solving a canonical forward problem as described in section 2.3, which reduces to the density in Equation (29). For economy of notation we denote with the latent variables and by the given/observed ones. Hence, in this case and . Let also denote the given/observed value of , which according to Equation (29) is . The goal is to approximate the intractable density:

 p(Y|V=v,λ)=πλ(Y,V=v)∫πλ(Y,V=v)\leavevmode\nobreak dY. (33)

Let a family of approximating densities parametrized by . We seek to determine the best possible approximation to by:

 minϕKL(q(Y;\leavevmode\nobreak ϕ)\leavevmode\nobreak ||\leavevmode\nobreak p(Y|V=v,λ)) (34)

By employing Jensen’s inequality one can establish that:

 Extra open brace or missing close brace (35)

Hence, minimizing the KL-divergence is equivalent to maximizing the Evidence Lower Bound (ELBO, [13]) .

Based on the form of (Equation (27)) one can obtain the following expression for the ELBO:

 Ffor(q(y;\leavevmode\nobreak ϕ))=∫q(y;\leavevmode\nobreak ϕ)logπλ(y,V=v)q(y;\leavevmode\nobreak ϕ)\leavevmode\nobreak dy=Eq(y;ϕ)[logψ1(σ)]+Eq(y;ϕ)[logψ2(σ,ui,C=c,Ub=u0;λ)]\leavevmode\nobreak \leavevmode\nobreak +Eq(y;ϕ)[logψ4(σ)]+Eq(y;ϕ)[logψ5(ui)]+H[q(y;ϕ)]=Lfor(ϕ), (36)

where is the entropy of . The term corresponding to was omitted as it does not depend on . We discuss the form of the employed in subsection 3.3 and the associated algorithmic steps (Algorithm 1) as well as provide additional details for the derivatives of the objective with respect to in A.1.

### 3.2 VI for inverse problems

We consider now the application of Variational Inference for the solution of the inverse problem, as reformulated in section 2.3. This involves the augmented posterior of Equation (32). We denote again with the latent state variables which in this case consist of . The observed/known state variables entail only . Furthermore, the model parameters controlling constitutive model errors are unknown. Due to the undirected nature [99, 71, 91, 92] of the proposed graphical model and the dependence of the normalization constant on (Equation (28)), we adopt a hybrid strategy where the full posterior of is approximated and point-estimates, corresponding to the Maximum-A-Posteriori (MAP) value, for are computed.

In particular, we consider the marginal posterior of and seek to maximize it using a Variational Expectation-Maximization scheme [6] that is based on the following lower-bound:

 logp(λ|Ub=u0,uobs)=log∫p(y,λ|Ub=u0,uobs)\leavevmode\nobreak dy=log∫q(y;ϕ)p(y,λ|Ub=u0,uobs)q(y;ϕ)\leavevmode\nobreak dy≥∫q(y;ϕ)logp(y,λ|Ub=u0,uobs)q(y;ϕ)\leavevmode\nobreak dy(from Jensen's inequality)=Finv(q(y;ϕ),\leavevmode\nobreak λ), (37)

where is a density with respect to the latent state variables. It can be readily shown [11] that the optimal , for which the inequality above becomes an equality, corresponds to the posterior . The latter does not necessarily belong to the family of approximating densities selected. Nevertheless, it suggests the following iterative scheme, where one alternates (until convergence) between the steps:

• E(xpectation)-step: The model parameters are fixed, and is maximized with respect to (so as it approximates as close as possible ).

• M(aximization)-step: The parameters , and therefore , remain fixed and is maximized with respect to the parameters .

As with all Expectation-Maximization schemes [29, 73] several relaxations of the aforementioned steps are possible such as improving, rather than maximizing, during any of the steps, partial updates of the parameters, etc.

Before we present the algorithmic details, we look more closely at the terms involved in . In particular, based on Equations (31) and (32), we obtain:

 Finv(q(y;ϕ),\leavevmode\nobreak λ)=∫q(y;ϕ)logp(y,λ|Ub=u0,uobs)q(y;ϕ)\leavevmode\nobreak dy=∫q(y;ϕ)logp(uobs|Ui)\leavevmode\nobreak p(σ,c,ui|λ,Ub=u0)\leavevmode\nobreak p(λ)p(uobs)\leavevmode\nobreak q(y;ϕ)\leavevmode\nobreak dσ\leavevmode\nobreak dc\leavevmode\nobreak dui=∫q(y;ϕ)logp(uobs