[
Abstract
In this work we develop a Bayesian setting to infer unknown parameters in initialboundary value problems related to linear parabolic partial differential equations. We realistically assume that the boundary data are noisy, for a given prescribed initial condition. We show how to derive the joint likelihood function for the forward problem, given some measurements of the solution field subject to Gaussian noise. Given Gaussian priors for the timedependent Dirichlet boundary values, we analytically marginalize the joint likelihood using the linearity of the equation. Our hierarchical Bayesian approach is fully implemented in an example that involves the heat equation. In this example, the thermal diffusivity is the unknown parameter. We assume that the thermal diffusivity parameter can be modeled a priori through a lognormal random variable or by means of a spacedependent stationary lognormal random field. Synthetic data are used to test the inference. We exploit the behavior of the nonnormalized log posterior distribution of the thermal diffusivity. Then, we use the Laplace method to obtain an approximated Gaussian posterior and therefore avoid costly Markov Chain Monte Carlo computations. Expected information gains and predictive posterior densities for observable quantities are numerically estimated using Laplace approximation for different experimental setups.
Keywords: Linear Parabolic PDEs, Noisy Boundary Parameters, Bayesian Inference, Heat Equation, Thermal Diffusivity.
Fabrizio Ruggeri ^{1}^{1}1 CNR  IMATI, Consiglio Nazionale delle Ricerche, Milano, Italy fabrizio@mi.imati.cnr.it , Zaid Sawlan ^{2}^{2}2 Corresponding author. CEMSE, King Abdullah University of Science and Technology, Thuwal, 239556900, KSA zaid.sawlan@kaust.edu.sa , Marco Scavino ^{3}^{3}3 CEMSE, King Abdullah University of Science and Technology, Thuwal, 239556900, KSA marco.scavino@kaust.edu.sa , and Instituto de Estadística (IESTA), Universidad de la República, Montevideo, Uruguay mscavino@iesta.edu.uy and Raul Tempone ^{4}^{4}4 CEMSE, King Abdullah University of Science and Technology, Thuwal, 239556900, KSA raul.tempone@kaust.edu.sa
Bayesian Inference for Linear Parabolic PDEs]A hierarchical Bayesian setting for an inverse problem in linear parabolic PDEs with noisy boundary conditions
F. Ruggeri, Z. Sawlan, M. Scavino and R. Tempone 
1 Introduction
Parabolic partial differential equations model various important physical phenomena such as diffusion and heat transport. The solution of such equations propagates forward in time from an initial condition given boundary conditions and equation coefficients. In applications, some equation coefficients can be unknown quantities that need to be estimated. In addition, exact initial and boundary conditions might not be known. One possible approach to estimate these unknowns is to solve the inverse problem given some information about the solution. Classical inversion methods for parabolic partial differential equations are introduced by Samarskii in the Chapter 8 of their book.
In this work, we consider a Bayesian inversion problem to determine the coefficients of linear parabolic partial differential equations, under the assumption that noisy measurements are available in the interior of a domain of interest and for the unknown boundary conditions. A main novelty of our approach to solve the inverse problem relies on the assumption that the boundary parameters are unknown and modeled by means of adequate probability distributions. Subsequently, the contribution of the boundary parameters is marginalized out from the joint law with the unknown equation coefficients we want to infer, allowing the characterization of their posterior distribution. There are many advantages to a Bayesian approach. For example, it provides a solution along with a comprehensive measure of uncertainty given by the posterior distribution. Moreover, the prior available information can be easily incorporated in terms of elicited prior distributions (Ghosh). An important issue in this work is that Bayesian inversion is posed as a hierarchical process. Boundary conditions can therefore be treated as unknown parameters (Kaipio). Since we are only interested in estimating the equation coefficients, we eliminate those extra parameters by marginalization.
Bayesian inversion techniques for the heat equation have been discussed and implemented in some previous works. Kaipio provided a general Bayesian framework for inverse problems in heat transfer, classified according to the dominant mode in heat transfer. The authors address many issues regarding forward problems and their statistical analysis. The prior modeling is extensively discussed, as well as how to deal, in particular, with different sources of uncertainties. Heat flux reconstruction problems have been studied by Wang1; Wang2. When referring to the problem of parameter estimation in inverse heat conduction problems, Wang and Zabaras showed how to infer the thermal conductivity using a hierarchical Bayesian framework, on the basis of temperature readings within a conducting solid, assuming that the heat flux on the boundary and the heat source are known. They also explored the high dimensional posterior state space by means of Markov Chain Monte Carlo simulation. Lanz estimated the thermal conductivity of a polymer, transforming the heat equation into a stochastic differential equation and considering the EulerMaruyama approximation to get the likelihood, introducing latent observations in space and then using a relatively cumbersome Markov Chain Monte Carlo method. Fudym1 and Fudym2 addressed the estimation problem of the thermal diffusivity, as in the present work, which is a parameter that describes thermophysical property of materials. In their works a large number of temperature measurements is made by an infrared camera, with fine spatial resolution and high frequency. They solved one and twodimensional forward problems for transient heat conduction, with spatially varying thermal conductivity and volumetric heat capacity, by finite differences, according to a nodal strategy. The parameter vector at each node is then estimated either by minimizing an a posteriori objective function when prior Gaussian distributions are assumed for the parameters, or by means of Markov Chain Monte Carlo methods for different prior distributions.
This work is organized as follows. In Section 2, we introduce the statistical setting and we derive the explicit form of the joint law of the unknown equation coefficients and the boundary parameters. In Section 3, we use a finite element scheme in order to write the solution of the forward problem as a linear function of the boundary conditions. We demonstrate in Section 4 that, under certain conditions, an exact marginalization can be carried out, yielding a closed formula for the marginal likelihood of the equation coefficients. In Section 5, we apply our marginalization technique to estimating thermal diffusivity in the onedimensional heat equation in two cases given temperature simulated data. Numerical results are obtained for the nonnormalized log posterior distribution of the thermal diffusivity. We model prior knowledge about the thermal diffusivity first as a lognormal random variable and then using a lognormal random field with a squared exponential (SE) covariance function (Rasmussen). In the first case, we use the Laplace method to provide an approximated Gaussian posterior distribution for the thermal diffusivity. Such method is then applied to obtain fast estimations of the information gain and the expected information gain under three experimental setups, and the predictive posterior mean of the temperature is also derived using the inferred thermal diffusivity. In the second case, where the thermal diffusivity is allowed to depend on the spatial variable, the Laplace approximation is used to obtain the posterior distribution of the hyperparameters that characterize the prior distribution for the thermal diffusivity.
2 Statistical setting and preliminary results
In this section we introduce the statistical model associated to the
forward initialboundary value problems for linear parabolic partial differential equations. We then derive, under mild assumptions, the exact expression for the joint likelihood function of the unknown parameters in the parabolic equation and the unknown boundary parameters.
Consider the deterministic onedimensional parabolic initialboundary value problem:
\begin{cases}\partial_{t}T+L_{\boldsymbol{\theta}}T=0,&x\in(x_{L},x_{R}),\,0<t% \leqslant t_{N}<\infty\\ T(x_{L},t)=T_{L}(t),&t\in[0,t_{N}]\\ T(x_{R},t)=T_{R}(t),&t\in[0,t_{N}]\\ T(x,0)=g(x),&x\in(x_{L},x_{R})\,,\end{cases}  (1) 
where L_{\boldsymbol{\theta}} is a linear secondorder partial differential operator that takes the form
L_{\boldsymbol{\theta}}T=\partial_{x}(a(x)\partial_{x}T)+b(x)\partial_{x}T+c(% x)T, 
\boldsymbol{\theta}(x)=(a(x),b(x),c(x))^{tr}, and the partial differential operator \partial_{t}+L_{\boldsymbol{\theta}} is parabolic, because (Evans:1998, [p.372]) there exists \epsilon such that a(x)\geqslant\epsilon>0 for all x\in(x_{L},x_{R}). We also assume that

a,b and c are bounded functions.

T_{L},T_{R} and g are square integrable functions.

The initial condition, g, is consistent with the boundary functions, namely g(x_{L})=T_{L}(0) and g(x_{R})=T_{R}(0).
Then, under the assumptions P1P3, there exists a unique weak solution of (1) (Evans:1998, [pp.375377]).
Our main objective is to provide a Bayesian solution to an inverse problem for \boldsymbol{\theta}, where we assume that

\boldsymbol{\theta} is unknown, while the initial condition g in the initialboundary value problem is known;

\boldsymbol{\theta} is allowed to vary with the spatial variable x.
Remark 2.1
In our Bayesian approach, we will assume later that the coefficient a(x) is a lognormal random variable or lognormal random field. Therefore, a(x) will not be bounded as assumed in P1. However, it can be proved that there exists a unique solution of the stochastic parabolic initialboundary value problem in the space L^{2}(\Omega,H^{1}). Such proof can be found in Charrier:2012 for elliptic boundary value problems but it can be also extended to parabolic initialboundary value problems.
Given noisy readings of the function T(x,t) at the I+1 spatial locations, including the boundaries, x_{L}=x_{0},x_{1},x_{2},\ldots,x_{I1},x_{I}=x_{R}, at each of the N times t_{1},t_{2},\ldots,t_{N}, we want to infer \boldsymbol{\theta} using a Bayesian approach. To determine the posterior distribution for \boldsymbol{\theta}, we need first to obtain the likelihood function of \boldsymbol{\theta}. The remainder of this section derives the joint likelihood function of \boldsymbol{\theta} and the boundary parameters. Let us introduce some convenient notation and assumptions: let \mathbf{Y_{n}}:=(Y_{0,n},\ldots,Y_{I,n})^{tr} denote the vector of observed readings at time t_{n}, and assume a statistical model with an additive Gaussian experimental noise \boldsymbol{\epsilon_{n}}; that is:
\mathop{\mathbf{Y_{n}}}\limits_{(I+1)\times 1}=\left[\begin{array}[]{c}T_{L}(t% _{n})\\ T(x_{1},t_{n})\\ \vdots\\ T(x_{I1},t_{n})\\ T_{R}(t_{n})\end{array}\right]+\boldsymbol{\epsilon_{n}},  (2) 
where \boldsymbol{\epsilon_{n}}\lx@stackrel{{\scriptstyle\scriptsize{\textrm{i.i.d.}% }}}{{\sim}}{\mathcal{N}}(\mathbf{0}_{I+1},\sigma^{2}\,\mathbf{I}_{I+1}) for some measurement error variance \sigma^{2}>0. The covariance matrix of \boldsymbol{\epsilon_{n}} is assumed equal to \sigma^{2}\,\mathbf{I}_{I+1} for simplicity, a general covariance matrix \Sigma_{\boldsymbol{\epsilon_{n}}} could be used as well provided that the boundary measurement errors are independent from the interior measurement errors.
Also denote by \mathop{\mathbf{Y_{n}^{I}}}\limits_{(I1)\times 1}:=(Y_{1,n},\ldots,Y_{I1,n})%
^{tr} the vector of observed data at the interior locations
x_{1},x_{2},\ldots,x_{I1} and let \mathbf{Y_{n}^{B}}:=(Y_{L,n},Y_{R,n})^{tr} be the vector of observed data at the boundary locations x_{0},x_{I} at time t_{n}.
The density of \mathbf{Y_{n}^{I}} is derived as it follows. First consider the time local problem, defined between consecutive measurement times, i.e.
\begin{cases}\partial_{t}T+L_{\boldsymbol{\theta}}T=0,&x\in(x_{L},x_{R}),\>t_{% n1}<t\leqslant t_{n},\\ T(x_{L},t)=T_{L}(t),&t\in[t_{n1},t_{n}],\\ T(x_{R},t)=T_{R}(t),&t\in[t_{n1},t_{n}],\\ T(x,t_{n1})=\widehat{T}(x,t_{n1}),&x\in(x_{L},x_{R})\,,\end{cases}  (3) 
whose exact solution, denoted by \widehat{T}(\cdot,t_{n}), depends only on \boldsymbol{\theta},\widehat{T}(\cdot,t_{n1}) and the boundary values \left\{T_{L}(t),T_{R}(t)\right\}_{t\in(t_{n1},t_{n})}. Finally, use the form of the statistical model (2) to obtain the form of the density of \mathbf{Y_{n}^{I}}.
Lemma 2.2
Given the model (2), the probability density function of \mathbf{Y_{n}^{I}} is expressed as
\rho(\mathbf{Y_{n}^{I}}\theta,\widehat{T}(\cdot,t_{n1}),\left\{T_{L}(t),T_{R% }(t)\right\}_{t\in(t_{n1},t_{n})})=\frac{1}{(\sqrt{2\pi}\sigma)^{I1}}\exp% \left(\frac{1}{2\sigma^{2}}\left\{\mathbf{R}}_{t_{n}}\right\^{2}_{\ell^{2}}% \right),  (4) 
where \mathop{{\mathbf{R}}_{t_{n}}}\limits_{(I1)\times 1}:=(\widehat{T}(x_{1},t_{n}% )Y_{1,n},\,\ldots,\,\widehat{T}(x_{I1},t_{n})Y_{I1,n})^{tr} denotes the data residual vector at time t=t_{n}.
For illustration purposes and without loss of generality, assume now that the Dirichlet boundary condition functions, T_{L}(\cdot) and T_{R}(\cdot), are well approximated by piecewise linear continuous functions in the time partition \left\{t_{n}\right\}_{n=1,\ldots,N}.
In this way, only 2N parameters, say T_{L}(t_{n})=T_{L,n},\,T_{R}(t_{n})=T_{R,n}\,,\>n=1,2,\ldots,N\,, suffice to determine the boundary conditions that are, in principle, infinite dimensional parameters. Let LBC_{n} denote the time nodes that determine the local boundary conditions \left\{T_{L,n1},T_{L,n},T_{R,n1},T_{R,n}\right\}\,. Other interpolation schemes may be used as well.
Remark 2.3
Given the discretized Dirichlet boundary conditions introduced above, we can say that \widehat{T}(\cdot,t_{n}) depends only on \boldsymbol{\theta},T(\cdot,t_{n1}) and the boundary parameters LBC_{n}. Similarly, \widehat{T}(\cdot,t_{n1}) depends on \boldsymbol{\theta},T(\cdot,t_{n2}) and the boundary parameters LBC_{n1}. From this recursion, we can obtain
\rho(\mathbf{Y_{n}^{I}}\theta,g,\left\{LBC_{j}\right\}_{j=1,\ldots,n})=\rho(% \mathbf{Y_{n}^{I}}\theta,\widehat{T}(\cdot,t_{n1}),LBC_{n}).  (5) 
Since the initial condition, g, is assumed to be known, it will be omitted in the rest of the paper.
Lemma 2.4
Given the model (2) and Lemma 2.2, the joint likelihood function of \boldsymbol{\theta} and the boundary parameters \left\{LBC_{n}\right\}_{n=1,\ldots,N} is given by
\displaystyle\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\boldsymbol{\theta},% \left\{LBC_{n}\right\}_{n=1,\ldots,N})=\prod_{n=1}^{N}\frac{1}{(\sqrt{2\pi}% \sigma)^{I1}}\exp\left(\frac{1}{2\sigma^{2}}\left\{\mathbf{R}}_{t_{n}}% \right\^{2}_{\ell^{2}}\right)  
\displaystyle\times\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(\frac{1}{2\sigma^% {2}}\left(T_{L,n}Y_{L,n}\right)^{2}\right)\times\frac{1}{\sqrt{2\pi\sigma^{2}% }}\exp\left(\frac{1}{2\sigma^{2}}\left(T_{R,n}Y_{R,n}\right)^{2}\right).  (6) 

Observe that \mathbf{Y_{n}^{I}},\mathbf{Y_{n}^{B}} are conditionally independent given \boldsymbol{\theta},\widehat{T}(\cdot,t_{n1}),LBC_{n}\,. Thus, we have
\displaystyle\rho(\mathbf{Y_{n}}\boldsymbol{\theta},\widehat{T}(\cdot,t_{n1}% ),LBC_{n}) \displaystyle= \displaystyle\rho(\mathbf{Y_{n}^{I}},\mathbf{Y_{n}^{B}}\boldsymbol{\theta},% \widehat{T}(\cdot,t_{n1}),LBC_{n}), \displaystyle= \displaystyle\rho(\mathbf{Y_{n}^{I}}\boldsymbol{\theta},\widehat{T}(\cdot,t_{% n1}),LBC_{n})\times\rho(\mathbf{Y_{n}^{B}}\boldsymbol{\theta},\widehat{T}(% \cdot,t_{n1}),LBC_{n}), \displaystyle= \displaystyle\rho(\mathbf{Y_{n}^{I}}\boldsymbol{\theta},\widehat{T}(\cdot,t_{% n1}),LBC_{n})\times\rho(\mathbf{Y_{n}^{B}}LBC_{n}) since \mathbf{Y_{n}^{B}}LBC_{n} does not depend on either \boldsymbol{\theta} nor \widehat{T}(\cdot,t_{n1}).
The joint likelihood function can then be written as
\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\boldsymbol{\theta},\left\{LBC_{n}% \right\}_{n=1,\ldots,N}) =\rho(\mathbf{Y_{N}}\boldsymbol{\theta},\left\{LBC_{n}\right\}_{n=1,\ldots,N}% ,\mathbf{Y_{N1}},\ldots,\mathbf{Y_{1}})\times\rho(\mathbf{Y_{1}},\ldots,% \mathbf{Y_{N1}}\boldsymbol{\theta},\left\{LBC_{n}\right\}_{n=1,\ldots,N1}) (since \mathbf{Y_{N}}\boldsymbol{\theta},\left\{LBC_{n}\right\}_{n=1,\ldots,N} is independent from \mathbf{Y_{1}},\ldots,\mathbf{Y_{N1}})
=\rho(\mathbf{Y_{N}}\boldsymbol{\theta},\left\{LBC_{n}\right\}_{n=1,\ldots,N}% )\times\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N1}}\boldsymbol{\theta},\left\{% LBC_{n}\right\}_{n=1,\ldots,N1}) (using equation (5))
=\rho(\mathbf{Y_{N}}\boldsymbol{\theta},\widehat{T}(\cdot,t_{N1}),LBC_{N})% \times\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N1}}\boldsymbol{\theta},\left\{% LBC_{n}\right\}_{n=1,\ldots,N1}) (iterating the previous arguments)
=\prod_{n=1}^{N}\rho(\mathbf{Y_{n}}\boldsymbol{\theta},\widehat{T}(\cdot,t_{n% 1}),LBC_{n})=\prod_{n=1}^{N}\rho(\mathbf{Y_{n}^{I}}\boldsymbol{\theta},% \widehat{T}(\cdot,t_{n1}),LBC_{n})\times\rho(\mathbf{Y_{n}^{B}}LBC_{n})
Remark 2.5
A generalization of the joint likelihood can be obtained given serial correlations, that is, \left\{\boldsymbol{\epsilon_{n}}\right\}_{n=1}^{N} are time correlated.
Using the notation \mathbf{T}_{L}=(T_{L,1},\ldots,T_{L,N})^{tr}, \mathbf{T}_{R}=(T_{R,1},\ldots,T_{R,N})^{tr}, \mathbf{Y}_{L}=(Y_{L,1},\ldots,Y_{L,N})^{tr} and \mathbf{Y}_{R}=(Y_{R,1},\ldots,Y_{R,N})^{tr}, the joint likelihood function (6) can be written as
\displaystyle\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\boldsymbol{\theta},% \mathbf{T}_{L},\mathbf{T}_{R})=(\sqrt{2\pi}\sigma)^{N(I+1)}\exp\left(\frac{1% }{2\sigma^{2}}\sum_{n=1}^{N}\left\{\mathbf{R}}_{t_{n}}\right\^{2}_{\ell^{2}}\right)  
\displaystyle\times\exp\left(\frac{1}{2\sigma^{2}}\left[\left\\mathbf{T}_{L}% \mathbf{Y}_{L}\right\^{2}_{\ell^{2}}+\left\\mathbf{T}_{R}\mathbf{Y}_{R}% \right\^{2}_{\ell^{2}}\right]\right),  (7) 
which is a suitable expression to derive later the marginal likelihood of \boldsymbol{\theta}.
The prior distributions for \boldsymbol{\theta} and the boundary parameters \mathbf{T}_{L},\mathbf{T}_{R} can be specified in different ways. Generally speaking, the prior distribution for \boldsymbol{\theta} should be proposed according to the physical properties described by the unknown parameters and taking into account available prior knowledge about the observed phenomena. As an example, it is known that the thermal conductivity of polymethyl methacrylate (plexiglas) is in the range 0.1670.25 W/(mK). A uniform prior distribution for the thermal conductivity could be chosen if there is no preference among the values of the interval.
As per the boundary parameters, we may assume independent Gaussian prior distributions T_{L,n}\sim{\mathcal{N}}(\mu_{L,n},\sigma_{p}^{2}), T_{R,n}\sim{\mathcal{N}}(\mu_{R,n},\sigma_{p}^{2}),\>\>n=1,\ldots,N\,, that are centered at a least square spline fit, say \mathop{\boldsymbol{\mu}_{L}}\limits_{N\times 1}=(\mu_{L,1},\ldots,\mu_{L,N})^%
{tr},
\mathop{\boldsymbol{\mu}_{R}}\limits_{N\times 1}=(\mu_{R,1},\ldots,\mu_{R,N})^%
{tr}, of the observed data, \mathbf{Y}_{L} and \mathbf{Y}_{R}, respectively, and for some prior variance \sigma_{p}^{2}>0.
We claim that the data residual vector \mathbf{R}_{t_{n}} can be written as a linear function of \mathbf{T}_{L} and \mathbf{T}_{R}. The next section is devoted to the proof of this basic result. This proof allows for the exact marginalization of the contribution of the nuisance boundary parameters from the joint likelihood function (7).
3 Numerical Approximation
In this section, our goal is to approximate the residual vector \mathbf{R}_{t_{n}} as a linear function of the boundary conditions. After reformulating the main problem (1), as described in the next lemma, we will introduce its weak form and finite element method will be then applied to provide a numerical approximation of the solution of the weak problem.
Lemma 3.1
The solution of (1) can be written in the form
T(x,t)=T_{L}(t)\frac{x_{R}x}{x_{R}x_{L}}+T_{R}(t)\frac{xx_{L}}{x_{R}x_{L}}% +u(x,t),  (8) 
where u solves a new initialboundary value problem with homogeneous Dirichlet boundary conditions:
\begin{cases}\partial_{t}u+L_{\boldsymbol{\theta}}u=f(x,t),&x\in(x_{L},x_{R}),% 0<t\leqslant t_{N}<\infty\\ u(x_{L},t)=0,&t\in[0,t_{N}]\\ u(x_{R},t)=0,&t\in[0,t_{N}]\\ u(x,0)=g^{0}(x),&x\in(x_{L},x_{R})\end{cases}  (9) 
and
f(x,t)=\left(\partial_{t}+L_{\boldsymbol{\theta}}\right)T_{L}(t)\frac{x_{R}x% }{x_{R}x_{L}}\left(\partial_{t}+L_{\boldsymbol{\theta}}\right)T_{R}(t)\frac{% xx_{L}}{x_{R}x_{L}}\,, 
g^{0}(x)=g(x)T_{L}(0)\frac{x_{R}x}{x_{R}x_{L}}T_{R}(0)\frac{xx_{L}}{x_{R}% x_{L}}\,. 
We now introduce the weak formulation of problem (9).
Find u(t)\in V=H_{0}^{1}(x_{L},x_{R}), t\in(0,t_{N}) such that:
\begin{cases}\int_{x_{L}}^{x_{R}}\partial_{t}u(t)vdx+B\left(u(t),v\right)=\int% _{x_{L}}^{x_{R}}f(t)vdx,\forall v\in V,\,t\in(0,t_{N}),\\ u(0)=g^{0},\>x\in(x_{L},x_{R}),\end{cases}  (10) 
where B(u,v)=\int_{x_{L}}^{x_{R}}[a(x)\partial_{x}u\>\partial_{x}v+b(x)\partial_{x}u% \>v+c(x)uv]\,dx and H_{0}^{1}(x_{L},x_{R}) is the closure of the space C^{1}_{c}(x_{L},x_{R}) of continuously differentiable functions with compact support on (x_{L},x_{R}) with respect to the H^{1}norm (claes, [p.149]).
Given a mesh x_{L}=x_{0}<...<k\Delta x=x_{k}<...<I\Delta x=x_{I}=x_{R} of the spatial domain (x_{L},x_{R}), we apply the finite element method with piecewise linear functions (hat functions) \{\phi_{k}\}_{k=1}^{I1} to approximate the weak solution u(x,t) of (9) as linear combinations of the basis functions:
u(x,t)\approx u_{\Delta x}(x,t)=\sum_{k=1}^{I1}u_{k}(t)\phi_{k}(x)\,,\>\>0<t% \leqslant t_{N}\,,\\ 
and we get
\sum_{k=1}^{I1}\partial_{t}u_{k}(t)\int_{x_{L}}^{x_{R}}\phi_{k}\phi_{j}dx+% \sum_{k=1}^{I1}u_{k}(t)B\left(\phi_{k},\phi_{j}\right)=\int_{x_{L}}^{x_{R}}f(% t)\phi_{j}dx,j=1,...I1,\,t\in(0,t_{N})\,.  (11) 
Let \mathbf{u}(t)=(u_{1}(t),\ldots,u_{I1}(t))^{tr}, then we can write the linear system of ODEs (11) in a matrix form as:
\displaystyle M\partial_{t}\mathbf{u}(t)+S_{\boldsymbol{\theta}}\mathbf{u}(t)=% \mathbf{f}(t),\,\>\>0<t\leqslant t_{N}\,,  (12) 
where M is the mass matrix, S_{\boldsymbol{\theta}} is the stiffness matrix and \mathop{\mathbf{f}(t)} is the load vector.
We consider now a uniform time discretization of (0,t_{N}) such that t_{0}=0<...<n\Delta t=t_{n}<...<N\Delta t=t_{N} and denote \mathop{\mathbf{u}(t_{n})}=\mathop{\mathbf{u}_{n}} and \mathop{\mathbf{f}(t_{n})}=\mathop{\mathbf{f}_{n}}. We apply the backward Euler method on equation (12) to obtain the fully discrete analogue of (10) that takes the form
\begin{cases}\left(M+\Delta tS_{\boldsymbol{\theta}}\right)\mathbf{u}_{n+1}=M% \mathbf{u}_{n}+\Delta t\mathbf{f}_{n+1}\,,n=0,\ldots,N1,\\ M\mathbf{u}_{0}=\mathbf{g}^{0},\end{cases}  (13) 
where \mathbf{g}^{0}=\left(\int_{x_{L}}^{x_{R}}g^{0}\phi_{1}dx,\ldots,\int_{x_{L}}^{% x_{R}}g^{0}\phi_{I1}dx\right)^{tr}.
Theorem 3.2
The approximation of the weak solution of (9) can be written as a linear function of the initial and boundary conditions:
\mathbf{u}_{n}=A_{n}(\boldsymbol{\theta})\mathbf{u}_{0}+\tilde{A}_{L,n}(% \boldsymbol{\theta}){\mathbf{T}}_{L}+\tilde{A}_{R,n}(\boldsymbol{\theta}){% \mathbf{T}}_{R},  (14) 
where \mathop{\mathbf{u}_{n}}=(u_{1,n},\ldots,u_{I1,n})^{tr}, \mathop{\mathbf{T}_{L}}=(T_{L,1},\ldots,T_{L,N})^{tr}, \mathop{\mathbf{T}_{R}}=(T_{R,1},\ldots,T_{R,N})^{tr}, and the matrices A_{n}(\boldsymbol{\theta}),\tilde{A}_{L,n}(\boldsymbol{\theta}) and \tilde{A}_{R,n}(\boldsymbol{\theta}) are explicitly constructed in the proof.

See Appendix A (Appendix A).
Theorem 3.3
The approximation of the weak solution of (1) can be written as a linear function of the initial and boundary conditions:
\mathbf{T}_{n}=\mathbf{B}^{n}\mathbf{T}_{0}+A_{L,n}(\boldsymbol{\theta}){% \mathbf{T}}_{L}+A_{R,n}(\boldsymbol{\theta}){\mathbf{T}}_{R}.  (15) 
where \mathbf{T}_{n} is defined similarly to \mathbf{u}_{n} and the matrices \mathbf{B},A_{L,n}(\boldsymbol{\theta}) and A_{R,n}(\boldsymbol{\theta}) are explicitly constructed in the proof.

See Appendix B (Appendix B).
Corollary 3.4
The data residual vector \mathbf{R}_{t_{n}}=(\widehat{T}(x_{1},t_{n})Y_{1,n},\,\ldots,\,\widehat{T}(x_% {I1},t_{n})Y_{I1,n})^{tr} is approximated by
\tilde{\mathbf{R}}_{t_{n}}=\left(\mathbf{B}^{n}\mathbf{T}_{0}\mathbf{Y_{n}^{I% }}\right)+A_{L,n}(\boldsymbol{\theta})\mathbf{T}_{L}+A_{R,n}(\boldsymbol{% \theta})\mathbf{T}_{R}.  (16) 
4 The marginal likelihood of \boldsymbol{\theta}
Given the observations \mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}, we showed, in the Section 2, how to obtain the joint likelihood function of \boldsymbol{\theta} and the boundary parameters \mathbf{T}_{L},\mathbf{T}_{R}.
In the present section, we derive a convenient expression for the marginal likelihood of \boldsymbol{\theta}, under the assumption that the prior distributions for \mathbf{T}_{L} and \mathbf{T}_{R} are independent Gaussian:
\displaystyle\mathbf{T}_{L}\sim\mathcal{N}(\boldsymbol{\mu}_{L},\sigma_{p}^{2}% \mathbf{I}_{N}),  (17)  
\displaystyle\mathbf{T}_{R}\sim\mathcal{N}(\boldsymbol{\mu}_{R},\sigma_{p}^{2}% \mathbf{I}_{N}), 
where \boldsymbol{\mu}_{L} and \boldsymbol{\mu}_{R} are least square spline fits of the observed data, \mathbf{Y}_{L} and \mathbf{Y}_{R}, respectively. Then, using (7), the marginal likelihood of \boldsymbol{\theta} is given by:
\displaystyle\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\boldsymbol{\theta})=(% \sqrt{2\pi}\sigma)^{N(I+1)}\times(\sqrt{2\pi}\sigma_{p})^{2N}\int_{\mathcal{% T}_{R}}\int_{\mathcal{T}_{L}}\exp\left(\frac{1}{2\sigma^{2}}\sum_{n=1}^{N}% \left\{\mathbf{R}}_{t_{n}}\right\^{2}_{\ell^{2}}\right)  
\displaystyle\times\!\exp\!\left(\!\frac{1}{2\sigma^{2}}(\mathbf{T}_{L}% \mathbf{Y}_{L})^{tr}(\mathbf{T}_{L}\mathbf{Y}_{L})\frac{1}{2\sigma^{2}}(% \mathbf{T}_{R}\mathbf{Y}_{R})^{tr}(\mathbf{T}_{R}\mathbf{Y}_{R})\!\right)  
\displaystyle\times\!\exp\!\left(\!\frac{1}{2\sigma_{p}^{2}}(\mathbf{T}_{L}% \boldsymbol{\mu}_{L})^{tr}(\mathbf{T}_{L}\boldsymbol{\mu}_{L})\frac{1}{2% \sigma_{p}^{2}}(\mathbf{T}_{R}\boldsymbol{\mu}_{R})^{tr}(\mathbf{T}_{R}% \boldsymbol{\mu}_{R})\!\right)d\mathbf{T}_{L}d\mathbf{T}_{R}.  (18) 
To provide the exact expression of the marginal likelihood of the linear parabolic equation coefficients \boldsymbol{\theta}, which has been implemented in the computational examples presented in Section 5, it is convenient to introduce the following notation:
\mathop{\Delta_{L}}\limits_{N\times N}=\sum_{n=1}^{N}A_{L,n}(\boldsymbol{% \theta})^{tr}A_{L,n}(\boldsymbol{\theta})\,,\>\>\mathop{\Delta_{R}}\limits_{N% \times N}=\sum_{n=1}^{N}A_{R,n}(\boldsymbol{\theta})^{tr}A_{R,n}(\boldsymbol{% \theta})\,, 
\mathop{\Delta_{2,L}}\limits_{N\times 1}=\sum_{n=1}^{N}A_{L,n}(\boldsymbol{% \theta})^{tr}(\mathbf{Y_{n}^{I}}\mathbf{B}^{n}\mathbf{T}_{0})\,,\>\>\mathop{% \Delta_{2,R}}\limits_{N\times 1}=\sum_{n=1}^{N}A_{R,n}(\boldsymbol{\theta})^{% tr}(\mathbf{Y_{n}^{I}}\mathbf{B}^{n}\mathbf{T}_{0})\,, 
\mathop{A_{LR}}\limits_{N\times N}=\sum_{n=1}^{N}A_{L,n}(\boldsymbol{\theta})^% {tr}A_{R,n}(\boldsymbol{\theta})\,,\>\>\mathop{D_{\sigma^{2}}}\limits_{N\times N% }=\textrm{diag}\left(\frac{1}{\sigma^{2}},\ldots,\frac{1}{\sigma^{2}}\right)\,% ,\>\>\mathop{D_{\sigma_{p}^{2}}}\limits_{N\times N}=\textrm{diag}\left(\frac{1% }{\sigma_{p}^{2}},\ldots,\frac{1}{\sigma_{p}^{2}}\right)\,. 
Theorem 4.1
The marginal likelihood of \boldsymbol{\theta} is given by:
\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\boldsymbol{\theta})=(\sqrt{2\pi}% \sigma)^{N(I+1)}(\sqrt{2\pi}\sigma_{p})^{2N}(2\pi)^{N/2}\Lambda_{0}^{1/2}(% 2\pi)^{N/2}\Lambda_{1}^{1/2} 
\times\exp\Bigg{\{}\frac{1}{2\sigma_{p}^{2}}\left[{\boldsymbol{\mu}_{L}}^{tr}% {\boldsymbol{\mu}_{L}}+{\boldsymbol{\mu}_{R}}^{tr}{\boldsymbol{\mu}_{R}}\right% ]\frac{1}{2\sigma^{2}}\left[\mathbf{Y}_{L}^{tr}\mathbf{Y}_{L}+\mathbf{Y}_{R}^% {tr}\mathbf{Y}_{R}+\sum_{i=1}^{N}(\mathbf{Y_{n}^{I}}\mathbf{B}^{n}\mathbf{T}_% {0})^{tr}(\mathbf{Y_{n}^{I}}\mathbf{B}^{n}\mathbf{T}_{0})\right] 
+\frac{1}{2}({\boldsymbol{\mu}_{L}}^{tr}D_{\sigma_{p}^{2}}+\mathbf{Y}_{L}^{tr}% D_{\sigma^{2}}+\Delta_{2,L}^{tr}D_{\sigma^{2}})\Lambda_{0}(D_{\sigma_{p}^{2}}{% \boldsymbol{\mu}_{L}}+D_{\sigma^{2}}\mathbf{Y}_{L}+D_{\sigma^{2}}\Delta_{2,L}) 
+\frac{1}{2}\left[\mathbf{t}^{tr}_{R,2}\Lambda_{1}\mathbf{t}_{R,2}+2\mathbf{t}% ^{tr}_{R,3}\Lambda_{1}\mathbf{t}_{R,2}+\mathbf{t}^{tr}_{R,3}\Lambda_{1}\mathbf% {t}_{R,3}\right]\Bigg{\}},  (19) 
where \Lambda_{0},\Lambda_{1},\mathbf{t}_{R,2} and \mathbf{t}_{R,3} are independent of \mathbf{T}_{L} and \mathbf{T}_{R}.

First, we observe that (18) can be written as:
(\sqrt{2\pi}\sigma)^{N(I+1)}\times(\sqrt{2\pi}\sigma_{p})^{2N}\times\exp% \Bigg{\{}\frac{1}{2\sigma^{2}_{p}}\big{[}{\boldsymbol{\mu}_{L}}^{tr}{% \boldsymbol{\mu}_{L}}+{\boldsymbol{\mu}_{R}}^{tr}{\boldsymbol{\mu}_{R}}\big{]} \frac{1}{2\sigma^{2}}\left[\mathbf{Y}_{L}^{tr}\mathbf{Y}_{L}+\mathbf{Y}_{R}^{% tr}\mathbf{Y}_{R}+\sum_{n=1}^{N}(\mathbf{Y_{n}^{I}}\mathbf{B}^{n}\mathbf{T}_{% 0})^{tr}(\mathbf{Y_{n}^{I}}\mathbf{B}^{n}\mathbf{T}_{0})\right]\Bigg{\}} \times\int_{\mathcal{T}_{R}}\int_{\mathcal{T}_{L}}\exp\Bigg{\{}\frac{1}{2}% \left[\mathbf{T}_{L}^{tr}(D_{\sigma^{2}}+D_{\sigma_{p}^{2}}+\frac{1}{\sigma^{2% }}\Delta_{L})\mathbf{T}_{L}2({\boldsymbol{\mu}_{L}}^{tr}D_{\sigma_{p}^{2}}+% \mathbf{Y}_{L}^{tr}D_{\sigma^{2}}+\Delta_{2,L}^{tr}D_{\sigma^{2}})\mathbf{T}_{% L}\right. \left.+\frac{2}{\sigma^{2}}\mathbf{T}_{R}^{tr}A_{LR}^{tr}\mathbf{T}_{L}+% \mathbf{T}_{R}^{tr}(D_{\sigma^{2}}+D_{\sigma_{p}^{2}}+\frac{1}{\sigma^{2}}% \Delta_{R})\mathbf{T}_{R}2({\boldsymbol{\mu}_{R}}^{tr}D_{\sigma_{p}^{2}}+% \mathbf{Y}_{R}^{tr}D_{\sigma^{2}}+\Delta_{2,R}^{tr}D_{\sigma^{2}})\mathbf{T}_{% R}\right]\Bigg{\}}d\mathbf{T}_{L}d\mathbf{T}_{R}\,. To marginalize \mathbf{T}_{L} and \mathbf{T}_{R}, we assume that they are independent Gaussian random vectors. We can then use the following standard result: if \mathbf{X}\sim{\mathcal{N}}_{p}(\boldsymbol{\mu},\mathbf{\Sigma}), then E(\exp(\mathbf{t}^{tr}\mathbf{X}))=\exp(\mathbf{t}^{tr}\boldsymbol{\mu}+\frac{% 1}{2}\mathbf{t}^{tr}\mathbf{\Sigma}\mathbf{t})\,.
Therefore, by integrating first with respect to \mathbf{T}_{L}, the marginal likelihood of \boldsymbol{\theta} and \mathbf{T}_{R} is proportional to the product of a factor that is independent of \mathbf{T}_{L} and the following term
\int_{\mathcal{T}_{L}}\exp\left\{\frac{1}{2}\mathbf{T}_{L}^{tr}\left(D_{% \sigma^{2}}+D_{\sigma_{p}^{2}}+\frac{1}{\sigma^{2}}\Delta_{L}\right)\mathbf{T}% _{L}\right\}\,\exp(\mathbf{t}^{tr}_{L,1}\mathbf{T}_{L})d\mathbf{T}_{L}\,, where
\mathop{\mathbf{t}^{tr}_{L,1}}\limits_{1\times N}=({\boldsymbol{\mu}_{L}}^{tr}% D_{\sigma_{p}^{2}}+\mathbf{Y}_{L}^{tr}D_{\sigma^{2}}+\Delta_{2,L}^{tr}D_{% \sigma^{2}})\frac{1}{\sigma^{2}}\mathbf{T}_{R}^{tr}A_{LR}^{tr}\,. It is now convenient to define \Lambda_{0}^{1}:=(D_{\sigma^{2}}+D_{\sigma_{p}^{2}}+\frac{1}{\sigma^{2}}% \Delta_{L}). The marginal likelihood of \boldsymbol{\theta} and \mathbf{T}_{R} is proportional to the product of a factor that is independent of \mathbf{T}_{L} and the term (2\pi)^{N/2}\Lambda_{0}^{1/2}\exp\left\{\frac{1}{2}\mathbf{t}^{tr}_{L,1}% \Lambda_{0}\mathbf{t}_{L,1}\right\}\,.
Therefore, the marginal likelihood of \boldsymbol{\theta} can be explicitly written as\displaystyle(\sqrt{2\pi}\sigma)^{N(I+1)}\times(\sqrt{2\pi}\sigma_{p})^{2N}% \times(2\pi)^{N/2}\Lambda_{0}^{1/2} \displaystyle\times \displaystyle\exp\Bigg{\{}\frac{1}{2\sigma_{p}^{2}}\big{(}{\boldsymbol{\mu}_{% L}}^{tr}{\boldsymbol{\mu}_{L}}+{\boldsymbol{\mu}_{R}}^{tr}{\boldsymbol{\mu}_{R% }}\big{)}\frac{1}{2\sigma^{2}}\bigg{[}\mathbf{Y}_{L}^{tr}\mathbf{Y}_{L}+% \mathbf{Y}_{R}^{tr}\mathbf{Y}_{R}+\sum_{i=1}^{N}(\mathbf{Y_{n}^{I}}\mathbf{B}% ^{n}\mathbf{T}_{0})^{tr}(\mathbf{Y_{n}^{I}}\mathbf{B}^{n}\mathbf{T}_{0})\bigg% {]}\Bigg{\}} \displaystyle\times \displaystyle\int_{\mathcal{T}_{R}}\exp\Bigg{\{}\frac{1}{2}\mathbf{t}^{tr}_{L,% 1}\Lambda_{0}\mathbf{t}_{L,1}\frac{1}{2}\Bigg{[}\mathbf{T}_{R}^{tr}(D_{\sigma% ^{2}}+D_{\sigma_{p}^{2}}+\frac{1}{\sigma^{2}}\Delta_{R})\mathbf{T}_{R}2({% \boldsymbol{\mu}_{R}}^{tr}D_{\sigma_{p}^{2}}+\mathbf{Y}_{R}^{tr}D_{\sigma^{2}}% +\Delta_{2,R}^{tr}D_{\sigma^{2}})\mathbf{T}_{R}\Bigg{]}\Bigg{\}}d\mathbf{T}_{R% }\,. The entire last expression is equal to the product of a term that is independent of \mathbf{T}_{R} and the following term:
\int_{\mathcal{T}_{R}}\exp\left\{\frac{1}{2}\mathbf{T}_{R}^{tr}\left[(D_{% \sigma^{2}}+D_{\sigma_{p}^{2}}+\frac{1}{\sigma^{2}}\Delta_{R})\left(\frac{1}{% \sigma^{2}}\right)^{2}A_{LR}^{tr}\Lambda_{0}A_{LR}\right]\mathbf{T}_{R}\right% \}\,\exp\left\{\mathbf{t}^{tr}_{R,1}\mathbf{T}_{R}\right\}d\mathbf{T}_{R}\,, where
\mathbf{t}^{tr}_{R,1}=\underbrace{({\boldsymbol{\mu}_{R}}^{tr}D_{\sigma_{p}^{2% }}+\mathbf{Y}_{R}^{tr}D_{\sigma^{2}}+\Delta_{2,R}^{tr}D_{\sigma^{2}})}_{% \mathbf{t}^{tr}_{R,2}}\underbrace{({\boldsymbol{\mu}_{L}}^{tr}D_{\sigma_{p}^{% 2}}+\mathbf{Y}_{L}^{tr}D_{\sigma^{2}}+\Delta_{2,L}^{tr}D_{\sigma^{2}})\Lambda_% {0}A_{LR}}_{\mathbf{t}^{tr}_{R,3}}\,. If we now define \Lambda_{1}^{1}:=(D_{\sigma^{2}}+D_{\sigma_{p}^{2}}+\frac{1}{\sigma^{2}}% \Delta_{R})\left(\frac{1}{\sigma^{2}}\right)^{2}A_{LR}^{tr}\Lambda_{0}A_{LR} and integrate with respect to \mathbf{T}_{R}, we have
\displaystyle\int_{\mathcal{T}_{R}}\exp\left\{\frac{1}{2}\mathbf{T}_{R}^{tr}% \Lambda_{1}^{1}\mathbf{T}_{R}\right\}\exp(\mathbf{t}^{tr}_{R,1}\mathbf{T}_{R}% )d\mathbf{T}_{R}=(2\pi)^{N/2}\Lambda_{1}^{1/2}\exp\left\{\frac{1}{2}\mathbf{% t}^{tr}_{R,1}\Lambda_{1}\mathbf{t}_{R,1}\right\} \displaystyle=(2\pi)^{N/2}\Lambda_{1}^{1/2}\exp\left\{\frac{1}{2}(\mathbf{t}% ^{tr}_{R,2}+\mathbf{t}^{tr}_{R,3})\Lambda_{1}(\mathbf{t}_{R,2}+\mathbf{t}_{R,3% })\right\}, after evaluating the integral. We finally obtain (19).
5 A Bayesian inference for thermal diffusivity
In this section, we implement our Bayesian approach to infer the thermal diffusivity \theta, an unknown parameter that appears in the heat equation and measures the rapidity of the heat propagation through a material (Wilson). Temperature data are available on the basis of cooling experiments. Synthetic data are used to carry out the inference.
Consider the heat equation (one–dimensional diffusion equation for T(x,t)):
\begin{cases}\partial_{t}T\partial_{x}\left(\theta(x)\partial_{x}T\right)=0,&% x\in(x_{L},x_{R}),\,0<t\leqslant t_{N}<\infty\\ T(0,t)=T_{L}(t),&t\in[0,t_{N}]\\ T(1,t)=T_{R}(t),&t\in[0,t_{N}]\\ T(x,0)=g(x),&x\in(x_{L},x_{R}).\end{cases}  (20) 
We want to infer the thermal diffusivity, \theta(x), using a Bayesian approach when the temperature is measured at I+1 locations, x_{0}=x_{L},x_{1},x_{2},\ldots,x_{I1},x_{I}=x_{R}, at each of the N times, t_{1},t_{2},\ldots,t_{N}. Clearly, this problem is a special case of (1) where L_{\boldsymbol{\theta}}=\partial_{x}\left(\theta(x)\partial_{x}T\right) and \theta(x)>0. We can therefore immediately obtain the nonnormalized posterior distribution of \theta using the marginal likelihood (19).
The prior distributions for \theta can be specified in different ways. In this section we will consider two cases, when \theta is a lognormal random variable and when \theta depends on the space variable x and is modeled by means of a lognormal random field. We focus on the second case in Subsection 5.3. We start by discussing the case where the thermal diffusivity prior is independent of x.
If we consider a lognormal prior \log{\theta}\sim\mathcal{N}\left(\nu,\tau\right), where \nu\in\mathbb{R} and \tau>0, then the nonnormalized posterior distribution of \theta is given by
\rho_{\nu,\tau}(\theta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}})\propto\frac{1}{% \sqrt{2\pi}\theta\tau}\exp\left(\frac{(\log\theta\nu)^{2}}{2\tau^{2}}\right)% \rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\theta).  (21) 
The posterior distribution of \theta can be approximated by Laplace’s method (Ghosh, [Chapter 4]) to obtain a Gaussian posterior
\rho_{\nu,\tau}(\theta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}})\approx\frac{1}{% \sqrt{2\piH(\hat{\theta})}}\exp\left\{(\theta\hat{\theta})^{tr}H(\hat{% \theta})^{1}(\theta\hat{\theta})\right\} 
where \hat{\theta} is the maximum a posteriori probability (MAP) estimate and H(\hat{\theta}) is the Hessian matrix of the log posterior evaluated at \hat{\theta}.
To assess the behavior of our method, we introduce a synthetic dataset generated with constant \theta. Let us assume, without loss of generality, that the interval time [0,t_{N}] is equal to [0,1], x_{L}=0 and x_{R}=1.
Dataset A
In order to generate data, we solve the initialboundary value problem for the heat equation with Robin boundary conditions:
\displaystyle\partial_{x}T(x_{L},t)  \displaystyle=  \displaystyle\frac{h}{\kappa}\left(T(x_{L},t)T_{out}\right)\,,\>\>t\in[0,1],  
\displaystyle\partial_{x}T(x_{R},t)  \displaystyle=  \displaystyle\frac{h}{\kappa}(T_{out}T(x_{R},t))\,,\>\>t\in[0,1], 
and the initial condition T(x,0)=T_{0}\,,\>\>x\in(0,1), where \theta(x)=1\times 10^{7}\,m^{2}/s, h is the convective heat transfer coefficient, \kappa denotes the thermal conductivity, \frac{h}{\kappa}=1\,(1/m), T_{out}=20\,^{\circ}\mathrm{C} and T_{0}=100\,^{\circ}\mathrm{C} (see Figure 1).
A synthetic dataset (hereafter named dataset A) is generated, with a measurement standard error noise of \sigma_{d}=0.56.
Before presenting the implementation of our novel technique, we will show how the Bayesian method works, using the joint likelihood (7), under the very restrictive assumption that the temperature values at the boundaries are exactly known.
5.1 Example 1
Suppose that the thermal diffusivity, \theta, is a random variable with a lognormal prior, \log{\theta}\sim\mathcal{N}\left(\nu,\tau\right). In this case, the nonnormalized posterior density for \theta is given by
\rho_{\nu,\tau}(\theta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}})\propto\frac{1}{% \sqrt{2\pi}\theta\tau}\exp\left(\frac{(\log\theta\nu)^{2}}{2\tau^{2}}\right)% \exp\left(\frac{1}{2\sigma^{2}}\sum_{n=1}^{N}\left\{\mathbf{R}}_{t_{n}}% \right\^{2}_{\ell^{2}}\right), 
where {\mathbf{R}}_{t_{n}} is used here because the boundary data are known exactly.
Given that \theta is a lognormal random variable with \nu=\tau=0.1, the resulting posterior will depend on \sigma and the number of observations N that are used to compute the loglikelihood. Figure 2 shows the behavior of the loglikelihood and the logposterior for \theta using different values for \sigma and N.
We then use the Laplace approximation to derive the Gaussian posterior approximated density for \theta. The prior and posterior densities for \theta are presented in Figure 3, where it can be appreciated that we obtained a Gaussian posterior, with mean 1.0025 and standard deviation 0.0044, which is concentrated around the true value of the parameter, \theta, despite having a very broad prior. We are now in the position to extend our implementation to embrace the case where the temperature values at the boundaries are unknown parameters as well.
5.2 Example 2
In this example, we consider again \theta as a random variable with a lognormal prior, \log{\theta}\sim\mathcal{N}\left(\nu,\tau\right). Unlike Example 1, we assume noisy boundary measurements and a Gaussian prior distribution for the boundary parameters as in (17). Therefore, the nonnormalized posterior density for \theta is given by
\rho_{\nu,\tau}(\theta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}})\propto\frac{1}{% \sqrt{2\pi}\theta\tau}\exp\left(\frac{(\log\theta\nu)^{2}}{2\tau^{2}}\right)% \rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\theta), 
where \rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\theta) is the marginal likelihood of \theta defined in Theorem 4.1.
Remark 5.1
Since \theta(x) is supposed to be constant, we can obtain equation (15) alternatively by solving the heat equation using finite differences (see Appendix C Appendix C).
Numerical results are now presented using the synthetic dataset A and assuming that \theta is a lognormal random variable with \nu=\tau=0.1. Figure 4 shows the behavior of the loglikelihood and the logposterior for \theta using different values of N and \sigma. Clearly, the accuracy of the estimated \theta depends on the size of the dataset, N, and the reliability of measurement devices, \sigma.
Figure 5 shows the relationship between the prior distribution of the boundary conditions and their measurements. Although we notice different behaviors of the loglikelihood and the logposterior, these functions exhibit the same argument of the maximum which is close to the true value of \theta.
Again, the Laplace approximation is used to derive the Gaussian posterior approximated density for \theta. The prior and posterior densities for \theta are presented in Figure 6 in which the Gaussian posterior, with mean 0.9955 and standard deviation 0.0047, is concentrated around the true value of \theta unlike the very broad prior.
Information Divergence and Expected Information Gain
In the Bayesian setting that we adopted to infer the thermal diffusivity, \theta, the utility of the performed experiment, given an experimental setup \xi, can be conveniently measured by the socalled information divergence (or discrimination information as Kull2 called it), which is here defined as the KullbackLeibler divergence (Kull) between the prior density function p(\theta) and the posterior density function of \theta, \rho(\theta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}},\xi):
D_{KL}(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}},\xi):=\int_{\Theta}\log\left(\frac% {\rho(\theta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}},\xi)}{p(\theta)}\right)\rho(% \theta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}},\xi)d\theta\,.  (22) 
The quantity in (22) is always nonnegative; it is equal to zero when the prior and the posterior coincide; it provides a quantification of the relative discrimination between the prior and the posterior; and it depends on the observations \mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}. Therefore, given the synthetic dataset, A, we may introduce different experimental setups of interest by varying the interval time during which the temperature is measured. By choosing some specific thermocouples,we may evaluate the information divergence for any experimental setup. Moreover, under the same generating process used for the dataset A, we may obtain as many synthetic datasets as needed to explore the properties of the proposed simulated experiment. The utility of such computerbased experiments can be adequately summarized by the socalled expected information gain (Quan), which is defined as the marginalization of D_{KL} over all possible simulated data:
I(\xi):=\int_{\cal{Y}}\int_{\Theta}\log\left(\frac{\rho(\theta\mathbf{Y_{1}},% \ldots,\mathbf{Y_{N}},\xi)}{p(\theta)}\right)\rho(\theta\mathbf{Y_{1}},\ldots% ,\mathbf{Y_{N}},\xi)d\theta\rho(\{\mathbf{Y_{i}}\}_{i=1}^{N}\xi)d(\{\mathbf{Y% _{i}}\}_{i=1}^{N}).  (23) 
This quantity (23) provides a criterion to determine which features of the setup, \xi, are, on average, most informative when inferring \theta. A larger value of I(\xi) when, say, \xi\in A, suggests that, given the proposed statistical model, the inference on the unknown parameter will be more efficient, on average, when the features of the designed experiment take value in the set A.
Let us label as {\textrm{TC1},\ldots,\textrm{TC7}} the thermocouples from the left boundary to the right boundary, respectively.
The numerical estimations of the information divergence for the synthetic dataset A and of the expected information gain, by using (21) to compute the approximated posterior in Example 1, are shown in Figures 7, 8 and 9, which refer to the following three experimental setups (es’s):

\xi consists of three nonoverlapping time intervals, with the same length, which cover the entire observational period [0,1]\,;

\xi consists of the five inner thermocouples;

\xi is the combination of the two previous experimental setups, es1 and es2.
From the values in Figure 7, which depicts the results from an experimental setup in which the temperature measurements are collected at different time intervals, we may conclude, by virtue of the interpretation of the expected information gain, that the second time interval is the most informative time interval from which to draw inferences on the thermal diffusivity, whereas the last time interval is the least informative one.
Figure 8 summarizes how the expected information gain behaves for the five inner thermocouples experimental setup (es2). Given the synthetic dataset A, the sixth thermocouple (TC6) is the one where the information divergence takes the smallest value. However, when we look at the expected information gain, we may appreciate the nearly symmetric informative content of the thermocouples with respect to the central thermocouple (TC4) and how the expected gain about the thermal diffusivity becomes larger near the central thermocouple.
Finally, we look for the best combination of the two previous experimental setups, and the corresponding results are displayed in Figure 9. We observe that the highest expected information gain is attained at the middle thermocouple (TC4) by using the information collected during the second time interval. Any indication provided by the numerical estimation of the expected information gain is very valuable to an experimentalist, since it suggests the most relevant features to be considered to build up an efficient experiment to infer the unknown parameters of the assumed statistical model.
Predictive Posterior Distribution
In this section, we examine the possibility of predicting the observable temperature at future time intervals after estimating the thermal diffusivity, \theta. More specifically, assume we have inferred \theta using temperature measurements up to time t_{n}. Then, we want to predict the temperature in the next time step, t_{n+1}. Given our Bayesian model, it is necessary to assume the knowledge of the boundary temperature at time t_{n+1}. A typical situation could be given by an experiment in which there is interest in temperature values at inner points for different boundary values.
The predictive posterior distribution, \rho(\mathbf{Y_{n+1}}\left\{\mathbf{Y_{k}}\right\}_{k=1}^{n},T_{L,n+1},T_{R,n% +1}), is given by
\int_{\Theta}\rho(\mathbf{Y_{n+1}}\left\{\mathbf{Y_{k}}\right\}_{k=1}^{n},T_{% L,n+1},T_{R,n+1},\theta)\rho(\theta\left\{\mathbf{Y_{k}}\right\}_{k=1}^{n})\,% d\theta\,,  (24) 
and it is estimated by averaging
\frac{1}{M}\sum_{i=1}^{M}\rho(\mathbf{Y_{n+1}}\left\{\mathbf{Y_{k}}\right\}_{% k=1}^{n},T_{L,n+1},T_{R,n+1},\theta_{i})\,, 
where the \theta_{i}’s are sampled from the posterior distribution of \theta.
Figure 10 shows the onestepahead predictive posterior densities at three different inner thermocouples based on the observations until time t=0.5, when the observed temperature at thermocouples \textrm{TC2},\textrm{TC3} and TC4 were 53.98,55.53 and 57.84\,^{\circ}\mathrm{C} respectively.
We emphasize that this methodology allows us to obtain the kstep ahead predictive posterior density for any inner thermocouple, assuming boundary conditions subject to uncertainty that are adequate for the experiment.
5.3 Example 3
In this example, we consider the case where the thermal diffusivity depends on the space variable, x. The finite element method used to solve the heat equation under such an assumption was presented in Section 3.
Prior distribution of \theta(x)
Assume that the prior distribution of \theta(x) is a lognormal random field with a squared exponential (SE) covariance function. Then, the prior distribution of \log\theta(x) can be expressed using the joint multivariate Gaussian distribution:
\left(\log(\theta(x_{1})),\ldots,\log(\theta(x_{s}))\right)\sim\mathcal{N}_{s}% \left(\boldsymbol{\mu},K\right),  (25) 
where \boldsymbol{\mu}=(\mu,\mu,...,\mu)^{tr}, K_{ij}=Cov(\log(\theta(x_{i})),\log(\theta(x_{j})))=\eta^{2}\exp\left(\frac{% x_{i}x_{j}^{2}}{2\ell}\right)\,,i,j=1,\ldots,s, \eta is the magnitude, and \ell denotes the length scale.
We assume the following priors for the hyperparameters \mu,\eta and \ell (the prior density of \mu and \eta is displayed in Figure 11):
\mu\sim\mathcal{N}\left(0.1,0.1\right),\quad\eta\sim\textrm{halfCauchy}\left(% 0.1\right),\quad\ell\sim U\left(0.5,5\right). 
In this example, we choose a Gaussian prior for \mu and uninformative uniform prior for \ell. The halfCauchy prior for \eta was chosen because it is a practical prior for scale parameters in hierarchical models (polson) (gelman).
Joint posterior distribution of the hyperparameters \mu,\eta,\ell
Given \boldsymbol{\theta}:=\left(\theta(x_{1}),\ldots,\theta(x_{s})\right)^{tr}, let us consider the joint posterior density of the hyperparameters (\mu,\eta,\ell) that characterize the distribution of \log\theta(x).
\displaystyle\rho(\mu,\eta,\ell\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}})\propto% \rho(\mu,\eta,\ell)\int_{\boldsymbol{\Theta}}\rho(\boldsymbol{\theta}\mu,\eta% ,\ell)\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\boldsymbol{\theta})d% \boldsymbol{\theta}. 
Let us introduce the auxiliary variable \mathbf{z}=(z_{1},\dots,z_{s})\sim\mathcal{N}_{s}\left(\mathbf{0},C=\frac{1}{% \eta^{2}}K\right) and consider the change of variables transformation: \log(\theta_{i})=\mu+\eta z_{i} where \theta_{i}:=\theta(x_{i}), z_{i}:=z(x_{i}), i=1,\ldots,s. Then, the prior density of \boldsymbol{\theta} is given by
\displaystyle\rho(\boldsymbol{\theta}\mu,\eta,\ell)  \displaystyle=  \displaystyle\frac{(\eta^{2}2\pi)^{\frac{s}{2}}C^{\frac{1}{2}}}{\theta_{1}% \,\theta_{2}\cdots\theta_{s}}\exp{\left(\frac{(\log{\boldsymbol{\theta}}% \boldsymbol{\mu})^{tr}C(\log{\boldsymbol{\theta}}\boldsymbol{\mu})}{2\eta^{2}% }\right)}  
\displaystyle=  \displaystyle\frac{(2\pi)^{\frac{s}{2}}C^{\frac{1}{2}}}{\eta^{s}e^{s\mu+% \eta(z_{1}+\ldots+z_{s})}}\exp{\left(\frac{1}{2}\mathbf{z}^{tr}C^{1}\mathbf{% z}\right)}  
\displaystyle=  \displaystyle\frac{\rho(\mathbf{z}\ell)}{\eta^{s}e^{s\mu+\eta(z_{1}+\ldots+z_% {s})}}. 
The posterior density of the hyperparameters can be therefore written as
\displaystyle\rho(\mu,\ell,\eta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}})\propto% \rho(\mu,\ell,\eta)\int_{\mathbf{Z}}\frac{\rho(\mathbf{z}\ell)}{\eta^{s}e^{s% \mu+\eta(z_{1}+\ldots+z_{s})}}\,J\,\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}% \mu,\eta,\mathbf{z})d\mathbf{z}, 
where J is the Jacobian matrix of the transformation.
By considering \ell as a nuisance parameter, we obtain
\rho(\mu,\eta\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}})\propto\rho(\mu,\eta)\int_{% \ell}\rho(\ell)\int_{\mathbf{Z}}\rho(\mathbf{z}\ell)\rho(\mathbf{Y_{1}},% \ldots,\mathbf{Y_{N}}\mu,\eta,\mathbf{z})d\mathbf{z}d\ell  (26) 
after \ell is marginalized.
To evaluate the posterior distribution of the hyperparametrs, we need to compute the s+1 dimensional integral in formula (26). Alternatively, Monte Carlo method can be used to approximate these integrations. First, we sample \ell from its prior distribution, \rho(\ell). Then, given \ell we can sample \mathbf{z} and evaluate the joint likelihood function, \rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\mu,\eta,\mathbf{z}), for any pair (\mu,\eta). Therefore, we approximate the nonnormalized posterior distribution using a double sum as follows:
\displaystyle\int_{\ell}\rho(\ell)\int_{\mathbf{Z}}\rho(\mathbf{z}\ell)\rho(% \mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\mu,\eta,\mathbf{z})d\mathbf{z}d\ell  \displaystyle\approx  \displaystyle\frac{1}{M_{\ell}}\sum_{i=1}^{M_{\ell}}\int_{\mathbf{Z}}\rho(% \mathbf{z}\ell_{i})\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\mu,\eta,\mathbf% {z})d\mathbf{z}  
\displaystyle\approx  \displaystyle\frac{1}{M_{\ell}}\frac{1}{M_{z}}\sum_{i=1}^{M_{\ell}}\sum_{j=1}^% {M_{z}}\rho(\mathbf{Y_{1}},\ldots,\mathbf{Y_{N}}\mu,\eta,\mathbf{z}_{j}), 
where \mathbf{z}_{j}\sim\rho(\mathbf{z}\ell_{i}).
Figure 12 shows that the nonnormalized posterior of the hyperparameters \mu and \eta has a unique mode at (0.05,0.025). We use then Laplace’s method to obtain a Gaussian posterior, using the synthetic dataset A, as shown in Figure 13.
A new dataset is now introduced to test our method when \theta depends on x.
Dataset B
To analyze the performance of our inferential technique in the case where the thermal diffusivity parameter depends on the space variable, x, we consider another synthetic dataset (hereafter named dataset B) that is generated similarly to the dataset A, except for the fact that \theta(x) is sampled randomly from the new prior (25) where \mu=0,\eta=0.1 and \ell=5.
Again, we approximate the posterior distribution of the hyperparameters \mu and \eta using Laplace’s approximation given the following priors for the hyperparameters \mu,\eta and \ell:
\mu\sim\mathcal{N}\left(0,0.25\right),\quad\eta\sim\textrm{halfCauchy}\left(0% .5\right),\quad\ell\sim U\left(4,6\right), 
where we assume broad priors for \mu and \eta with a more informative unifrom prior for \ell.
From Figure 15, we find that the maximum a posteriori probability (MAP) estimate is (0.05,0.025) and Laplace’s approximation can be used. By comparing the prior and posterior densities for \mu and \eta in Figures 14 and 16, we can say that the experiment is informative since the posterior concentrates around (0.05,0.025) which is close to the true value.
6 Conclusion
In this work, we developed a general Bayesian approach for onedimensional linear parabolic partial differential equations with noisy boundary conditions. First, we derived the joint likelihood of the thermal diffusivity \theta and the boundary parameters. Second, we approximated the solution of the forward problem, by showing that such solution can be written as a linear function of the boundary conditions. After that, we marginalized out the boundary parameters, under the assumptions that they are well approximated by piecewise linear functions and that they are independent Gaussian random vectors. This approach can be generalized to any wellposed linear partial differential equation.
On the implementation side, we computed the logposterior of the thermal diffusivity in different cases. Besides, we used the Laplace approximation to obtain a Gaussian posterior. In the first example, we used directly the joint likelihood of the thermal diffusivity \theta and the boundary parameters, assuming that the boundary conditions were known. In the second example, we used the marginalized likelihood of \theta, assuming that \theta is a lognormal random variable and, as in the previous example, we obtained an approximated Gaussian posterior distribution, showing that the unknown value of the thermal diffusivity is recovered almost exactly. Moreover we explored two important advantages of using the Bayesian approach, by providing the estimation of the expected information gain for different experimental setups and the predictive posterior distribution of the temperature. We noticed that the temperature measurements from the middle thermocouple at the second time interval are, in general, the most informative measurements. Finally, we considered the case where \theta is a lognormal random field with squared exponential covariance function. In this case, we obtained the joint posterior distribution for the covariance hyperparameters by applying hierarchical Bayesian techniques.
paper˙v5.bbl
Acknowledgments
Part of this work was carried out while F. Ruggeri and M. Scavino were Visiting Professors at KAUST. Z. Sawlan, M. Scavino and R. Tempone are members of the KAUST SRI Center for Uncertainty Quantification in Computational Science and Engineering.
Appendix A
Proof of Theorem 3.2:
First, let us introduce the vectors
\displaystyle\mathop{F_{L,1}}\limits_{(I1)\times 1}  \displaystyle=  \displaystyle\left[\left(\int_{x_{L}}^{x_{R}}\frac{x_{R}x}{x_{R}x_{L}}\phi_% {j}dx\right)_{j}\right],  
\displaystyle\mathop{F_{L,2}}\limits_{(I1)\times 1}  \displaystyle=  \displaystyle\left[\left(\int_{x_{L}}^{x_{R}}\left(\frac{x_{R}x}{x_{R}x_{L}}% \Delta tL_{\boldsymbol{\theta}}\frac{x_{R}x}{x_{R}x_{L}}\right)\phi_{j}dx% \right)_{j}\right],  
\displaystyle\mathop{F_{R,1}}\limits_{(I1)\times 1}  \displaystyle=  \displaystyle\left[\left(\int_{x_{L}}^{x_{R}}\frac{xx_{L}}{x_{R}x_{L}}\phi_% {j}dx\right)_{j}\right],  
\displaystyle\mathop{F_{R,2}}\limits_{(I1)\times 1}  \displaystyle=  \displaystyle\left[\left(\int_{x_{L}}^{x_{R}}\left(\frac{xx_{L}}{x_{R}x_{L}}% \Delta tL_{\boldsymbol{\theta}}\frac{xx_{L}}{x_{R}x_{L}}\right)\phi_{j}dx% \right)_{j}\right], 
and the matrices
\displaystyle\mathop{\mathbf{B}}\limits_{(I1)\times(I1)}  \displaystyle=  \displaystyle\left(M+\Delta tS_{\boldsymbol{\theta}}\right)^{1}M,  
\displaystyle\mathop{\mathbf{F}_{L,n}}\limits_{(I1)\times N+1}  \displaystyle=  \displaystyle\left(\begin{array}[]{cccc}\mathop{\mathbf{0}}\limits_{(I1)% \times(n1)}&\mathop{F_{L,1}}\limits_{(I1)\times 1}&\mathop{F_{L,2}}\limits_{% (I1)\times 1}&\mathop{\mathbf{0}}\limits_{(I1)\times(Nn)}\end{array}\right)% ,\>n=2,\ldots,N1,  
\displaystyle\mathop{\mathbf{F}_{R,n}}\limits_{(I1)\times N+1}  \displaystyle=  \displaystyle\left(\begin{array}[]{cccc}\mathop{\mathbf{0}}\limits_{(I1)% \times(n1)}&\mathop{F_{R,1}}\limits_{(I1)\times 1}&\mathop{F_{R,2}}\limits_{% (I1)\times 1}&\mathop{\mathbf{0}}\limits_{(I1)\times(Nn)}\end{array}\right)% ,\>n=2,\ldots,N1,  
\displaystyle\mathop{\mathbf{F}_{L,1}}\limits_{(I1)\times N+1}  \displaystyle=  \displaystyle\left(\begin{array}[]{ccc}\mathop{F_{L,1}}\limits_{(I1)\times 1}% &\mathop{F_{L,2}}\limits_{(I1)\times 1}&\mathop{\mathbf{0}}\limits_{(I1)% \times(N1)}\end{array}\right),  
\displaystyle\mathop{\mathbf{F}_{R,1}}\limits_{(I1)\times N+1}  \displaystyle=  \displaystyle\left(\begin{array}[]{ccc}\mathop{F_{R,1}}\limits_{(I1)\times 1}% &\mathop{F_{R,2}}\limits_{(I1)\times 1}&\mathop{\mathbf{0}}\limits_{(I1)% \times(N1)}\end{array}\right),  
\displaystyle\mathop{\mathbf{F}_{L,N}}\limits_{(I1)\times N+1}  \displaystyle=  \displaystyle\left(\begin{array}[]{ccc}\mathop{\mathbf{0}}\limits_{(I1)\times% (N1)}&\mathop{F_{L,1}}\limits_{(I1)\times 1}&\mathop{F_{L,2}}\limits_{(I1)% \times 1}\end{array}\right),  
\displaystyle\mathop{\mathbf{F}_{R,N}}\limits_{(I1)\times N+1}  \displaystyle=  \displaystyle\left(\begin{array}[]{ccc}\mathop{\mathbf{0}}\limits_{(I1)\times% (N1)}&\mathop{F_{R,1}}\limits_{(I1)\times 1}&\mathop{F_{R,2}}\limits_{(I1)% \times 1}\end{array}\right). 
Then, the solution of (9) is given by:
\mathbf{u}_{n+1}=B\mathbf{u}_{n}+\left(M+\Delta tS_{\boldsymbol{\theta}}\right% )^{1}\left(\mathbf{F}_{L,n}\mathbf{T}_{L}+\mathbf{F}_{R,n}\mathbf{T}_{R}% \right). 
Applying recursively the previous relation we derive the discrete representation (Duhamel’s formula)
\mathbf{u}_{n}=\mathbf{B}^{n}\mathbf{u}_{0}+\sum_{k=1}^{n}\mathbf{B}^{nk}% \left(M+\Delta tS_{\boldsymbol{\theta}}\right)^{1}\left(\mathbf{F}_{L,k}% \mathbf{T}_{L}+\mathbf{F}_{R,k}\mathbf{T}_{R}\right). 
Now we can build the matrices A_{n}(\boldsymbol{\theta}),\tilde{A}_{L,n}(\boldsymbol{\theta}) and \tilde{A}_{R,n}(\boldsymbol{\theta})\,,n=1,\ldots,N\,, introduced in the expression (14), to recover the solution of the problem (9) as a linear function of the initialboundary conditions, namely:
\displaystyle A_{n}(\boldsymbol{\theta})  \displaystyle=  \displaystyle\mathbf{B}^{n},  
\displaystyle\tilde{A}_{L,n}(\boldsymbol{\theta})  \displaystyle=  \displaystyle\sum_{k=1}^{n}\mathbf{B}^{nk}\left(M+\Delta tS_{\boldsymbol{% \theta}}\right)^{1}\mathbf{F}_{L,k},\>\textrm{and}  
\displaystyle\tilde{A}_{R,n}(\boldsymbol{\theta})  \displaystyle=  \displaystyle\sum_{k=1}^{n}\mathbf{B}^{nk}\left(M+\Delta tS_{\boldsymbol{% \theta}}\right)^{1}\mathbf{F}_{R,k}.\nobreak\hskip 0.0pt\hskip 15.0pt minus 5% .0pt\nobreak\vrule height 7.5pt width 5.0pt depth 2.5pt 
Appendix B
Proof of Theorem 3.3:
From (8) and (14), we can write:
\mathbf{T}_{n}=\mathbf{B}^{n}\mathbf{u}_{0}+\tilde{A}_{L,n}(\boldsymbol{\theta% }){\mathbf{T}}_{L}+\tilde{A}_{R,n}(\boldsymbol{\theta}){\mathbf{T}}_{R}T_{L,n% }F_{L,1}T_{R,n}F_{R,1}, 
Now, define A_{L,n}(\boldsymbol{\theta}) and A_{R,n}(\boldsymbol{\theta}) by:
\displaystyle A_{L,n}(\boldsymbol{\theta})  \displaystyle=  \displaystyle\tilde{A}_{L,n}(\boldsymbol{\theta})+\left(\begin{array}[]{cccc}% \mathop{\mathbf{B}^{n}F_{L,1}}\limits_{(I1)\times 1}&\mathop{\mathbf{0}}% \limits_{(I1)\times(n1)}&\mathop{F_{L,1}}\limits_{(I1)\times 1}&\mathop{% \mathbf{0}}\limits_{(I1)\times(Nn)}\end{array}\right),  
\displaystyle A_{R,n}(\boldsymbol{\theta})  \displaystyle=  \displaystyle\tilde{A}_{R,n}(\boldsymbol{\theta})+\left(\begin{array}[]{cccc}% \mathop{\mathbf{B}^{n}F_{R,1}}\limits_{(I1)\times 1}&\mathop{\mathbf{0}}% \limits_{(I1)\times(n1)}&\mathop{F_{R,1}}\limits_{(I1)\times 1}&\mathop{% \mathbf{0}}\limits_{(I1)\times(Nn)}\end{array}\right). 
Therefore, we obtain equation (15).
Appendix C
Proof of Remark 5.1:
Consider the following backward Euler discretization of the local problem (3) in the interval time, (t_{n},t_{n+1})=(n\Delta t,(n+1)\Delta t):
\left\{\begin{array}[]{rl}\frac{1}{\Delta t}(T_{i,n+1}&T_{i,n})\frac{\theta}% {\Delta x^{2}}(T_{i+1,n+1}2T_{i,n+1}+T_{i1,n+1})=0,\>\>i=2,\ldots,I\\ T_{L,n}=&T_{L}(n\Delta t)\,,\\ T_{R,n}=&T_{R}(n\Delta t)\,.\end{array}\right.  (27) 
To write the discretization (27) in a matrix form, let us introduce the vectors \mathop{\mathbf{T}_{n}}\limits_{(I1)\times 1}=(T_{2,n},\ldots,T_{I,n})^{tr}, n=1,\ldots,N, and the matrix
\mathop{\mathbf{A}}\limits_{(I1)\times(I1)}=\left(\begin{array}[]{cccccc}2&% 1&0&0&\ldots&0\\ 1&2&1&0&\ldots&0\\ 0&1&2&1&\ldots&0\\ \vdots&\ddots&\ddots&\ddots&\ddots&\vdots\\ 0&\ldots&0&1&2&1\\ 0&0&0&\ldots&1&2\end{array}\right)\,. 
In this way, we may write
\frac{1}{\Delta t}(\mathbf{T}_{n+1}\mathbf{T}_{n})\frac{\theta}{\Delta x^{2}% }\mathbf{A}\mathbf{T}_{n+1}=\frac{\theta}{\Delta x^{2}}(T_{L,n+1}\mathbf{v}+T_% {R,n+1}\mathbf{w})\,,  (28) 
where \mathop{\mathbf{v}}\limits_{(I1)\times 1}=(1,0,\ldots,0)^{tr} and \mathop{\mathbf{w}}\limits_{(I1)\times 1}=(0,\ldots,0,1)^{tr}\,.
The expression (28) is equal to
(I_{I1}\theta\frac{\Delta t}{\Delta x^{2}}\mathbf{A})\mathbf{T}_{n+1}=% \mathbf{T}_{n}+\theta\frac{\Delta t}{\Delta x^{2}}(T_{L,n+1}\mathbf{v}+T_{R,n+% 1}\mathbf{w}) 
and letting \frac{\Delta t}{\Delta x^{2}}=\lambda and \mathbf{B}=(I_{I1}\theta\lambda\mathbf{A})^{1}, we obtain
\mathbf{T}_{n+1}=\mathbf{B}\mathbf{T}_{n}+\theta\lambda(T_{L,n+1}\mathbf{B}% \mathbf{v}+T_{R,n+1}\mathbf{B}\mathbf{w})\,. 
Applying recursively the previous relation, we derive
\mathbf{T}_{n}=\mathbf{B}^{n}\mathbf{T}_{0}+\theta\lambda\sum_{k=1}^{n}T_{L,k}% \mathbf{B}^{nk+1}\mathbf{v}+\theta\lambda\sum_{k=1}^{n}T_{R,k}\mathbf{B}^{nk% +1}\mathbf{w}\,, 
whose compact matrix form is
\mathbf{T}_{n}=\mathbf{B}^{n}\mathbf{T}_{0}+\mathbf{C}_{n}\tilde{\mathbf{T}}_{% L}+\mathbf{D}_{n}\tilde{\mathbf{T}}_{R}\,, 
where

\mathop{\tilde{\mathbf{T}}_{L}}\limits_{n\times 1}=(T_{L,1},\ldots,T_{L,n})^{% tr}=(T_{L}(\Delta t),\ldots,T_{L}(n\Delta t))^{tr}\,,

\mathop{\tilde{\mathbf{T}}_{R}}\limits_{n\times 1}=(T_{R,1},\ldots,T_{R,n})^{% tr}=(T_{R}(\Delta t),\ldots,TR(n\Delta t))^{tr}\,,

the matrix \mathop{\mathbf{C}_{n}}\limits_{(I1)\times n} has column vectors \mathbf{c}_{k}=\theta\lambda\mathbf{A}^{nk+1}\mathbf{v}\,,\>k=1,\ldots,n\,,

the matrix \mathop{\mathbf{D}_{n}}\limits_{(I1)\times n} has column vectors \mathbf{d}_{k}=\theta\lambda\mathbf{A}^{nk+1}\mathbf{w}\,,\>k=1,\ldots,n\,.
Now, we can build the matrices \mathop{A_{L,n}(\theta)}\limits_{(I1)\times N} and \mathop{A_{R,n}(\theta)}\limits_{(I1)\times N}\,,n=1,\ldots,N\,, introduced in the expression (16), to recover the solution of the problem (3) for each interval time, (t_{n1},t_{n})\,,n=1,\ldots,N\,, as a linear function of the initialboundary conditions:
\mathop{A_{L,n}(\theta)}\limits_{(I1)\times N}=\left(\begin{array}[]{cc}% \mathop{\mathbf{C}_{n}}\limits_{(I1)\times n}&\mathop{\mathbf{0}}\limits_{(I% 1)\times(Nn)}\end{array}\right)\,, 
\mathop{A_{R,n}(\theta)}\limits_{(I1)\times N}=\left(\begin{array}[]{cc}% \mathop{\mathbf{C}_{n}}\limits_{(I1)\times n}&\mathop{\mathbf{0}}\limits_{(I% 1)\times(Nn)}\end{array}\right)\,. 