Randomized residualbased error estimators for parametrized equations^{†}^{†}thanks: Submitted to the editors \redDATE.
Funding: This work was funded by ONR Grant N000141712077 (ATP).
Abstract
We propose a randomized a posteriori error estimator for reduced order approximations of parametrized (partial) differential equations. The error estimator has several important properties: the effectivity is close to unity with prescribed lower and upper bounds at specified high probability; the estimator does not require the calculation of stability (coercivity, or infsup) constants; the online cost to evaluate the a posteriori error estimator is commensurate with the cost to find the reduced order approximation; the probabilistic bounds extend to many queries with only modest increase in cost. To build this estimator, we first estimate the norm of the error with a MonteCarlo estimator using Gaussian random vectors whose covariance is chosen according to the desired error measure, e.g. userdefined norms or quantity of interest. Then, we introduce a dual problem with random righthand side the solution of which allows us to rewrite the error estimator in terms of the residual of the original equation. In order to have a fasttoevaluate estimator, model order reduction methods can be used to approximate the random dual solutions. Here, we propose a greedy algorithm that is guided by a scalar quantity of interest depending on the error estimator. Numerical experiments on a multiparametric Helmholtz problem demonstrate that this strategy yields rather lowdimensional reduced dual spaces.
Key words. A posteriori error estimation, parametrized equations, projectionbased model order reduction, MonteCarlo estimator, concentration phenomenon, goaloriented error estimation.
AMS subject classifications. 65N15, 65C05, 65N30, 68Q25, 62G15
1 Introduction
Many models for engineering applications, life sciences, environmental issues, or finance depend on parameters which account for variation in the material or geometry but also uncertainty in the data. Often the respective applications require low marginal (i.e. per parameter) computational costs. This is for instance the case in “many query” settings where we require the computation of the solution of the corresponding parametrized equation for many different parameter values. Examples for model order reduction techniques that aim at computationally feasible approximations of such parametrized models are tensorbased methods [11, 24] and the reduced basis (RB) method [14, 26, 9, 27, 30]. In order to ensure say functional safety of a structure, certification of such approximation is of high importance. Moreover, bounding the approximation error to get a handle on the uncertainty induced by the approximation is crucial when using it in the context of uncertainty quantification. The subject of this paper is thus certification of approximations to parametrized equations via an a posteriori error estimator for a large number of parameter queries. Our method is also wellsuited to realtime contexts.
One of the most commonly used error estimators for infsup stable problems is the product of the dual norm of the residual and the inverse of the infsup constant. While the former can usually be computed rapidly, accurate estimation of the infsup constant is in general rather costly. For instance, the Successive Constraint Method (SCM) [17, 5, 16] computes a parameterdependent lower bound of the infsup constant by employing the successive solution to appropriate linear optimization problems. This procedure is usually computationally demanding and can lead to pessimistic error bounds [12].
In this paper we introduce a random a posteriori error estimator which does not require the estimation of stability constants. The error estimator features several other desirable properties. First, it is both reliable and efficient at given high probability and often has an effectivity close to one. Secondly, the effectivity can be bounded from below and above at high probability with constants selected by the user, balancing computational costs and desired sharpness of the estimator. Moreover, the presented framework yields error estimators with respect to userdefined norms, for instance the L^{2}norm or the H^{1}norm; the approach also permits error estimation of linear quantities of interest (QoI). Finally, depending on the desired effectivity the computation of the error estimator is in general only as costly as the computation of the reduced order approximation or even less expensive, which makes our error estimator strategy attractive from a computational viewpoint.
To derive this error estimator, we consider a Gaussian random vector whose covariance matrix is chosen depending on the respective norm or QoI we wish to estimate. Summing the squares of the inner products of K independent copies of that random vector with the approximation error yields an unbiased Monte Carlo estimator. Using concentration inequalities, we control the effectivity of the resulting random error estimator with high probability. This type of random subspace embedding is typically encountered in compressed sensing [7]. By exploiting the errorresidual relationship we recognize that these inner products equal the inner products of the residual and the dual solutions of K dual problems with random righthand sides. Approximating the dual problems via projectionbased model order reduction yields an a posteriori error estimator of low marginal computation cost. To construct the dual reduced space we introduce a greedy algorithm driven by a scalar QoI that assesses how good the fasttoevaluate a posteriori error estimator approximates the original Monte Carlo estimator. This goaloriented strategy outperforms standard dualresidual based greedy algorithms or the Proper Orthogonal Decomposition (POD).
Our a posteriori error estimator is inspired by the probabilistic error estimator for the approximation error in the solution of a system of ordinary differential equations introduced in [4] by Cao and Petzold. To estimate the norm of the error, they employ the small statistical sample method from Kenney and Laub [21], which estimates the norm of a vector by its inner product with a random vector drawn uniformly at random on the unit sphere. Rewriting that inner product using the errorresidual relationship results in an adjoint (or dual) problem with random final time, whose solution is then invoked to estimate the error [4]. This approach is extended to ordinary differential equations via a POD by Homescu et al in [15] and differential algebraic equations in [28]. Also, the effect of perturbations in the initial conditions or parameters on the quality of the approximation of the reduced model is investigated [15, 28]. In our work we extend these concepts to address the general norms of interest within the PDE context, to explicitly address accurate error estimation for any given parameter value within a finite parameter domain, and to address the limit of many queries.
Randomized methods for error estimation are gaining interest in the reduced order modeling community. For instance in [1], randomized techniques are used to speedup the computation of the dual norm of the residual used as an error indicator. By exploiting the fact that the residual manifold is included in a lowdimensional subspace, the authors need appeal to only a few random samples when constructing the random subspace embedding. Instead, our approach targets the true error which, in contrast to the residual, is in general not exactly included in a lowdimensional subspace for the problems we have at hand. Therefore, in our approach, we use different techniques and we determine the number of random sample we need via the cardinality of the parameter set on which we wish to estimate the error. In [19] a probabilistic a posteriori error bound for linear scalarvalued quantities of interest is proposed, with application in sensitivity analysis. Contrary to the method presented in our work, the righthand side of the dual problem in [19] is the linear functional associated with the QoI and randomization is done by assuming that the parameter is a random variable on the parameter set. Another application of randomized techniques, in particular randomized numerical linear algebra [13], to (localized) model order reduction is considered in [3]: a reliable and efficient probabilistic a posteriori error estimator for the difference between a finitedimensional linear operator and its orthogonal projection onto a reduced space is derived; the main idea is to apply the operator to standard Gaussian random vectors and consider the norm of the result. Also in [32], an interpolation of the operator inverse is built via a Frobeniusnorm projection and computed efficiently using randomized methods. An error estimator is obtained by measuring the norm of residual multiplied by the interpolation of the operator inverse, used here as a preconditioner.
We note that also the hierarchical error estimator for the RB method presented in [12] does not require the estimation of any stability constants, such as the infsup constant. In [12] the error is estimated by the distance between two reduced approximations of different accuracies and the computational costs depend highly on the dimension of the (primal) reduced space and are always higher than the costs for the computation of the RB approximation. In contrast, in our approach, the costs associated with the dual problems, and hence estimator evaluation, are commensurate with the cost associated with the (primal) RB approximation. Finally, the reducedordermodel error surrogates (ROMES) method introduced in [8] and the closely related approaches [22, 29, 23] aim at constructing a statistical model for the approximation error. In [8] the statistical model is learned via stochasticprocess datafit methods from a small number of computed error indicators.
The remainder of this article is organized as follows. In LABEL:sec:NormWithGaussian we derive a randomized a posteriori error estimator that estimates the error for a finite number of parameter values at given high probability. As this error estimator still depends on the highdimensional solutions of dual problems, LABEL:sec:DualApproximation is devoted to the reduced order approximation of the dual problems and the analysis of the fasttoevaluate a posteriori error estimator. In LABEL:sec:Numerics we demonstrate several theoretical aspects of the error estimator numerically and finally draw some conclusions in LABEL:sec:Conclusion.
2 Randomized error estimator for parameterdependent equation
2.1 Parameterdependent equations and error measurement
Consider a realvalued^{1}^{1}1Throughout the paper we consider realvalued equations: the extension of our method to the case of complexvalued problems is straightforward using the isomorphy \mathbb{C}=\mathbb{R}^{2}. parameterdependent equation
A(\mu)u(\mu)=f(\mu),  (2.1) 
where the parameter \mu belongs to a parameter set \mathcal{P}\subset\mathbb{R}^{P}. For every queried parameter \mu\in\mathcal{P}, A(\mu)\in\mathbb{R}^{N\times N} is an invertible matrix and f(\mu)\in\mathbb{R}^{N}. We assume we are given an approximation \widetilde{u}(\mu) of the solution u(\mu). In this paper, the goal is to estimate the error
\u(\mu)\widetilde{u}(\mu)\_{\Sigma}. 
Here, \\cdot\_{\Sigma} is either a norm defined by means of a symmetric positivedefinite (SPD) matrix \Sigma\in\mathbb{R}^{N\times N} via \v\_{\Sigma}^{2}=v^{T}\Sigma v for all v\in\mathbb{R}^{N}, or a seminorm if \Sigma is only symmetric positive semidefinite. We highlight that the framework presented in this paper encompasses the estimation of the error in various different norms or the error in some QoI as will be discussed in the remainder of this subsection; see LABEL:tab:choiceOfSigma for a brief summary.
By choosing \Sigma=I_{N}, the identity matrix of size N, \\cdot\_{\Sigma} becomes the canonical norm \\cdot\_{2} of \mathbb{R}^{N}. If problem LABEL:eq:AUB stems from the discretization of a parameterdependent linear partial differential equation, there is usually a natural norm \\cdot\_{X} associated with a Hilbert space of functions X\subset H^{1}(D) for some spatial domain D\subset\mathbb{R}^{d}, d\in\{1,2,3\}. In such a case, there exists a discrete Riesz map R_{X}\in\mathbb{R}^{N\times N} which is a SPD matrix such that \sqrt{(\cdot)^{T}R_{X}(\cdot)}=\\cdot\_{X}. The choice \Sigma=R_{X} implies \\cdot\_{\Sigma}=\\cdot\_{X}, which means that the error is measured with respect to the natural norm of the problem. We may also consider for instance the error in the L^{2}norm by choosing \Sigma=R_{L^{2}(D)}, where the discrete Riesz map R_{L^{2}(D)} is chosen such that (\cdot)^{T}R_{L^{2}(D)}(\cdot)=\\cdot\_{L^{2}(D)}^{2}.
In some cases one is not interested in the solution u(\mu) itself but rather in some QoI defined as a linear function of u(\mu), say
s(\mu)=Lu(\mu)~{}\in\mathbb{R}^{m}, 
for some L\in\mathbb{R}^{m\times N}. In this situation one would like to estimate the error \s(\mu)L\,\widetilde{u}(\mu)\_{W}, where \\cdot\_{W} is a given natural norm on \mathbb{R}^{m} associated with a SPD matrix R_{W} so that \w\_{W}^{2}=w^{T}R_{W}w for all w\in\mathbb{R}^{m}. With the choice \Sigma=L^{T}R_{W}L we can write
\displaystyle\u(\mu)\widetilde{u}(\mu)\_{\Sigma}^{2}  \displaystyle=(u(\mu)\widetilde{u}(\mu))^{T}\big{(}L^{T}R_{W}L\big{)}(u(\mu)% \widetilde{u}(\mu))=\s(\mu)L\widetilde{u}(\mu)\_{W}^{2}, 
so that measuring the error with respect to the norm \\cdot\_{\Sigma} gives the error associated with the QoI. Notice that if m<N the matrix \Sigma is singular and \\cdot\_{\Sigma} is a seminorm. Finally, consider the scalarvalued QoI given by s(\mu)=l^{T}u(\mu) where l\in\mathbb{R}^{N}. This corresponds to previous situation with m=1 and L=l^{T}. The choice \Sigma=l\,l^{T} yields \u(\mu)\widetilde{u}(\mu)\_{\Sigma}^{2}=s(\mu)L\widetilde{u}(\mu), where \cdot denotes the absolute value.
Target error  Choice of \Sigma 

\quad\u(\mu)\widetilde{u}(\mu)\_{2}  \quad\Sigma=I_{N} 
\quad\u(\mu)\widetilde{u}(\mu)\_{X}  \quad\Sigma=R_{X} 
\quad\u(\mu)\widetilde{u}(\mu)\_{L^{2}(D)}  \quad\Sigma=R_{L^{2}(D)} 
\quad\s(\mu)L\widetilde{u}(\mu)\_{W}  \quad\Sigma=L^{T}R_{W}L 
\quads(\mu)l^{T}\widetilde{u}(\mu)  \quad\Sigma=l\,l^{T} 
2.2 Estimating norms using Gaussian maps
Let Z\sim\mathcal{N}(0,\Sigma) be a zero mean Gaussian random vector in \mathbb{R}^{N} with covariance matrix \Sigma\in\mathbb{R}^{N\times N}. Given a vector v\in\mathbb{R}^{N}, for example v=u(\mu)\widetilde{u}(\mu) for some (fixed) parameter \mu\in\mathcal{P}, we can write
\v\_{\Sigma}^{2}=v^{T}\Sigma v=v^{T}\mathbb{E}(ZZ^{T})v=\mathbb{E}((Z^{T}v)^% {2}), 
where \mathbb{E}(\cdot) denotes the mathematical expectation. This means that (Z^{T}v)^{2} is an unbiased estimator of \v\_{\Sigma}^{2}. Let Z_{1},\ldots,Z_{K} be K independent copies of Z and define the random matrix \Phi\in\mathbb{R}^{K\times N} whose ith row is (1/\sqrt{K})Z_{i}^{T}. The matrix \Phi is sometimes called a Gaussian map. Denoting by \\cdot\_{2} the canonical norm of \mathbb{R}^{K}, we can write
\\Phi v\_{2}^{2}=\frac{1}{K}\sum_{i=1}^{K}(Z_{i}^{T}v)^{2}\qquad\text{for % any }v\in\mathbb{R}^{N}.  (2.2) 
In other words, \\Phi v\_{2}^{2} is a Ksample MonteCarlo estimator of \mathbb{E}((Z^{T}v)^{2})=\v\_{\Sigma}^{2}. By the independence of the Z_{i}’s, we have \operatorname{Var}(\\Phi v\_{2}^{2})=\frac{1}{K}\operatorname{Var}(Z^{T}v) so that \\Phi v\_{2}^{2} is a lower variance estimator of \v\_{\Sigma}^{2} compared to (Z^{T}v)^{2}. However, the variance is not always the most relevant criteria to assess the performance of an estimator. In the context of this paper, we rather want to quantify the probability that \\Phi v\_{2}^{2} deviates from \v\_{\Sigma}^{2}. This can be done by noting that, provided \v\_{\Sigma}\neq 0, the random variables (Z_{i}^{T}v)/\v\_{\Sigma} for i=1,\ldots,K are independent standard normal random variables so that we have
\\Phi v\_{2}^{2}=\frac{\v\_{\Sigma}^{2}}{K}\sum_{i=1}^{K}\Big{(}\frac{Z_{i% }^{T}v}{\v\_{\Sigma}}\Big{)}^{2}\sim\frac{\v\_{\Sigma}^{2}}{K}Q, 
where Q\sim\chi^{2}(K) follows a chisquared distribution with K degrees of freedom. Denoting by \mathbb{P}\{A\} the probability of an event A and by \overline{A} the complementary event of A, the previous relation yields
\displaystyle\mathbb{P}\Big{\{}w^{1}\v\_{\Sigma}\leq\\Phi v\_{2}\leq w\v% \_{\Sigma}\Big{\}}  \displaystyle=1\mathbb{P}\big{\{}\overline{Kw^{2}\leq Q\leq Kw^{2}}\big{\}}, 
for any w\geq 1. Then for any given (fixed) vector v\in\mathbb{R}^{N}, the probability that a realization of \\Phi v\_{2} lies between w^{1}\v\_{\Sigma} and w\v\_{\Sigma} is independent of v but also independent of the dimension N. The following proposition gives an upper bound for \mathbb{P}\big{\{}\overline{Kw^{2}\leq Q\leq Kw^{2}}\big{\}} in terms of w and K. The proof, given in LABEL:proof:Chi2Tail, relies on the fact that we have closed form expressions for the law of Q\sim\chi^{2}(K).
Proposition 2.1.
Let Q\sim\chi^{2}(K) be a chisquared random variable with K\geq 3 degrees of freedom. For any w>\sqrt{e} we have
\mathbb{P}\big{\{}\overline{Kw^{2}\leq Q\leq Kw^{2}}\big{\}}\leq\Big{(}\frac{% \sqrt{e}}{w}\Big{)}^{K}. 
LABEL:prop:Chi2Tail shows that the probability \mathbb{P}\big{\{}\overline{Kw^{2}\leq Q\leq Kw^{2}}\big{\}} decays at least exponentially with respect to K, provided w\geq\sqrt{e} and K\geq 3. Then for any v\in\mathbb{R}^{N}, the relation
w^{1}\v\_{\Sigma}\leq\\Phi v\_{2}\leq w\v\_{\Sigma},  (2.3) 
holds with a probability greater than 1(\sqrt{e}/w)^{K}. As expected, a large value of w is beneficial to ensure the probability of failure (\sqrt{e}/w)^{K} to be small. For instance with w=4 and K=6, relation LABEL:eq:controlError holds with a probability larger than 0.995. However, we observe in LABEL:fig:Chi2 that this theoretical result is rather pessimistic since it overestimates the true probability by one order of magnitude for small values of w. Also, we conjecture on LABEL:fig:Chi2 that there is an exponential decay even when w\leq\sqrt{e} (see the blue curve with w=1.5), which is not predicted by LABEL:prop:Chi2Tail.

In many situations we want to estimate the norm of several vectors rather than just one vector solely. This is for instance the case if one has to estimate the norm of the error v=u(\mu)\widetilde{u}(\mu) for many different parameter values \mu\in\mathcal{P}. In that case, one would like to quantify the probability that relation LABEL:eq:controlError holds simultaneously for any vector in a set \mathcal{M}\subset\mathbb{R}^{N}. Assuming \mathcal{M} is finite, a union bound argument — for a detailed proof see LABEL:proof:estimate_many_vectors — yields the following result:
Corollary 2.2.
Given a finite collection of vectors \mathcal{M}=\{v_{1},v_{2},\ldots\}\subset\mathbb{R}^{d} and a failure probability 0<\delta<1. Then, for any w>\sqrt{e} and
K\geq\min\left\{\frac{\log(\#\mathcal{M})+\log(\delta^{1})}{\log(w/\sqrt{e})}% ,\enspace 3\right\}  (2.4) 
we have
\mathbb{P}\Big{\{}w^{1}\v\_{\Sigma}\leq\\Phi v\_{2}\leq w\v\_{\Sigma}~{% },~{}\forall v\in\mathcal{M}\Big{\}}\geq 1\delta.  (2.5) 
LABEL:tab:concentration_set gives numerical values of K that satisfy LABEL:eq:ConditionK_FiniteSet depending on \delta, w and \#\mathcal{M}. For example with \delta=10^{4} and w=10, estimating simultaneously the norm of 10^{9} vectors requires only K=17 samples. Again, we emphasize that this result is independent on the dimension N of the vectors to be estimated.
\delta=10^{2}  w=2  w=4  w=10  \quad  \delta=10^{4}  w=2  w=4  w=10 

\#\mathcal{M}=10^{0}  24  6  3  \#\mathcal{M}=10^{0}  48  11  6  
\#\mathcal{M}=10^{3}  60  13  7  \#\mathcal{M}=10^{3}  84  19  9  
\#\mathcal{M}=10^{6}  96  21  11  \#\mathcal{M}=10^{6}  120  26  13  
\#\mathcal{M}=10^{9}  132  29  15  \#\mathcal{M}=10^{9}  155  34  17 
Remark 2.3 (Comparison with the JohnsonLindenstrauss lemma [6, 20]).
The JohnsonLindenstrauss (JL) lemma states that for any 0<\varepsilon<1 and any finite set \mathcal{M}\subset\mathbb{R}^{N}, the condition K\geq 8\varepsilon^{2}\log(\#\mathcal{M}) ensures the existence of a linear map \Phi:\mathbb{R}^{N}\rightarrow\mathbb{R}^{K} such that (1\varepsilon)\vu\_{2}^{2}\leq\\Phi v\Phi u\_{2}^{2}\leq(1+\varepsilon)% \vu\_{2}^{2}, holds for all u,v\in\mathcal{M}. Replacing \mathcal{M} by \mathcal{M}\cup\{0\} and letting u=0, one has that K\geq 8\varepsilon^{2}\log(\#\mathcal{M}+1) is sufficient to ensure the existence of a \Phi\in\mathbb{R}^{K\times N} such that
\sqrt{1\varepsilon}\v\_{2}\leq\\Phi v\_{2}\leq\sqrt{1+\varepsilon}\v\_{% 2},\quad\text{for all }v\in\mathcal{M}.  (2.6) 
The above relation differs from LABEL:eq:controlError in the sense that the deviation of \\Phi v\_{2} from \v\_{2} is controlled in an additive manner via a parameter \varepsilon instead of a multiplicative way via w. We highlight also the different dependencies of K on \varepsilon and w. In contrast to the requirement in the JL lemma Condition LABEL:eq:ConditionK_FiniteSet permits reduction in the number of required copies K of the random vectors by considering an increased w. Note that the computational complexity of the a posteriori error estimator we propose in this paper crucially depends on K, see LABEL:sec:online_complexity. Since the goal in this paper is to estimate the error we do in general not have to insist on a very accurate estimation of \v\_{2}. Instead, in many situtations it might be preferable to accept a higher effectivity w of the a posteriori error estimator in favour of a faster computational time. We emphasize that the user has the choice here.
Notice also that with the choice w=1/\sqrt{1\varepsilon}, Equation LABEL:eq:tmp29647 implies LABEL:eq:controlError. Then, the JL lemma ensures that LABEL:eq:controlError holds true if K\geq 8(1w^{2})^{2}\log(\#\mathcal{M}+1). Even if we have the same logarithmic dependence on \#\mathcal{M}, this is much larger than what we obtained in LABEL:eq:ConditionK_FiniteSet, already for moderate but especially for large values of w. For example with w=4, \#\mathcal{M}=10^{3} and \delta=10^{2}, JL lemma requires K\geq 63 whereas Condition LABEL:eq:ConditionK_FiniteSet requires only K\geq 13. Finally, we highlight that a similar result to LABEL:eq:controlError has been obtained in [21] for random vectors that are uniformly and randomly selected from the sphere. The multiplicative type of estimates in [21] motivated us to derive similar results for Gaussian vectors.
Remark 2.4 (Drawing Gaussian vectors).
In actual practice we can draw efficiently from Z\sim\mathcal{N}(0,\Sigma) using a factorization of the covariance matrix of the form of \Sigma=U^{T}U, e.g. a (sparse) Cholesky decomposition. It is then sufficient to draw a standard Gaussian vector \widehat{Z} and to compute the matrixvector product Z=U^{T}\widehat{Z}. As pointedout in [1, Remark 2.9], one can take advantage of an potential block structure of \Sigma to build a (nonsquare) factorization U with a negligible computational cost.
2.3 Randomized a posteriori error estimator
We apply the methodology described in the previous subsection to derive a residualbased randomized a posteriori estimator for the error \u(\mu)\widetilde{u}(\mu)\_{\Sigma}. Let \Phi=K^{1/2}[Z_{1},\ldots Z_{K}]^{T} be a random matrix in \mathbb{R}^{K\times n} where Z_{1},\ldots Z_{K} are independent copies of Z\sim\mathcal{N}(0,\Sigma), and consider the error estimator \Delta(\mu)=\\Phi\big{(}u(\mu)\widetilde{u}(\mu)\big{)}\_{2}, or equivalently
\Delta(\mu)=\left(\frac{1}{K}\sum_{k=1}^{K}\Big{(}Z_{i}^{T}\big{(}u(\mu)% \widetilde{u}(\mu)\big{)}\Big{)}^{2}\right)^{1/2}.  (2.7) 
If the parameter set \mathcal{P} is finite, LABEL:prop:estimate_many_vectors with \mathcal{M}=\{u(\mu)\widetilde{u}(\mu);\mu\in\mathcal{P}\} permits control of the quality of the estimate \Delta(\mu) uniformly over \mu\in\mathcal{P}. But in actual practice the parameter set is often of infinite cardinality. Using more sophisticated techniques than just a simple union bound argument should provide results also when \mathcal{P} has infinite cardinality. In this paper, we are however only interested in the case of a finite set of parameter values, as restated in the following corollary.
Corollary 2.5.
Let 0<\delta<1 and w>\sqrt{e}. Given a finite set of parameter values \mathcal{S}\subset\mathcal{P}, the condition
K\geq\min\left\{\frac{\log(\#\mathcal{S})+\log(\delta^{1})}{\log(w/\sqrt{e})}% ,\enspace 3\right\},  (2.8) 
is sufficient to ensure
\mathbb{P}\Big{\{}w^{1}\Delta(\mu)\leq\u(\mu)\widetilde{u}(\mu)\_{\Sigma}% \leq w\Delta(\mu)~{},~{}\forall\mu\in\mathcal{S}\Big{\}}\geq 1\delta. 
It is important to note that Condition LABEL:coro:truth_est_Condition_S depends only on the cardinality of \mathcal{S}. This means that K can be determined only knowing the number of parameters for which we need to estimate the error. However, computing \Delta(\mu) requires the solution u(\mu) of problem LABEL:eq:AUB, which is infeasible in practice. By introducing the residual
r(\mu)=f(\mu)A(\mu)\widetilde{u}(\mu),  (2.9) 
associated with Problem LABEL:eq:AUB and, similar to [4, 15], exploiting the error residual relationship we may albeit rewrite the terms Z_{i}^{T}(u(\mu)\widetilde{u}(\mu)), 1\leq i\leq K as follows:
Z_{i}^{T}(u(\mu)\widetilde{u}(\mu))=Z_{i}^{T}A(\mu)^{1}r(\mu)=(A(\mu)^{T}Z_% {i})^{T}r(\mu).  (2.10) 
The term Z_{i}^{T}(u(\mu)\widetilde{u}(\mu)) thus equals the inner product of the (primal) residual and the solution Y_{i}(\mu)\in\mathbb{R}^{N} of the random dual problems
A(\mu)^{T}Y_{i}(\mu)=Z_{i},\quad 1\leq i\leq K.  (2.11) 
Because of the random right hand side in LABEL:eq:randomDualProblem, the solutions Y_{1}(\mu),\ldots,Y_{K}(\mu) are random vectors. Thanks to the above the error estimator \Delta(\mu) LABEL:defeq:error_est can be rewritten as
\Delta(\mu)=\left(\frac{1}{K}\sum_{i=1}^{K}\big{(}Y_{i}(\mu)^{T}r(\mu)\big{)}^% {2}\right)^{1/2}.  (2.12) 
This shows that \Delta(\mu) can be computed by applying K linear forms to the residual r(\mu). In that sense, \Delta(\mu) can be considered as an a posteriori error estimator. Notice that computing the solutions to LABEL:eq:randomDualProblem is in general as expensive as solving the primal problem LABEL:eq:AUB. In the next section we show how to approximate the dual solutions Y_{1}(\mu),\ldots,Y_{K}(\mu) in order to obtain a fasttoevaluate a posteriori error estimator.
Remark 2.6 (Scalarvalued QoI).
When estimating the error in scalarvalued QoIs of the form of s(\mu)=l^{T}u(\mu), the covariance matrix is \Sigma=l\,l^{T}, see LABEL:sec:motivation. In that case the random vector Z\sim\mathcal{N}(0,\Sigma) follows the same distribution as X\,l where X\sim\mathcal{N}(0,1) is a standard normal random variable (scalar). The random dual problem LABEL:eq:randomDualProblem then becomes A(\mu)^{T}Y_{i}(\mu)=X_{i}\,l and the solution is Y_{i}(\mu)=X_{i}\,q(\mu) where q(\mu) is the solution of the deterministic dual problem A(\mu)^{T}q(\mu)=l. Dual problems of this form are commonly encountered for estimating linear quantities of interest, see [25] for a general presentation and [9, 27, 31] for the application in reduced order modeling.
3 A fasttoevaluate randomized a posteriori error estimator
In order to obtain a fasttoevaluate a posteriori error estimator whose computational complexity is independent of N, we employ projectionbased model order reduction (MOR) techniques to compute approximations of the solutions Y_{1}(\mu),\ldots,Y_{K}(\mu) of the dual problems LABEL:eq:randomDualProblem. To that end, let us assume that we are given a realization of the K random vectors Z_{1},\ldots,Z_{K} and that we have a reduced space \widetilde{\mathcal{Y}}\subset\mathbb{R}^{N} at our disposal; different ways to construct \widetilde{\mathcal{Y}} will be discussed in LABEL:sec:greedy and compared numerically in LABEL:sec:Numerics. Then, we define \widetilde{Y}_{i}(\mu) as the Galerkin projection of Y_{i}(\mu) on \widetilde{\mathcal{Y}}, meaning
\widetilde{Y}_{i}(\mu)\in\widetilde{\mathcal{Y}}\,:\quad\langle A(\mu)^{T}% \widetilde{Y}_{i}(\mu),v\rangle=\langle Z_{i},v\rangle\,,\quad\forall v\in% \widetilde{\mathcal{Y}}.  (3.1) 
Here, \langle v,w\rangle:=v^{T}w for all v,w\in\mathbb{R}^{N}. We emphasize that we employ the same reduced space \widetilde{\mathcal{Y}} for the approximation of the K dual solutions Y_{1}(\mu),\ldots,Y_{K}(\mu). Needless to say that a segregated strategy, where each dual problem has its own reduced space, can also be considered. The advantage of a segregated strategy is that one can easily parallelize the computations, if needed. However, in this paper we focus exclusively on the monolithic approach LABEL:eq:monolithic.
By replacing Y_{i}(\mu) in LABEL:eq:DualTrick by the fasttoevaluate approximation \widetilde{Y}_{i}(\mu), we define a fasttoevaluate a posteriori error estimator as
\widetilde{\Delta}(\mu):=\left(\frac{1}{K}\sum_{i=1}^{K}(\widetilde{Y}_{i}(\mu% )^{T}r(\mu))^{2}\right)^{1/2}.  (3.2) 
We highlight that, in constrast to for instance the “standard” a posteriori error estimator being defined as the product of the reciprocal of a stability constant and the dual norm of the primal residual, \widetilde{\Delta}(\mu) does not contain any constants that require estimation. Moreover, unlike hierarchical error estimators [2, 12] the quality of the approximation used for the error estimator does not depend on the quality of the primal approximation. For a more elaborate comparison we refer to LABEL:sec:online_complexity.
Additionally, we shall show in LABEL:sec:online_complexity that evaluating \mu\mapsto\widetilde{\Delta}(\mu) requires only the solution of one linear system of size n_{\widetilde{\mathcal{Y}}}:=\dim(\widetilde{\mathcal{Y}}), instead of K linear systems of size n_{\widetilde{\mathcal{Y}}} as suggested by LABEL:eq:monolithic. However, before discussing the computational complexity of \widetilde{\Delta}(\mu), we show in LABEL:sec:performance_of_onlineefficient_estimator that under certain conditions \widetilde{\Delta}(\mu) is both a reliable and efficient error estimator at high probability. Based on this analysis we propose in LABEL:sec:greedy different greedy algorithms for constructing the reduced space \widetilde{\mathcal{Y}}.
3.1 Analysis of the fasttoevaluate a posteriori error estimator
First, we relate the relative error in the a posteriori error estimator to the error in the dual residual:
Proposition 3.1.
Assume \Sigma is invertible. The fasttoevaluate error estimator \widetilde{\Delta}(\mu) defined by LABEL:eq:_def_a_post_est_online satisfies
\displaystyle\frac{\Delta(\mu)\widetilde{\Delta}(\mu)}{\u(\mu)\widetilde{% u}(\mu)\_{\Sigma}}  \displaystyle\leq\max_{1\leq i\leq K}\A^{T}(\mu)\widetilde{Y}_{i}(\mu)Z_{i}% \_{\Sigma^{1}}\qquad\text{ for all }\mu\in\mathcal{P}.  (3.3) 
Here, \\cdot\_{\Sigma^{1}} denotes the norm on \mathbb{R}^{N} such that \v\_{\Sigma^{1}}^{2}=v^{T}\Sigma^{1}v for all v\in\mathbb{R}^{N}.
The proof is given in LABEL:proof:dualError. Notice that LABEL:prop:dualError requires \Sigma to be invertible, which excludes the cases where one wants to estimate the error in a vectorvalued QoI, see LABEL:sec:motivation. LABEL:prop:dualError allows us to control the error via \widetilde{\Delta}(\mu), where the effectivity w is enlarged in an additive manner, as stated in the following corollary.
Corollary 3.2.
Suppose we are given a finite set of parameter values \mathcal{S}\subset\mathcal{P} for which we want to estimate the error \u(\mu)\widetilde{u}(\mu)\_{\Sigma}. Let 0<\delta<1, w>\sqrt{e} and assume
K\geq\min\left\{\frac{\log(\#\mathcal{S})+\log(\delta^{1})}{\log(w/\sqrt{e})}% ,\enspace 3\right\}.  (3.4) 
Furthermore, assume that \Sigma is invertible and that we almost surely have \varepsilon\leq w^{1}, where
\varepsilon=\sup_{\mu\in\mathcal{P}}\left\{\max_{1\leq i\leq K}\A^{T}(\mu)% \widetilde{Y}_{i}(\mu)Z_{i}\_{\Sigma^{1}}\right\}.  (3.5) 
Then, we have
\mathbb{P}\Big{\{}(w+\varepsilon)^{1}\widetilde{\Delta}(\mu)\leq\u(\mu)% \widetilde{u}(\mu)\_{\Sigma}\leq\frac{w}{1w\,\varepsilon}\,\widetilde{\Delta% }(\mu),\quad\forall\mu\in\mathcal{S}\Big{\}}\geq 1\delta.  (3.6) 
The proof is given in LABEL:proof:dualErrorAdditive. LABEL:cor:dualErrorAdditive gives a sufficient condition to control the quality of the estimator \widetilde{\Delta}(\mu) over a finite set of parameter values \mathcal{S}\subset\mathcal{P} with high probability. It requires \varepsilon\leq w^{1}, which is equivalent to \A^{T}(\mu)\widetilde{Y}_{i}(\mu)Z_{i}\_{\Sigma^{1}}\leq w^{1} for all \mu\in\mathcal{P} and all 1\leq i\leq K. This condition can be difficult to satisfy in practice. To explain this, let us note that \Z_{i}\_{\Sigma^{1}} is, with high probability^{2}^{2}2To show this, note that \Z_{i}\_{\Sigma^{1}}^{2}\sim\chi^{2}(N) so that, by LABEL:prop:Chi2Tail, relation w^{\prime1}\sqrt{N}\leq\Z_{i}\_{\Sigma^{1}}\leq w^{\prime}\sqrt{N} holds with probability 1(\sqrt{e}/w^{\prime})^{N} for any w^{\prime}\geq\sqrt{e}., of the order of \sqrt{N}. Therefore \varepsilon\leq w^{1} means that the relative dual residual norm ought to be of the order of
\frac{\A^{T}(\mu)\widetilde{Y}_{i}(\mu)Z_{i}\_{\Sigma^{1}}}{\Z_{i}\_{% \Sigma^{1}}}\simeq\frac{1}{w\sqrt{N}}. 
When N\gg 1, the condition \varepsilon\leq w^{1} means that we need a very accurate approximation of the dual variables. For instance with N=10^{6}, the dual residual norm \A^{T}(\mu)\widetilde{Y}_{i}(\mu)Z_{i}\_{\Sigma^{1}}/\Z_{i}\_{\Sigma^{1}} has to be less that 10^{3} for all \mu\in\mathcal{P} and all 1\leq i\leq K, which can be too demanding in actual practice.
Next, we give an alternative way of controlling the quality of \widetilde{\Delta}(\mu). Contrarily to LABEL:cor:dualErrorAdditive, which provides a additive type of control, the following proposition gives a control in an multiplicative manner. The proof in given in LABEL:proof:dualErrorMultiplicative in the appendix.
Proposition 3.3.
Suppose we are given a finite set of parameter values \mathcal{S}\subset\mathcal{P} over which we want to estimate the error \u(\mu)\widetilde{u}(\mu)\_{\Sigma}. Let 0<\delta<1, w>\sqrt{e} and assume
K\geq\min\left\{\frac{\log(\#\mathcal{S})+\log(\delta^{1})}{\log(w/\sqrt{e})}% ,\enspace 3\right\}.  (3.7) 
Then the fasttoevaluate estimator \widetilde{\Delta}(\mu) satisfies
\mathbb{P}\Big{\{}(\alpha w)^{1}\widetilde{\Delta}(\mu)\leq\u(\mu)% \widetilde{u}(\mu)\_{\Sigma}\leq(\alpha w)\,\widetilde{\Delta}(\mu),\quad\mu% \in\mathcal{S},\Big{\}}\geq 1\delta,  (3.8) 
where
\alpha:=\,\max_{\mu\in\mathcal{P}}\left(\max\left\{\frac{\Delta(\mu)}{% \widetilde{\Delta}(\mu)}\,,\,\frac{\widetilde{\Delta}(\mu)}{\Delta(\mu)}\right% \}\right)\geq 1.  (3.9) 
LABEL:prop:dualErrorMultiplicative shows that, with high probability, the error estimator \widetilde{\Delta}(\mu) departs from the true error \u(\mu)\widetilde{u}(\mu)\_{\Sigma} at most by a multiplicative factor (\alpha w)^{1} or (\alpha w). Notice that \alpha is a measure of the distance from \mu\mapsto\widetilde{\Delta}(\mu) to \mu\mapsto\Delta(\mu): if it is close to 1 then \widetilde{\Delta}(\mu) is close to \Delta(\mu) uniformly over the parameter set \mathcal{P}. Unlike LABEL:cor:dualErrorAdditive, LABEL:prop:dualErrorMultiplicative does not require \Sigma to be invertible and, even more importantly, LABEL:eq:dualErrorMultiplicative is valid for all \alpha independent of say w. However, the computation of \alpha can be expensive since it requires the exact error estimator \Delta(\mu) over the whole parameter set \mathcal{P}. Therefore, we propose to use \alpha as a stopping criterion when constructing the dual reduced space to ensure that (\alpha w)^{1}\widetilde{\Delta}(\mu)\leq\u(\mu)\widetilde{u}(\mu)\_{% \Sigma}\leq(\alpha w)\,\widetilde{\Delta}(\mu) holds true for a rich training set \subset\mathcal{P} as we will detail in LABEL:sec:greedy.
3.2 Greedy constructions of the dual reduced space \widetilde{\mathcal{Y}}
3.2.1 Vector point of view of the dual problems
A popular technique to build a reduced space is to take the span of snapshots of the solution. In order to handle the K distinct dual problems, the index “i” in LABEL:eq:randomDualProblem in considered as an additional parameter. Thus, we define the augmented parameter set \mathcal{P}_{K}=\{1,\ldots,K\}\times\mathcal{P} and seek a n_{\widetilde{\mathcal{Y}}}dimensional reduced space of the form of
\widetilde{\mathcal{Y}}=\text{span}\{Y_{i_{1}}(\mu_{1}),\ldots,Y_{i_{n_{% \widetilde{\mathcal{Y}}}}}(\mu_{n_{\widetilde{\mathcal{Y}}}})\},  (3.10) 
where the n_{\widetilde{\mathcal{Y}}} elements (i_{1},\mu_{1}),\ldots,(i_{n_{\widetilde{\mathcal{Y}}}},\mu_{n_{\widetilde{% \mathcal{Y}}}}) are to be chosen in \mathcal{P}_{K}. The RB methodology (see for instance [14, 26, 9, 27] for an introduction) consists in selecting (i_{1},\mu_{1}),\ldots,(i_{n_{\widetilde{\mathcal{Y}}}},\mu_{n_{\widetilde{% \mathcal{Y}}}}) in a greedy fashion [30]. In detail, assuming the j first parameters are given, the (j+1)th parameter is defined as
(i_{j+1},\mu_{j+1})\in\underset{(i,\mu)\in\mathcal{P}_{K}^{\text{train}}}{% \text{argmax}}\A(\mu)^{T}\widetilde{Y}_{i}(\mu)Z_{i}\_{*},  (3.11) 
where \widetilde{Y}_{i}(\mu) is the approximation of Y_{i}(\mu) given by LABEL:eq:monolithic with \widetilde{\mathcal{Y}} defined as in LABEL:eq:RBdualSpaceVector. Here, \mathcal{P}_{K}^{\text{train}}\subset\mathcal{P}_{K} is a sufficiently rich training set with finite cardinality and \\cdot\_{*} denotes an arbitrary norm of \mathbb{R}^{N}. According to LABEL:cor:dualErrorAdditive, it is natural to chose \\cdot\_{*}=\\cdot\_{\Sigma^{1}}, provided \Sigma is invertible. After having computed the snapshot Y_{i_{j+1}}(\mu_{j+1}), the reduced space \widetilde{\mathcal{Y}} is updated using LABEL:eq:RBdualSpaceVector with j\leftarrow j+1. By selecting the parameter (i_{j+1},\mu_{j+1}) according to LABEL:eq:weakGreedy_training, the idea is to construct a reduced space that minimizes the dual residual norm \A(\mu)^{T}\widetilde{Y}_{i}(\mu)Z_{i}\_{*} uniformly over the training set (i,\mu)\in\mathcal{P}_{K}^{\text{train}}.
It remains to define a criterion to stop the greedy iterations. Given a userdefined tolerance tol\geq 0, the use of the stopping criterion
\max_{(i,\mu)\in\mathcal{P}_{K}^{\text{train}}}\A(\mu)^{T}\widetilde{Y}_{i}(% \mu)Z_{i}\_{*}\leq tol,  (3.12) 
ensures that, at the end of the iteration procedure, the residual norm of the dual problem is below tol everywhere on the training set \mathcal{P}_{K}^{\text{train}}. One can relax that criterion by replacing the max in LABEL:eq:stoppingCriterion_MAX_STD by the quantile of order q\in[0,1]:
q\text{}\mathrm{quantile}\Big{\{}\A(\mu)^{T}\widetilde{Y}_{i}(\mu)Z_{i}\_{% *}\,:\,(i,\mu)\in\mathcal{P}_{K}^{\text{train}}\Big{\}}\leq tol.  (3.13) 
Here q\text{}\mathrm{quantile}\{A\} denotes the qth largest entries of a (ordered) set A. With this stopping criterion, the iterations stop when at least a fraction of q points in \mathcal{P}_{K}^{\text{train}} have a dual residual norm below tol. Notice that the q\text{}\mathrm{quantile} and the \max coincides when q=1 so that LABEL:eq:stoppingCriterion_QUANTILE_STD generalizes LABEL:eq:stoppingCriterion_MAX_STD. The resulting greedy algorithm is summarized in LABEL:algo:greedy.
3.2.2 Matrix point of view of the dual problems
In this section we propose another greedy algorithm which relies on a matrix interpretation of the K dual problems LABEL:eq:randomDualProblem. Let us denote by
\mathbf{Y}(\mu)=[Y_{1}(\mu),\ldots,Y_{K}(\mu)]\in\mathbb{R}^{N\times K}, 
the matrix containing the dual solutions. Instead of constructing the reduced space \widetilde{\mathcal{Y}} as the span of vectors Y_{i}(\mu), like in Equation LABEL:eq:RBdualSpaceVector, we now consider reduced spaces of the form of
\widetilde{\mathcal{Y}}=\text{span}\{\mathbf{Y}(\mu_{1})\lambda_{1},\ldots,% \mathbf{Y}(\mu_{n_{\widetilde{\mathcal{Y}}}})\lambda_{n_{\widetilde{\mathcal{Y% }}}}\},  (3.14) 
where \lambda_{1},\ldots,\lambda_{n_{\widetilde{\mathcal{Y}}}} are n_{\widetilde{\mathcal{Y}}} vectors in \mathbb{R}^{K} and where \mu_{1},\ldots,\mu_{n_{\widetilde{\mathcal{Y}}}}\in\mathcal{P}. Notice that if the vectors \lambda_{i} are canonical vectors of \mathbb{R}^{K} we have that \widetilde{\mathcal{Y}} can be written as in LABEL:eq:RBdualSpaceVector. In that sense, the approximation format LABEL:eq:RBdualSpaceMatrix is richer than LABEL:eq:RBdualSpaceVector and we can expect better performance. We also note that the greedy algorithm we propose here shares some similarities with the PODgreedy algorithm introduced in [10].
We now propose a second greedy algorithm inspired by LABEL:prop:dualErrorMultiplicative. Let \mathcal{P}^{\text{train}}\subset\mathcal{P} be again a finite training set and suppose that at step r in the greedy algorithm we have a reduced space \widetilde{\mathcal{Y}} as in LABEL:eq:RBdualSpaceMatrix at our disposal. The first step is to define the next evaluation point \mu_{j+1} as
\mu_{j+1}\in\underset{\mu\in\mathcal{P}^{\text{train}}}{\text{argmax}}\left(% \max\left\{\frac{\Delta(\mu)}{\widetilde{\Delta}(\mu)}\,,\,\frac{\widetilde{% \Delta}(\mu)}{\Delta(\mu)}\right\}\right),  (3.15) 
where we recall that \Delta(\mu)=(\frac{1}{K}\sum_{i=1}^{K}[Z_{i}^{T}(u(\mu)\widetilde{u}(\mu))]^{% 2})^{1/2}. Finding \mu_{j+1} according to LABEL:eq:strongGreedy requires to compute the solution u(\mu) over the training set \mu\in\mathcal{P}^{\text{train}}. If this is not affordable, we can replace u(\mu) by a reference solution u_{\mathrm{ref}}(\mu), which can for instance be a hierarchical approximation of u(\mu), see later in LABEL:sec:Numerics. Then, we introduce the reference error estimator
\Delta_{\mathrm{ref}}(\mu):=\left(\frac{1}{K}\sum_{i=1}^{K}\big{(}Z_{i}^{T}(u_% {\mathrm{ref}}(\mu)\widetilde{u}(\mu))\big{)}^{2}\right)^{1/2} 
and seek \mu_{j+1} as
\mu_{r+1}\in\underset{\mu\in\mathcal{P}^{\text{train}}}{\text{argmax}}\left(% \max\left\{\frac{\Delta_{\mathrm{ref}}(\mu)}{\widetilde{\Delta}(\mu)}\,,\,% \frac{\widetilde{\Delta}(\mu)}{\Delta_{\mathrm{ref}}(\mu)}\right\}\right).  (3.16) 
Once the parameter \mu_{j+1} is found either with LABEL:eq:strongGreedy of with LABEL:eq:strongGreedy_bis, we compute the dual solutions Y_{1}(\mu_{j+1}),\ldots,Y_{K}(\mu_{j+1}) and assemble \mathbf{Y}(\mu_{j+1}). Here we need to solve K linear equations with the same operator A(\mu_{j+1})^{T} but with K different righthand sides, see Equation LABEL:eq:randomDualProblem. This can be done efficiently say be using a Cholesky or LU decomposition and reusing the factorization for the K problems.
The second step is to determine the vector \lambda_{j+1}. In order to maximize the improvement of the reduced space, we propose to define \lambda_{j+1} as follows:
\lambda_{j+1}\in\underset{\lambda\in\mathbb{R}^{K}}{\text{argmax }}\frac{\% \mathbf{Y}(\mu_{j+1})\lambda\widetilde{\mathbf{Y}}(\mu_{j+1})\lambda\_{2}}{% \\lambda\_{2}},  (3.17) 
where \widetilde{\mathbf{Y}}(\mu_{j+1})=[\widetilde{Y}_{1}(\mu_{j+1}),\ldots,% \widetilde{Y}_{K}(\mu_{j+1})]. The rational behind LABEL:eq:strongGreedyLambda is to align \lambda_{j+1} with the direction where the matrix \widetilde{\mathbf{Y}}(\mu_{j+1}) differs the most from \mathbf{Y}(\mu_{j+1}). One can easily show that \lambda_{j+1} defined by LABEL:eq:strongGreedyLambda is the first eigenvector of the KbyK matrix
M(\mu_{j+1})=\big{(}\mathbf{Y}(\mu_{j+1})\widetilde{\mathbf{Y}}(\mu_{j+1})% \big{)}^{T}\big{(}\mathbf{Y}(\mu_{j+1})\widetilde{\mathbf{Y}}(\mu_{j+1})\big{% )}.  (3.18) 
Once \lambda_{j+1} is computed, we set j\leftarrow j+1 and we update the reduced space \widetilde{\mathcal{Y}} using LABEL:eq:RBdualSpaceMatrix. We terminate the algorithm based on the following stopping criteria
q\text{}\mathrm{quantile}\left\{\max\left\{\frac{\Delta(\mu)}{\widetilde{% \Delta}(\mu)}\,,\,\frac{\widetilde{\Delta}(\mu)}{\Delta(\mu)}\right\}\,:\,\mu% \in\mathcal{P}^{\text{train}}\right\}\leq tol. 
The resulting greedy algorithm is summarized in LABEL:algo:qoi_greedy.
Remark 3.4 (Comparison with PODgreedy).
Note that in the PODgreedy algorithm [10] one would consider the orthogonal projection on the reduced space \widetilde{\mathcal{Y}} instead of the actual reduced solutions in LABEL:eq:strongGreedyLambda. However, for problems where the Galerkin projection deviates significantly from the orthogonal projection, we would expect that using the reduced solution gives superior results than the PODgreedy as the latter does not take into account the error due to the Galerkin projection which can be significant for instance close to resonances in a Helmholtz problem. We have performed numerical experiments for the same benchmark problem (parametrized Helmholtz equation) we consider in LABEL:sec:Numerics that confirm this conjecture.
3.3 Computational aspects of the fasttoevaluate error estimator
At a first glance the complexity for evaluating \mu\mapsto\widetilde{\Delta}(\mu) is dominated by the solution of the K reduced problems LABEL:eq:monolithic, meaning K times the solution of a (dense) linear system of equations of size n_{\widetilde{\mathcal{Y}}}. The next proposition, inspired by Lemma 2.7 in [31], shows that one can actually evaluate \mu\mapsto\widetilde{\Delta}(\mu) by solving only one linear system of size n_{\widetilde{\mathcal{Y}}}, which reduces the previous complexity by a factor K; the proof is provided in LABEL:proof:monolithicAlternative. Note however that the complexity for evaluating \mu\mapsto\widetilde{\Delta}(\mu) is not completely independent on K. Indeed, as we employ the same reduced space for the approximation of K dual problems, the dimension of n_{\widetilde{\mathcal{Y}}} depends on K. The rate of the increase of n_{\widetilde{\mathcal{Y}}} for growing K will be investigated in numerical experiments in LABEL:sec:Numerics.
Proposition 3.5.
The error indicator \widetilde{\Delta}(\mu) defined by (LABEL:eq:_def_a_post_est_online) can be written as
\widetilde{\Delta}(\mu)=\left(\frac{1}{K}\sum_{i=1}^{K}\big{(}Z_{i}^{T}\,% \widetilde{e}(\mu)\big{)}^{2}\right)^{1/2},  (3.19) 
where \widetilde{e}(\mu)\in\widetilde{\mathcal{Y}} is the solution to
\widetilde{e}(\mu)\in\widetilde{\mathcal{Y}}\,,\quad\langle A(\mu)\widetilde{e% }(\mu),v\rangle=\langle r(\mu),v\rangle\,,\quad\forall v\in\widetilde{\mathcal% {Y}}.  (3.20) 
Besides giving an alternative way of computing \widetilde{\Delta}(\mu), LABEL:prop:monolithicAlternative also gives a new insight into the fasttoevaluate error estimator. Reformulating Problem LABEL:eq:monolithicAlternative as
\widetilde{e}(\mu)\in\widetilde{\mathcal{Y}}\,,\quad\langle A(\mu)\big{(}% \widetilde{u}(\mu)+\widetilde{e}(\mu)\big{)},v\rangle=\langle f(\mu),v\rangle% \,,\quad\forall v\in\widetilde{\mathcal{Y}}, 
demonstrates that \widetilde{e}(\mu)\in\widetilde{\mathcal{Y}} may be interpreted as a correction of the primal approximation \widetilde{u}(\mu), so that \widetilde{u}(\mu)+\widetilde{e}(\mu) is an enriched solution of the original problem LABEL:eq:AUB compared to \widetilde{u}(\mu). Since \widetilde{\mathcal{Y}} is not designed for improving the primal approximation \widetilde{u}(\mu), one cannot reasonably hope that the correction \widetilde{e}(\mu) improves significantly \widetilde{u}(\mu). However the norm of \widetilde{e}(\mu), estimated by the fasttoevaluate error estimator LABEL:eq:rewrite_err, gives relevant information about the error \u(\mu)\widetilde{u}(\mu)\_{\Sigma}.
Remark 3.6.
Assume A(\mu)=\sum_{q=1}^{Q_{A}}\alpha_{q}(\mu)A_{q} and f(\mu)=\sum_{q=1}^{Q_{f}}\zeta_{q}(\mu)f_{q} with A_{q},f_{q} parameterindependent and consider \widetilde{u}(\mu) as the Galerkin projection onto some primal reduced order space \widetilde{\mathcal{X}} of dimension n_{\widetilde{\mathcal{X}}}. Since all inner products involving high dimensional quantities can be preassembled, the marginal computational complexity of \widetilde{\Delta}(\mu) is \mathcal{O}(Q_{A}n_{\widetilde{\mathcal{Y}}}^{2}+Q_{f}n_{\widetilde{\mathcal{Y% }}}+Q_{A}n_{\widetilde{\mathcal{Y}}}n_{\widetilde{\mathcal{X}}}) for assembling LABEL:eq:monolithicAlternative, \mathcal{O}(n_{\widetilde{\mathcal{Y}}}^{3}) for solving LABEL:eq:monolithicAlternative and \mathcal{O}(Kn_{\widetilde{\mathcal{Y}}}) for calculating LABEL:eq:rewrite_err. For moderate Q_{A} the marginal computational complexity of \widetilde{\Delta}(\mu) is thus dominated by \mathcal{O}(n_{\widetilde{\mathcal{Y}}}^{3}), i.e. the costs for solving LABEL:eq:monolithicAlternative.
Remark 3.7 (Comparison with hierarchical type error estimators [12]).
An alternative strategy for estimating the error is to measure the distance between the approximation \widetilde{u}(\mu)\in\widetilde{\mathcal{X}} and a reference solution u_{\mathrm{ref}}(\mu), which is an improved approximation of u(\mu) compared to \widetilde{u}(\mu). When using projection based model order reduction u_{\mathrm{ref}}(\mu) can be defined as a Galerkin projection onto an enriched reduced space of the form of \widetilde{\mathcal{X}}+\widetilde{\mathcal{Y}}, as proposed in [12]. Unlike our approach, the space \widetilde{\mathcal{Y}} ought to be adapted for capturing the error u(\mu)\widetilde{u}(\mu). The complexity for evaluating such a hierarchical error estimator is dominated by the solution of a dense system of equations of size \text{dim}(\widetilde{\mathcal{X}}+\widetilde{\mathcal{Y}}). In contrast, our approach requires the solution of a system of equations whose size is independent on the dimension of the primal reduced space \widetilde{\mathcal{X}}, see the above LABEL:rmk:computationComplexityDelta.
4 Numerical experiments
We numerically demonstrate various theoretical aspects of the proposed error estimator. Our benchmark is a parameterized Helmholtz equation for which a reduced order solution is obtained by the RB method. Estimating the error in this reduced order model is challenging because, around the resonances, we lose the coercivity of the operator which makes a posteriori error estimation quite difficult with standard methods.
Let us mention here that all the training sets \mathcal{P}^{\text{train}} (or \mathcal{P}^{\text{train}}_{K}) and all the online sets \mathcal{S} are comprised of snapshots selected independently and uniformly at random in \mathcal{P} (or in \mathcal{P}_{K}). Those (random) sets are redrawn at each new simulation, unless mentioned otherwise.
4.1 Benchmark: Multiparametric Helmholtz equation
Consider the parameterized Helmholtz equation
\displaystyle\partial_{x_{1}x_{1}}\mathfrak{u}\mu_{1}\partial_{x_{2}x_{2}}% \mathfrak{u}\mu_{2}\,\mathfrak{u}  \displaystyle=\mathfrak{f}  \displaystyle\text{ in }D:=(0,1)\times(0,1),  
\displaystyle\mathfrak{u}  \displaystyle=0  \displaystyle\text{ on }(0,1)\times\{0\},  
(4.1)  
\displaystyle\partial_{x_{2}}\mathfrak{u}  \displaystyle=\cos(\pi x_{1})  \displaystyle\text{ on }(0,1)\times\{1\},  
\displaystyle\partial_{x_{1}}\mathfrak{u}  \displaystyle=0  \displaystyle\text{ on }\{0,1\}\times(0,1). 
The solution \mathfrak{u}=\mathfrak{u}(\mu) is parameterized by \mu=(\mu_{1},\mu_{2})\in\mathcal{P}:=[0.2,1.2]\times[10,50], where \mu_{1} accounts for anisotropy and \mu_{2} is the wavenumber squared. The source term \mathfrak{f} is defined by \mathfrak{f}(x_{1},x_{2})=\mathfrak{f}_{1}(x_{1})\mathfrak{f}_{2}(x_{2}) for any (x_{1},x_{2})\in D, where
\displaystyle\mathfrak{f}_{1}(x_{1}):=\begin{cases}5&\text{ if }\enspace 0\leq x% _{1}\leq 0.1,\\ 5&\text{ if }\enspace 0.2\leq x_{1}\leq 0.3,\\ 10&\text{ if }\enspace 0.45\leq x_{1}\leq 0.55,\\ 5&\text{ if }\enspace 0.7\leq x_{1}\leq 8,\\ 5&\text{ if }\enspace 0.9\leq x_{1}\leq 1,\\ 0&\text{ else},\end{cases}\quad\text{ and }\mathfrak{f}_{2}(x_{2}):=\begin{% cases}1&\text{ if }\enspace 0.5\leq x_{2}\leq 1,\\ 0&\text{ else}.\end{cases} 
A similar test case with a smaller parameter set has been considered in [16]. The resonances can be determined analytically and are depicted by the black lines in LABEL:fig:ParamDomain. Because of the multiparameter setting, we have resonance surfaces which are more difficult to deal with than an union of isolated resonance frequencies in the singleparameter setting; see [16]. Moreover, we observe that in the region [0.2,0.4]\times[30,50]\subset\mathcal{P} there are quite a few resonance surfaces that are also relatively close together, making this an even more challenging situation both for the construction of suitable reduced models and even more for a posteriori error estimation.
We employ the Finite Element (FE) method to discretize the weak solution of LABEL:eq:model_problem. To that end, we define a FE space X^{h}\subset X:=\{\mathfrak{v}\in H^{1}(D)\,:\,\mathfrak{v}(x_{1},0)=0\} by means of a regular mesh with square elements of edge length h=0.01 and FE basis functions that are piecewise linear in x_{1} and x_{2} direction, resulting in a FE space of N=\dim(X^{h})=10100. The FE approximation \mathfrak{u}^{h}(\mu) is defined as the Galerkin projection of \mathfrak{u}(\mu) on X^{h}, and we denote by u(\mu)\in\mathbb{R}^{N} the vector containing the coefficients of \mathfrak{u}^{h}(\mu) when expressing it in the FE basis. Moreover, we denote by R_{X}\in\mathbb{R}^{N\times N} the discrete Riesz map associated with the H^{1}norm, which is such that u(\mu)R_{X}u(\mu)=\\mathfrak{u}^{h}(\mu)\_{H^{1}(D)}^{2} for any \mu\in\mathcal{P}. By default the covariance matrix \Sigma is always chosen to be \Sigma=R_{X}, unless mentioned otherwise.
We may also consider a QoI defined as the trace of the FE solution on the boundary \Gamma=\{0\}\times(0,1)\subset\partial D, meaning \mathfrak{u}^{h}_{\Gamma}(\mu). We denote by s(\mu)\in\mathbb{R}^{100} the vector containing those entries of u(\mu)\in\mathbb{R}^{N} that are associated with the grid points on \Gamma. Then, we can write s(\mu)=Lu(\mu) where L\in\mathbb{R}^{100\times N} is an extraction matrix. To measure the error associated with the QoI, we use the norm \\cdot\_{W} defined as the discretization of the L^{2}(\Gamma) norm, which is such that \s(\mu)\_{W}=\\mathfrak{u}_{\Gamma}^{h}(\mu)\_{L^{2}(\Gamma)} for any \mu\in\mathcal{P}.
The primal RB approximation \widetilde{u}(\mu) is defined as the Galerkin projection of u(\mu) onto the space of snapshots, meaning \widetilde{u}(\mu)\in\widetilde{\mathcal{X}}:=\text{span}\{u(\mu_{1}),u(\mu_{2% }),\ldots\}, where the parameters \mu_{1},\mu_{2},\ldots are selected in a greedy way based on the dual norm of the residual associated with LABEL:eq:model_problem. Each time we run LABEL:algo:qoi_greedy, we use a reference solution u_{\mathrm{ref}}(\mu) defined as an RB approximation of u(\mu) using n_{\widetilde{\mathcal{X}}}+10 basis functions, where n_{\widetilde{\mathcal{X}}}:=\dim(\widetilde{\mathcal{X}}). Note that this reference solution appears only in the offline stage.
4.2 Randomized a posteriori error estimation with exact dual
We demonstrate here the statistical properties of the error estimator \Delta(\mu) defined by LABEL:eq:DualTrick. LABEL:fig:pdfXi2_Vs_DeltaExactDual shows histograms of the effectivity indices \{\Delta(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma}\,,\,\mu\in\mathcal{S}\} for five different realizations of the vectors Z_{1},\ldots,Z_{K}. Here, the same online set \mathcal{S} with \#\mathcal{S}=10^{4} is used. We observe that for each of the five realizations, the effectivity indices \Delta(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma} lie in the interval [1/w,w] for any \mu\in\mathcal{S}, as predicted by LABEL:coro:truth_est_S. This theoretical bound looks however pessimistic, as the effectivities for K=5 (resp. K=10) lie in the interval [1/w,w] that corresponds to K=10 (resp. K=20). This might be due to the rather crude union bound argument.
The solid lines on LABEL:fig:pdfXi2_Vs_DeltaExactDual represent the probability density function (pdf) of \sqrt{Q/K} where Q\sim\chi^{2}(K). This is the pdf of \Delta(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma} for any fixed \mu. Even though the histograms depicted on LABEL:fig:pdfXi2_Vs_DeltaExactDual are not representing that pdf (instead they represent the distribution of the effectivity index among the set \mathcal{S}), we observe good accordance with the black line. In particular we observe a concentration phenomenon of the histograms around 1 when K increases.
4.3 Approximation of the dual problems
4.3.1 Construction of the dual space
In LABEL:fig:Approx_Dual_RX we compare the maximum, the minimum, the 95% quantile and the 99% quantile of \{\widetilde{\Delta}(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma}:\mu\in% \mathcal{S}\} where the dual reduced space is constructed either by LABEL:algo:greedy (with \\cdot\_{*}=\\cdot\_{R_{X}^{1}}), by LABEL:algo:qoi_greedy or by a POD. We observe that by using LABEL:algo:qoi_greedy we need many fewer dual basis functions than for LABEL:algo:greedy and for the POD. In detail, we see for instance in LABEL:subfig:k5_alt, LABEL:subfig:k5_stand and LABEL:subfig:k5_pod that for K=5 employing LABEL:algo:qoi_greedy requires about n_{\widetilde{\mathcal{Y}}}=20 dual basis functions to have 99% of the samples in the interval [1/3,3], while when using the LABEL:algo:greedy or the POD we need about 35 or 30 basis functions, respectively. We emphasize that for K=20 the difference is even larger. Moreover when considering the QoI (last row of LABEL:fig:Approx_Dual_RX), the difference between LABEL:algo:qoi_greedy and the LABEL:algo:greedy and POD is less pronounced but still considerable. This significant disparity can be explained by the fact that while both the POD and the LABEL:algo:greedy try to approximate the K dual solutions \widetilde{Y}_{1}(\mu),\ldots,\widetilde{Y}_{K}(\mu), LABEL:algo:qoi_greedy is driven by the approximation of the error estimator \Delta(\mu) and thus a scalar quantity; compare the selection criteria LABEL:eq:weakGreedy_training and LABEL:eq:strongGreedy. This also explains why the discrepancy increases significantly for growing K: While POD and LABEL:algo:greedy have to approximate a more complex object (the K dual solutions), we only obtain an additional summand in \widetilde{\Delta}(\mu) for each additional random righthand side. Let us also highlight the significant difference between the maximum value and the 99\% quantile over the parameter set and the somewhat erratic behavior of the maximum, which both seem to be due to the resonance surfaces. As indicated above this motivates considering for instance the 99\% quantile as a stopping criterion in both LABEL:algo:greedy and LABEL:algo:qoi_greedy.
4.3.2 Dimension of the dual space
LABEL:tab:stand_greedy shows statistics of the dimension of the dual reduced space \widetilde{\mathcal{Y}} obtained by LABEL:algo:greedy with different stopping criterion. We observe that except for tol=50 and moderate K the dimension of the dual space is in general quite large. Notice however that the use of LABEL:cor:dualErrorAdditive requires tol\approx\varepsilon\leq 1/w\leq 1, which excludes tol=5 and tol=50. LABEL:fig:ConvGreedyDual_standard shows the evolution of the stopping criteria during the first 80 iterations of LABEL:algo:greedy. We observe a significant impact of K on the convergence profiles: with K=20 the curves do not attain the tolerance tol=0.5, which explains the results we observed in LABEL:tab:stand_greedy.






In comparison, LABEL:algo:qoi_greedy yields much smaller dual reduced spaces; compare LABEL:tab:stand_greedy and LABEL:table:algo_2. We see in LABEL:table:algo_2 that, except for tol=1.5 the dimension of the dual RB space \widetilde{\mathcal{Y}} is smaller than the dimension of the primal RB space \widetilde{\mathcal{X}} when using the 95\%, 97.5\%, 99\%quantiles for the stopping criterion. Moreover, for instance for tol=3 we see that for n_{\widetilde{\mathcal{X}}}=20,30 we can use (significantly) less dual than primal basis functions. However, we also see that tight tolerances for tol will lead in general to dual reduced spaces that have a larger dimension than the primal RB space. As larger tolerances \geq 5 may lead to an significant underestimation of the error (see LABEL:fig:varying_alpha_w), tolerances for tol between 1 and 4 seem to be preferable. LABEL:fig:ConvGreedyDual shows the evolution of the stopping criteria during the first 80 iterations of LABEL:algo:qoi_greedy. Note that for higher tolerances for tol it may happen for a realization that LABEL:algo:qoi_greedy terminates in a valley between two peaks.
Furthermore, we observe in LABEL:table:algo_2 a very large standard deviation of about 10 if we consider the maximum over the offline training set, while for the 95\%,97.5\%,99\% quantiles we have often a standard deviation of about 2. Additionally, the dimension of the dual reduced spaces for the maximum is much larger than for the considered quantiles, but among the considered quantiles we observe only very moderate changes. Again, it seems that this behavior is due to the resonance surfaces. Moreover, as we obtain a very satisfactory effectivity of \widetilde{\Delta}(\mu) when we use for instance the 99\% quantile (see LABEL:subsub:numerics_online_perf), we conclude that using quantiles between 97.5\% and 99\% as a stopping criterion in LABEL:algo:qoi_greedy seems advisable.
Finally, we observe both a very moderate dependency of the dimension of the dual reduced space constructed by LABEL:algo:qoi_greedy on K and a rather mild dependency on n_{\widetilde{\mathcal{X}}}. Therefore, we conjecture that the proposed error estimator might also be applied rather complex problems.
4.3.3 Performance of \widetilde{\Delta}(\mu) on an online parameter set
On LABEL:fig:histograms_online we plot the histograms of \{\widetilde{\Delta}(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma}:\mu\in% \mathcal{S}\} for 5 realizations of the random vectors Z_{1},\ldots,Z_{K}, where the dual reduced space \widetilde{\mathcal{Y}} is built via LABEL:algo:qoi_greedy. We observe a similar behavior as for the error estimator \Delta(\mu) with the exact dual, see LABEL:fig:pdfXi2_Vs_DeltaExactDual. In particular for all \mu\in\mathcal{S} the effectivity index \widetilde{\Delta}(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma} lies between (\alpha w)^{1} and (\alpha w), see LABEL:prop:dualErrorMultiplicative, where \alpha is estimated by tol=2. Finally, we highlight that LABEL:fig:histograms_online demonstrates that with near certainty we obtain effectivities near unity with a dual space dimension on the same order as (or less than) the primal space dimension. Hence the costs for the a posteriori error estimator are about the same as those for constructing the primal approximation.
In order to understand the average performance of the onlineefficient error indicator, we plot in LABEL:fig:varying_alpha_w the histograms of the concatenation of 100 realizations of the effectivity indices \{\widetilde{\Delta}(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma}:\mu\in% \mathcal{S}\}. Here, for each new realization, we redraw the K vectors Z_{1},\ldots,Z_{K}, the training set \mathcal{P}^{\text{train}}, then run LABEL:algo:qoi_greedy to construct the dual reduced space \widetilde{\mathcal{Y}}, and finally redraw the online set \mathcal{S}. We observe that for a larger tolerance tol the histograms are shifted to the left, which seems to be a bit stronger for larger K (corresponding to smaller w). This is due to the fact that LABEL:algo:qoi_greedy is stopped earlier and the dimension of \widetilde{\mathcal{Y}} is not sufficiently large to approximate well the error estimator \Delta(\mu). Nevertheless, we observe that the effectivity indices \widetilde{\Delta}(\mu)/\u(\mu)\widetilde{u}(\mu)\_{\Sigma} are always in the interval [(\alpha w)^{1},(\alpha w)], where \alpha\approx tol, as expected thanks to LABEL:prop:dualErrorMultiplicative. This shows that, even with a rather crude approximation of the dual solutions, it is safe to use the fasttoevaluate error estimator \widetilde{\Delta}(\mu), as the grey area is taking into account the approximation error in the error estimator.
To guarantee that the effectivity indices lie in a userdefined interval of the form of [c^{1},c], it is sufficient to choose \alpha and w such that \alpha w=c, see LABEL:prop:dualErrorMultiplicative. As a consequence there is a degree of freedom in the choice of \alpha and w, meaning in the choice of tol\approx\alpha and K=K(w) via relation LABEL:coro:truth_est_Condition_S. To avoid a too large shift of the histogram to the left as for instance observed for w=2.1 and tol=3 it seems advisable to choose \alpha at least as small as w. Additionally, the plots corresponding to w=3.2,2.1 and \alpha=2 highlight the importance of choosing \alpha small enough compared to w if one is interested in rather tight estimates. However, decreasing \alpha has, as anticipated, a much stronger effect on the dimension of the dual reduced space (see LABEL:fig:varying_alpha_w). Therefore, it seems that for the considered test case choosing \alpha/w\in(1/3,1) seems to be a good compromise between computational costs and effectivity of the error estimator. We also see that for instance w=6.5 and \alpha=3 or \alpha=2 yield already very good results in this direction.
5 Conclusions
In this paper we introduced a randomized a posteriori error estimator for lowrank approximations, which is constantfree and is both reliable and efficient at given high probability. Here, the upper and lower bound of the effectivity is chosen by the user. To derive the error estimator we exploit the concentration phenomenon of Gaussian maps. Exploiting the error residual relationship and approximating the associated random dual problems via projectionbased model order reduction yields a fasttoevaluate a posteriori error estimator. We highlight that we had to put some effort in proving the concentration inequalities but regarding the parametrized problem we only relied on its wellposedness and the definition of the adjoint operator. Therefore, there is some chance that the presented framework might be extended quite easily to say nonlinear problems.
To construct the dual reduced space we employed a greedy algorithm guided by a quantity of interest that assesses the quality of the fasttoevaluate error estimator. The numerical experiments for a multiparametric Helmholtz problem show that we obtain much smaller dual reduced spaces than with a standard greedy driven by the dual norm of the residual or with the POD. Moreover, the numerical experiments demonstrate that for moderate upper bounds for the effectivities of about 20 the dimension of the dual reduced space needs only to be a bit more than half of the dimension of the primal reduced space. If a very tight effectivity bound of about 2 or 3 is desired the dual reduced spaces have to be about twice as large as the primal approximation spaces. We emphasize however that even for larger bounds of the effectivity thanks to the concentration of measure the effectivity is still very often close to one. Furthermore, we observed only a very moderate dependence of the dimension of the dual reduced space on the number of random vectors K, which controls the variance of the estimator and a very mild dependence on the dimension on the (primal) reduced space. This might indicate that the error estimator will also perform well for challenging problems. Finally, we showed that to compute the fasttoevaluate a posteriori error estimator we need to solve one dense linear system of equations of the size of the dimension of the dual reduced space.
Due to the above the proposed a posteriori error estimator features a very favorable computational complexity and its computational costs are often about the same as the costs for the lowrank approximation or even smaller for moderate effectivity bounds. The presented error estimator can thus be more advantageous from a computational viewpoint than error estimators based on the dual norm of the residual and a (costly to estimate) stability constant or hierarchical type error estimators.
A Proofs
A.1 Proof of LABEL:prop:Chi2Tail
First we give a bound for \mathbb{P}\{Q\leq Kw^{2}\}. This quantity corresponds to the cumulative distribution function of the \chi^{2}(K) distribution evaluated at Kw^{2}. We have \mathbb{P}\{Q\leq Kw^{2}\}=\frac{1}{\Gamma(K/2)}\gamma(\frac{K}{2},\frac{K}{2% w^{2}}), where \Gamma(\cdot) is the gamma function such that \Gamma(a)=\int_{0}^{\infty}t^{a1}e^{t}\mathrm{d}t and \gamma(\cdot,\cdot) the lower incomplete gamma function defined by \gamma(a,x)=\int_{0}^{x}t^{a1}e^{t}\mathrm{d}t. Following the lines of [18], we can write \gamma(a,x)\leq\int_{0}^{x}t^{a1}\mathrm{d}t=\frac{1}{a}x^{a} and
\Gamma(a)=\int_{0}^{a}t^{a1}e^{t}\mathrm{d}t+\int_{a}^{\infty}t^{a1}e^{t}% \mathrm{d}t\geq e^{a}\int_{0}^{a}t^{a1}\mathrm{d}t+a^{a1}\int_{a}^{\infty}e% ^{t}\mathrm{d}t=2a^{a1}e^{a}, 
whenever a\geq 1. Then, if K\geq 2 we have
\mathbb{P}\{Q\leq Kw^{2}\}=\frac{1}{\Gamma(K/2)}\,\gamma\Big{(}\frac{K}{2},% \frac{K}{2w^{2}}\Big{)}\leq\frac{(K/2)e^{K/2}}{2(K/2)^{K/2}}\cdot\frac{(K/(2w^% {2}))^{K/2}}{K/2}=\frac{1}{2}\Big{(}\frac{\sqrt{e}}{w}\Big{)}^{K}.  (A.1) 
Now we give a bound for \mathbb{P}\{Q\geq Kw^{2}\}. Using a Markov inequality, for any 0\leq t<1/2 we can write
\displaystyle\mathbb{P}\{Q\geq Kw^{2}\}  \displaystyle=\mathbb{P}\{e^{tQ}\geq e^{tKw^{2}}\}\leq\frac{\mathbb{E}(e^{tQ})% }{e^{tKw^{2}}}=\frac{(12t)^{K/2}}{e^{tKw^{2}}}, 
where for the last equality we used the expression for the momentgenerating function of \chi^{2}(K). The minimum of the above quantity is attained for t=(w^{2}1)/(2w^{2}) so we can write
\displaystyle\mathbb{P}\{Q\geq Kw^{2}\}  \displaystyle\leq(w^{2}e^{1w^{2}})^{K/2}=\Big{(}\frac{\sqrt{e}}{w}\Big{)}^{K}% (w^{2}e^{w^{2}/2})^{K}\leq\Big{(}\frac{\sqrt{e}}{w}\Big{)}^{K}\frac{2^{K}}{e^% {K}}\leq\frac{1}{2}\Big{(}\frac{\sqrt{e}}{w}\Big{)}^{K}, 
for any K\geq 3. Together with LABEL:eq:tmp6392, the previous inequalities allows writing
\mathbb{P}\big{\{}\overline{Kw^{2}\leq Q\leq Kw^{2}}\big{\}}=\mathbb{P}\{Q% \leq Kw^{2}\}+\mathbb{P}\{Q\geq Kw^{2}\}\leq\Big{(}\frac{\sqrt{e}}{w}\Big{)}^% {K}, 
for any K\geq 3, which concludes the proof.
A.2 Proof of LABEL:prop:estimate_many_vectors
A union bound allows writing
\displaystyle\mathbb{P}\Big{\{}w^{1}\v\_{\Sigma}\leq\\Phi v\_{2}\leq w\v% \_{\Sigma}~{},~{}\forall v\in\mathcal{M}\Big{\}}  
\displaystyle\quad\geq 1\sum_{v\in\mathcal{M}}\mathbb{P}\Big{\{}\overline{w^{% 1}\v\_{\Sigma}\leq\\Phi v\_{2}\leq w\v\_{\Sigma}}\Big{\}}  
\displaystyle\quad=1(\#\mathcal{M})~{}\mathbb{P}\big{\{}\overline{Kw^{2}\leq Q% \leq Kw^{2}}\}\geq 1(\#\mathcal{M})\Big{(}\frac{\sqrt{e}}{w}\Big{)}^{K}, 
where, for the last inequality, we used Proposition LABEL:prop:Chi2Tail (assuming w>\sqrt{e} and K\geq 3 hold). Given 0<\delta<1, condition K\geq\frac{\log(\#\mathcal{M})+\log(\delta^{1})}{\log(w/\sqrt{e})}, is equivalent to 1(\#\mathcal{M})(\frac{\sqrt{e}}{w})^{K}\geq 1\delta and ensures that (LABEL:eq:controlError) holds for all v\in\mathcal{M} with probability larger than 1\delta.
A.3 Proof of LABEL:prop:dualError
Let \Psi(\mu)=K^{1/2}[Y_{1}(\mu),\ldots,Y_{K}(\mu)]^{T} and \widetilde{\Psi}(\mu)=K^{1/2}[\widetilde{Y}_{1}(\mu),\ldots,\widetilde{Y}_{K}% (\mu)]^{T} so that, from Equations LABEL:eq:DualTrick and LABEL:eq:_def_a_post_est_online, we can write \Delta(\mu)=\\Psi(\mu)r(\mu)\_{2} and \widetilde{\Delta}(\mu)=\\widetilde{\Psi}(\mu)r(\mu)\_{2}. Using a triangle inequality we can write
\Delta(\mu)\widetilde{\Delta}(\mu)=\big{}\\Psi(\mu)r(\mu)\_{2}\% \widetilde{\Psi}(\mu)r(\mu)\_{2}\big{}\leq\\Psi(\mu)r(\mu)\widetilde{\Psi}% (\mu)r(\mu)\_{2} 
Dividing by \u(\mu)\widetilde{u}(\mu)\_{\Sigma} we can write
\displaystyle\frac{\Delta(\mu)\widetilde{\Delta}(\mu)}{\u(\mu)\widetilde{% u}(\mu)\_{\Sigma}}  \displaystyle\leq\frac{\(\Psi(\mu)\widetilde{\Psi}(\mu))r(\mu)\_{2}}{\u(% \mu)\widetilde{u}(\mu)\_{\Sigma}}=\frac{\(\widetilde{\Psi}(\mu)\Psi(\mu))A% (\mu)(u(\mu)\widetilde{u}(\mu))\_{2}}{\u(\mu)\widetilde{u}(\mu)\_{\Sigma}}  
\displaystyle\leq\sup_{v\in\mathbb{R}^{N}\backslash\{0\}}\frac{\(\widetilde{% \Psi}(\mu)\Psi(\mu))A(\mu)v\_{2}}{\v\_{\Sigma}}  
\displaystyle=\sup_{\v\_{\Sigma}=1}\sqrt{\frac{1}{K}\sum_{i=1}^{K}\big{(}(A(% \mu)^{T}\widetilde{Y}_{i}(\mu)Z_{i})^{T}v\big{)}^{2}}  
\displaystyle\leq\sup_{\v\_{\Sigma}=1}\max_{1\leq i\leq K}(A(\mu)^{T}% \widetilde{Y}_{i}(\mu)Z_{i})^{T}v  
\displaystyle=\max_{1\leq i\leq K}\(A(\mu)^{T}\widetilde{Y}_{i}(\mu)Z_{i})^{% T}v\_{\Sigma^{1}}, 
which yields LABEL:eq:dualError and concludes the proof.
A.4 Proof of LABEL:cor:dualErrorAdditive
By LABEL:prop:dualError we have \Delta(\mu)\widetilde{\Delta}(\mu)\leq\varepsilon\u(\mu)\widetilde{u}(\mu% )\_{\Sigma}, which is equivalent to
\Delta(\mu)\varepsilon\u(\mu)\widetilde{u}(\mu)\_{\Sigma}\leq\widetilde{% \Delta}(\mu)\leq\Delta(\mu)+\varepsilon\u(\mu)\widetilde{u}(\mu)\_{\Sigma}. 
By LABEL:coro:truth_est_S, it holds with probability larger than 1\delta that w^{1}\Delta(\mu)\leq\u(\mu)\widetilde{u}(\mu)\_{\Sigma}\leq w\Delta(\mu) for all \mu\in\mathcal{S}. Then with the same probability we have
(w^{1}\varepsilon)\u(\mu)\widetilde{u}(\mu)\_{\Sigma}\leq\widetilde{% \Delta}(\mu)\leq(w+\varepsilon)\u(\mu)\widetilde{u}(\mu)\_{\Sigma}, 
for all \mu\in\mathcal{S}, which yields LABEL:eq:dualErrorAdditive and concludes the proof.
A.5 Proof of LABEL:prop:dualErrorMultiplicative
By LABEL:coro:truth_est_S, it holds with probability larger than 1\delta that w^{1}\Delta(\mu)\leq\u(\mu)\widetilde{u}(\mu)\_{\Sigma}\leq w\Delta(\mu) for all \mu\in\mathcal{S}. Then with the same probability we have
\u(\mu)\widetilde{u}(\mu)\\leq w\Delta(\mu)\leq w\left(\sup_{\mu^{\prime}% \in\mathcal{S}}\frac{\Delta(\mu^{\prime})}{\widetilde{\Delta}(\mu^{\prime})}% \right)\widetilde{\Delta}(\mu)\overset{\lx@cref{creftype~refnum}{eq:alpha}}{% \leq}(\alpha w)\widetilde{\Delta}(\mu), 
and
\u(\mu)\widetilde{u}(\mu)\\geq w^{1}\Delta(\mu)\geq w^{1}\left(\inf_{\mu^% {\prime}\in\mathcal{S}}\frac{\Delta(\mu^{\prime})}{\widetilde{\Delta}(\mu^{% \prime})}\right)\widetilde{\Delta}(\mu)\overset{\lx@cref{creftype~refnum}{eq:a% lpha}}{\geq}(\alpha w)^{1}\widetilde{\Delta}(\mu), 
for any \mu\in\mathcal{S}, which yields LABEL:eq:dualErrorMultiplicative and concludes the proof.
A.6 Proof of LABEL:prop:monolithicAlternative
By construction, both \widetilde{Y}_{i}(\mu) and \widetilde{e}(\mu) belong to \widetilde{\mathcal{Y}}. Then for all i=1,\ldots,K we can write
\displaystyle\widetilde{Y}_{i}(\mu)^{T}r(\mu)  \displaystyle=\langle r(\mu),\widetilde{Y}_{i}(\mu)\rangle\overset{\lx@cref{% creftype~refnum}{eq:monolithicAlternative}}{=}\langle A(\mu)\widetilde{e}(\mu)% ,\widetilde{Y}_{i}(\mu)\rangle  
\displaystyle=\langle\widetilde{e}(\mu),A(\mu)^{T}\widetilde{Y}_{i}(\mu)% \rangle\overset{\lx@cref{creftype~refnum}{eq:monolithic}}{=}\langle\widetilde{% e}(\mu),Z_{i}\rangle=Z_{i}^{T}\widetilde{e}(\mu). 
Then, by definition LABEL:eq:_def_a_post_est_online we can write
\widetilde{\Delta}(\mu)=\left(\frac{1}{K}\sum_{k=1}^{K}\big{(}\widetilde{Y}_{i% }(\mu)^{T}r(\mu)\big{)}^{2}\right)^{1/2}=\left(\frac{1}{K}\sum_{k=1}^{K}\big{(% }Z_{i}^{T}\widetilde{e}(\mu)\big{)}^{2}\right)^{1/2} 
which gives the result.
References
 [1] O. Balabanov and A. Nouy, Randomized linear algebra for model reduction. Part I: Galerkin methods and error estimation, tech. rep., arXiv:1803.02602, 2018.
 [2] M. Barrault, Y. Maday, N. Nguyen, and A. Patera, An ’empirical interpolation’ method: application to efficient reducedbasis discretization of partial differential equations, C. R. Math. Acad. Sci. Paris Series I, 339 (2004), pp. 667–672.
 [3] A. Buhr and K. Smetana, Randomized Local Model Order Reduction, SIAM J. Sci. Comput., 40 (2018), pp. A2120–A2151.
 [4] Y. Cao and L. Petzold, A posteriori error estimation and global error control for ordinary differential equations by the adjoint method, SIAM J. Sci. Comput., 26 (2004), pp. 359–374.
 [5] Y. Chen, J. S. Hesthaven, Y. Maday, and J. Rodrí guez, Improved successive constraint method based a posteriori error estimate for reduced basis approximation of 2D Maxwell’s problem, M2AN Math. Model. Numer. Anal., 43 (2009), pp. 1099–1116.
 [6] S. Dasgupta and A. Gupta, An elementary proof of the JohnsonLindenstrauss lemma, International Computer Science Institute, Technical Report, (1999), pp. 99–006.
 [7] D. L. Donoho, Compressed sensing, IEEE Transactions on information theory, 52 (2006), pp. 1289–1306.
 [8] M. Drohmann and K. Carlberg, The ROMES method for statistical modeling of reducedordermodel error, SIAM/ASA J. Uncertain. Quantif., 3 (2015), pp. 116–145.
 [9] B. Haasdonk, Reduced basis methods for parametrized PDEs – a tutorial, in Model Reduction and Approximation, P. Benner, A. Cohen, M. Ohlberger, and K. Willcox, eds., SIAM, Philadelphia, PA, 2017, pp. 65–136.
 [10] B. Haasdonk and M. Ohlberger, Reduced basis method for finite volume approximations of parametrized linear evolution equations, M2AN Math. Model. Numer. Anal., 42 (2008), pp. 277–302.
 [11] W. Hackbusch, Tensor spaces and numerical tensor calculus, Springer, Heidelberg, 2012.
 [12] S. Hain, M. Ohlberger, M. Radic, and K. Urban, A Hierarchical APosteriori Error Estimator for the Reduced Basis Method, tech. rep., arXiv:1802.03298, 2018.
 [13] N. Halko, P. G. Martinsson, and J. A. Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, SIAM Rev., 53 (2011), pp. 217–288.
 [14] J. Hesthaven, G. Rozza, and B. Stamm, Certified reduced basis methods for parametrized partial differential equations, Springer Briefs in Mathematics, Springer, Cham, 2016.
 [15] C. Homescu, L. Petzold, and R. Serban, Error Estimation for ReducedOrder Models of Dynamical Systems, SIAM Rev., 49 (2007), pp. 277–299.
 [16] D. Huynh, D. Knezevic, Y. Chen, J. Hesthaven, and A. Patera, A naturalnorm Successive Constraint Method for infsup lower bounds, Comput. Methods Appl. Mech. Engrg., 199 (2010), pp. 1963 – 1975.
 [17] D. B. P. Huynh, G. Rozza, S. Sen, and A. T. Patera, A successive constraint linear optimization method for lower bounds of parametric coercivity and infsup stability constants, C. R. Math. Acad. Sci. Paris, 345 (2007), pp. 473–478.
 [18] G. J. O. Jameson, The incomplete gamma functions, The Mathematical Gazette, 100 (2016), pp. 298–306.
 [19] A. Janon, M. Nodet, and C. Prieur, Goaloriented error estimation for the reduced basis method, with application to sensitivity analysis, J. Sci. Comput., 68 (2016), pp. 21–41.
 [20] W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Contemp. Math., 26 (1984), p. 1.
 [21] C. S. Kenney and A. J. Laub, Smallsample statistical condition estimates for general matrix functions, SIAM J. Sci. Comput., 15 (1994), pp. 36–61.
 [22] A. Manzoni, S. Pagani, and T. Lassila, Accurate Solution of Bayesian Inverse Uncertainty Quantification Problems Combining Reduced Basis Methods and Reduction Error Models, SIAM/ASA J. Uncertain. Quantif., 4 (2016), pp. 380–412.
 [23] A. Moosavi, R. Ştefănescu, and A. Sandu, Multivariate predictions of local reducedordermodel errors and dimensions, Internat. J. Numer. Methods Engrg., 113 (2018), pp. 512–533.
 [24] A. Nouy, Lowrank methods for highdimensional approximation and model order reduction, in Model reduction and approximation, P. Benner, A. Cohen, M. Ohlberger, and K. Willcox, eds., SIAM, Philadelphia, PA, 2017, pp. 171–226.
 [25] N. A. Pierce and M. B. Giles, Adjoint recovery of superconvergent functionals from PDE approximations, SIAM Rev., 42 (2000), pp. 247–264.
 [26] A. Quarteroni, A. Manzoni, and F. Negri, Reduced basis methods for partial differential equations, vol. 92, Springer, Cham, 2016.
 [27] G. Rozza, D. Huynh, and A. Patera, Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations: application to transport and continuum mechanics, Arch. Comput. Meth. Eng., 15 (2008), pp. 229–275.
 [28] R. Serban, C. Homescu, and L. Petzold, The effect of problem perturbations on nonlinear dynamical systems and their reducedorder models, SIAM J. Sci. Comput., 29 (2007), pp. 2621–2643.
 [29] S. Trehan, K. T. Carlberg, and L. J. Durlofsky, Error modeling for surrogates of dynamical systems using machine learning, Internat. J. Numer. Methods Engrg., 112 (2017), pp. 1801–1827.
 [30] K. Veroy, C. Prud’homme, D. V. Rovas, and A. T. Patera, A posteriori error bounds for reducedbasis approximation of parametrized noncoercive and nonlinear elliptic partial differential equations, in Proceedings of the 16th AIAA Computational Fluid Dynamics Conference, vol. 3847, 2003.
 [31] O. Zahm, M. BillaudFriess, and A. Nouy, Projectionbased model order reduction methods for the estimation of vectorvalued variables of interest, SIAM J. Sci. Comput., 39 (2017), pp. A1647–A1674.
 [32] O. Zahm and A. Nouy, Interpolation of inverse operators for preconditioning parameterdependent equations, SIAM J. Sci. Comput., 38 (2016), pp. A1044–A1074.