# Data-Driven Combined State and Parameter Reduction for Extreme-Scale Inverse Problems

## Abstract

In this contribution we present an accelerated optimization-based approach for combined state and parameter reduction of a parametrized linear control system which is then used as a surrogate model in a Bayesian inverse setting. Following the basic ideas presented in [Lieberman, Willcox, Ghattas. Parameter and state model reduction for large-scale statistical inverse settings, SIAM J. Sci. Comput., 32(5):2523-2542, 2010], our approach is based on a generalized data-driven optimization functional in the construction process of the surrogate model and the usage of a trust-region-type solution strategy that results in an additional speed-up of the overall method. In principal, the model reduction procedure is based on the offline construction of appropriate low-dimensional state and parameter spaces and an online inversion step based on the resulting surrogate model that is obtained through projection of the underlying control system onto the reduced spaces. The generalization and enhancements presented in this work are shown to decrease overall computational time and increase accuracy of the reduced order model and thus allow an application to extreme-scale problems. Numerical experiments for a generic model and a fMRI connectivity model are presented in order to compare the computational efficiency of our improved method with the original approach.

## 1Introduction

Many physical, chemical, technical, environmental, or bio-medical applications require the solution of inverse problems for parameter estimation and identification. This is in particular the case for complex dynamical systems where only experimental functional data is accessible via measurements. In neurosciences, a particular application is e.g. the extraction of effective connectivity in neural networks from measured data, such as data from electroencephalography (EEG) or functional magnetic resonance imaging (fMRI).

If a network with many states is considered, the corresponding large-scale inverse problem is often only accessible in reasonable computational time if model reduction is applied to the underlying control system. Moreover, the measured data is subject to statistic errors such that it is reasonable to apply a Bayesian inference approach which tries to identify a distribution on the underlying parameters, rather than computing deterministic parameter values. In the context of connectivity analysis in neurosciences such an inversion approach has been established by Friston and his collaborators in recent years under the synonym Dynamic-Causal-Modeling (DCM) [1]. We in particular refer to [2] and the references therein for further reading.

In this contribution we will consider model reduction for Bayesian inversion of general linear control systems which in particular includes the above mentioned application scenario in neurosciences, but also other input-output systems that may be obtained, for example, in a partial differential equations setting after discretization.

Control systems usually comprise some internal state (), allow external input (), transform the input and state to observed output () and provide different configurations through parameters (). Generally, a control system therefore consists of a dynamical system with vector field and an output functional . In the finite dimensional case (e.g. after discretization) we may consider for , and a control system given by:

with external input , output and parameters . Naturally, the system is equipped with an initial condition for the state, i.e. .

In this contribution, as a general underlying model, linear control systems of the following form are considered:

where now, we have neglected the dependence on time for the ease of exposition. Here, denotes the parametrized system matrix, an input matrix , and an output matrix. The matrix is fully parametrized, meaning each component is an individual parameter, thus is mapped to the system matrix by:

These parameters are unknown up to their prior distribution at the time of reduction, and for models with the number of parameters might make an estimation prohibitively computationally expensive.

For bounded input, the systems stability is characterized by the eigenvalues of . Hence, the parameters composing need to be estimated in a manner that the system remains stable, which can be considered an additional prior information that needs to be taken into account during the (Bayesian) inversion.

Reducing the parameter space, e.g. the number of independent connections in a brain connectivity model, before the parameter estimation, can lower the complexity of optimization significantly. Alternatively, reducing the state space, in our example the number of brain regions or nodes, cuts the computational cost of the necessary integrations during the optimization. The resulting reduced order model approximates the original model up to an error introduced by the reduction procedure. In this contribution, a combined reduction of state and parameter space is presented, allowing the swift inversion of large-scale and even extreme-scale parametrized models.

The inversion procedure incorporating model reduction is usually arranged in two steps. In a first step, called the offline phase, the underlying parametrized model is reduced; here in states and parameters. Second, in an online phase, the reduced models parameters are estimated to fit the observed experimental data.

In Bayesian inverse problems [3] we aim at estimating a probability distribution of the unknown parameters instead of the parameter values directly. If we denote the given data by , the inverse approach computes the so called posterior distribution, , i.e. the probability distribution of the parameters under the given data. The basic underlying mathematics to achieve this goal is Bayes’ rule:

with the prior distribution , reflecting the preliminary knowledge about the parameter distribution. is the likelihood that we observe , given a parameter vector . The likelihood is estimated in the online phase and requires forward integrations of the underlying dynamical system. is the model evidence - the probability of the data.

Up to the normalizing factor , the posterior distribution is proportional to , i.e.

In our problem setting, the observed data is presumed to contain some additive noise , i.e.

In the scope of this work Gaussian noise is assumed. Thus, in this fully Gaussian setting, all probability distributions can be specified in terms of mean and covariance. Therefore, given a prior Gaussian distribution and some experimental data , a distribution proportional to the posterior distribution can be computed by the Gaussian likelihood and prior:

Then, the Maximum-A-Posteriori (MAP) estimator is given by:

which is computed in the online-phase after we have constructed a suitable reduced order model.

Model reduction of parametrized control systems has been investigated in recent years through various approaches. Early interpolation ideas were presented in [5] as well as moment matching techniques e.g. in [6]. Some recent approaches comprise interpolatory schemes with sparse grids [7], superposition of locally reduced models [8] or matrix interpolation [9]. Another approach comes from the reduced basis techniques. Here, in particular the POD-Greedy algorithm has been established [11] in the context of parametrized partial differential equations and transfered to dynamical systems in [13]. We also refer to [14] for an application of reduced basis model reduction in an Bayesian inverse setting.

While all the above mentioned approaches concentrate on state space reduction only, there are also very recent approaches towards simultaneous reduction of state and parameter spaces in the context of large-scale inverse problems. Concerning gramian-based combined parameter and state reduction we in particular refer to [15] and the references therein. In this contribution, however, we are concerned with optimization-based combined state and parameter reduction. Our approach is mainly based on [16] where a procedure to concurrently reduce state and parameter spaces of linear control systems in a Bayesian inversion setting has been introduced. The iterative improvement of a projection as used in [16] is also used in [17] from a gramian-based perspective. This method is related to the state reduction from [18], the parameter reduction from [19] and is generally related to the Hessian-based model reduction ansatz as described in [20]. We also refer to [21] for a corresponding goal-oriented optimization based approach.

Starting from the ideas in [16] we base our approach in this contribution on a generalized data-driven optimization functional and present enhancements of the reduction procedure using ideas from [11].

The article is organized as follows. In Section 2, a short review of the model reduction procedure from [16] is given. In Section 3, we first take into account a data-driven optimization functional, and then present enhancements to the existing approach that result in a reduction of offline computational time. Section 4 summarizes the implementation of the presented algorithm and its extensions. Finally, we evaluate the resulting model reduction approach in numerical experiments. The methods are tested and compared on a generic model and an fMRI connectivity model for synthetic neuronal activity in Section 5.

## 2Combined State and Parameter Reduction

In this section, the method from [16] is briefly reviewed and annotated. For large-scale inverse problems, model reduction becomes relevant to ensure reasonable optimization durations. Commonly, Galerkin or Petrov-Galerkin projections are employed for the low-rank projections of state [20] or parameter [19] spaces. The simultaneous reduction of state and parameter space is based on Galerkin projections , with:

The reduced model is of lower order than the original full-order model. For the reduced states the associated low-rank control system is derived from the original models components:

with a reduced initial condition and the reduced components:

The required state projection and parameter projection are determined iteratively. This iterative assembly of the projection matrices is based on a greedy algorithm [23], that optimizes the error between the high-fidelity original and the low-dimensional reduced model. In each iteration a set of reduced parameters is determined by maximizing the error (see [20]) between the original and the reduced models output using the following objective function

with suitable weights . Here denotes a measure for the error in the output between the reduced model evaluated at the reduced parameter and the full underlying model with the high dimensional parameter. In the original approach [16] the output error measure is chosen as . The regularization in the second term utilizes the prior covariance matrix . In case the covariance matrix is diagonal, the inverse of the covariance matrix, the precision matrix, can be computed by inverting each diagonal component of . Thus, the second summand simply regularizes this maximizer in terms of the provided prior distribution:

which is a weighted -norm. This type of regularization makes use of the prior covariance and thus penalizes parameters of low probability with respect to the prior information.

The presented model order reduction method relies on optimization to fit the reduced parameters optimally. As in [16] and [20] a greedy method can be employed. Yet, one is not restricted to the output error in the -norm. Alternatively, the least-absolute-deviation method (-norm minimization), or the least-maximum-deviation (-norm minimization) can be used, i.e.

More generally, each -norm can be used inside the objective function, which results in a generalized objective functional with for some . The regularization term remains unchanged, since it is solely based on the prior distribution. However, since the evaluation of the output error in the -norm requires high dimensional solves in each step of the optimization loop, it is in general advisable to replace the true output error by some a posteriori error estimator, as e.g. derived and suggested in [11]. Such an a-posteriori error estimator can be computed more efficiently since it does not require full-order time integrations.

Each iteration requires computing the reduced model, based on the last iterations’ projection matrices , as well as the greedy optimization of based on the integration of the full and the reduced model:

The resulting reduced parameters constitute the next basis vector being orthogonalized^{1}

As the dynamic system is linear and time-invariant, a simulation of such system is equivalent to solving a system of linear equations [20]:

Some selection of the solution time series is then incorporated into the state projection by orthogonalization:

In [20] POD-modes are selected as to be included into the projection. A more simple but numerically very efficient approach would be the mean of the time series, as used in [25] and [26]:

and in the discrete case:

In this contribution, we suggest to select from a truncated POD of the orthogonal projection error in the time series with respect to the reduced state space of the preceding iteration. This approach follows the idea of the POD-Greedy procedure proposed in [11].

Using the projections and , the next iteration is performed.

The parameter projection can be initialized with a constant vector as described in [16], yet a more natural choice is the prior mean, assuming it is not identical to zero, . A prior information that is usually implicitly assumed is the underlying systems stability. Hence, without any other prior information one could at least choose as uninformative priors, suggesting stability:

From this initial choice for the parameters the full order system is sampled and the state projection is initialized, for example, by the mean over time of the states:

In summary, the complete reduction algorithm is given by the pseudo-code listing of algorithm ?.

In algorithm ?, describes a snapshot of the states for parameters . corresponds to the selection from the states over time and the `orth`

method orthogonalizes a given matrix. The `argmax`

method represents an optimization procedure, which in the scope of this work is given by, but is not restricted to, an unconstrained optimization.

A reconstruction, after the inference, of the parameters, using the above described reduction procedure of states and parameters, is also accomplished by the computed projections and . Due to the use of Galerkin projections, the inverse projection of and is given by their transpose , , thus:

With this **combined reduction** of parameter and state space, using the parameter projection and the state projection , improves the inversion procedure not only by shortening the integration durations due to the state reduction, but also by decreasing the number of optimizable parameters.

## 3Incorporating Data-Misfit and Trust-Region Methods

In this section we first include experimental data into the model reduction procedure. Second, a trust-region like approach is presented to shorten the duration of the reduced order model construction. These are aimed to accelerate the assembly of the model reducing projections , and thus shorten the overall offline phase.

### 3.1Incorporating Data-Misfit

Inverse problems require some experimental data to which a models parameters are fit during the inversion in the online phase. We thus incorporate the data-misfit into the objective functional in the combined reduction procedure during the offline phase. This is motivated by two arguments. While in the original approach, the surrogate model is constructed to be accurate within the whole parameter space, it can now be tailored towards the parameter ranges that are related to the solution of the inverse problem, which might result in lower dimensional surrogate models that lead to more accurate results in the inversion. If only data-misfit is considered, the objective functional would read (cf. [21])

where denotes a suitably chosen norm. Note, that the usage of the data-misfit will save computational time during the offline phase, as no full order integrations of the model is needed to evaluate the objective functional.

The resulting optimization problem will most likely be ill-posed. Therefore, we will keep the regularization from Equation 2. Furthermore, for the solution of the inverse problem it is advisable to prepare the reduced model in the offline phase in such a way that also parameter ranges in the neighborhood of the optimal parameters associated with the measured data are taken into account. Our suggested generalized objective functional is a combination of with the data-misfit . Using weighting parameters ; we thus obtain

A possible choice that equilibrates the influence of all three terms would be . As a result, the reduced models parameters are determined to encourage matching the reduced model to the provided experimental data instead to a general reduced model with the given priors. To further enforce the fitting of the reduced model to the measured output, we could choose with very small or even which results in an objective functional without the main maximization term of Equation 2, i.e.

Apart from an expected higher accuracy in matching the experimental data, which is the ultimate goal in a later online phase of the inversion procedure, this massively lowers the computational load, since no sampling of the full order model is required. Yet, due to the usage of specific experimental data in the reduction procedure, the resulting reduced model is only valid for fitting this particular data.

### 3.2Trust-Region Strategy

The original model reduction algorithm keeps the dimension of the reduced parameter vector fixed and iterates until convergence or predetermined reduced order. Yet the dimension of the to-be-estimated parameter vector largely determines the offline duration, due to the required integrations during optimization in each iteration. This can be counteracted by a trust-region like strategy, which is loosely related to [22] and [27], where trust-region methods are applied to POD based model reduction. In this contribution the basic trust-region algorithm ([28]) is simplified to allow a swift computation by removing the acceptance step. Due to the optimization during each iteration of the reduction process, which itself iterates until some acceptable bound is reached, an extra acceptance step is not required. Since the dimension of the parameter vector varies over iterations, so for the th iteration:

and an additional mapping from the trust-region parameter vector to the full-order parameter vector is required. This mapping enables the orthogonalization of the current iterations parameter vector into the parameter projection. A simple mapping is given by:

For the next iteration , the parameter vector is extended by:

The trust-region radius is initially set to dimension one, thus instead of initializing the parameter vector with a constant vector or the prior means, it is set to (scalar) one:

Yet, the first column of the parameter projection is still initially set to the prior means:

Then, in the first iteration of the reduction the (scalar) parameter is computed which approximates the full system best,

In each subsequent iteration an additional dimension of the parameter space is added to the trust-region radius. For example, the second iteration optimizes two parameters, the third optimizes three, etc until the given reduced parameter space dimension is reached.

At each instance the full parameter vector is required, it is projected by the inverse (which equates to the transposed, due to the orthogonalization) of the parameter projection :

Hence, starting with a single scalar parameter, that is initialized with , the reduced parameter vector is assembled by iteratively incrementing the dimension.

With this enhancement, in each iteration an optimization problem of lesser or (once only) equal dimension than the original algorithm. Due to the smaller size of the optimization problem the offline time is massively lowered.

## 4Implementation

The trust-region enhancement, together with the data-misfit enhancement from Section 3.1 are modularly included into the algorithm; listing ? showcases the new algorithm.

The algorithm from code listing ? is implemented under the name `optmor`

- **opt**imization-based **m**odel **o**rder **r**eduction. The source code is available from: http://j.mp/optmor under an open-source license and is compatible with OCTAVE and MATLAB. For compatibility reasons the estimation algorithm employed during the reduction is an unconstrained optimization, but can easily be replaced with a constrained optimization function ([19], [16]). To remain configurable the here described enhancements can each be used optionally; either individually or in combination. Additionally, the usage of a source term and a feed-forward matrix is implemented as well, allowing models of the form:

to be reduced.

The interface of the `optmor`

program is given by:

with being a vector containing the parameter prior mean. The argument is a function handle to a mapping from a given parameter vector to a system matrix using the signature: `A = @(p)`

. If the inverse mathematical vectorization map, , will be assumed as parameter mapping. is the input matrix, is the output matrix, is the feed-forward matrix and is the source term. The vector is a three component vector holding the start time, time step and end time, while the scalar holds the targeted reduced dimension. Furthermore, the vector represents the initial value and the matrix provides the input or control, for each time step. The argument holds the associated prior covariance matrix; for a unit covariance matrix is assumed. is a six component vector holding the configurable options. Optionally, may hold experimental output time series, required for the data-misfit enhancements. The algorithm returns two projection matrices, and , for parameter and state projections respectively.

For selecting from a snapshot also the POD-Greedy method [11] using the error system is implemented. As described above, in [20] a system of linear equations is solved to simulate the forward model. Alternatively, a single-step solver, like a Runge-Kutta, or multi-step solver, like Adams-Bashforth, can be utilized to solve the system. An advantage of using such solvers is the lesser memory requirements opposed to solving a linear system with dimension being the product of states and time-steps.

## 5Numerical Results

To demonstrate the capabilities of this approach in combined state and parameter reduction, two types of models are tested. First, a generic control system as described in Equation 1 is tested. Second, the combined reduction is applied to a linearized system for the inversion of fMRI data to deduce connectivity between brain regions [29]. Lastly, an extreme-scale problem is tested as well as an evaluation of the effectivity of the reduction method for different configurations.

### 5.1Online Phase

In the online phase, the estimation of the parameter (distribution) is accomplished by a least-squares minimization of the residual between reduced order model output and experimental data. The objective function employed in the optimization of the full-order model is given by:

whereas for the reduced models an adapted objective function of the following form is utilized:

Here, the parameter estimation is performed with an unconstrained (least-squares) optimization with regularization for the full-order and reduced order models.

### 5.2Generic Linear Control System

The `optmor`

implementation is tested with a generic linear control system. As mentioned above, we will assume is fully parametrized, hence . The number of states is varied with , and thus ; while the number of inputs and outputs is fixed to . Systems with these dimensions are generated randomly, but with ensured stability^{2}

As a baseline, the full-order model is estimated without employing any reduction. Since in this case the full-order models (high-dimensional) parameters are approximated, there is only an online phase.

Next, the presented data-driven (section Section 3.1) and trust-region (section Section 3.2) extensions are tested and compared individually and in combination for this parametrized linear control system. In Figure 1 the offline and online durations as well as the relative error in outputs is shown for the enhancements in comparison to the full-order optimization and the original reduction method.

The additional data-misfit term, of the data-driven enhancement, increases the offline phase duration, but reduces the relative error and online time. The newly introduced trust-region enhancement greatly reduces offline phase duration as predicted and also to a lesser degree the online time compared to the original and data-misfit method. Compared to the original methods’ [16] relative error, the relative error behaves slightly better. Combining the data-driven and trust-region approach significantly shortens the offline phase, yielding a slightly higher relative error compared to the data-driven method but below the original algorithms error. Lastly, excluding the minimization term from the objective function (), again reduces the offline time while not affecting online time and relative error. Thus, especially the combination of the data-driven and trust-region enhancements massively accelerates the reduction process.

Assessing the effectivity of the reduction, using the combined data-driven and trust-region approach, in terms of total (offline and online) time compared to the full-order solution and the original algorithm results in speed-ups of up to two orders of magnitude as listed in Table 1.

State Dimension | Speed Up (vs. Full-Order) | Speed Up (vs. Original) |
---|---|---|

9 | ||

16 | ||

25 | ||

36 |

### 5.3fMRI Connectivity Model

A method to infer connectivity for different regions of the brain based on experimental data recorded by fMRI or fNIRS is known as Effective Connectivity [2]. There are two sub-models composing the underlying model of Effective Connectivity which is a concept that is closely related to Dynamic-Causal-Modeling [1]. The dynamic sub-model represents the network of the observed brain regions by a controlled linear system:

The forward sub-model converts each state of the dynamic sub-model to the observed measurements. In the case of fMRI observation the forward-submodel is given by the nonlinear hemodynamic model [1]:

with the parameters . As the parameters are not part of the dynamic system they will be excluded from the reduction and estimation and remain fixed at their prior value.

In the scope of this work a linearized fMRI forward sub-model from [29] is utilized to be applicable in a fully linear setting:

Thus, each state of the common dynamic sub-model, given by a linear control system, has an individual SISO control system attached of which the output reflects fMRI measurements.

The dynamic and the linearized forward sub-models need to be rearranged to fit the linear control system framework:

with being the Kronecker matrix, whose only nonzero element is at .

For the following experiments the dynamic sub-model’s control system is embedded into the fMRI connectivity model. Since the inference targets the connectivity parameters, each region is assumed to have the same hemodynamic parameters [1]. The number of regions is varied with , which leads to . Thus, ; and as each region is potentially able to receive external input the number of inputs equals the number of regions. A connectivity matrix is generated randomly, but stable and input will be given by an initial delta impulse. The prior mean of the parameters is set to on the diagonal, off the diagonal, thus ensuring initial stability of the system (); while the prior covariance is set to the unit matrix . For the hemodynamic parameters, the prior values assumed for are listed in Table 2. In the following applied reduction methods the POD-Greedy state selection will be used, since it seems the most robust for this model.

Parameter | Mean | Covariance |
---|---|---|

0.65 | 0.001 | |

0.41 | 0.001 | |

0.98 | 0.001 | |

0.34 | 0.001 | |

0.32 | 0.001 | |

1.00 | 0 | |

1.00 | 0 | |

1.00 | 0 |

For all the reductions the number of iterations is fixed to the number of regions, which allows comparison of the different combined reduction variants of the same reduced order. Figure 2 depicts the results using the same setup as for the previous comparison.

In the offline phase the performance is similar to the generic network, again the original reduction method and data-misfit enhancement visibly consume more time due the relatively higher state dimension. Here, the data-driven method provides a low relative error, while the original algorithm may produce outliers with high errors. For these larger models the combination of trust-region and data-driven enhancements results in shorter offline and online phases, yet with higher relative errors. Again, the exclusive use of data-driven and trust-region () extensions further reduces the offline time without increasing online time or relative error.

Inspecting the total time for solving the inverse problem compared to the full-order and original algorithms’ durations is summarized in Table 3. Again a speed-up of up to two orders of magnitude is obtained.

State Dimension | Speed-Up (vs. Full-Order) | Speed-Up (vs. Original) |
---|---|---|

45 | ||

80 | ||

125 | ||

180 |

### 5.4Extreme-Scale Experiment

The very short offline phase of the trust-region based reduction process allows a combined reduction of extreme-scale problems [30]. As a demonstration of the efficiency under such conditions, we finally look at the generic model with states which implies a parameter space dimension of . The optimization of the full-order model utilizing an unconstrained optimization, for example a trust-region newton approach^{3}

In cases with more complicated parametrization, however, gradients (and Hessians) might not be available or complicated to obtain. Then, the trust-region-based method presented in this work allows a swift model reduction. Table 4 shows the offline time, online time and relative error in outputs of the reduced order system after inference to the original system.

Method | Offline Time [s] | Online Time [s] | Relative Output Error |
---|---|---|---|

Full Order | - | - | - |

Original | - | - | - |

Data-Driven | - | - | - |

Trust+Data | 1143.51 | 20.00 | 0.0151 |

Trust+Data-Only | 276.74 | 22.31 | 0.0151 |

The full order optimization as well as original optimization-based reduction method and the data-driven approach were not able to complete the optimization. This is due to the memory requirements of the unconstrained optimization; since no gradient or Hessian is provided a finite difference scheme is employed to approximate the derivatives. The required matrices of size exceeded the test systems memory by far, for the finite difference scheme employed to approximate the derivatives.

As depicted in Table 4, the combined trust-region/data-driven and trust-region/data-only methods, however, were able to compute a result with relative output error, as never an optimization of the full parameter space had to be performed. Both methods resulted in a comparable online time, while the trust-region/data-driven-only approach has an additional speed-up in the offline-phase, since never a full-order system had to be solved during the Greedy optimizations. We expect a comparable offline performance for the trust-region/data-driven approach, if an efficient to evaluate error estimator would be used as an output error measure instead of the true reduction error that was employed in these numerical experiments.

## 6Conclusion and Outlook

In this contribution we proposed a new data-driven and trust-region approach for projection based model reduction in Bayesian inverse problems. Both new ingredients improve the performance of the proposed combined state and parameter model reduction algorithm. While the data-driven extension is able to reduce relative output errors, the trust-region enhancement massively lowers offline duration in the greedy optimization, yet only slightly increases the relative error. The combination of both results in a shortened offline duration undercutting all previous, but the relative output error corresponds to that of the original reduction algorithm introduced in [16]. Due to the very short offline durations, our new approach allows an efficient inversion of large-scale and even extreme-scale problems as demonstrated in the numerical experiments.

In our numerical experiments, the optimization method inside the reduction algorithms was restricted to an unconstrained optimization due to ensure compatibility for different platforms. A constrained optimization with provided gradient and Hessian as well as sparse large-scale facilities can additionally improve the optimization during reduction and the actual parameter distribution estimation.

More custom parametrizations than the basic full parametrization can be introduced and thus enable the reduction of more complex models as demonstrated by the fMRI example. Further research will encompass the generalization of this approach to certain classes of nonlinear models [31], for example bilinear systems could be reduced with this method requiring only minor adaption.

## 7Acknowledgements

This work was supported by the Deutsche Forschungsgemeinschaft, DFG EXC 1003 Cells in Motion - Cluster of Excellence, Münster, Germany as well as by the Center for Developing Mathematics in Interaction, DEMAIN, Münster, Germany.

### Footnotes

- The orthogonalization of state and parameter projection can be accomplished by various algorithms for example Gram-Schmidt, Householder-Reflections, Givens-Rotation or Singular Value Decomposition
- For the real part of the eigenvalues of the system matrix holds:
- as MATLAB’s
`fminunc`

### References

**Dynamic causal modelling.**

K.J. Friston, L. Harrison, and W. Penny.*Neuroimage*, 19(4):1273–1302, 2003.**Models of Effective Connectivity in Neural Systems.**

K.E. Stephan and K.J. Friston. In V.K. Jirsa and A.R. McIntosh, editors,*Handbook of Brain Connectivity*, Understanding Complex Systems, pages 303–327. Springer Berlin Heidelberg, 2007.**Inverse problems: a Bayesian perspective.**

A.W. Stuart.*Acta Numerica*, 19(1):451–559, 2010.*Large-Scale Inverse Problems and Quantification of Uncertainty*.

L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, D. Keyes, B. Mallick, L. Tenorio, B. Waanders, K. Willcox, and Y. Marzouk. Wiley Series in Computational Statistics. Wiley, 2011.**A method for generating rational interpolant reduced order models of two- parameter linear systems.**

S. Weile, E. Michielssen, E. Grimme, and K. Gallivan.*Appl. Math. Lett.*, 12(5):93–102, 1999.**A Robust Algorithm for Parametric Model Order Reduction.**

L. Feng and P. Benner. In*PAMM*, volume 7(1), page 1021501–1021502, 2007.**Parametrische Modellreduktion mit dünnen Gittern.**

U. Baur and P. Benner. In*GMA-Fachausschuss 1.30, Modellbildung, Identifizierung und Simulation in der Automatisierungstechnik, Salzburg ISBN 978-3-9502451-3-4*, 2008.**Efficient Order Reduction of Parametric and Nonlinear Models by Superposition of Locally Reduced Models.**

B. Lohmann and R. Eid. In*Methoden und Anwendungen der Regelungstechnik. Erlangen-Münchener Workshops 2007 und 2008*. Shaker Verlag, Aachen, 2009.**Stability-preserving parametric model reduction by matrix interpolation.**

R. Eid, R. Castañé-Selga, H. Panzer, T. Wolf, and B. Lohmann.*MATH. COMP. MODEL. DYN.*, 17(4):319–335, 2011.**An online method for interpolating linear parametric reduced-order models.**

D. Amsallem and C. Farhat.*SIAM J. Sci. Comput.*, 33(5):2169–2198, 2011.**Reduced basis method for finite volume approximations of parametrized linear evolution equations.**

B. Haasdonk and M. Ohlberger.*M2AN*, 42(2):277–302, 2008.**Convergence rates of thePOD-greedy method.**

B. Haasdonk.*M2AN*, 47(3):859–873, 2013.**Efficient reduced models and***a posteriori*error estimation for parametrized dynamical systems by offline/online decomposition.

B. Haasdonk and M. Ohlberger.*MATH. COMP. MODEL. DYN.*, 17(2):145–161, 2011.**Reduced basis approximation and a posteriori error estimation for parametrized parabolicPDEs: application to real-timeBayesian parameter estimation.**

N.C. Nguyen, G. Rozza, D.B.P. Huynh, and A.T. Patera. In*Large-scale inverse problems and quantification of uncertainty*, Wiley Ser. Comput. Stat., pages 151–177. Wiley, 2011.**Cross-Gramian Based Combined State and Parameter Reduction for Large-Scale Control Systems.**

C. Himpe and M. Ohlberger. arxiv (math.oc) 1302.0634, preprint (submitted), Institute for Computational and Applied Mathematics, 2013.**Parameter and state model reduction for large-scale statistical inverse problems.**

C. Lieberman, K. Willcox, and O. Ghattas.*SIAM J. Sci. Comput.*, 32(5):2523–2542, 2010.**An overview of model reduction methods and a new result.**

A.C. Antoulas. In*Proceedings of the 48th IEEE Conference on Decision and Control*, pages 5357–5361. IEEE, 2009.**Goal-oriented, model-constrained optimization for reduction of large-scale systems.**

T. Bui-Thanh, K. Willcox, O. Ghattas, and B. van Bloemen Waanders.*J. Comput. Phys.*, 224(2):880–896, 2007.**Model reduction for large-scale systems with high-dimensional parametric input space.**

T. Bui-Thanh, K. Willcox, and O. Ghattas.*SIAM J. Sci. Comput.*, 30(6):3270–3288, 2008.**Hessian-based model reduction for large-scale systems with initial-condition inputs.**

O. Bashir, K. Willcox, O. Ghattas, B. van Bloemen Waanders, and J. Hill.*Int. J. Numer. Meth. Engng.*, 73(6):844–868, 2008.**Goal-Oriented Inference: Approach, Linear Theory, and Application to Advection Diffusion.**

C. Lieberman and K. Willcox.*SIAM J. Sci. Comput.*, 34(4):A1880–A1904, 2012.**Trust-region proper orthogonal decomposition for flow control.**

E. Arian, M. Fahl, and E.W. Sachs. Technical report, DTIC Document, 2000.**Hessian-Based Model Reduction Approach to Solving Large-Scale Source Inversion Problems.**

C. Lieberman and B. Van Bloemen Waanders. In*CSRI Summer Proceedings 2007*, pages 37–48, 2007.**Reduced basis a posteriori error bounds for parametrized linear-quadratic elliptic optimal control problems.**

Martin A. Grepl and Mark Kärcher.*Comptes Rendus Mathematique*, 349(15-16):873–877, 2011.**Empirical model reduction of controlled nonlinear systems.**

S. Lall, J.E. Marsden, and S. Glavaski.*Proceedings of the IFAC World Congress*, F:473–478, 1999.**A subspace approach to balanced truncation for model reduction of nonlinear control systems.**

S. Lall, J.E. Marsden, and S. Glavaski.*INT. J. ROBUST. NONLIN.*, 12(6):519–535, 2002.**A trust region method for parabolic boundary control problems.**

C.T. Kelley and E.W. Sachs.*SIAM J. OPTIMIZ.*, 9(4):1064–1081, 1999.*Trust Region Methods*.

A.R. Conn, N.I.M. Gould, and P.L. Toint. MPS-SIAM Series on Optimization. SIAM, 2000.**Detecting the Stable, Observable and Controllable States of the Human Brain Dynamics.**

E. Kamrani, A. Foroushani, M. Vaziripour, and M. Sawan.*OJMI*, 2(4):128–136, 2012.**Extreme-scale UQ for Bayesian inverse problems governed by PDEs.**

T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox.*Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis*, page 3, 2012.**Non-linear model reduction for uncertainty quantification in large-scale inverse problems.**

D. Galbally, K. Fidkowski, K. Willcox, and O. Ghattas.*Int. J. Numer. Meth. Engng.*, 81(12):1581–1608, 2010.