Computation of the Response Surface in the Tensor Train data format
Abstract
We apply the Tensor Train (TT) approximation to construct the Polynomial Chaos Expansion (PCE) of a random field, and solve the stochastic elliptic diffusion PDE with the stochastic Galerkin discretization. We compare two strategies of the polynomial chaos expansion: sparse and full polynomial (multiindex) sets. In the full set, the polynomial orders are chosen independently in each variable, which provides higher flexibility and accuracy. However, the total amount of degrees of freedom grows exponentially with the number of stochastic coordinates. To cope with this curse of dimensionality, the data is kept compressed in the TT decomposition, a recurrent lowrank factorization. PCE computations on sparse grids sets are extensively studied, but the TT representation for PCE is a novel approach that is investigated in this paper. We outline how to deduce the PCE from the covariance matrix, assemble the Galerkin operator, and evaluate some postprocessing (mean, variance, Sobol indices), staying within the lowrank framework. The most demanding are two stages. First, we interpolate PCE coefficients in the TT format using a few number of samples, which is performed via the block cross approximation method. Second, we solve the discretized equation (large linear system) via the alternating minimal energy algorithm. In the numerical experiments we demonstrate that the full expansion set encapsulated in the TT format is indeed preferable in cases when high accuracy and high polynomial orders are required.
1 Motivation
During the last years, lowrank tensor techniques were successfully applied to the solution of highdimensional stochastic and parametric PDEs doostannonintrusive2013 (); khospde2010 (); khosqtt2010 (); KhSchGalerkinSPDE2011 (); tobkressparam2011 (); toblerhtdmrg2011 (); matthieszanderlowrank2012 (); Nouy10 (); Nouy07 (); litvinenkospde2013 (); wahnertstochgalerkin2014 (); Grasedyck_Kressner (); kunoth2013 (); LitvSampling13 (). With standard techniques it is almost impossible to store all entries of the discretised highdimensional operator explicitly. Besides of the storage one should solve this highdimensional problem and obtain solution in a reasonable time. Very often some additional efficient postprocessing is required, e.g. visualization of large data sets, computation of the mean, variance, Sobol indices or exceedance probabilities of the solution or of a quantity of interest (QoI). We will perform this postprocessing in the lowrank tensor format.
Situations where one is concerned with uncertainty quantification often come in the following guise: we are investigating some physical system which is modeled as follows:
(1) 
where describes the state of the system lying in a Hilbert space , is an operator, modeling the physics of the system, and is some external influence (action / excitation / loading). In particular, could be some parameterdependent differential operator, for example,
(2) 
where is the spatial dimension, is a random field dependent on a random parameter in some probability space , and . In this article we approximate/represent the input coefficient in the lowrank tensor format, compute the solution and perform all postprocessing in the same lowrank data format. The model (1) depends on some parameter ; in the context of uncertainty quantification the actual value of is undefined UQLitvinenko12 (); Rosic2013 (). Often this uncertainty is modeled by giving the set a probability measure. Evaluation and quantification of the uncertainty often requires functionals of the state , and the functional dependence of on becomes important. Similar situations arise in design, where may be a design parameter still to be chosen, and one may seek a design such that a functional is e.g. maximized.
The situation just sketched involves a number of objects which are functions of the parameter values. While evaluating or for a certain may be straightforward, one may easily envisage situations where evaluating or may be very costly as it may rely on some very time consuming simulation or computation, like, for example, running a climate model.
As it will be shown in the next section, any such parametric object like , , or may be seen as an element of a tensor product space litvinenkospde2013 (); wahnertstochgalerkin2014 (). This can be used to find very sparse approximations to those objects, and hence much cheaper ways to evaluate parametric quantities in the uncertainty quantification, like means, covariances, exceedance probabilities, etc. For this purpose, the dependence of and on has to be propagated to the solution or state vector , see e.g. matthiesgalerkin2005 ().
The main theoretical result of this work is approximation of the response surface of a stochastic model in the tensor train (TT) data format. In the next section we outline the general Galerkin, PCE and KLE discretization schemes for a random field. The introduction to the TT methods, and the new block cross interpolation algorithm are presented in Section 3. A bit more details how to apply the block cross algorithm to the response surface based on a multivariate polynomial (polynomial chaos expansion, PCE) approximation see in Section 4.1. We start with the TT approximation of the multidimensional input coefficient . After that, in Section 4.2 we construct the stochastic Galerkin matrix in the TT format. Section 4.3 is devoted to the efficient postprocessing (computation of level sets, Sobol indices (Section 4.4), the mean value and covariance) in the TT format. Numerical results in Section 5 demonstrate the performance of the TT format, applied to the approximate solution of the elliptic boundary problem with uncertain coefficients. The notation list is given in Table 1.
General quantities  

Natural numbers  
The right hand side  
The solution  
Computational domain  
Space where parameter is living  
vector space spanned on the basis  
,  Identical operator, identity matrix of size 
Stochastic quantities and expansions  
KL  KarhunenLoève (Expansion) 
gPCE  generalized Polynomial Chaos Expansion 
Covariance function  
var  Variance 
Gaussian random field  
NonGaussian random field,  
Transformation,  
PCE coefficient at the index  
The mean value  
Multivariate Hermite polynomial with multiindex  
univariate polynomial of variable  
Tensor product of the Hermite polynomial algebra matrices  
A single matrix defined for single indices  
Galerkin matrices with coefficient ,  
Indices, ranges and sets  
Multiindex  
, ,  stochastic multiindices 
,  Infinite and finitedimensional multiindex sets 
Sparse multiindex set  
Number of the KLE terms after truncation  
Stochastic dimension  
Physical (spatial) dimension  
polynomial order or vector of them,  
Multidimensional and/or tensor product quantities  
TT  Tensor train data format 
Tensor (Kronecker) product  
strong Kronecker product  
dimensional function  
left quasimaxvol indices,  
right quasimaxvol indices, 
2 Discretisation and computation
For brevity we follow Matthies_encycl (), where more references may be found, see also the recent monograph LeMatre10 (). Usually (1) is some partial differential equation and has to be discretised, approximated, or somehow projected onto a finite dimensional subspace
(3) 
For example, a set of piecewise linear finite elements may be considered.
To propagate the parametric dependence, choose a finite dimensional subspace of the Hilbert space, say for the solution in (1). Via the Galerkin projection or collocation, or other such techniques, the parametric model is thereby formulated on the tensor product , denoted as
(4) 
For example, the linear elliptic equation (2) is cast to the linear system . There are multiple possibilities for the choice of , and hence finite dimensional subspaces . The solution of (4) is often computationally challenging, as may be very large. One way to handle highdimensional problems is to apply a lowrank approximation, by representing the entities in (4), e.g. , , and in a lowrank data format. Several numerical techniques Nouy2009 (); Doostan2009 (); Krosche_2010 (); matthieszanderlowrank2012 () have been developed recently to obtain an approximation to the solution of (4) in the form . It is important that such techniques operate only with the elements of the datasparse lowrank representation, and hence solve the highdimensional problem efficiently KhorSurvey (); Schwab_sparse_tensors11 (); ostlatensor2009 (); ottt2009 (); tobkressparam2011 (); litvinenkospde2013 (); Sudret_sparsePCE ().
Once this has been computed, any other functional like may be found with relative ease. As soon as is equipped with a probability measure, the functionals usually take the form of expectations, a variance, an exceedance probability, or other statistic quantities needed in the uncertainty quantification.
2.1 Discretization of the input random field
We assume that may be seen as a smooth transformation of the Gaussian random field . In this Section we explain how to compute KLE of if KLE of is given. For more details see (ZanderDiss, , Section 3.4.2) or KeeseDiss ().
Typical example is the lognormal field with . The Gaussian distribution of stipulates to decompose into the Hermite polynomial series,
(5) 
where is the th Hermite polynomial, and is the number of terms after truncation.
The Gaussian field may be written as the KarhunenLoeve expansion (KLE). First, given the covariance matrix of , we may relate it with the covariance matrix of as follows,
(6) 
Solving this implicit order equation, we derive . Now, the KLE may be computed,
(7) 
and we assume that the eigenfunctions absorb the square roots of the KL eigenvalues, such that the stochastic variables are normalized (they are uncorrelated and jointly Gaussian).
In the discrete setting, we truncate PCE and write it for random variables,
(8) 
is the multivariate Hermite polynomial, is the multiindex of the PCE coefficients, is the univariate Hermite polynomial, and is a set of all multinomial orders (or a multiindex set). The Galerkin coefficients are evaluated as follows,
(9) 
where is the Galerkin coefficient of the transform function in (5), and means just the th power of the KLE function value . The expansion (8) still contains infinite amount of terms. In practice, we restrict the polynomial orders to finite limits, which can be done in different ways. In the current paper, we investigate the following two possibilities.
Definition 1
The full multiindex is defined by restricting each component independently,
is a shortcut for the tuple of order limits.
The full set provides high flexibility in resolution of stochastic variables wahnertstochgalerkin2014 (); litvinenkospde2013 (). However, its cardinality is equal to , if . This may become a very large number even if and are moderate (, is typical). In this paper, we do not store all values explicitly, but instead approximate them via the tensor product representation. See more about multiindices in Sec. 3.2.1 in ZanderDiss ().
A traditional way to get rid of the curse of dimensionality is the sparse expansion set.
Definition 2
The sparse multiindex is defined by restricting the sum of components,
The sparse set contains values if , which is definitely much less than . However, the negative side is that for a fixed , some variables are worse resolved than the others, and the approximation accuracy may suffer. It may be harmful to increase either, since in the sparse set it contributes exponentially to the complexity.
The tensor product storage of the full coefficient set scales as , where is the specific measure of the “data structure”. Typically, it grows with and , fortunately in a mild way (say, linearly). In cases when is moderate, but is relatively large, this approach may become preferable. Other important features are the adaptivity to the particular data, and easy assembly of the stiffness matrix of the PDE problem.
Some complexity reduction in Formula (9) can be achieved with the help of the KLE for the initial field . Consider the expansion
(10) 
where are eigenfunctions of the integral operator with the covariance as the kernel. The set can be used as a (reduced) basis in . It is difficult to work with (10) straightforwardly, since the distribution of is generally unknown. But we know that the set , where is the number of KLE terms after the truncation, serves as an optimal reduced basis. Therefore, instead of using (9) directly, we project it onto :
(11) 
Note that the range may be much smaller than the number of discretization points . After that, we restore the approximate coefficients,
(12) 
2.2 Construction of the stochastic Galerkin operator
The same PCE ansatz of the coefficient (8) may be adopted to discretize the solution , the test functions from the stochastic Galerkin method, and ultimately the whole initial problem (1), see wahnertstochgalerkin2014 (); litvinenkospde2013 (). To be concrete, we stick to the elliptic equation (2) with the deterministic righthand side .
Given the KLE components (10) and the spatial discretization basis (3), we first assemble the spatial Galerkin matrices,
(13) 
for , . Now we take into account the PCE part . Assuming that is decomposed in the same way as (8) with the same or , we integrate over stochastic coordinates and obtain the following rule (see also KeeseDiss (); ZanderDiss (); Matthies_encycl ()),
(14) 
where
(15) 
is the triple product of the Hermite polynomials, and is according to (11). For convenience, since is a supersymmetric tensor, we will denote as a matrix slice at the fixed index (or the whole multiindex in ). Putting together (12), (13) and (14), we obtain the whole discrete stochastic Galerkin operator,
(16) 
which is a square matrix of size in case of full . Fortunately, if is computed in the tensor product format, the direct product in (15) allows to exploit the same format for (16), and build the operator easily.
Since the only stochastic input is the permeability , the righthand side is extended to the size of easily,
(17) 
and is the first identity vector of size , , which assigns the deterministic to the zerothorder Hermite polynomial in parametric domain.
3 Tensor product formats and lowrank data compression
3.1 Tensor Train decomposition
We saw that the cardinality of the full polynomial set may rise to prohibitively large values , and the traditional way to get rid of this curse of dimensionality is the sparsification . We propose to use the full set, but approximate the data indirectly in the lowrank tensor product format, which may be more flexible than the prescribed sparsification rule.
To show the techniques in the most brief way, we choose the socalled matrix product states (MPS) formalism fannesmps1992 (), which introduces the following representation of a multivariate tensor:
(18) 
In numerical linear algebra this format became known under the name tensor train (TT) representation since ottt2009 (); oseltt2011 (). Each TT core (or block) is defined by numbers, where denotes the number of basis functions (i.e. the polynomial order) in the variable (the mode size), and is the TT rank. The total number of entries scales as , which is tractable as long as is moderate.
Each line of the definition (18) is convenient for its own purposes. The first one is a formal statement that a tensor is presented in the TT format, which may be used if we change one block, for example. The second line is the most expanded elementwise definition, showing the polylinearity of the format. The third line is the seminal Matrix Product form: after fixing , each block has only two indices , and may be considered as a matrix. Thus, the TT format means that each element of is presented as a product of matrices. Finally, the fourth line is an analog of the Canonical Polyadic decomposition, convenient to compare the structures in both formats. Surely, in the same form we may write e.g. .
3.2 Analytical construction of the TT format
Highdimensional operators (cf. in (16)) may be naturally presented in the TT format, by putting two indices in each TT block, .
Example 1
Consider the dimensional Laplacian discretized over an uniform tensor grid ( dofs in each direction). It has the Kronecker (canonical) rank representation:
(19) 
with , and being the identity. However, the same operator in the TT format is explicitly representable with all TT ranks equal to 2 for any dimension khkazlap2012 (),
(20) 
where the strong Kronecker product operation ”” is defined as a regular matrix product for the first level of TT cores, and the inner blocks (e.g. , ) are multiplied by means of the Kronecker product. In the elementwise matrix product notation, the Laplace operator reads
3.3 Approximate construction of the TT format
The principal favor of the TT format comparing to e.g. the Canonical Polyadic (CP) decomposition (which is also popular in statistics, see litvinenkospde2013 (); wahnertstochgalerkin2014 ()) is a stable quasioptimal rank reduction procedure oseltt2011 (), based on singular value decompositions. The complexity scales as , i.e. is free from the curse of dimensionality, while the full accuracy control takes place.
But before we outline this procedure, we need to introduce some notation. In the TT definition (18), we see that TT blocks depend on 3 indices, and in this sense are 3dimensional tensors. However, when we write computational tensor algorithms, it is convenient (and efficient in practice!) to cast tensor contractions to (sequences of) matrix products. One way to deduce a matrix from a 3dimensional block is to take a slice, fixing one index, which was used in the Matrix Product form of (18). Another approach is the conception of reshaping, i.e. when two of three indices are considered as the new multiindex, such that all block elements become written in a matrix. In the programming practice, this operation may be explicit, like the reshape function in MATLAB, or implicit, like in a Fortran+LAPACK program, where all data is passed simply in a contiguous array, and the sizes are specified as separate inputs.
In mathematics, this issue may be formalized via the following special symbols.
Definition 3
Given a TT block , introduce the following reshapes:

leftfolded block: and

rightfolded block:
both pointing to the same data stored in .
The latter remark about multiple pointers to the same data is a wellknown conception in programming, and will help us to present algorithms in a more elegant way, by assigning e.g. , without a need to state explicitly afterwards.
Definition 4
A TT block is said to be left resp. rightorthogonal, if
respectively.
Note that for the first and the last blocks, the left and right orthogonalities mean simply the orthogonality w.r.t. the tensor indices and , in the same way as in the dyadic decomposition of a matrix. Since the TT representation is not unique,
for any nonsingular of proper sizes, the orthogonalities may be ensured without changing the whole tensor. Indeed, neighboring TT blocks constitute a dyadic decomposition, which may be orthogonalized either as , or , where or , respectively. Repeating this procedure for all blocks yields a TT representation with the corresponding orthogonality. Algorithm 1 explains the left orthogonalization, the right one is analogous.
Despite the moderate complexity , these algorithms provide the orthogonality for large tensor train chunks, e.g. the TT interfaces.
Definition 5
The left, resp. right TT interfaces are defined as follows:
Lemma 1
For a TT tensor with blocks leftorthogonal, it holds
Proof
The product
where is the complex conjugation, may be computed by the direct summation over indices . In the first step, we evaluate , according to Def. 4. Let us be given , then in the th step we compute
that is, the induction may be continued, and finally we end up with .
Now, it is clear how to build the TT rounding procedure: in the first step, we run Alg. 1, making the TT format leftorthogonal, and so will be . Then we perform a smallsized SVD and cast to the block . Setting ensures its right orthogonality, while the left orthogonality of also takes place. So we may perform the SVD of and so on. The whole procedure is summarized in Algorithm 2. In practice, we do not typically use different thresholds for each step, but instead fix the global level , and put everywhere.
3.4 Sum and product in the TT format
Equipped with the robust reduction technique, we may think of the tensor train arithmetics, because algebraic operations are inherited naturally from lowrank or CP decompositions. For example, a sum for and given in the form (18) casts to the TT blocks of as follows,
(21) 
and . Matrix, pointwise and scalar products follow similarly, see below and oseltt2011 () for more details. The main feature of all such operations in the format is that the (TT) ranks may grow; fortunately, in many applications the rounding procedure allows to reduce them to a moderate level.
The scalar (dot, inner) product of two vectors is equal to a product of all TT elements of both vectors, followed by a summation over all rank and initial indices. However, this implementation would require complexity. Using the block foldings (Def. 3) and auxiliary quantities, the scalar product may be computed with cost, as shown in Algorithm 3.
Note that Algorithm 3 remains consistent if the left TT ranks are not equal to ones. In this case, we simply return a matrix , containing the elements . It is especially convenient when we work with response surfaces for sPDEs, where may stand for the spatial grid size . An example is the computation of the variance in Section 4.3.
3.5 Block TTCross interpolation algorithm
Calculation of the PCE formula (11) could be a difficult task in tensor formats, since the tensor indices enter the factorial functions, or even enumerate the elements of another vector, in our case. To compute the response surface in the TT format by this way, we need to be aimed with the following technique:

given a procedure to compute each element of a tensor, e.g. (11).

build a TT approximation using a feasible amount of elements (i.e. much less than ).
Such a tool exists, and relies on the cross interpolation of matrices, generalized to the higherdimensional case otttcross2010 (); sodmrgi2011proc (); savqott2014 (). The principal ingredient is based on the efficiency of an incomplete Gaussian elimination in approximation of a lowrank matrix, also known as the Adaptive Cross Approximation (ACA) bebe2000 (); bebeaca2011 (). Given a matrix , we select some few amount of “good” columns and rows to define the whole matrix,
(22) 
where , and , are sets of indices of cardinality . It is known that there exists a quasioptimal set of interpolating indices .
Lemma 2 (Maximum volume (maxvol) principle gtmaxvol2001 ())
If and are such that is maximal among all submatrices of , then
where is the Chebyshev norm, .
In practice, however, the computation of the true maxvol submatrix is infeasible, since it is a NPhard problem. Instead, one performs a heuristic iteration in an alternating fashion gostzmaxvol2010 (): we start with some (e.g. random) lowrank factor , determine indices yielding a quasimaxvol submatrix in , and compute as columns of of the indices . Vice versa, in the next step we find quasimaxvol column indices in and calculate corresponding elements, collecting them into the newer , which hopefully approximates the true lowrank factor better than the initial guess. This process continues until the convergence, which appears to be quite satisfactory in practice.
In higher dimensions we proceed similarly, thanks to the polylinear structure of the TT format, which constitutes a recurrent matrix lowrank factorization. Indeed, if we encapsulate together the first and last indices into two multiindices, the TT format (18) may be seen as the dyadic factorization with the interfaces (Def. 5) playing roles of factors:
Starting from , we iterate the alternating cross interpolation, computing the maxvol indices in each TT block subsequently, and an update of the th block is defined by elements of the initial tensor. Similarly to (22), we compute the left quasimaxvol indices from the tall matrix , and the right quasimaxvol indices from the wide matrix . Then the TT analog of (22) writes
(23) 
where , and the middle term is a sample of the sought tensor, . This sampling runs through the whole range of and only those of the rest indices that belong to and , i.e. and . After that, the new is used in the maxvol algorithm for derivation of as follows. Let us be given a set of index vectors of length each. We concatenate it with all possible values of , and obtain the set of vectors of length each. The range is exactly the row size of , and hence the maxvol will return values in this range. Let us call them : the new left set is taken as a set of tuples from with the labels (among all variants). In the same way, the right indices may be built in a recurrent fashion.
In total, we need only entries to be evaluated, where is the number of alternating iterations (see Alg. 4), typically of the order of . In each step, given values , values of and values , it is affordable to call e.g. Formula (11) times.
However, note that each call of (11) throws values, corresponding to different . We may account for this in various ways. Since has the meaning of the reduced spatial variable, we may feature it as an additional dimension. Nonetheless, when we will restrict the indices , we will remove some values of from consideration. Therefore, a vast majority of information goes in vain: we evaluate values, but only a few of them will be used to improve the approximation. Another way is to run independent cross algorithms to approximate each into its own TT format. This is also not very desirable: we will have to add TT formats together, summing their ranks, which is to be followed by the TT approximation procedure 2. The asymptotic complexity thus scales as , which was found too expensive in practice.
A better approach is to store all in the same TT representation, employing the idea of the block TT format dkoseigb2014 (). The resulting method has a threefold advantage: all data in each call of (11) is assimilated, the algorithm adjusts the TT ranks automatically according to the given accuracy, and the output is returned as a single optimallycompressed TT format, convenient for further processing.