Function-train: a continuous analogue of the tensor-train decomposition

# Function-train: a continuous analogue of the tensor-train decomposition

Alex Gorodetsky111Massachusetts Institute of Technology, Cambridge, MA 02139, USA; {goroda,sertac,ymarz}@mit.edu.    Sertac Karaman111Massachusetts Institute of Technology, Cambridge, MA 02139, USA; {goroda,sertac,ymarz}@mit.edu.    Youssef Marzouk111Massachusetts Institute of Technology, Cambridge, MA 02139, USA; {goroda,sertac,ymarz}@mit.edu.
###### Abstract

We describe a new function approximation framework based on a continuous extension of the tensor-train decomposition. The approximation, termed a function-train (FT), results in a tensor-train structure whose cores are univariate functions. An advantage of the FT over discrete approaches is that it produces an adaptive approximation of tensor fibers that is not tied to any tensorized discretization procedure; indeed, the algorithm can be coupled with any univariate linear or nonlinear approximation procedure. Furthermore, the representation of low-rank functions in FT format enables efficient continuous computation: we can add, multiply, integrate, and differentiate functions in polynomial time with respect to dimension. Our approach is in the spirit of other continuous computation packages such as Chebfun, and yields an algorithm which requires the computation of “continuous” matrix factorizations such as the LU and QR decompositions of vector-valued functions. Our contributions include creating an algorithm for finding the maximum volume submatrix of a matrix-valued function, a maximum-volume based cross approximation algorithm to obtain skeleton decompositions of vector-valued functions, a cross approximation algorithm for converting black-box functions into FT format, and a continuous rounding algorithm that re-approximates an FT by one of lower ranks. We demonstrate the benefits of our approach by integrating high-dimensional and discontinuous functions, and apply it to a variety of approximation tasks.

Key words. tensor decomposition, tensor-train, Chebfun, continuous matrix factorizations

## 1 Introduction

Computing with functions is an emerging alternative to the discretize-then-solve [29] methodology traditionally used in computational science. This new paradigm has been enabled by the development of continuous extensions to the common and widely used linear algebra techniques that underpin virtually all computer simulations. For example, [3, 33, 41, 42, 14] introduce an extension of MATLAB, called Chebfun, for computing with functions of one, two [41], and three [24] variables. Chebfun implements continuous versions of the LU, QR, and singular value decompositions; enables approximation, optimization, and rootfinding; and provides methods for solving differential equations without discretization.

In this work, we develop methods that efficiently extend continuous computation to high dimensions. Our approach represents functions of many variables in a compressed form, where the degree of compression is tied to a notion of the function’s rank. In particular, we draw on recent developments in tensor decompositions, mainly the tensor-train (TT) decomposition [32, 30], to create an analogous continuous decomposition called the function-train (FT). The most important difference between the FT and the TT is that FT cores are matrix-valued functions, rather than third-order arrays. The matrix-valued functions represent sets of univariate functions, and we follow Chebfun’s example by representing each univariate function in some basis, e.g., orthonormal or piecewise polynomials. This continuous representation, along with our proposed compression framework, provides many practical advantages over computing with the multi-way arrays arising from high-dimensional discretization procedures. These advantages include:

• Exploration of the continuous input space of a high-dimensional function, rather than an a priori discretized set of inputs;

• On-the-fly adaptation to local features of a function, rather than iterative a posteriori grid refinement;

• Continuous multilinear algebra techniques that enable fast addition, multiplication, contraction, integration, differentiation, and other operations with functions in FT format.

Broadly, these advantages arise from the integration of two areas: tensor decompositions and continuous computation. Computing with low-rank tensors has been shown to mitigate the curse of dimensionality in many applications [23, 9, 32, 6], whenever low-rank structure is present. Yet many problems are not naturally posed in the discrete setting required by tensors. For example, representing a function with localized features, e.g., a probability density function, may require specialized tensor-product discretizations around areas of large density. Specifying such a discretization before exploring the function, however, is extremely difficult; a method for exploring and adapting to local structure while building the approximation is desired. Computing with functions effectively renders the discretization step transparent to the user, while simultaneously enabling a variety of adaptive and non-tensorized discretization procedures to be applied. One concrete example of this benefit, in the context of cross approximation of tensors, is the use of continuous optimization procedures to identify fibers to be evaluated. This approach naturally causes function evaluations to be clustered around local features; examples of such clustering are shown in Section LABEL:sec:implementation. Another advantage of a continuous decomposition arises when a function must be evaluated outside of its discretized locations. In [21], for example, we represent the value function of an stochastic control problem in discrete tensor-train format. However, to evaluate the control law for states outside of the discretization, specialized interpolation procedures are needed. If the value function were instead represented as a function, the steps required for its evaluation anywhere in the state space would automatically be defined.

This work thus builds a bridge between continuous computation and low-rank decompositions, helping realize the advantages of the former in truly high-dimensional settings. A central contribution of this paper, which enables the constructions described above, is the extension of discrete cross approximation algorithms to the continuous case. In Section LABEL:sec:2dcur, we introduce a CUR/skeleton decomposition for vector-valued functions, and develop a cross approximation algorithm to obtain such skeleton decompositions. To this end, we develop a maxvol algorithm for finding optimal row and column pivots through continuous optimization techniques, described in Sections LABEL:sec:maxvol and LABEL:sec:cross. Next, in Section LABEL:sec:ft, we develop existence results for the FT decomposition based on “unfolding” ranks of a function (essentially continuous analogues of standard tensor-train existence results), and we extend the cross approximation algorithm to functions of many variables in order to represent and approximate such functions in FT format. Finally, in Section LABEL:sec:mla, we develop a calculus of operations for functions in FT format that can be used to perform compressed continuous computation in the Chebfun style. Sections LABEL:sec:implementation and LABEL:sec:res provide numerical demonstrations of these ideas. The resulting algorithms are well suited to computation in high dimensions: their costs scale at most polynomially with the dimension of the input space and the function’s rank. The concept map in Figure LABEL:fig:conceptmap illustrates how various areas of continuous linear algebra relate to the function-train decomposition, and will help serve as a roadmap for this paper.

## 2 Preliminaries

### 2.1 Notation

Let denote the set of positive integers and the set of reals. Let for be a tensor product of one-dimensional intervals with for . Let denote the Lebesgue measure over the interval and the Lebesgue measure over . We will write scalar-valued functions as , vector-valued functions as , and matrix-valued functions as for . Integrals will always be with respect to the Lebesgue measure. For example, should be understood as , and similarly .

We will reference specific outputs of or using square brackets and the indices and . In other words, components of vector-valued and matrix-valued functions will be written as and , respectively, such that and are then scalar-valued functions. We will reserve subscripts for indexing different instances of the same type of function, such that and are distinct vector-valued functions, and are distinct matrix-valued functions, etc.

Let and , such that . We can then consider the separated representations of scalar-valued functions, vector-valued functions, and matrix-valued functions as, respectively:

 fk:X≤k×X>k→R, where fk({x1,…,xk},{xk+1,…,xd})=f(x1,…,xd), \hb@xt@.01(2.1) Fk:X≤k×X>k→Rn, where Fk({x1,…,xk},{xk+1,…,xd})=F(x1,…,xd), Fk:X≤k×X>k→Rn×m, where Fk({x1,…,xk},{xk+1,…,xd})=F(x1,…,xd),

for and . Also, the shorthand will denote and will denote . We will frequently also combine the separation notation and vector-valued function referencing to obtain objects such as

 Fk[i](⋅,x>k) :X≤k→R,

which is the th component of a vector-valued function defined on the tensor product space of the first inputs, with the last dimensions fixed.

Finally, the norm denotes the maximum-modulus value of the vector-valued function . Similarly, denotes the maximum-modulus element of the matrix .

### 2.2 Continuous linear algebra

Approximating a black-box tensor in tensor-train format requires performing a sequence of standard matrix factorizations, e.g., LU, QR, and singular value decompositions. Our continuous framework requires continuous equivalents of these factorizations. In particular, the factorizations required to obtain the FT must be performed for three elements of continuous linear algebra: scalar-valued functions, vector-valued functions, and matrix-valued functions. In this section, we will describe how these three elements relate to one another, and define their products and factorizations.

The LU and QR factorizations and the singular value decomposition (SVD) of a vector-valued function are primary ingredients of any continuous numerical linear algebra package such as Chebfun [33] or ApproxFun [28]. For vector-valued functions of one variable, these decompositions are discussed in [3]. Computing these decompositions often requires continuous analogues of standard discrete algorithms. For example, Householder triangularization may be used to compute the QR decomposition of a vector-valued function [43]. Extensions of these algorithms to functions of two variables are described in [42]. We will use these bivariate extensions as building blocks for our higher dimensional constructions, and thus we briefly review the relevant literature below.

#### 2.2.1 Elements of continuous linear algebra

A vector-valued function can be viewed as an array of scalar-valued functions. Suppose that are a collection of real scalar-valued functions on . Then a vector-valued function consists of functions such that

 F[i](x)=fi(x).

The structure of has multiple interpretations which we find useful in different contexts.

• ” vector (used in Sections LABEL:sec:lu for the LU decomposition and LABEL:sec:qr for the QR decomposition):

• Each row of the vector is indexed by .

• The functions are “stacked” on top of one another, e.g.,

 F(x)=⎡⎢ ⎢⎣f1(x)⋮fn(x)⎤⎥ ⎥⎦.
• ” quasimatrix111Called a quasimatrix because it corresponds to a matrix of infinite rows and columns. See e.g., [3, 42] (used in Sections LABEL:sec:luLABEL:sec:qr, and LABEL:sec:maxvolskinny for the maxvol algorithm):

• Each row of the quasimatrix is indexed by .

• Each column of the quasimatrix is indexed by .

• The functions are placed “next to” one another, e.g.,

 F(x)=[f1(x) … fn(x)].
• ” matrix for some separated form with (used in Section LABEL:sec:cskel to study the maxvol property):

• Rows are indexed by tuples

• Columns are indexed by

A collection of vector-valued functions forms a matrix-valued function. In particular, suppose that we have vector-valued functions . Then a matrix-valued function with columns and rows can be written as

 F=[F1 F2⋯Fm],

where the th column , for . Also, can be viewed as a “matrix” with an infinite number of rows and columns. Each “row” of this matrix corresponds to a value of the index .

We now define inner products of vector-valued and matrix-valued functions, as they will be critical to defining the LU, QR, and SVD factorizations of matrix-valued functions. First, recall that the inner product between two functions and is

 ⟨f,g⟩=∫Xf(x)g(x)dx.

The inner product of two vector-valued functions and is defined as

 ⟨F,G⟩=n∑i=1⟨F[i],G[i]⟩. \hb@xt@.01(2.2)

We use a similar definition for the inner product of two matrix-valued functions and ,

 ⟨F,G⟩=m∑i=1n∑j=1⟨F[i,j],G[i,j]⟩, \hb@xt@.01(2.3)

where this expression can be interpreted as the inner product of two flattened vector-valued functions, and is analogous to the square of the Frobenius norm of a matrix. It is straightforward to verify that these inner product definitions satisfy the necessary symmetry, linearity, and positivity conditions.

In addition, we define products between arrays of functions (vector- and matrix-valued functions) and arrays of scalars (vectors and matrices), as shown in Table LABEL:tab:prod1.

We also define products between functional elements. Let and , where . Then products between vector-valued or matrix-valued functions on these domains yield vector-valued or matrix-valued functions on the space . Notation and a summary of these operations are provided in Table LABEL:tab:prod2.

We now turn to factorizations of vector- and matrix-valued functions.

#### 2.2.2 LU factorization

To extend the cross approximation and maxvol algorithms to the continuous setting, we will require the LU decomposition of a vector-valued function. The key components of this decomposition are a vector-valued function (here written as a quasimatrix, with scalar-valued functions ) that is “psychologically” lower triangular [42] and a set of pivots , . A psychologically lower triangular vector-valued function is defined such that column has zeros at all for . Furthermore, if , then is unit lower triangular. Finally, if for all , then is diagonally maximal. Using these definitions we recall the definition of an LU decomposition of a vector-valued function [3, 42]:

###### Definition 2.1 (LU factorization of a vector-valued function [3])

Let be a vector-valued function. An LU factorization of is a decomposition of the form , where is upper triangular and is unit lower triangular and diagonally maximal.

The LU factorization may be computed using Gaussian elimination with row pivoting according to the algorithm in [42].

We can extend this definition of an LU factorization to matrix-valued functions. In particular, the decomposition will result in a psychologically lower triangular matrix-valued function ; an upper triangular matrix ; and a set of pivots The definition of the pivots is motivated by viewing as collection of columns , where each column is a vector-valued function to be interpreted as an “” vector, as described in Section LABEL:sec:elements. Each pivot is then specified by a (row, value) tuple. A lower triangular matrix-valued function is defined such that has zeros at all ; that is, has no enforced zeros, has a zero in row at the value , has zeros at and , etc. Furthermore, if then is called unit lower triangular, and if for all and for all , then is diagonally maximal. Using these notions we define an LU decomposition of a matrix-valued function.

###### Definition 2.2 (LU factorization of a matrix-valued function)

Let be a matrix-valued function. An LU factorization of is a decomposition of the form where is upper triangular, and is unit lower triangular and diagonally maximal.

We also implement the LU decomposition of a matrix-valued function using Gaussian elimination with row-pivoting.

#### 2.2.3 QR factorization

Another decomposition that will be necessary for function-train rounding and for the cross approximation of multivariate functions is the QR factorization of a quasimatrix.

###### Definition 2.3 (QR factorization of a vector-valued function [3])

Let be a vector-valued function. A QR factorization of is a decomposition of the form , where the vector-valued function consists of orthonormal scalar-valued functions and is an upper triangular matrix.

This QR factorization can be computed in a stable manner using a continuous extension of Householder triangularization [43]. In this paper, we also require the QR decomposition of a matrix-valued function.

###### Definition 2.4 (QR factorization of a matrix-valued function)

Let be a matrix-valued function. A QR factorization of is a decomposition of the form , where the columns of are orthonormal vector-valued functions and is an upper triangular matrix.

Since we have defined the inner product of vector-valued functions in (LABEL:eq:qminner) and therefore are able to take inner products of the columns of , we can also compute this factorization in a stable manner using Householder triangularization. We can consider a notion of rank for both vector-valued and matrix-valued functions to be the number of nonzero elements of the diagonal of .

#### 2.2.4 Singular value decomposition

Many of our theoretical results will employ the functional SVD.

###### Definition 2.5 (Functional SVD)

Let and let be in . A singular value decomposition of is

 g(y,z)=∞∑j=1σjuj(y)vj(z), \hb@xt@.01(2.4)

where the left singular functions are orthonormal in , the right singular functions are orthonormal in , and are the singular values.

In practice, the summation in (LABEL:eq:svd) is truncated to some finite number of terms and we group the first left singular functions into the vector-valued function such that . Similarly, we group the right singular functions into a vector-valued function , with . If we also define , then we can write the functional SVD as . This notion of the SVD of a function has existed for a while [37, 38] and is also called the Schmidt decomposition [37]. In general, convergence can be assumed to be in . As described in [42], when is also Lipschitz continuous, then the series in (LABEL:eq:svd) converges absolutely and uniformly.

The functional SVD is useful for analyzing certain separated representations of multivariate functions . For example, in Section LABEL:sec:exact we show that the ranks of our function-train representation are bounded by the SVD ranks of the separated functions , where we put and in the above definition.

Next, we present a decomposition similar to the functional SVD, but for vector-valued rather than scalar-valued functions. We call the decomposition an extended SVD, because it shares some properties with the functional SVD, such as a separation rank, and because it decomposes the vector-valued function into a sum of products of orthonormal functions. This decomposition appears in the proofs of Theorems LABEL:th:exact and LABEL:th:approx, as well as in the skeleton decomposition of a multivariate function described in Section LABEL:sec:crosstensor.

###### Definition 2.6 (Extended SVD)

Let and let be a vector-valued function such that for . A rank extended SVD of is a factorization

 G[i](y,z)=r∑j=1σjUj[i](y)vj(z), \hb@xt@.01(2.5)

where the left singular functions are vector-valued and orthonormal,222Orthonormality here is defined with respect to the inner product in (LABEL:eq:qminner). the right singular functions are scalar-valued and orthonormal, and are the singular values.

We can combine the left singular functions to form the matrix-valued function where . And as in the functional SVD, we can group the right singular functions to form the vector-valued function such that . If we also gather the singular values in a diagonal matrix , then the extended SVD can be written as

The main interpretation of this decomposition is that contains functions that have the same right singular vectors and different left singular vectors. We will exploit two properties of this decomposition for the proof of Theorem LABEL:th:curexist. The first is the fact that for any fixed , the vector-valued function can be represented as a linear combination of the columns of . Second, for any fixed and column , the function can be represented as a linear combination of the columns of .

Again, in the case of functions of more than two variables, the extended SVD can be applied to the function’s -separated representation. In particular, if and we have a vector-valued function , then we can consider the extended SVD of the separated form , where we put , , and in the definition above.

## 3 Continuous skeleton/CUR decomposition

In Section LABEL:sec:exact we show that the FT representation of a function can be computed by performing a sequence of SVDs of various separated versions of . Such an algorithm would require a prohibitive number of function evaluations, however. In this section, we develop an alternative low-rank decomposition that requires evaluations of the function only at relatively few points, lying along certain “fibers” of the input domain. This decomposition is a continuous version of the skeleton/CUR decomposition of a matrix [18, 26, 7], and is critical to the practical black-box approximation algorithm described in Section LABEL:sec:crosstensor. We develop this continuous CUR decomposition for the specific case of vector-valued functions. The section is split into three parts. First we establish conditions for the existence of a continuous CUR decomposition. Then we motivate a construction of the decomposition based on the maximum volume concept. Finally, we describe a cross approximation algorithm for computing the decomposition.

Before introducing the continuous CUR decomposition, we formally describe its components. In particular, we need to formulate the notion of a fiber of a vector-valued function, which is analogous to a row or column of a matrix. In particular, we consider the -separated form of a vector-valued function , and view it as an “” matrix (see Section LABEL:sec:elements) where the rows are indexed by and the columns are indexed by . According to this interpretation, row fibers are scalar-valued functions and column fibers are vector-valued functions.

###### Definition 3.1 (Row fiber)

A row fiber of a vector-valued function is the scalar-valued function obtained by fixing an (index, value) pair so that

 rα(x>k)=Fk[i](z,x>k).
###### Definition 3.2 (Set of row fibers)

A set of row fibers of a vector-valued function is the vector-valued function obtained by fixing a set of tuples , where , so that

 Rα=[rα1 rα2 ⋯ rαℓ],

where , .

We have corresponding definitions for column fibers and a set of column fibers.

###### Definition 3.3 (Column fiber)

A column fiber of a vector-valued function is the vector-valued function obtained by fixing an element such that

 Cy(x≤k)=Fk(x≤k,y).

We can group a set of column fibers together to obtain a matrix-valued function as follows.

###### Definition 3.4 (Set of column fibers)

A set of column fibers of a vector-valued function is the matrix-valued function obtained by fixing a set , where for , such that

 Cy=[Cy1 Cy2 ⋯ Cyℓ].

The intersection of a set of row fibers and a set of column fibers forms a submatrix.

###### Definition 3.5 (Submatrix of a vector-valued function)

A submatrix of a vector valued function is the matrix obtained by fixing a set of columns and a set of rows , as in Definitions LABEL:def:cfs and LABEL:def:rfs respectively, such that

 ¯Fk[a,b]=F[ia](za,yb) for 1≤a,b≤ℓ, \hb@xt@.01(3.1)

or equivalently,

 ¯Fk[a,b]=Cy[ia,b](za)=Rα[a](yb). \hb@xt@.01(3.2)

Next, we discuss the notion of a skeleton or CUR decomposition of vector-valued functions.

### 3.1 Existence of the skeleton decomposition of vector-valued functions

In this subsection, we define the skeleton decomposition of a vector-valued function and establish conditions for its existence, based on the SVD of the -separated form of the function. The existence of a skeleton decomposition for a scalar-valued function can then be obtained by choosing . Note that the skeleton decomposition of a function has already been used, for example, in [4, 5]. These analyses make smoothness assumptions on the function. Our results, in contrast, formulate the skeleton decomposition using only low-rank properties of the function, without introducing any explicit smoothness assumptions.

###### Definition 3.6 (Skeleton/CUR decomposition of a vector-valued function)

Let be a vector-valued function. The rank skeleton decomposition of separated form is a factorization of the type

 Fk[i](x≤k,x>k)=Cy[i,:](x≤k)GRα(x>k),∀i∈{1,…,n},x≤k∈X≤k, and x>k∈X>k, \hb@xt@.01(3.3)

where is a matrix-valued function representing a set of column-fibers (Definition LABEL:def:cfs), is an matrix, and is a vector-valued function representing a set of row-fibers (Definition LABEL:def:rfs).

###### Theorem 3.7

Let be a vector-valued function and let be its -separated form for any . If has a rank- extended SVD, then a rank skeleton decomposition (LABEL:eq:cur) of exists.

Proof. The proof is constructive and requires one to choose linearly independent333A set of linearly independent scalar-valued functions , is one for which the only solution to the equation is for An analogous definition holds for vector-valued functions. column- and row-fibers of . The intersection of these fibers will form the submatrix such that choosing , where refers to the Moore-Penrose pseudoinverse of matrix , will yield a correct construction. The proof strategy then involves partitioning the input space into four sets and showing pointwise equality on each partition.

Choose a set of row indices (see Definition LABEL:def:rfs) such that the vector-valued function contains linearly independent scalar-valued functions , . Next, choose a set of column indices (see Definition LABEL:def:cfs) such that the matrix-valued function contains linearly independent vector-valued functions , . Note that these choices also define a submatrix . Furthermore, let be the proposed skeleton decomposition

 ~F(x≤k,x>k)=Cy(x≤k)[¯Fk]†Rα(x>k). \hb@xt@.01(3.4)

We now seek to show that and are pointwise equal.

We show by decomposing the space into four partitions: , , , and , where and . We also rely on the following property of the Moore-Penrose pseudoinverse of a matrix :

 AA†A=A. \hb@xt@.01(3.5)

First, we define matrices such that

 C[a,b] =Cy[ia,b](za)=Cyb[ia](za),and \hb@xt@.01(3.6) R[a,b] =Rα[a](yb)=rαa(yb), \hb@xt@.01(3.7)

where for , and . One can verify that

 C=R=¯Fk \hb@xt@.01(3.8)

using (LABEL:eq:crequality).

Now we consider the first partition . Showing equality over this partition is equivalent to showing the equality of and the corresponding submatrix of . We proceed using (LABEL:eq:ftilde), (LABEL:eq:cmat), and (LABEL:eq:rmat) to represent the elements of as

 ¯~Fk[a,b] =~F[ia](za,yb)=Cy[ia,:](za)[¯Fk]†Rα(yb), =C[a,:][¯Fk]†R[:,b],

where for , and . According to (LABEL:eq:CRequiv) and (LABEL:eq:mp), we obtain

 ¯~Fk=C[¯Fk]†R=¯Fk[¯Fk]†¯Fk=¯Fk.

Thus, and are equivalent over this partition.

Next we consider the partition . By definition of linear independence and the SVD rank of a function, there exists a such that

 Fk[i](x≤k,x>k)=ℓ∑j=1qi,x≤k[j]rαj(y)∀x>k∈X>k. \hb@xt@.01(3.9)

In other words, every scalar-valued function can be written as a linear combination of the scalar-valued functions comprising . This property follows directly from the fact that contains linearly independent functions and that has an extended SVD of rank . In particular, for all we have

 Fk[i](x≤k,yb)=Cy[i,b](x≤k)=qi,x≤kR[:,b],∀(i,x≤k)∈{1,…,n}×X≤k \hb@xt@.01(3.10)

We now show that (LABEL:eq:cur) holds for and :

 ~F[i](x≤k,yb) =Cy[i,:](x≤k)[¯Fk]†R[:,b] =qi,x≤kR[¯Fk]†R[:,b] =qi,x≤k¯Fk[¯Fk]†¯Fk[:,b] =qi,x≤k¯Fk[:,b] =qi,x≤kR[:,b] =Fk[i](x≤k,yb)

where the first equality comes from the definition of , the second comes from (LABEL:eq:notxb), the third comes from (LABEL:eq:CRequiv), the fourth equality comes from using (LABEL:eq:mp), the fifth also comes from (LABEL:eq:CRequiv), and the sixth comes from another application of (LABEL:eq:notxb).

We can proceed with a symmetric argument for the partition . Linear independence and finite rank require that for all , there exists a such that

 Fk[i](x≤k,y)=ℓ∑j=1Cy[i,j](x≤k)qy[j]∀(i,x≤k)∈{1,…,n}×X≤k.

We can obtain an analogue to (LABEL:eq:notxb) by noticing that for all we have

 Fk[ia](za,x>k)=Rα[a](x>k)=C[a,:]qx>k,∀x>k∈X>k. \hb@xt@.01(3.11)

Using (LABEL:eq:notyb), a symmetric argument yields the desired equivalence for elements of the partition .

Finally, we show that (LABEL:eq:cur) holds pointwise for all . First, by (LABEL:eq:notx) and (LABEL:eq:notyb) we have

 Fk[i](x≤k,x>k) =ℓ∑j=1qi,x≤k[j]Rα[j](x>k) \hb@xt@.01(3.12) =ℓ∑j=1qi,x≤k[j]Fk[ij](zj,x>k), where (ij,zj)=α[j], =ℓ∑j=1qi,x≤k[j]C[j,:]qx>k =qi,x≤kCqx>k

Now we show that the application of (LABEL:eq:notxb) and (LABEL:eq:notyb) to (LABEL:eq:cur) yields the desired result:

 ~F[i](x≤k,x>k) =Cy[i,:](x≤k)[¯Fk]†Rα(x>k) =qi,x≤kR[¯Fk]†Cqx>k =qi,x≤k¯Fk[¯Fk]†¯Fkqx>k =qi,x≤k¯Fkqx>k =qi,x≤kCqx>k =Fk[i](x≤k,x>k),

where the first equality follows from the definition of , the second follows from an application of (LABEL:eq:notxb) and (LABEL:eq:notyb), the third follows from (LABEL:eq:CRequiv), the fourth follows from the Moore-Penrose pseudoinverse (LABEL:eq:mp), the fifth is another application of (LABEL:eq:CRequiv), and the final equality follows from (LABEL:eq:lastpartition).

This proof shows the general strategy needed to construct a CUR decomposition of a vector-valued function when one can afford to evaluate row and column fibers. In this case, one must seek at linearly independent row and column fibers to obtain an exact reconstruction. A natural question to ask, however, is what happens when one wants to obtain an approximation with for a particular separated representation of ? In particular, how do we choose and to obtain a well-behaved approximation? The answer is to choose them so that submatrix lying at the intersection of the row and column fibers has maximum volume among all possible combinations of row and column fibers.

### 3.2 Optimality of maxvol

In this section we consider the case , where we would like to use fewer terms in a skeleton approximation than the actual rank of the function. This situation arises because many functions of interest are approximately low-rank. Consider the SVD of a function with a possibly infinite number of nonzero singular values, and explicitly separate the first terms from the remainder:

 f(x,y)=ℓ∑i=1σiui(x)vi(y)+g(x,y). \hb@xt@.01(3.13)

If , then will have a relatively accurate rank approximation whenever —for instance if the singular values quickly decay to zero. Now consider the extended SVD of a vector-valued function of rank . Choose an approximation rank and form a CUR decomposition defined by a set of column fibers and a set of row fibers :

 ~Fk[i](x≤k,x>k)=Cy[i,:](x≤k)GRα(x>k). \hb@xt@.01(3.14)

The goal of this section is to bound and to show that this bound holds when the matrix in the skeleton decomposition is chosen to be a maximum volume submatrix of , where the “volume” of a matrix refers to the modulus of its determinant.

###### Definition 3.8 (Maximum volume submatrix)

A submatrix of the -separated representation of a vector-valued function has maximum volume if its determinant has maximum modulus among all possible sub-matrices of . The maximum volume submatrix is denoted as .

A result analogous to the following was proven for matrices by Goreinov [17]; here, we extend it to multivariate vector-valued functions. The proof is given in Appendix LABEL:sec:proofmaxvol.

###### Theorem 3.9

Let be a continuous vector-valued function with -separated representation , and let have an extended SVD with singular functions uniformly bounded by , i.e., and . Furthermore, let be a skeleton approximation of comprising row and column fibers, and let the matrix formed by the intersection of these fibers be a nonsingular maximum volume submatrix of . Then it holds that

 ∥F−~F∥C≤Mρ2ζ(2ε)(ℓ+1)2εσℓ+1 \hb@xt@.01(3.15)

for any , where is a constant independent of and , is the Riemann zeta function, and is the th singular value of .

Theorem LABEL:th:maxvol implies that using a maximum volume submatrix allows us to bound the error of the skeleton decomposition of a function by a constant factor greater than the error of the SVD, the optimal low-rank decomposition. This result is different from the result of [4], in which the error of a skeleton decomposition was related to the error of a polynomial approximation of the function. The present result refocuses the problem directly on the rank of the function instead of the accuracy of polynomial approximation.

### 3.3 Cross approximation and maxvol

Next, we describe an algorithm for finding a good skeleton decomposition of a vector-valued function. Computing the skeleton decomposition of a matrix is the subject of much current research. Some approaches employ random sampling of the rows and columns of the matrix [13, 26, 7]. Another class of methods attempt to explicitly find the rows and columns that maximize the volume of the submatrix. [19, 17] describe how a skeleton decomposition with a maximum-volume submatrix is quasioptimal. We will follow this second route as it has been successfully extended to the tensor-train format [32], and subsequently shown to be quasioptimal [36]. This section focuses on extending maximum-volume cross approximation algorithms to the continuous/functional setting.

#### 3.3.1 Cross approximation of a vector-valued function

Cross approximation of vector-valued functions involves a fairly straightforward row-column alternating algorithm. Pseudocode for this approach is given in Algorithm LABEL:alg:cross. The algorithm works by first fixing a set of column fibers and computing a set of rows to maximize the volume of a submatrix of the associated matrix-valued function . Next, the set of row-fibers are fixed and a new set of indices for the column fibers are identified such that they maximize the volume of a submatrix of . These indices and are found by solving continuous optimization problems. In other words, no discretization is required to choose the fibers, and the choice of optimization algorithm is flexible. In Section LABEL:sec:maxvolskinny we provide more detail on the how these optimization problems are solved. The algorithm continually alternates between rows and columns until the difference between successive approximations falls below a particular tolerance. Note that this algorithm requires the prescription of an upper bound on the rank. In Section LABEL:sec:round, we will describe a rank estimation scheme in the context of a multivariate cross approximation algorithm.

Algorithm LABEL:alg:cross requires several subroutines: qr-mvf refers to the QR decomposition of a matrix-valued function and is computed using Householder triangularization; qr-vvf refers to a QR decomposition of a vector-valued function and is also computed using Householder triangularization; maxvol-mvf refers to Algorithm LABEL:alg:maxvol, discussed in Section LABEL:sec:maxvolskinny.