Stable, Robust and Super Fast Reconstruction of Tensors Using Multi-Way Projections

Stable, Robust and Super Fast Reconstruction of Tensors Using Multi-Way Projections

Abstract

In the framework of multidimensional Compressed Sensing (CS), we introduce an analytical reconstruction formula that allows one to recover an th-order data tensor from a reduced set of multi-way compressive measurements by exploiting its low multilinear-rank structure. Moreover, we show that, an interesting property of multi-way measurements allows us to build the reconstruction based on compressive linear measurements taken only in two selected modes, independently of the tensor order . In addition, it is proved that, in the matrix case and in a particular case with rd-order tensors where the same 2D sensor operator is applied to all mode-3 slices, the proposed reconstruction is stable in the sense that the approximation error is comparable to the one provided by the best low-multilinear-rank approximation, where is a threshold parameter that controls the approximation error. Through the analysis of the upper bound of the approximation error we show that, in the 2D case, an optimal value for the threshold parameter exists, which is confirmed by our simulation results. On the other hand, our experiments on 3D datasets show that very good reconstructions are obtained using , which means that this parameter does not need to be tuned. Our extensive simulation results demonstrate the stability and robustness of the method when it is applied to real-world 2D and 3D signals. A comparison with state-of-the-arts sparsity based CS methods specialized for multidimensional signals is also included. A very attractive characteristic of the proposed method is that it provides a direct computation, i.e. it is non-iterative in contrast to all existing sparsity based CS algorithms, thus providing super fast computations, even for large datasets.

Compressed Sensing (CS), Kronecker-CS, Low-rank tensor approximation, Multi-way analysis, Tucker model.

I Introduction

During the last years there has been an increased interest in Compressed Sensing (CS), whose aim is the reconstruction of signals based on a set of measurements that is much smaller than the original signal size. Thus, instead to acquire a potentially large signal and compress it, CS suggests that signals can be compressively sampled thus reducing the amount of measurements. More specifically, in standard CS [1, 2], a signal , which is assumed unavailable, is reconstructed from a reduced set of linear projections () , where the sensing matrix is typically random or composed by few selected rows of the Fourier transform matrix [3]. In order to make the CS problem solvable, it is necessary to exploit a priori information about the signal of interest by imposing some constraints. For example, it is widely assumed that the signal is compressible by decomposing it in a known Wavelet basis (dictionary). In other words, it is assumed that every signal admits a sparse representation on a given dictionary, i.e. combining only few elements, called atoms. Under the scope of these sparse models, many efficient CS algorithms were developed in order to reconstruct signals from compressive measurements which involve iterative refinements of the solution by means of Greedy algorithms or by minimizing the -norm of the solution (see [4] for an up to date summary of algorithms). These algorithms have found many applications in diverse fields such as in medical imaging, surveillance, machine learning, etc [5].

I-a Multidimensional CS

Most of the development of CS was focused on problems involving 1D signal or 2D image data encoded in vectors. However, many important applications involve higher dimensional signals or tensors. Some data sources are readily generated as tensors such as hyperspectral images, videos, 3D light field displays [6], Magnetic Resonance Imaging (MRI) [7], etc., in other cases, tensors can be synthetically created by a rearrangement of lower dimensional data structures or by mathematical construction [8]. In some applications, like in materials science [9] or in scientific computation [10], the exponential increase in memory and time requirements when the number of dimensions increases, makes impossible to work with full datasets and models with few parameters must be used. Such models are referred as tensor decompositions, which can be obtained by making few inspections of the full-datasets or by taking compressive measurements.

Recently, the Kronecker-CS model [11] has been proposed in order to provide a practical implementation of CS for higher order tensors by exploiting their multidimensional structure. This model explicitly assumes that multidimensional signals have sparse representations using separable dictionaries, usually known as Kronecker dictionaries. Kronecker bases are well known and widely used in image processing, for example, given a 3D image, its associated dictionary is where () are small dictionaries associated to columns (mode-1), rows (mode-2) and tubes (mode-3), respectively. Moreover, when working with multidimensional signals, the Kronecker structure also arises naturally in the physical implementation of the sensing devices since they can operate on different dimensions or modes of the signal, independently, through separated sensing matrices [11, 12]. This Kronecker structure of the sensing operator/dictionary is equivalent to apply the constrained Tucker model [13] and made possible to implement relatively fast and practical algorithms based on sparsity structures, e.g., on hyperspectral 3D images and video data through a vector -norm minimization algorithm [11]. More recently, greedy algorithms, especially designed to take advantage of the Kronecker structure and block sparsity of the representations, were proposed in [13] and applied to a variety of multidimensional signal processing problems such as in MRI, hyperspectral imaging and multidimensional inpainting [14]. Also, in [15], the authors developed generalized tensor compressed sensing algorithms by exploiting the Kronecker structure in a similar way as done in [13, 16] but using an -minimization approach.

I-B Exploiting low-rank approximations instead of sparsity

While sparsity is the “working horse” of standard CS, recently a new line of research has been proposed suggesting that, instead of using sparse models as a priori information about multidimensional signals, the low-rank approximation property could be exploited, i.e. without any a priori knowledge about the possible bases or factor matrices for each mode (dictionaries). This idea was first explored in [17], where matrices were reconstructed from its under-sampled measurements by solely assuming the existence of a low-rank approximation and by solving a convex optimization problem involving the matrix nuclear norm. These ideas have been extended to tensors, by considering different models for the measurements and using tensor low-rank approximations (based on the CANDECOMP/PARAFAC (CP) model) or low-multilinear-rank approximations (based on the Tucker model). For example, in [18], tensor completion of visual data was analyzed by generalizing the minimization of matrix nuclear norm to the tensor case. In [19], also the problem of estimating missing entries in tensors was considered by assuming that a low-rank CP model is fitted through a weighted least squares problem formulation. In [20], hyperspectral images (3D tensors) are recovered from random linear projections of all channels by using a reconstruction algorithm that combines low-rank and join-sparse matrix recovery. In [21] the problem of reconstructing tensors having a low multilinear-rank Tucker model, based on a set of linear measurements, was investigated and used for tensor denoising via an iterative multilinear-rank minimization method. In the context of optical-interferometric imaging, in [22], a method for recovering a supersymmetric rank-1 3D tensor from a set of multilinear measurements was proposed by using a convex linear program. In [23], the Kronecker sensing structure was used for tensor compression and a method involving a low-rank model fitting, followed by a per mode decompression, was proposed in order to recover a low-rank CP tensor with sparse factors. We refer to Table I for a brief summary of recent approaches to CS involving tensor datasets, where the differences and similarities among the methods are highlighted.

Articles Tensor order Data Model Measurements Model Algorithm type
Duarte et al [11] Sparsity, Kronecker dictionaries Multilinear -norm minimization
Caiafa et al [13, 14] Sparsity, Block-sparsity, Kronecker dictionaries Multilinear Greedy
Li et al [15] Sparsity, Kronecker dictionaries Multilinear -norm minimization
Candes et al [17] Low-rank matrix Linear (missing entries) Nuclear norm minimization
Liu et al [18] Low-rank (CP model) Linear (missing entries) Tensor nuclear norm minimization
Acar et al [19] Low-rank (CP model) Linear (missing entries) Weighted Least Squares
Golbabaee et al [20] Joint-sparsity, Low-rank matrices Linear Nuclear norm, mixed norm minimization.
Rauhut et al [21] Low multilinear-rank (Tucker model) Linear Iterative Hard Thresholding
Auría et al [22] Rank-1 Tensor Multilinear Convex programming
Sidiropoulos et al [23] Low-rank Sparse CP model Multilinear -norm decompression
Current article Low multilinear-rank (Tucker model) Multilinear Non-iterative (direct) reconstruction
TABLE I: Summary and comparison of available approaches to CS for tensor datasets

In this work, we extend the ideas and results of our recent conference paper [24], providing a direct (i.e., analytical) reconstruction formula that allows us to recover a tensor from a set of multilinear projections that are obtained by multiplying the data tensor by a different sensing matrix in each mode. This model comes into scene naturally in many potential applications, for example, in the case of sensing 2D or 3D images by means of a separable operator as developed in [25, 26, 11, 12], i.e., by taking compressive measurements of columns, rows, etc. separately, imposing a Kronecker structure on the sensing operator. The key assumption is that our multidimensional signal is well approximated by a low multilinear-rank Tucker model which is realistic for many structured datasets, specially in the case of multidimensional images. We formulate our multidimensional CS model in a general setting for -th order tensors and provide theoretical stability analysis and robustness evidence for and a very important particular case with .

A particularly distinctive and attracting feature of the proposed reconstruction method is that, unlike all other methods listed in Table I, our method is non-iterative. We believe that the present mathematical model could be fully exploited by the next generation of multidimensional compressive sensors for very large datasets. Through extensive simulations on real-world datasets, we also illustrate the relevance of our results in hyperspectral compressive imaging for which the technology is already available [11, 25, 12].

I-C Paper organization

This paper is organized as follows: in Section II, tensor notation, definitions and basic results used throughout the paper, are introduced; in Section III, the reconstruction formula is introduced for the ideal case when the tensor of interest admits an exact low multilinear-rank representation; in Section IV, the effect of a more realistic model for signals is analyzed and a modified reconstruction formula is proposed in order to guarantee stable reconstructions; in Section V, several numerical results based on 2D and 3D real-world signals are provided, validating our theoretical results and evaluating the stability and robustness of our proposed reconstruction scheme. The performance is evaluated in terms of computational time, quality of reconstructions and variance of the results over Monte Carlo simulations (robustness). Finally, in section VI, the main conclusions of the present work are outlined.

Ii Notation, definitions and preliminary results

Ii-a Tensor notation and operations

Tensors (multi-way arrays) are denoted by underlined boldface capital letters, e.g. is an -th order tensor of real numbers. Matrices (2D arrays) are denoted by bold uppercase letters and vectors by boldface lower-case letters, e.g. and are a matrix and a vector, respectively. The element of a tensor is referred as . The Frobenius norm is defined by . The spectral norm of a matrix is denoted by corresponding to its largest singular value.

Given a tensor , its mode- fibers are the vectors obtained by fixing all indices except , which correspond to columns (), rows (), and so on. Mode- unfolding of a tensor yields a matrix () whose columns are the corresponding mode- fibers arranged in a particular order, to be more precise, tensor element maps to matrix element , where with [27].

Given a multidimensional signal (tensor) and a matrix the mode- tensor by matrix product is defined by:

(1)

with () and . It should be noted that this corresponds to the product of matrix by each one of the mode- fibers of since .

Ii-B Tucker model and multilinear-rank

The Tucker decomposition model [28] provides a generalization of the low-rank approximation of matrices to the case of tensors, i.e. for a given tensor , we have , where is an error tensor and the multilinear-rank-() tensor approximation is defined as follows (Tucker model):

(2)

with a core tensor and factor matrices (typically ). A data tensor is said to have multilinear-rank- if such a decomposition is exact for a set of minimal values , i.e. . We say that a tensor is full-rank if all its unfolded matrices are full-rank, i.e., , . A particularly interesting case of the Tucker model is when factor matrices are orthogonal and chosen as the truncated matrices of left singular vectors associated with the unfolding matrices . In this case, we obtain the so called truncated Higher Order Singular Value Decomposition (HOSVD) [28]. It is noted that, in the matrix case, the truncated SVD provides the best low rank approximation having orthogonal factors and a diagonal core matrix.

Ii-C Multi-way Projections

While in classical 1D CS, the set of compressive measurements is obtained by a linear projection, i.e. by multiplying the vector signal by a sensing matrix, in the tensor case, we can exploit its multi-way structure and use devices that provide compressive measurements by multiplying an th-order tensor by sensing matrices in several modes, similarly or identically to the Kronecker-CS setting [11]. According to the definition of the mode- product in eqn. (1), multiplying a data tensor by a sensing matrix in the mode- corresponds to apply a projection to every mode- fiber. For example, a 2D signal can be compressively sensed by using two sensing matrices, and for mode-1 and mode-2, respectively, i.e. or, equivalently, , or , where and are the vectorized versions of matrices and , respectively. Thus, the objective of Kronecker-CS is to recover the signal from the measured data matrix .

In this paper, we assume that the following set of compressive multi-way measurements (), are available:

(3)

where () are the corresponding mode- sensing matrices. Note that eqn. (3) indicates that the original tensor is multiplied by the set of sensing matrices in all modes except in mode- (see Fig. 1 (top)). We also assume available the following core tensor :

(4)

which, in fact, is redundant since it can be computed from and taking into account that for any .

Most of state-of-the-art tensor reconstruction algorithms based on Kronecker-CS [11, 13, 14, 22, 23] assume that the only available measurement tensor is , i.e. the product of the original dataset by the sensing matrices (n=1,2,3) in all modes simultaneously. On the other hand, our present method requires to have the set of tensor measurements obtained by multiplying the datasets by all the sensing matrices except one, i.e. tensors (n=1,2,3), not necessarily involving a larger amount of measurements (see experimental comparison in Section V-E). However, it is not difficult to see that already existing hardware implementations of CS imaging systems can be easily adapted in order to provide the kind of measurements required by our method. For example, in the 2D case, our method proposes to collect measurements on columns using a common sensing matrix , and rows using . By using the same ideas of the single-pixel camera developed in [29] and used in [11], our sensing operator can be obtained by using, for example, a linear array of DMDs (Digital Microarray Devices) with random orientations to sense columns and another linear array of DMDs to sense rows. On the other hand, it is also interesting to note that in [25], a different hardware implementation was used to provide real Kronecker-CS, i.e., by applying different sensing matrices to columns and rows which can be used to provide the measurements required by our method. It is also interesting to note that, recently, in [12], a new hardware implementation has been proposed that allows to employ Kronecker-CS with separable sensing operators in space and in spectral domains which could be also used to provide the measurements required by our method.

In some 3D applications, the mode- sensing matrix is the identity matrix, i.e. , because the same sensing operator is applied to each frontal slice of the tensor. This is the case, for instance, in hyperspectral compressive imaging, where each frequency band (a frontal slice) is sensed by applying a different selective filter [11, 30]. This mathematical model is also valid for video sequences, where each frontal slice of the tensor corresponds to a snapshot taken at a given time.

Fig. 1: Multi-way measurements and the reconstruction model for a low multilinear-rank 3D tensor.

Ii-D The Truncated Moore-Penrose Pseudo-inverse

It is well known that, by using the Moore-Penrose (MP) pseudo-inverse of an ill-conditioned matrix , an unstable behavior is produced since the norm could be extremely large when the smallest singular value is close to zero. In order to avoid this problem, the truncated MP pseudo-inverse can be used as a regularization technique, which is defined as follows [31]:

(5)

with entries of the diagonal matrix defined as follows:

(6)

where is a free threshold parameter (see next sections). It is noted that as and . Also, the following properties are easily verified:

(7)

with if , and if ;

(8)
(9)

The following lemma, provides a generalization of the property (7) to the rd order tensor case.

Lemma II.1.

For a given tensor with smallest singular values in each mode , and , respectively, the following property holds:

(10)

with

if

where and ().

Proof.

See proof in the Appendix A.∎

Iii Exact low multilinear-rank tensor recovery

The following theorem provides an explicit reconstruction formula, as illustrated in Fig. 1 (bottom), and states the conditions under which the original tensor can be exactly recovered from the set of multi-way measurements () and , defined in equations (3) and (4), respectively.

Theorem III.1 (Low multilinear-rank case).

If tensor has multilinear-rank- and sensing matrices are such that the tensor is full-rank, then the following reconstruction formula is exact, i.e. :

(11)

where “” stands for the MP pseudo-inverse of a matrix and , with .

Proof.

Let us consider the exact HOSVD decomposition , with core tensor and orthogonal factors , which exists because it is assumed that tensor has multilinear-rank-(). We consider, for convenience, a change of bases such that the new factors satisfy . We can do this by defining the new set of factors as follows:

(12)

thus, we have where . Note also that, with these new bases, the multi-way measurements are now simplified to or, equivalently, . Taking into account that , the mode- unfolded version of eqn. (11) is:

(13)

which, can be written in terms of its associated mode- unfolding matrix as:

Now, by substituting in the previous equation and, by repeating this process for all modes , we finally arrive at:

(14)

which proves that , since . ∎

It is noted that, using a simplified notation, the Tucker model of a tensor with size having multilinear rank , requires parameters and the suggested reconstruction needs no more than measurements corresponding to tensors , . This is larger than the number of parameters of the associated Tucker model, however, if we see that the number of measurements is approximately times the number of parameters, which is in general much smaller that the total number of entries .

Iii-a Multi-way Measurements Via Linear Projections Applied to Only Two Selected Modes

It is interesting to note that all multi-way measurements defined in eqn. (3) () and eqn. (4), can be computed from linear measurements taken only in two selected modes out of . To show this, suppose that we have at our disposal the linear measurements in modes and given by: , with ; then, it is easy to see that the mode- unfolding matrix of each multi-way measurement , , can be written as follows:

For example, in the important particular case of hyperspectral images, sometimes it is easier to take compressive measurements of columns (mode-1) and rows (mode-2) of a 3rd order tensor by using the corresponding matrices and as follows:

(15)

It is easy to see that the multi-way measurements required by our method can be obtained from these two set of linear projections, for example, noting that: , , , and that tensor can be obtained using that (for any ). In this case, matrix is not a real sensing matrix and its rank must be chosen in order to capture the numerical rank of the dataset in its 3rd mode, i.e. . In Algorithm 1, the steps to reconstruct a tensor from the linear projections defined in equation (15) are summarized.

0:  Linear projections , , sensing matrices ()
0:  Tensor reconstruction
1:  ; Mode-2 unfolding of
2:  ; Tensorization
3:  ; Mode-1 unfolding of
4:  ; Tensorization
5:  ; Mode-1 unfolding1of
6:  ; Tensorization
7:  ; Mode- unfolding ()
8:  ; for any or
9:  Compute pseudo inverses of unfolding matrices ()
10:  ,
11:  return  
Algorithm 1 : Reconstruction of tensor from linear projections taken in mode-1 and mode-2 (as defined in eqn. (15))

Iv Stable Reconstructions of Approximately Low multilinear-Rank Tensors

The reconstruction formula of eqn. (11) is exact for the case where the tensor has multilinear-rank-(). However, in real world applications, signals usually have not low multilinear-rank, but they admit good low multilinear-rank approximations. Thus, a more realistic model for a signal should be where has exact multilinear-rank-() and is a tensor error with sufficiently small norm, i.e. .

In this case, one may ask about the optimal choice for sensing matrices in order to provide the best low multilinear-rank reconstruction. In the matrix case (), it is not difficult to see that, if sensing matrices are constructed using the first singular vectors in each mode, then the obtained reconstruction is optimal (best low-rank approximation). To be more specific, given the SVD of the data matrix with , and where and are the first left and right singular values, respectively, and is a diagonal matrix containing the first singular values in its main diagonal in decreasing order, if we define the sensing matrices as follows: , , then the reconstruction is given by:

(16)

where is, by definition the truncated SVD which is the best low rank approximation.

However, in practice we do not know the singular vectors because the original dataset is not available so we need to use sensing matrices that are independent from the dataset, for example, by generating them randomly which will give us sub-optimal reconstructions, i.e. .

In particular, we say that a reconstruction method is stable if the obtained error is comparable to the input error , i.e. for some constant . As we will show in this section, the formula given by eqn. (11) may suffer from an unstable behavior, especially in the matrix case (), i.e. generating large output errors even when the input error is small. We will show that we can solve this unstable behavior by using the truncated pseudo-inverse, as defined in Section II-D, instead of the exact MP pseudo-inverse. In other words, we define the modified reconstruction formula as follows:

(17)

where is the truncated pseudo-inverse of matrix (with the threshold parameter ). It is noted that, when , eqn. (17) is equivalent to eqn. (11).

In the next sections we derive theoretical upper bounds for the reconstructions errors for (matrix case) and for a particular case of a 3D tensor (), where the same 2D sensor is applied to every frontal slice. We also analyze under which conditions the parameter can be chosen in order to provide the minimum upper bound (optimal value ).

Iv-a Error Upper Bound for the 2D case (N=2)

The following theorem provides an upper bound for the reconstruction of eqn. (17) and shows that the error bound approaches to zero as if or (i.e., is proportional to the best approximation error ).

Theorem IV.1.

. Let matrix be approximated by a rank- matrix , i.e. where and, given sensing matrices , such that is full-rank, then the following error upper bound holds:

(18)

where is the -th singular value (smallest) of the matrix and constants , and are defined below.

Proof.

Let be the truncated SVD of , and the factors are defined by eqn. (12), then we obtain

(19)

The mode- measurement matrix is , with (where we assumed that ), and using a similar analysis for mode-, we obtain that , with .

Using the property (8) of the truncated pseudo-inverse, the reconstructed matrix becomes:

(20)

Now, by inserting the expressions for and into eqn. (20) we obtain:

(21)

Assuming that with if , and , if (see Section II-D), and using and equation (19), we obtain:

(22)

If we apply the spectral norm to the last matrix equation and by using that , we finally obtain

(23)

where constants are identified as follows:

(24)
(25)
(26)

Fig. 2: Matrix Case (): Illustration of the two possible shapes for the error upper bound function depending on the magnitude of the singular value . Typical values of parameters , were chosen as in the digital image example of Fig. 3 ( and ).

It is important to note that the obtained theoretical bound is not tight in general because it is based on matrix inequalities that usually are not tight, such as the triangular inequality, i.e. and other used matrix inequalities. However, it still is useful since:

  1. The case (exact case) is not realistic since always real world datasets are not low multilinear rank, so a theoretical bound, even if it is not tight is needed to characterize the reconstruction behavior in real applications;

  2. The bound, as a function of is asymptotically linear, i.e. of order . So as more precise the model is (smaller ), much better reconstructions, and of the same order, are expected, which is intuitive but can be only proved by a theoretical bound. In particular, it proves that, the bound approaches zero as the best low-rank approximation error , when or , and perfect reconstruction is obtained when the signal has rank-, as it was already proved in Theorem III.1;

Iv-B Optimal selection of threshold parameter

A quite important question arises here: What is the optimal value of the threshold parameter ? Since the objective is to obtain a reconstruction error as small as possible, we can try to minimize the upper bound and hope that the theoretical bound gets its minimum at a point close to the minimum actual error. After a careful look at eqn. (18) as a function of , if we define

(27)

we can identify two different cases:

  • CASE I (small ): When , the error bound is a convex function attaining its global minimum at . This means that we should use . Note that the error bound for the original reconstruction formula (11) corresponds to the case of setting , which gives us a larger error bound (see Fig. 2 (top)).

  • CASE II (large ): On the other hand, when , the best choice is to set , which corresponds to using the original reconstruction formula of eqn. (11), i.e., with (see Fig. 2 (bottom)).

However, it is important to note that, in practice, we are not able to compute the optimal parameter because we do not know matrices and which depends on the SVD decomposition of the original unknown signal. However, we may use a rough overestimated approximation of this parameter by assuming that and using the fact that which provides us the following rough estimation:

(28)

But, the problem still remains challenging since usually we do not know exactly the error of the best low-rank estimation .

Iv-C Error Upper Bound for a particular 3D case (N=3)

In this section we consider the recovery of a -rd order tensor , for the particular and very important case where the same sensing operator is applied to every frontal slice of a tensor as considered hyperspectral CS imaging or video CS [11]. In other words, we assume that matrix (i.e. ) is the identity matrix, thus the following multi-way projections are available: ,