A Nonconvex Low-Rank Tensor Completion Model for Spatiotemporal Traffic Data Imputation

# A Nonconvex Low-Rank Tensor Completion Model for Spatiotemporal Traffic Data Imputation

## Abstract

Sparsity and missing data problems are very common in spatiotemporal traffic data collected from various sensing systems. Making accurate imputation is critical to many applications in intelligent transportation systems. In this paper, we formulate the missing data imputation problem in spatiotemporal traffic data in a low-rank tensor completion (LRTC) framework and define a novel truncated nuclear norm (TNN) on traffic tensors of locationdaytime of day. In particular, we introduce an universal rate parameter to control the degree of truncation on all tensor modes in the proposed LRTC-TNN model, and this allows us to better characterize the hidden patterns in spatiotemporal traffic data. Based on the framework of the Alternating Direction Method of Multipliers (ADMM), we present an efficient algorithm to obtain the optimal solution for each variable. We conduct numerical experiments on four spatiotemporal traffic data sets, and our results show that the proposed LRTC-TNN model outperforms many state-of-the-art imputation models with missing rates/patterns. Moreover, the proposed model also outperforms other baseline models in extreme missing scenarios.

###### keywords:
Spatiotemporal traffic data, Missing data imputation, Low-rank tensor completion, Truncated nuclear norm (TNN) minimization, Nonconvex optimization
1\newdefinition

definitionDefinition

## 1 Introduction

Spatiotemporal traffic data, which registers time-stamped traffic state (e.g., flow and speed) observations from different locations (e.g., a network of sensors), serves as a critical input to a wide range of applications in intelligent transportation systems (ITS), such as travel time estimation, trip planning, traffic forecasting, to name but a few. With the recent development in information and communication technology (ICT), the scale and dimension of spatiotemporal traffic data also become larger. In the meanwhile, however, these data sets also suffer from the missing data problem, which undermines their utility and effectiveness in real-world applications. In practice, the missing data problem may arise from sensor malfunctioning, communication failure, and maintenance. Another critical reason that causes the missing data problem is due to the insufficient sensor coverage in both the spatial and the temporal dimensions. For example, floating cars in an urban transportation network can provide a good sample of traffic information in real-time. However, the data itself is in nature sparse and far from enough to support city-wide and fine-resolution traffic speed and travel time monitoring. Therefore, performing imputation on spatiotemporal traffic data has become a critical step before further applications. Recent research has shown an increasing interest in developing efficient and effective missing data imputation methods for large-scale and high-dimensional traffic data (see e.g., Zhu et al., 2012; Li et al., 2013; Asif et al., 2016; Tan et al., 2016; Chen et al., 2019a, b; Yu et al., 2020).

Incomplete spatiotemporal traffic data are multivariate/multidimensional time series with various missing patterns and missing ratios. Essentially, the missing patterns can be summarized into two types: random missing and non-random missing. Figure 1 presents two intuitive examples of missing patterns (i.e., random missing and non-random missing) in a spatiotemporal setting. The fluctuation of the power grid and the loss of the transmission packets can lead to data random missing scenarios in which individual data points are randomly lost. While the sensor malfunctioning and regular maintenance may cause non-random data missing scenarios in which we have correlation corruption for a certain period of time. It has been demonstrated that non-random missing patterns are more difficult to deal with than random missing due to the correlated corruption. It is very challenging to impute the incomplete traffic data under non-random or mixture missing patterns that data may be lost for several hours or even several days.

The fundamental challenge of missing data imputation is to effectively borrow information from those “observed” entries by characterizing the higher-order correlations and dependencies in the data. To achieve this, recent research has shown an increasing interest in taking advantage of the low-rank property of spatiotemporal data, which allows us to apply techniques such as compressive sensing, principal component analysis (PCA), and matrix/tensor factorization. For example, Zhu et al. (2012) imposed the low-rank assumption on a traffic state matrix from a list of road segments over certain time and transforms missing data imputation to a matrix completion problem; based on probabilistic PCA, Li et al. (2013) developed an improved model to better characterize the nonlinear spatiotemporal dependencies; based on probe vehicle data, Yu et al. (2020) developed a city-wide traffic data estimation framework by performing matrix completion using Schatten -norm. To better capture the correlations and utilize the global structure across different days, Asif et al. (2016), Tan et al. (2016) and Chen et al. (2018) model the temporal dimension as the combination of day and time of day, and then apply tensor factorization techniques to impute missing values. Given that most traffic data sets are inherently low-rank, these low rank-based models have demonstrated superior performance over other methods.

In the meanwhile, substantial research progress has been made in developing new learning methods for low-rank tensor completion (LRTC). Given that the rank operator is neither convex nor continuous, the nuclear norm (NN) is introduced as a convex surrogate for the rank function to be minimized. LRTC with a NN function has attracted considerable attention recently, and many variants have been developed in the past few years (see e.g., Liu et al., 2013). However, despite the remarkable performance of NN in matrix/tensor completion, it is also suggested that the convex NN relaxation may not be sufficient to approximate the rank, as all the singular values are simultaneously minimized (Hu et al., 2013). It has been proved that solving the convex NN regularized problem leads to a near-optimal low-rank solution if we impose certain incoherence assumptions on the singular values of the matrix (Candès and Tao, 2010); however, such conditions are rarely met in practice. As a result, recent research has tried to identify alternative approximation to the rank function, and some studies have shown the advantage of using nonconvex surrogate functions to approximate the rank for matrices (see e.g., Hu et al., 2013; Lu et al., 2014, 2015; Yao et al., 2019; Fan et al., 2019). For example, Lu et al. (2014) and Yao et al. (2019) evaluated several nonconvex singular value functions for approximating the rank function on low-rank matrix learning, and it is shown that nonconvex surrogate regularization can work better than NN. Recent studies have also extended matrix TNN to a tensor setting (e.g., Huang et al., 2014; Han et al., 2017; Xue et al., 2018), showing that nonconvex TNN minimization is more capable than convex NN minimization on tensor completion tasks. However, these studies mainly focus on an image setting (e.g., a tensor of 2562563 ()) in which , and it remains a critical challenge to properly define the degree of truncation on each tensor mode in the case of spatiotemporal traffic data without prior knowledge. To solve this problem, we propose a new TNN model in which we automatically determine the nuclear norm truncation along each mode of a given data tensor by introducing a global rate parameter. Our contribution is twofold:

• We develop an imputation model that benefits from our domain knowledge. Following the nonconvex TNN minimization on matrix learning problems, we propose a TNN minimization based LRTC model with elegantly defined TNN on tensors and apply an efficient optimization algorithm—Alternating Direction Method of Multipliers (ADMM)—to solve this model. The truncation along all tensor modes can be automatically determined by a rate parameter in this model, and this technical operation is more flexible than what given in previous LRTC-TNN models.

• We show the capability and advantage of the proposed model with extensive data sets and imputation experiments. Relying on the four selected spatiotemporal traffic data sets, we demonstrate the superiority (e.g., accuracy, efficiency) of the proposed model with various missing rates/patterns over the state-of-the-art imputation models. In a very unusual case that data have a large number of missing values (e.g., missing rate is 70%), we also show that the proposed model provides accurate imputation results and outperforms the competing models.

The remainder of this paper is organized as follows. In Section 2, we first present the basic formulation of low-rank tensor completion (LRTC), and then we propose a new formulation based on truncated nuclear norm (TNN) minimization by introducing a universal rate parameter to control the degree of truncation. Section 3 depicts the main pipeline for model implementation and application. In Section 4, we conduct numerical experiments on several real-world data sets under two aforementioned missing scenarios, and we evaluate the proposed LRTC-TNN model against several state-of-the-art baseline models. Section 5 summarizes this study and we discuss several directions for future research.

## 2 Methodology

In this section, we first introduce the basic formulation of the LRTC model. Then, we introduce the LRTC with truncated nuclear norm (TNN) minimization in detail. In particular, we define the formulation of TNN on tensors and integrate it into LRTC.

### 2.1 Notations

Throughout this work, we follow the same notations as in Kolda and Bader (2009). We use boldface uppercase letters to denote matrices, e.g., , boldface lowercase letters to denote vectors, e.g., , and lowercase letters to denote scalars, e.g., . Given a matrix , the Frobenius norm is defined as . We denote a th-order tensor and its entries by and , respectively. Let denote the th-mode unfolding of tensor for . Correspondingly, we define a folding operator that converts a matrix to a higher-order tensor in the th-mode as . Thus, we have .

### 2.2 Low-Rank Tensor Completion (LRTC)

LRTC is a family of tensor completion techniques. This type of machine learning model is essentially built on the low-rank assumption on the partially observed input tensor, which is same to the low-rank matrix completion. In this work, we mainly focus on modeling a third-order tensor in the context of spatiotemporal traffic data. For a partially observed third-order tensor , the LRTC model can be formulated as

 minX rank(X) (1) s.t. PΩ(X)=PΩ(Y),

where is the recovered tensor that we hope to find, is the index set of the observed entries (Liu et al., 2013). The notation refers to the algebraic rank, and extension to higher-order tensors is straightforward. The operator is an orthogonal projection supported on :

 [PΩ(X)]mnt={xmnt,if(m,n,t)∈Ω,0,otherwise,

for any tensor . The operator denotes the projection onto the complementary set of . The relationship between these two operators is .

The rank minimization problem in Eq. (1) for tensor completion is NP-hard and computationally intractable (Liu et al., 2013). To deal with this issue, recent research focuses on identifying a possible convex relaxation as an alternative to the rank minimization problem. For example, Liu et al. (2013) proposed to use the multiple NN to replace the rank function in the objective function:

 minX 3∑k=1αk∥X(k)∥∗ (2) s.t. PΩ(X)=PΩ(Y),

where for are weight parameters. In this objective function, for any matrix , the NN is defined as and is the th biggest singular value of . In an early study, Fazel (2002) proved that the standard NN is the convex envelope of the nonconvex rank function, i.e., , when the largest singular value is not larger than 1.

Despite that the NN minimization method has achieved tremendous progress and remarkable success in missing matrix/tensor data imputation tasks in previous studies, recent studies suggest that the result can be significantly improved by using certain nonconvex functions constructed on singular values instead of the default NN (Hu et al., 2013; Gu et al., 2014; Yao et al., 2019). For example, instead of minimizing all singular values of a matrix simultaneously, TNN minimization keeps large singular values unchanged to preserve major components and only consider those small singular values as variables. In this work, our preliminary goal is to define a type of TNN on tensors to replace the corresponding NN in Eq. (2).

### 2.3 LRTC with TNN Minimization (LRTC-TNN)

Before introducing the formulation of TNN on tensors, we first give a commonly used definition for the TNN on matrices.

{definition}

[Truncated Nuclear Norm on Matrices (Zhang et al., 2012; Hu et al., 2013)] Given a matrix and a positive integer , TNN is defined as the sum of minimum singular values, i.e.,

 ∥X∥r,∗=min{m,n}∑i=r+1σi(X), (3)

where is the th singular value of . The singular values are sorted as .

In Definition 2.3, the largest singular values do not contribute to the TNN function. This truncation definition makes it possible to minimize the NN subtracted by the sum of the largest singular values rather than all singular values simultaneously. However, this definition on matrices cannot be directly used for multiway/multidimensional tensors. In accordance with TNN on matrices, we follow Liu et al. (2013) and propose to use tensor unfoldings (i.e., matrices) to define the TNN on tensors.

{definition}

[Truncated Nuclear Norm on Tensors] For any th-order tensor , the multiple TNN is defined as

 ∥X∥θ,∗=d∑k=1αk∥X(k)∥rk,∗, (4)

with the truncation for each tensor mode being

 rk=⌈θ⋅min{nk,∏h≠knh}⌉,∀k∈{1,2,...,d}, (5)

where means the smallest integer that is not less than the given value, is an universal rate parameter which controls the whole truncation on modes of and it should satisfies in this case, with are weight parameters imposed on all the TNN of unfolding matrices , respectively.

Provided Definition 2.3 as the basis of LRTC with TNN minimization, we believe that if the rate parameter can be set appropriately, the truncation for each tensor mode would be assigned automatically. By replacing the NN in Eq. (2) with the defined TNN in Eq. (4), the LRTC model is now given by

 minX 3∑k=1αk∥X(k)∥rk,∗ (6) s.t. PΩ(X)=PΩ(Y).

However, Eq. (6) is still not in its appropriate form. In fact, unfolding a tensor in different mode cannot guarantee the dependencies of variables (i.e., unfolding matrices of the tensor) in the objective function (Liu et al., 2013). To this end, we introduce an auxiliary tensor variable and an additional set of constraints to convert the problem (6) into a tractable problem as follows,

 minM,X1,X2,X3 3∑k=1αk∥Xk(k)∥rk,∗ (7) s.t. {Xk=M,k=1,2,3,PΩ(M)=PΩ(Y),

where is introduced to keep observation information and then broadcast such information to the variables . In the current formulation, tensors are involved with TNNs while another variable establishes the relationship with the partially observed tensor .

### 2.4 Solution Algorithms

To solve this optimization problem, a straightforward and widely used approach is the Alternating Direction Method of Multipliers (ADMM) framework. This framework has been experimentally proved to be efficient for low-rank matrix/tensor completion models (Hu et al., 2013; Liu et al., 2013). Before setting up an ADMM model, we need to write the augmented Lagrangian function of Eq. (7) in advance:

 (8)

where denotes the inner product.

Accordingly, ADMM transforms the original tensor completion problem to the following three subproblems in an iterative manner:

 Xl+1k: =argminXL(M,{Xl+1k,Tlk}3k=1), (9) Ml+1: =argminML(M,{Xl+1k,Tlk}3k=1), Tl+1k: =Tlk+ρk(Xl+1k−Ml+1),

where we follow the order .

In practice, if , the variables can be updated by

 Tl+1:=Tl+ρ(~Xl+1−~Ml+1), (10)

where are fourth-order tensors of the same size . and are stacked by third-order tensors s and s over the fourth mode, respectively, and is stacked over the fourth mode by coping the third-order tensor .

#### Computing Tensors Xk,k=1,2,3

Based on Eqs. (8) and (9), the subproblem for getting the optimal is

 Xl+1k: =argminX αk∥X(k)∥rk,∗+ρk2∥X(k)−Ml(k)∥2F+⟨X(k),Tlk(k)⟩ (11) =argminX αk∥X(k)∥rk,∗+ρk2∥X(k)∥2F−ρk⟨X(k),Ml(k)−1ρkTlk(k)⟩, =argminX αk∥X(k)∥rk,∗+ρk2∥X(k)−(Ml(k)−1ρkTlk(k))∥2F, =argminX G(X(k)).

In general, the optimal can be found by solving when is convex. However, for a nonconvex , a global optimal solution is not guaranteed (Hu et al., 2013) and it remains unclear whether an optimization problem like Eq. (11) can converge to the desired global optima. In this work, we adopt a closed-form optimal/sub-optimal solution to this problem by using the following theorem.

###### Theorem 1.

For any , , and where , an optimal solution to the problem

 minX α∥X∥r,∗+ρ2∥X−Z∥2F, (12)

is given by the generalized singular value thresholding

 ^X=UΣα/ρV⊤, (13)

where is the singular value decomposition of . The shrinkage of singular values is therefore defined as

 Σα/ρ=diag((σ1,⋯,σr,[σr+1−α/ρ]+,⋯,[σmin{m,n}−α/ρ]+)⊤), (14)

where are diagonal entries of , and denotes the truncation at 0 which satisfies .

###### Proof.

For any , define the function as

 fi(x)={αx+ρ2(x−z)2,if % i>r,ρ2(x−z)2,otherwise, (15)

which is differentiable, and

 ∂fi(x)∂x={α+ρ(x−z),if i>r,ρ(x−z),otherwise. (16)

When minimizing the objective function , we should find such that to optimize . To solve , we construct the singular value function referring to Larsson et al. (2014) as as where the function is defined as Eq. (15). Considering the components of the sum in separately, we have

 (17)

and this result is same as Eq. (14).

On the other hand, the TNN minimization problem described in Eq. (12) is indeed a special case of the weighted NN minimization problem, i.e.,

 minX αw⊤σ(X)+ρ2∥X−Z∥2F,

with the weights being and . As proved by Zhang and Lu (2011) (Proposition 2.1) and Chen et al. (2013) (Theorem 2.3), in the situation of (i.e., order constraint), weighted NN minimization problem has an optimal solution and the shrinkage of singular values corresponding to Eq. (14) is given by

 Σα/ρ=diag(([σ1−αw1/ρ]+,⋯,[σmin{m,n}−αwmin{m,n}/ρ]+)⊤), (18)

which is consistent with Theorem 1.

Now, according to Theorem 1, the shrinkage of singular values for the problem (11) is

 σi(X(k))=⎧⎪⎨⎪⎩[σi(Ml(k)−1ρkTlk(k))−αkρk]+,if i>rk,σi(Ml(k)−1ρkTlk(k)),otherwise, (19)

where the singular value decomposition of is and the diagonal entries of are s. We could get an optimal by

 Xl+1k=foldk(U% diag(σ(X(k)))V⊤). (20)

#### Computing Tensor M

In terms of , there is a least square solution based on Eq. (8) and (9), which is given by

 Ml+1: =argminM 3∑k=1(ρk2∥Xl+1k(k)−M(k)∥2F−⟨M(k),Tlk(k)⟩) (21) =argminM 3∑k=1(ρk2⟨Xl+1k−M,Xl+1k−M⟩−⟨M,Tlk⟩) =argminM3∑k=1(ρk2⟨M,M⟩−ρk⟨M,Xl+1k⟩−⟨M,Tlk⟩) =argminM⟨M,M⟩−2∑3k=1ρk⟨M,3∑k=1(ρkXl+1k+Tlk)⟩ =1∑3k=1ρk3∑k=1(ρkXl+1k+Tlk),

where we impose a fixed constraint, i.e., , to guarantee the transformation of observation information at each iteration.

## 3 Model Implementation

As shown in Figure 2, our goal is to impute the incomplete tensor as a complete one using the proposed LRTC-TNN model. The main procedures are: (1) adapting the raw data to a tensor structure with a specific rule, (2) imputing incomplete tensor data with our proposed model, and (3) filling in missing entries.

A practical issue in applying this tensor learning framework for spatiotemporal traffic data imputation is how to establish the rule of structuring spatiotemporal traffic data as a tensor. In this study, we assume that third-order tensor with dimensions corresponding to spatial location, day, and time interval is a typical algebraic data structure, which is experimentally proved to be efficient on a number of spatiotemporal traffic data (Chen et al., 2019b). To find a solution of the problem (7), it is sufficient to carry out the following steps (see the middle panel of Figure 2): (1) Initialize the variables and set some necessary parameters; (2) Update the variables , , and according to Eq. (20), Eq. (21), and Eq. (10), respectively.

## 4 Experiments

In this section, we conduct extensive experiments on four real-world spatiotemporal traffic data sets to evaluate the proposed LRTC-TNN model.

### 4.1 Data Sets

We choose four publicly available data sets collected from real-world transportation systems as our benchmark data sets. Table 1 provides a brief summary of these data sets. For simplicity, as shown in Table 1, we refer to the four data sets as data set ”G”, ”B”, ”H”, and ”S”, respectively. Essentially, the four data sets follow the same tensor structure–“location/sensordaytime of day” and they can be also organized in a matrix structure of “location/sensortime” by stacking the time dimension.

### 4.2 Baseline Models

We compare the proposed LRTC-TNN model with the following baseline models:

• BTMF: Bayesian Temporal Matrix Factorization (Chen and Sun, 2019). This is a fully Bayesian matrix factorization model which integrates vector autoregressive (VAR) model into the latent temporal factors. With the flexible VAR process, BTMF achieves superior accuracy over other matrix factorization models (without temporal modeling) and tensor factorization models in imputation tasks.

• TRMF: Temporal Regularized Matrix Factorization (Yu et al., 2016). This is a temporal matrix factorization model which applies a multiple autoregressive (AR) process to model latent temporal factors. BTMF and TRMF can be considered a generalized version of the Bayesian temporal tensor factorization model by Xiong et al. (2010), which imposes temporal smoothness with a dynamic model.

• BGCP: Bayesian Gaussian CP decomposition (Chen et al., 2019b). This is a fully Bayesian tensor factorization model which uses Markov chain Monte Carlo (MCMC) to learn the latent factor matrices (i.e., low-rank structure).

• BATF: Bayesian Augmented Tensor Factorization (Chen et al., 2019a). This is a fully Bayesian model built on a special tensor factorization formula, in which components include both explicit variables and latent factors (i.e., low-rank factors). Variables in this model are learned via variational Bayes.

• HaLRTC: High-accuracy Low-Rank Tensor Completion (Liu et al., 2013). This is a LRTC model which uses NN minimization to find accurate estimation for unobserved/missing entries in tensor data.

We have the following considerations on choosing the baseline models. In general, we take into account two types of imputation approaches built on the assumption of underlying low-rank structure, which has been proved to be effective and efficient in the previous studies. The first is low-rank temporal matrix factorization models like BTMF and TRMF. These models follow the matrix structure (i.e., “location/senortime”) and impose smoothness on latent temporal factors by introducing auto-regressive/dynamic assumptions. Therefore, these temporal models produce more informative and consistent low-rank structures than those estimated from standard matrix factorization. The second approach follows the tensor structure (i.e., “location/senordaytime of day”) to better utilize the information across different days. The tensor approach includes tensor factorization models (BGCP and BATF) and a standard LRTC model (HaLRTC). These models have been experimentally proved to be state-of-the-art in the literature.

### 4.3 Experiment Setting

For model evaluation, we mask certain entries of the data as missing values and then perform imputation for these “missing” entries. We use the actual values (ground truth) of these entries to compute the metrics MAPE and RMSE:

 MAPE=1nn∑i=1|yi−^yi|yi×100,RMSE= ⎷1nn∑i=1(yi−^yi)2. (22)

Following (Chen et al., 2018), we design two missing patterns—random missing (RM, see Figure 1(a)) and non-random missing (NM, see Figure 1(b)). The NM scenario is more challenging since the data is corrupted in a correlated manner. These two missing scenarios can help us better assess the performance and effectiveness of different models.

There are some parameters to be set in LRTC-TNN, including , , and . In our experiments with third-order tensors, we set and . Here, the learning rate parameter and the truncation rate parameter determine the performance directly. Table 2 shows the the value of these two parameters for each imputation experiment by using cross validation. For learning rate parameter , it determines the convergence of the whole model. Larger usually slows down the convergence process, while a smaller one would let the model meet convergence in only few iterations. The rate parameter for NM data should be set much smaller than RM data, wherein we set the smallest as 5% for these data sets. The maximum number of iteration of LRTC-TNN for these imputation experiments is set to 200. The adapted data sets and Python implementation for these experiments are publicly available at GitHub6.

### 4.4 Results

Table 3 summarizes the imputation performance of matrix factorization models (BTMF and TRMF), tensor factorization models (BGCP and BATF), and LRTC models (HaLRTC and LRTC-TNN) on the four spatiotemporal data sets. We follow the same missing rates as our previous work (Chen and Sun, 2019) and configure all matrix and tensor factorization models with the same number of factors and BTMF and TRMF with the same time lags. The number of factors of factorization models (i.e., BTMF, TRMF, BGCP, and BATF) are set to 80, 30, 50, 50 for RM data sets (G), (B), (H), and (S), respectively. For the NM scenario, we set the number of factors to 10 for all data sets. The time lag of BTMF and TRMF for all four data sets is where is the number of time intervals per day.

Overall, LRTC-TNN clearly outperforms other baseline models in diverse missing scenarios (RM and NM scenarios with varying missing rates). For all these models, we see that the MAPE/RMSE values are higher for NM scenario than RM scenario, suggesting that NM scenario is indeed more difficult to handle than the RM scenario. Another fact is that the MAPE/RMSE values become higher with increasing missing rates. Both missing pattern and missing rate have a direct impact on all these models.

For data sets (G), (H) and (S), LRTC-TNN consistently outperforms other baseline models significantly. For the Birmingham (B) parking data set, LRTC-TNN achieves the best imputation performance for the scenario of NM, while the two temporal factorization models BTMF and TRMF performs the best for the RM scenario. This is due to the strong temporal patterns and consistency in data set (B), and thus the temporal smoothness and dynamics (e.g., AR/VAR) play a more important role in capturing the true signal.

Both the two LRTC models (HaLRTC and LRTC-TNN) achieve promising performance; however, we find that LRTC-TNN consistently outperforms HaLRTC with much lower MAPE/RMSE values. On the other hand, the MAPE/RMSE values of using LRTC-TNN do not become higher dramatically as HaLRTC when missing rate increases, and this is also indicated in Figure 3. These results clearly show the advantage of using TNN minimization over NN minimization for learning low-rank tensor structures.

In general, the imputation problem becomes more and more challenging with increasing missing rate. We next investigate the performance of different models under heavy and extreme missing scenarios. In doing so, we compare LRTC-TNN with three baseline models: BTMF as a matrix factorization model, BGCP as a tensor factorization model, and HaLRTC as a low-rank completion model. Figure 3 shows the performance of the four models in both RM and NM scenarios under high missing rates (i.e., 50%, 60%, and 70%). As can be seen, LRTC-TNN achieves the lowest MAPE/RMSE values in most cases. It should be noted that HaLRTC becomes unstable in heavy missing scenarios. On the contrary, LRTC-TNN is robust and capable of addressing heavy missingness with high accuracy. The result also implies that the nonconvex TNN minimization shows superior performance over the convex NN minimization.

We next choose some examples from the four experiment data sets and visualize the NM time series with the extreme missing rate and the corresponding recovered time series by using LRTC-TNN in Figure 4. We see that LRTC-TNN can achieve very high accuracy for all the four data sets with only 30% input.

## 5 Conclusion and Future Directions

In this paper, we propose a nonconvex TNN minimization based low-rank tensor completion model—LRTC-TNN—to impute missing values in spatiotemporal traffic data. Our experiments show that our proposed model consistently outperforms other state-of-the-art imputation models on the four selected real-world traffic data sets with two representative missing patterns (i.e., random missing [RM] and non-random missing [NM]).

For future research, we propose the following directions:

• Incorporating side information and smoothness priors: Spatiotemporal data demonstrates both global consistency and local consistency (e.g., the autoregressive patterns in BTMF and TRMF). The underlying low-rank tensor is effective to capture the global consistency. To further impose local consistency and better characterize side information, we can introduce smoothness priors to the model by introducing additional cost terms or constraints into the optimization framework, such as spatial Laplacian regularization, temporal Toeplitz regularization, and total variation (Roughan et al., 2011; Yokota et al., 2016).

For instance, we can encode a priori information on the spatial structure using a graph Laplacian matrix where and are the adjacent matrix of spatial structure and the degree matrix, respectively. This allows us to impose the similarity between two connected locations by introducing the penalty where rows of any matrix correspond to spatial locations. Combining with autoregression or smoothness on temporal dimension, we can formulate a spatiotemporal regularization on the variable (see the subproblem in Eq. (21)).

• Minimizing Schatten -norm: In algebra, Schatten -norm is defined as

 ∥X∥Sp=[∑iσpi(X)]1/p, (23)

where and (Fan et al., 2019). Here, we could see that: (1) If , then Schatten -norm is the rank function, i.e., ; (2) If , then Schatten -norm is the nuclear norm. Therefore, we can find a better surrogate for the rank function by using the Schatten -norm than the NN by letting . Accordingly, truncated Schatten -norm that stems from Schatten -norm might be a potentially better minimizer than TNN. However, a critical bottleneck for Schatten -norm minimization is the nonconvex optimization, which is more difficult to solve than TNN minimization discussed in this work.

• Minimizing weighted nuclear norm: Instead of using TNN minimization, we can develop a weighted version of the NN minimization to better account for the differences/variations between singular values (Gu et al., 2014). The objective function of our problem can also be defined by a weighted version of the NN minimization:

 ∥X∥w,∗=∑iwiσi(X), (24)

where are non-negative weights assigned to the singular values . This will lead a more general nonconvex form; however, this will introduce an additional set of weight parameters and it becomes a challenge to tune these parameters for a specific data set.

## Acknowledgement

This research is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, the Institute for Data Valorisation (IVADO), and the Canada Foundation for Innovation (CFI).

### References

1. Asif, M. T., Mitrovic, N., Dauwels, J., Jaillet, P., 2016. Matrix and tensor based methods for missing data estimation in large traffic networks. IEEE Transactions on Intelligent Transportation Systems 17 (7), 1816–1825.
2. Candès, E. J., Tao, T., May 2010. The power of convex relaxation: Near-optimal matrix completion. IEEE Trans. Inf. Theor. 56 (5), 2053–2080.
3. Chen, K., Dong, H., Chan, K.-S., 2013. Reduced rank regression via adaptive nuclear norm penalization. Biometrika 100 (4), 901–920.
4. Chen, X., He, Z., Chen, Y., Lu, Y., Wang, J., 2019a. Missing traffic data imputation and pattern discovery with a bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies 104, 66 – 77.
5. Chen, X., He, Z., Sun, L., 2019b. A bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies 98, 73 – 84.
6. Chen, X., He, Z., Wang, J., 2018. Spatial-temporal traffic speed patterns discovery and incomplete data recovery via svd-combined tensor decomposition. Transportation Research Part C: Emerging Technologies 86, 59–77.
7. Chen, X., Sun, L., 2019. Bayesian temporal factorization for multidimensional time series prediction.
8. Fan, J., Ding, L., Chen, Y., Udell, M., 2019. Factor group-sparse regularization for efficient low-rank matrix recovery. In: Advances in Neural Information Processing Systems. pp. 5105–5115.
9. Fazel, M., 2002. Matrix rank minimization with applications. Ph.D. thesis, Stanford University.
10. Gu, S., Zhang, L., Zuo, W., Feng, X., June 2014. Weighted nuclear norm minimization with application to image denoising. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2862–2869.
11. Han, Z.-F., Leung, C.-S., Huang, L.-T., So, H. C., 2017. Sparse and truncated nuclear norm based tensor completion. Neural Processing Letters 45 (3), 729–743.
12. Hu, Y., Zhang, D., Ye, J., Li, X., He, X., Sep. 2013. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9), 2117–2130.
13. Huang, L.-T., So, H.-C., Chen, Y., Wang, W.-Q., 2014. Truncated nuclear norm minimization for tensor completion. In: 2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, pp. 417–420.
14. Kolda, T. G., Bader, B. W., 2009. Tensor decompositions and applications. SIAM Review 51 (3), 455–500.
15. Larsson, V., Olsson, C., Bylow, E., Kahl, F., 2014. Rank minimization with structured data patterns. In: European Conference on Computer Vision. Springer, pp. 250–265.
16. Li, L., Li, Y., Li, Z., 2013. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transportation research part C: emerging technologies 34, 108–120.
17. Liu, J., Musialski, P., Wonka, P., Ye, J., 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1), 208–220.
18. Lu, C., Tang, J., Yan, S., Lin, Z., 2014. Generalized nonconvex nonsmooth low-rank minimization. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
19. Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z., 2015. Generalized singular value thresholding. In: AAAI Conference on Artificial Intelligence (AAAI).
20. Roughan, M., Zhang, Y., Willinger, W., Qiu, L., 2011. Spatio-temporal compressive sensing and internet traffic matrices (extended version). IEEE/ACM Transactions on Networking 20 (3), 662–676.
21. Tan, H., Wu, Y., Shen, B., Jin, P. J., Ran, B., 2016. Short-term traffic prediction based on dynamic tensor completion. IEEE Transactions on Intelligent Transportation Systems 17 (8), 1524–9050.
22. Xiong, L., Chen, X., Huang, T.-K., Schneider, J., Carbonell, J. G., 2010. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In: Proceedings of the 2010 SIAM International Conference on Data Mining. pp. 211–222.
23. Xue, S., Qiu, W., Liu, F., Jin, X., 2018. Low-rank tensor completion by truncated nuclear norm regularization. In: 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, pp. 2600–2605.
24. Yao, Q., Kwok, J. T., Wang, T., Liu, T., Nov 2019. Large-scale low-rank matrix learning with nonconvex regularizers. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (11), 2628–2643.
25. Yokota, T., Zhao, Q., Cichocki, A., 2016. Smooth parafac decomposition for tensor completion. IEEE Transactions on Signal Processing 64 (20), 5423–5436.
26. Yu, H.-F., Rao, N., Dhillon, I. S., 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. In: Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 29. Curran Associates, Inc., pp. 847–855.
27. Yu, J., Stettler, M. E., Angeloudis, P., Hu, S., Chen, X. M., 2020. Urban network-wide traffic speed estimation with massive ride-sourcing gps traces. Transportation Research Part C: Emerging Technologies 112, 136–152.
28. Zhang, D., Hu, Y., Ye, J., Li, X., He, X., June 2012. Matrix completion by truncated nuclear norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2192–2199.
29. Zhang, Y., Lu, Z., 2011. Penalty decomposition methods for rank minimization. In: Shawe-Taylor, J., Zemel, R. S., Bartlett, P. L., Pereira, F., Weinberger, K. Q. (Eds.), Advances in Neural Information Processing Systems 24. Curran Associates, Inc., pp. 46–54.
30. Zhu, Y., Li, Z., Zhu, H., Li, M., Zhang, Q., 2012. A compressive sensing approach to urban traffic estimation with probe vehicles. IEEE Transactions on Mobile Computing 12 (11), 2289–2302.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters