A Nonconvex LowRank Tensor Completion Model for Spatiotemporal Traffic Data Imputation
Abstract
Sparsity and missing data problems are very common in spatiotemporal traffic data collected from various sensing systems. Making accurate imputation is critical to many applications in intelligent transportation systems. In this paper, we formulate the missing data imputation problem in spatiotemporal traffic data in a lowrank tensor completion (LRTC) framework and define a novel truncated nuclear norm (TNN) on traffic tensors of locationdaytime of day. In particular, we introduce an universal rate parameter to control the degree of truncation on all tensor modes in the proposed LRTCTNN model, and this allows us to better characterize the hidden patterns in spatiotemporal traffic data. Based on the framework of the Alternating Direction Method of Multipliers (ADMM), we present an efficient algorithm to obtain the optimal solution for each variable. We conduct numerical experiments on four spatiotemporal traffic data sets, and our results show that the proposed LRTCTNN model outperforms many stateoftheart imputation models with missing rates/patterns. Moreover, the proposed model also outperforms other baseline models in extreme missing scenarios.
keywords:
Spatiotemporal traffic data, Missing data imputation, Lowrank tensor completion, Truncated nuclear norm (TNN) minimization, Nonconvex optimizationdefinitionDefinition
1 Introduction
Spatiotemporal traffic data, which registers timestamped traffic state (e.g., flow and speed) observations from different locations (e.g., a network of sensors), serves as a critical input to a wide range of applications in intelligent transportation systems (ITS), such as travel time estimation, trip planning, traffic forecasting, to name but a few. With the recent development in information and communication technology (ICT), the scale and dimension of spatiotemporal traffic data also become larger. In the meanwhile, however, these data sets also suffer from the missing data problem, which undermines their utility and effectiveness in realworld applications. In practice, the missing data problem may arise from sensor malfunctioning, communication failure, and maintenance. Another critical reason that causes the missing data problem is due to the insufficient sensor coverage in both the spatial and the temporal dimensions. For example, floating cars in an urban transportation network can provide a good sample of traffic information in realtime. However, the data itself is in nature sparse and far from enough to support citywide and fineresolution traffic speed and travel time monitoring. Therefore, performing imputation on spatiotemporal traffic data has become a critical step before further applications. Recent research has shown an increasing interest in developing efficient and effective missing data imputation methods for largescale and highdimensional traffic data (see e.g., Zhu et al., 2012; Li et al., 2013; Asif et al., 2016; Tan et al., 2016; Chen et al., 2019a, b; Yu et al., 2020).
Incomplete spatiotemporal traffic data are multivariate/multidimensional time series with various missing patterns and missing ratios. Essentially, the missing patterns can be summarized into two types: random missing and nonrandom missing. Figure 1 presents two intuitive examples of missing patterns (i.e., random missing and nonrandom missing) in a spatiotemporal setting. The fluctuation of the power grid and the loss of the transmission packets can lead to data random missing scenarios in which individual data points are randomly lost. While the sensor malfunctioning and regular maintenance may cause nonrandom data missing scenarios in which we have correlation corruption for a certain period of time. It has been demonstrated that nonrandom missing patterns are more difficult to deal with than random missing due to the correlated corruption. It is very challenging to impute the incomplete traffic data under nonrandom or mixture missing patterns that data may be lost for several hours or even several days.
The fundamental challenge of missing data imputation is to effectively borrow information from those “observed” entries by characterizing the higherorder correlations and dependencies in the data. To achieve this, recent research has shown an increasing interest in taking advantage of the lowrank property of spatiotemporal data, which allows us to apply techniques such as compressive sensing, principal component analysis (PCA), and matrix/tensor factorization. For example, Zhu et al. (2012) imposed the lowrank assumption on a traffic state matrix from a list of road segments over certain time and transforms missing data imputation to a matrix completion problem; based on probabilistic PCA, Li et al. (2013) developed an improved model to better characterize the nonlinear spatiotemporal dependencies; based on probe vehicle data, Yu et al. (2020) developed a citywide traffic data estimation framework by performing matrix completion using Schatten norm. To better capture the correlations and utilize the global structure across different days, Asif et al. (2016), Tan et al. (2016) and Chen et al. (2018) model the temporal dimension as the combination of day and time of day, and then apply tensor factorization techniques to impute missing values. Given that most traffic data sets are inherently lowrank, these low rankbased models have demonstrated superior performance over other methods.
In the meanwhile, substantial research progress has been made in developing new learning methods for lowrank tensor completion (LRTC). Given that the rank operator is neither convex nor continuous, the nuclear norm (NN) is introduced as a convex surrogate for the rank function to be minimized. LRTC with a NN function has attracted considerable attention recently, and many variants have been developed in the past few years (see e.g., Liu et al., 2013). However, despite the remarkable performance of NN in matrix/tensor completion, it is also suggested that the convex NN relaxation may not be sufficient to approximate the rank, as all the singular values are simultaneously minimized (Hu et al., 2013). It has been proved that solving the convex NN regularized problem leads to a nearoptimal lowrank solution if we impose certain incoherence assumptions on the singular values of the matrix (Candès and Tao, 2010); however, such conditions are rarely met in practice. As a result, recent research has tried to identify alternative approximation to the rank function, and some studies have shown the advantage of using nonconvex surrogate functions to approximate the rank for matrices (see e.g., Hu et al., 2013; Lu et al., 2014, 2015; Yao et al., 2019; Fan et al., 2019). For example, Lu et al. (2014) and Yao et al. (2019) evaluated several nonconvex singular value functions for approximating the rank function on lowrank matrix learning, and it is shown that nonconvex surrogate regularization can work better than NN. Recent studies have also extended matrix TNN to a tensor setting (e.g., Huang et al., 2014; Han et al., 2017; Xue et al., 2018), showing that nonconvex TNN minimization is more capable than convex NN minimization on tensor completion tasks. However, these studies mainly focus on an image setting (e.g., a tensor of 2562563 ()) in which , and it remains a critical challenge to properly define the degree of truncation on each tensor mode in the case of spatiotemporal traffic data without prior knowledge. To solve this problem, we propose a new TNN model in which we automatically determine the nuclear norm truncation along each mode of a given data tensor by introducing a global rate parameter. Our contribution is twofold:

We develop an imputation model that benefits from our domain knowledge. Following the nonconvex TNN minimization on matrix learning problems, we propose a TNN minimization based LRTC model with elegantly defined TNN on tensors and apply an efficient optimization algorithm—Alternating Direction Method of Multipliers (ADMM)—to solve this model. The truncation along all tensor modes can be automatically determined by a rate parameter in this model, and this technical operation is more flexible than what given in previous LRTCTNN models.

We show the capability and advantage of the proposed model with extensive data sets and imputation experiments. Relying on the four selected spatiotemporal traffic data sets, we demonstrate the superiority (e.g., accuracy, efficiency) of the proposed model with various missing rates/patterns over the stateoftheart imputation models. In a very unusual case that data have a large number of missing values (e.g., missing rate is 70%), we also show that the proposed model provides accurate imputation results and outperforms the competing models.
The remainder of this paper is organized as follows. In Section 2, we first present the basic formulation of lowrank tensor completion (LRTC), and then we propose a new formulation based on truncated nuclear norm (TNN) minimization by introducing a universal rate parameter to control the degree of truncation. Section 3 depicts the main pipeline for model implementation and application. In Section 4, we conduct numerical experiments on several realworld data sets under two aforementioned missing scenarios, and we evaluate the proposed LRTCTNN model against several stateoftheart baseline models. Section 5 summarizes this study and we discuss several directions for future research.
2 Methodology
In this section, we first introduce the basic formulation of the LRTC model. Then, we introduce the LRTC with truncated nuclear norm (TNN) minimization in detail. In particular, we define the formulation of TNN on tensors and integrate it into LRTC.
2.1 Notations
Throughout this work, we follow the same notations as in Kolda and Bader (2009). We use boldface uppercase letters to denote matrices, e.g., , boldface lowercase letters to denote vectors, e.g., , and lowercase letters to denote scalars, e.g., . Given a matrix , the Frobenius norm is defined as . We denote a thorder tensor and its entries by and , respectively. Let denote the thmode unfolding of tensor for . Correspondingly, we define a folding operator that converts a matrix to a higherorder tensor in the thmode as . Thus, we have .
2.2 LowRank Tensor Completion (LRTC)
LRTC is a family of tensor completion techniques. This type of machine learning model is essentially built on the lowrank assumption on the partially observed input tensor, which is same to the lowrank matrix completion. In this work, we mainly focus on modeling a thirdorder tensor in the context of spatiotemporal traffic data. For a partially observed thirdorder tensor , the LRTC model can be formulated as
(1)  
where is the recovered tensor that we hope to find, is the index set of the observed entries (Liu et al., 2013). The notation refers to the algebraic rank, and extension to higherorder tensors is straightforward. The operator is an orthogonal projection supported on :
for any tensor . The operator denotes the projection onto the complementary set of . The relationship between these two operators is .
The rank minimization problem in Eq. (1) for tensor completion is NPhard and computationally intractable (Liu et al., 2013). To deal with this issue, recent research focuses on identifying a possible convex relaxation as an alternative to the rank minimization problem. For example, Liu et al. (2013) proposed to use the multiple NN to replace the rank function in the objective function:
(2)  
where for are weight parameters. In this objective function, for any matrix , the NN is defined as and is the th biggest singular value of . In an early study, Fazel (2002) proved that the standard NN is the convex envelope of the nonconvex rank function, i.e., , when the largest singular value is not larger than 1.
Despite that the NN minimization method has achieved tremendous progress and remarkable success in missing matrix/tensor data imputation tasks in previous studies, recent studies suggest that the result can be significantly improved by using certain nonconvex functions constructed on singular values instead of the default NN (Hu et al., 2013; Gu et al., 2014; Yao et al., 2019). For example, instead of minimizing all singular values of a matrix simultaneously, TNN minimization keeps large singular values unchanged to preserve major components and only consider those small singular values as variables. In this work, our preliminary goal is to define a type of TNN on tensors to replace the corresponding NN in Eq. (2).
2.3 LRTC with TNN Minimization (LRTCTNN)
Before introducing the formulation of TNN on tensors, we first give a commonly used definition for the TNN on matrices.
[Truncated Nuclear Norm on Matrices (Zhang et al., 2012; Hu et al., 2013)] Given a matrix and a positive integer , TNN is defined as the sum of minimum singular values, i.e.,
(3) 
where is the th singular value of . The singular values are sorted as .
In Definition 2.3, the largest singular values do not contribute to the TNN function. This truncation definition makes it possible to minimize the NN subtracted by the sum of the largest singular values rather than all singular values simultaneously. However, this definition on matrices cannot be directly used for multiway/multidimensional tensors. In accordance with TNN on matrices, we follow Liu et al. (2013) and propose to use tensor unfoldings (i.e., matrices) to define the TNN on tensors.
[Truncated Nuclear Norm on Tensors] For any thorder tensor , the multiple TNN is defined as
(4) 
with the truncation for each tensor mode being
(5) 
where means the smallest integer that is not less than the given value, is an universal rate parameter which controls the whole truncation on modes of and it should satisfies in this case, with are weight parameters imposed on all the TNN of unfolding matrices , respectively.
Provided Definition 2.3 as the basis of LRTC with TNN minimization, we believe that if the rate parameter can be set appropriately, the truncation for each tensor mode would be assigned automatically. By replacing the NN in Eq. (2) with the defined TNN in Eq. (4), the LRTC model is now given by
(6)  
However, Eq. (6) is still not in its appropriate form. In fact, unfolding a tensor in different mode cannot guarantee the dependencies of variables (i.e., unfolding matrices of the tensor) in the objective function (Liu et al., 2013). To this end, we introduce an auxiliary tensor variable and an additional set of constraints to convert the problem (6) into a tractable problem as follows,
(7)  
where is introduced to keep observation information and then broadcast such information to the variables . In the current formulation, tensors are involved with TNNs while another variable establishes the relationship with the partially observed tensor .
2.4 Solution Algorithms
To solve this optimization problem, a straightforward and widely used approach is the Alternating Direction Method of Multipliers (ADMM) framework. This framework has been experimentally proved to be efficient for lowrank matrix/tensor completion models (Hu et al., 2013; Liu et al., 2013). Before setting up an ADMM model, we need to write the augmented Lagrangian function of Eq. (7) in advance:
(8) 
where denotes the inner product.
Accordingly, ADMM transforms the original tensor completion problem to the following three subproblems in an iterative manner:
(9)  
where we follow the order .
In practice, if , the variables can be updated by
(10) 
where are fourthorder tensors of the same size . and are stacked by thirdorder tensors s and s over the fourth mode, respectively, and is stacked over the fourth mode by coping the thirdorder tensor .
Computing Tensors
In general, the optimal can be found by solving when is convex. However, for a nonconvex , a global optimal solution is not guaranteed (Hu et al., 2013) and it remains unclear whether an optimization problem like Eq. (11) can converge to the desired global optima. In this work, we adopt a closedform optimal/suboptimal solution to this problem by using the following theorem.
Theorem 1.
For any , , and where , an optimal solution to the problem
(12) 
is given by the generalized singular value thresholding
(13) 
where is the singular value decomposition of . The shrinkage of singular values is therefore defined as
(14) 
where are diagonal entries of , and denotes the truncation at 0 which satisfies .
Proof.
For any , define the function as
(15) 
which is differentiable, and
(16) 
When minimizing the objective function , we should find such that to optimize . To solve , we construct the singular value function referring to Larsson et al. (2014) as as where the function is defined as Eq. (15). Considering the components of the sum in separately, we have
(17) 
and this result is same as Eq. (14).
∎
On the other hand, the TNN minimization problem described in Eq. (12) is indeed a special case of the weighted NN minimization problem, i.e.,
with the weights being and . As proved by Zhang and Lu (2011) (Proposition 2.1) and Chen et al. (2013) (Theorem 2.3), in the situation of (i.e., order constraint), weighted NN minimization problem has an optimal solution and the shrinkage of singular values corresponding to Eq. (14) is given by
(18) 
which is consistent with Theorem 1.
Computing Tensor
3 Model Implementation
As shown in Figure 2, our goal is to impute the incomplete tensor as a complete one using the proposed LRTCTNN model. The main procedures are: (1) adapting the raw data to a tensor structure with a specific rule, (2) imputing incomplete tensor data with our proposed model, and (3) filling in missing entries.
A practical issue in applying this tensor learning framework for spatiotemporal traffic data imputation is how to establish the rule of structuring spatiotemporal traffic data as a tensor. In this study, we assume that thirdorder tensor with dimensions corresponding to spatial location, day, and time interval is a typical algebraic data structure, which is experimentally proved to be efficient on a number of spatiotemporal traffic data (Chen et al., 2019b). To find a solution of the problem (7), it is sufficient to carry out the following steps (see the middle panel of Figure 2): (1) Initialize the variables and set some necessary parameters; (2) Update the variables , , and according to Eq. (20), Eq. (21), and Eq. (10), respectively.
4 Experiments
In this section, we conduct extensive experiments on four realworld spatiotemporal traffic data sets to evaluate the proposed LRTCTNN model.
4.1 Data Sets
We choose four publicly available data sets collected from realworld transportation systems as our benchmark data sets. Table 1 provides a brief summary of these data sets. For simplicity, as shown in Table 1, we refer to the four data sets as data set ”G”, ”B”, ”H”, and ”S”, respectively. Essentially, the four data sets follow the same tensor structure–“location/sensordaytime of day” and they can be also organized in a matrix structure of “location/sensortime” by stacking the time dimension.
Data Set  Description  Data Size 

(G) Guangzhou urban traffic speed data 
The data set contains average traffic speed collected from 214 road segments over two months (from August 1 to September 30, 2016) with a 10minute resolution (144 time intervals per day) in Guangzhou, China. This data set contains 1.29% missing values. 
Tensor: , matrix: . 
(B) Birmingham parking data 
This data set registers occupancy (i.e., number of parked vehicles) of 30 car parks in Birmingham City for every half an hour between 8:00 and 17:00 over more than two months (77 days from October 4, 2016 to December 19, 2016). This data set contains 14.89% missing values. 
Tensor: , matrix: . 
(H) Hangzhou metro passenger flow data 
This data set provides incoming passenger flow of 80 metro stations over 25 days (from January 1 to January 25, 2019) with a 10minute resolution in Hangzhou, China. We discard the interval 0:00 a.m. â 6:00 a.m. with no services (i.e., only consider the remaining 108 time intervals of a day). 
Tensor: , matrix: . 
(S) Seattle freeway traffic speed data 
This data set contains freeway traffic speed from 323 loop detectors with a 5minute resolution over the whole year of 2015 in Seattle, USA. We choose the subset in January (4 weeks from January 1 to January 28) as our experiment data. 
Tensor: , matrix: . 
4.2 Baseline Models
We compare the proposed LRTCTNN model with the following baseline models:

BTMF: Bayesian Temporal Matrix Factorization (Chen and Sun, 2019). This is a fully Bayesian matrix factorization model which integrates vector autoregressive (VAR) model into the latent temporal factors. With the flexible VAR process, BTMF achieves superior accuracy over other matrix factorization models (without temporal modeling) and tensor factorization models in imputation tasks.

TRMF: Temporal Regularized Matrix Factorization (Yu et al., 2016). This is a temporal matrix factorization model which applies a multiple autoregressive (AR) process to model latent temporal factors. BTMF and TRMF can be considered a generalized version of the Bayesian temporal tensor factorization model by Xiong et al. (2010), which imposes temporal smoothness with a dynamic model.

BGCP: Bayesian Gaussian CP decomposition (Chen et al., 2019b). This is a fully Bayesian tensor factorization model which uses Markov chain Monte Carlo (MCMC) to learn the latent factor matrices (i.e., lowrank structure).

BATF: Bayesian Augmented Tensor Factorization (Chen et al., 2019a). This is a fully Bayesian model built on a special tensor factorization formula, in which components include both explicit variables and latent factors (i.e., lowrank factors). Variables in this model are learned via variational Bayes.

HaLRTC: Highaccuracy LowRank Tensor Completion (Liu et al., 2013). This is a LRTC model which uses NN minimization to find accurate estimation for unobserved/missing entries in tensor data.
We have the following considerations on choosing the baseline models. In general, we take into account two types of imputation approaches built on the assumption of underlying lowrank structure, which has been proved to be effective and efficient in the previous studies. The first is lowrank temporal matrix factorization models like BTMF and TRMF. These models follow the matrix structure (i.e., “location/senortime”) and impose smoothness on latent temporal factors by introducing autoregressive/dynamic assumptions. Therefore, these temporal models produce more informative and consistent lowrank structures than those estimated from standard matrix factorization. The second approach follows the tensor structure (i.e., “location/senordaytime of day”) to better utilize the information across different days. The tensor approach includes tensor factorization models (BGCP and BATF) and a standard LRTC model (HaLRTC). These models have been experimentally proved to be stateoftheart in the literature.
4.3 Experiment Setting
For model evaluation, we mask certain entries of the data as missing values and then perform imputation for these “missing” entries. We use the actual values (ground truth) of these entries to compute the metrics MAPE and RMSE:
(22) 
Following (Chen et al., 2018), we design two missing patterns—random missing (RM, see Figure 1(a)) and nonrandom missing (NM, see Figure 1(b)). The NM scenario is more challenging since the data is corrupted in a correlated manner. These two missing scenarios can help us better assess the performance and effectiveness of different models.
There are some parameters to be set in LRTCTNN, including , , and . In our experiments with thirdorder tensors, we set and . Here, the learning rate parameter and the truncation rate parameter determine the performance directly. Table 2 shows the the value of these two parameters for each imputation experiment by using cross validation. For learning rate parameter , it determines the convergence of the whole model. Larger usually slows down the convergence process, while a smaller one would let the model meet convergence in only few iterations. The rate parameter for NM data should be set much smaller than RM data, wherein we set the smallest as 5% for these data sets. The maximum number of iteration of LRTCTNN for these imputation experiments is set to 200. The adapted data sets and Python implementation for these experiments are publicly available at GitHub
20%, RMG  40%, RMG  20%, NMG  40%, NMG  10%, RMB  30%, RMB  10%, NMB  30%, NMB  

()  20  20  20  5  5  5  2  2 
(%)  30  30  5  5  20  15  10  5 
20%, RMH  40%, RMH  20%, NMH  40%, NMH  20%, RMS  40%, RMS  20%, NMS  40%, NMS  
()  2  2  2  2  10  10  5  5 
(%)  10  10  5  5  30  30  5  5 
4.4 Results
Table 3 summarizes the imputation performance of matrix factorization models (BTMF and TRMF), tensor factorization models (BGCP and BATF), and LRTC models (HaLRTC and LRTCTNN) on the four spatiotemporal data sets. We follow the same missing rates as our previous work (Chen and Sun, 2019) and configure all matrix and tensor factorization models with the same number of factors and BTMF and TRMF with the same time lags. The number of factors of factorization models (i.e., BTMF, TRMF, BGCP, and BATF) are set to 80, 30, 50, 50 for RM data sets (G), (B), (H), and (S), respectively. For the NM scenario, we set the number of factors to 10 for all data sets. The time lag of BTMF and TRMF for all four data sets is where is the number of time intervals per day.
Overall, LRTCTNN clearly outperforms other baseline models in diverse missing scenarios (RM and NM scenarios with varying missing rates). For all these models, we see that the MAPE/RMSE values are higher for NM scenario than RM scenario, suggesting that NM scenario is indeed more difficult to handle than the RM scenario. Another fact is that the MAPE/RMSE values become higher with increasing missing rates. Both missing pattern and missing rate have a direct impact on all these models.
For data sets (G), (H) and (S), LRTCTNN consistently outperforms other baseline models significantly. For the Birmingham (B) parking data set, LRTCTNN achieves the best imputation performance for the scenario of NM, while the two temporal factorization models BTMF and TRMF performs the best for the RM scenario. This is due to the strong temporal patterns and consistency in data set (B), and thus the temporal smoothness and dynamics (e.g., AR/VAR) play a more important role in capturing the true signal.
Both the two LRTC models (HaLRTC and LRTCTNN) achieve promising performance; however, we find that LRTCTNN consistently outperforms HaLRTC with much lower MAPE/RMSE values. On the other hand, the MAPE/RMSE values of using LRTCTNN do not become higher dramatically as HaLRTC when missing rate increases, and this is also indicated in Figure 3. These results clearly show the advantage of using TNN minimization over NN minimization for learning lowrank tensor structures.
BTMF  TRMF  BGCP  BATF  HaLRTC  LRTCTNN  
20%, RMG  7.47/3.19  7.47/3.14  8.28/3.57  8.32/3.59  8.15/3.33  6.63/2.84 
40%, RMG  7.81/3.35  7.76/3.25  8.29/3.59  8.36/3.61  8.87/3.61  7.22/3.11 
20%, NMG  10.16/4.27  10.24/4.27  10.20/4.27  10.17/4.26  10.46/4.21  9.39/3.97 
40%, NMG  10.36/4.46  10.37/4.37  10.25/4.32  10.17/4.30  10.88/4.38  9.57/4.07 
10%, RMB  1.71/7.44  2.77/10.57  6.50/19.69  6.93/20.65  4.85/17.35  4.11/12.37 
30%, RMB  2.61/13.38  3.69/21.80  6.23/19.98  6.68/21.29  6.64/26.79  5.21/17.51 
10%, NMB  12.05/28.27  12.74/29.46  13.64/43.15  16.28/40.81  9.47/34.72  8.77/20.92 
30%, NMB  15.44/61.69  16.35/85.98  15.93/57.07  15.95/57.07  14.83/92.59  13.26/47.65 
20%, RMH  25.18/28.51  21.31/37.07  19.01/41.16  22.74/33.07  18.26/28.88  18.15/24.65 
40%, RMH  26.83/32.19  22.89/38.15  19.59/32.71  23.17/31.62  19.01/31.81  18.80/25.75 
20%, NMH  26.50/81.73  26.07/40.06  25.57/35.99  34.94/29.32  20.29/40.53  20.38/31.79 
40%, NMH  30.24/80.53  27.32/39.75  24.37/49.64  30.63/48.01  21.47/53.26  21.10/32.34 
20%, RMS  5.92/3.71  5.96/3.71  7.45/4.50  8.70/3.73  5.95/3.48  4.60/3.01 
40%, RMS  6.18/3.79  6.16/3.79  7.58/4.54  8.73/3.75  6.77/3.84  5.01/3.22 
20%, NMS  9.12/5.27  9.12/5.26  9.93/5.65  10.15/4.25  8.82/4.70  6.95/4.19 
40%, NMS  9.20/5.33  9.19/5.30  9.94/5.68  10.15/4.30  10.20/5.28  7.63/4.52 
Best results are highlighted in bold fonts. 
In general, the imputation problem becomes more and more challenging with increasing missing rate. We next investigate the performance of different models under heavy and extreme missing scenarios. In doing so, we compare LRTCTNN with three baseline models: BTMF as a matrix factorization model, BGCP as a tensor factorization model, and HaLRTC as a lowrank completion model. Figure 3 shows the performance of the four models in both RM and NM scenarios under high missing rates (i.e., 50%, 60%, and 70%). As can be seen, LRTCTNN achieves the lowest MAPE/RMSE values in most cases. It should be noted that HaLRTC becomes unstable in heavy missing scenarios. On the contrary, LRTCTNN is robust and capable of addressing heavy missingness with high accuracy. The result also implies that the nonconvex TNN minimization shows superior performance over the convex NN minimization.
We next choose some examples from the four experiment data sets and visualize the NM time series with the extreme missing rate and the corresponding recovered time series by using LRTCTNN in Figure 4. We see that LRTCTNN can achieve very high accuracy for all the four data sets with only 30% input.
5 Conclusion and Future Directions
In this paper, we propose a nonconvex TNN minimization based lowrank tensor completion model—LRTCTNN—to impute missing values in spatiotemporal traffic data. Our experiments show that our proposed model consistently outperforms other stateoftheart imputation models on the four selected realworld traffic data sets with two representative missing patterns (i.e., random missing [RM] and nonrandom missing [NM]).
For future research, we propose the following directions:

Incorporating side information and smoothness priors: Spatiotemporal data demonstrates both global consistency and local consistency (e.g., the autoregressive patterns in BTMF and TRMF). The underlying lowrank tensor is effective to capture the global consistency. To further impose local consistency and better characterize side information, we can introduce smoothness priors to the model by introducing additional cost terms or constraints into the optimization framework, such as spatial Laplacian regularization, temporal Toeplitz regularization, and total variation (Roughan et al., 2011; Yokota et al., 2016).
For instance, we can encode a priori information on the spatial structure using a graph Laplacian matrix where and are the adjacent matrix of spatial structure and the degree matrix, respectively. This allows us to impose the similarity between two connected locations by introducing the penalty where rows of any matrix correspond to spatial locations. Combining with autoregression or smoothness on temporal dimension, we can formulate a spatiotemporal regularization on the variable (see the subproblem in Eq. (21)).

Minimizing Schatten norm: In algebra, Schatten norm is defined as
(23) where and (Fan et al., 2019). Here, we could see that: (1) If , then Schatten norm is the rank function, i.e., ; (2) If , then Schatten norm is the nuclear norm. Therefore, we can find a better surrogate for the rank function by using the Schatten norm than the NN by letting . Accordingly, truncated Schatten norm that stems from Schatten norm might be a potentially better minimizer than TNN. However, a critical bottleneck for Schatten norm minimization is the nonconvex optimization, which is more difficult to solve than TNN minimization discussed in this work.

Minimizing weighted nuclear norm: Instead of using TNN minimization, we can develop a weighted version of the NN minimization to better account for the differences/variations between singular values (Gu et al., 2014). The objective function of our problem can also be defined by a weighted version of the NN minimization:
(24) where are nonnegative weights assigned to the singular values . This will lead a more general nonconvex form; however, this will introduce an additional set of weight parameters and it becomes a challenge to tune these parameters for a specific data set.
Acknowledgement
This research is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada, the Institute for Data Valorisation (IVADO), and the Canada Foundation for Innovation (CFI).
Footnotes
References
 Asif, M. T., Mitrovic, N., Dauwels, J., Jaillet, P., 2016. Matrix and tensor based methods for missing data estimation in large traffic networks. IEEE Transactions on Intelligent Transportation Systems 17 (7), 1816–1825.
 Candès, E. J., Tao, T., May 2010. The power of convex relaxation: Nearoptimal matrix completion. IEEE Trans. Inf. Theor. 56 (5), 2053–2080.
 Chen, K., Dong, H., Chan, K.S., 2013. Reduced rank regression via adaptive nuclear norm penalization. Biometrika 100 (4), 901–920.
 Chen, X., He, Z., Chen, Y., Lu, Y., Wang, J., 2019a. Missing traffic data imputation and pattern discovery with a bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies 104, 66 – 77.
 Chen, X., He, Z., Sun, L., 2019b. A bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies 98, 73 – 84.
 Chen, X., He, Z., Wang, J., 2018. Spatialtemporal traffic speed patterns discovery and incomplete data recovery via svdcombined tensor decomposition. Transportation Research Part C: Emerging Technologies 86, 59–77.
 Chen, X., Sun, L., 2019. Bayesian temporal factorization for multidimensional time series prediction.
 Fan, J., Ding, L., Chen, Y., Udell, M., 2019. Factor groupsparse regularization for efficient lowrank matrix recovery. In: Advances in Neural Information Processing Systems. pp. 5105–5115.
 Fazel, M., 2002. Matrix rank minimization with applications. Ph.D. thesis, Stanford University.
 Gu, S., Zhang, L., Zuo, W., Feng, X., June 2014. Weighted nuclear norm minimization with application to image denoising. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2862–2869.
 Han, Z.F., Leung, C.S., Huang, L.T., So, H. C., 2017. Sparse and truncated nuclear norm based tensor completion. Neural Processing Letters 45 (3), 729–743.
 Hu, Y., Zhang, D., Ye, J., Li, X., He, X., Sep. 2013. Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (9), 2117–2130.
 Huang, L.T., So, H.C., Chen, Y., Wang, W.Q., 2014. Truncated nuclear norm minimization for tensor completion. In: 2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop (SAM). IEEE, pp. 417–420.
 Kolda, T. G., Bader, B. W., 2009. Tensor decompositions and applications. SIAM Review 51 (3), 455–500.
 Larsson, V., Olsson, C., Bylow, E., Kahl, F., 2014. Rank minimization with structured data patterns. In: European Conference on Computer Vision. Springer, pp. 250–265.
 Li, L., Li, Y., Li, Z., 2013. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transportation research part C: emerging technologies 34, 108–120.
 Liu, J., Musialski, P., Wonka, P., Ye, J., 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (1), 208–220.
 Lu, C., Tang, J., Yan, S., Lin, Z., 2014. Generalized nonconvex nonsmooth lowrank minimization. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR).
 Lu, C., Zhu, C., Xu, C., Yan, S., Lin, Z., 2015. Generalized singular value thresholding. In: AAAI Conference on Artificial Intelligence (AAAI).
 Roughan, M., Zhang, Y., Willinger, W., Qiu, L., 2011. Spatiotemporal compressive sensing and internet traffic matrices (extended version). IEEE/ACM Transactions on Networking 20 (3), 662–676.
 Tan, H., Wu, Y., Shen, B., Jin, P. J., Ran, B., 2016. Shortterm traffic prediction based on dynamic tensor completion. IEEE Transactions on Intelligent Transportation Systems 17 (8), 1524–9050.
 Xiong, L., Chen, X., Huang, T.K., Schneider, J., Carbonell, J. G., 2010. Temporal collaborative filtering with bayesian probabilistic tensor factorization. In: Proceedings of the 2010 SIAM International Conference on Data Mining. pp. 211–222.
 Xue, S., Qiu, W., Liu, F., Jin, X., 2018. Lowrank tensor completion by truncated nuclear norm regularization. In: 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, pp. 2600–2605.
 Yao, Q., Kwok, J. T., Wang, T., Liu, T., Nov 2019. Largescale lowrank matrix learning with nonconvex regularizers. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (11), 2628–2643.
 Yokota, T., Zhao, Q., Cichocki, A., 2016. Smooth parafac decomposition for tensor completion. IEEE Transactions on Signal Processing 64 (20), 5423–5436.
 Yu, H.F., Rao, N., Dhillon, I. S., 2016. Temporal regularized matrix factorization for highdimensional time series prediction. In: Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 29. Curran Associates, Inc., pp. 847–855.
 Yu, J., Stettler, M. E., Angeloudis, P., Hu, S., Chen, X. M., 2020. Urban networkwide traffic speed estimation with massive ridesourcing gps traces. Transportation Research Part C: Emerging Technologies 112, 136–152.
 Zhang, D., Hu, Y., Ye, J., Li, X., He, X., June 2012. Matrix completion by truncated nuclear norm regularization. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2192–2199.
 Zhang, Y., Lu, Z., 2011. Penalty decomposition methods for rank minimization. In: ShaweTaylor, J., Zemel, R. S., Bartlett, P. L., Pereira, F., Weinberger, K. Q. (Eds.), Advances in Neural Information Processing Systems 24. Curran Associates, Inc., pp. 46–54.
 Zhu, Y., Li, Z., Zhu, H., Li, M., Zhang, Q., 2012. A compressive sensing approach to urban traffic estimation with probe vehicles. IEEE Transactions on Mobile Computing 12 (11), 2289–2302.