Bayesian SpatioTemporal Graph Convolutional Network for Traffic Forecasting
Abstract
In traffic forecasting, graph convolutional networks (GCNs), which model traffic flows as spatiotemporal graphs, have achieved remarkable performance. However, existing GCNbased methods heuristically define the graph structure as the physical topology of the road network, ignoring potential dependence of the graph structure over traffic data. And the defined graph structure is deterministic, which lacks investigation of uncertainty. In this paper, we propose a Bayesian SpatioTemporal Graph Convolutional Network (BSTGCN) for traffic prediction. The graph structure in our network is learned from the physical topology of the road network and traffic data in an endtoend manner, which discovers a more accurate description of the relationship among traffic flows. Moreover, a parametric generative model is proposed to represent the graph structure, which enhances the generalization capability of GCNs. We verify the effectiveness of our method on two realworld datasets, and the experimental results demonstrate that BSTGCN attains superior performance compared with stateoftheart methods.
^{1} University of Science and Technology of China
fujun@mail.ustc.edu.cn, weichou@mail.ustc.edu.cn, chenzhibo@ustc.edu.cn
1 Introduction
Traffic congestion is a growing drain on the economy with the acceleration of urbanization. For example, the cost of traffic congestion in America reached $124 billion in 2014, and will rise to $186 billion in 2030, according to a report by Forbes [6]. Therefore, improving traffic conditions is essential for increasing city efficiency, improving economy, and easing peopleâs daily life. One promising way to mitigate urban traffic congestion is to introduce Intelligent Transportation Systems (ITS), in which traffic prediction plays a vital role. However, accurate traffic prediction is still challenging due to the complex spatiotemporal dependence among traffic flows.
In the past few decades, many schemes have been proposed for traffic prediction, which can be broadly divided into two categories: temporal dependence based methods and spatiotemporal dependence based methods. Temporal dependence based methods leverage temporal characteristics of traffic flows to predict future traffic condition. Nevertheless, these methods have limited capability to achieve accurate traffic prediction due to ignoring the spatial dependence among traffic flows. Therefore, spatiotemporal dependence based methods are increasingly emerging, which take the spatial information into account. Since the road network is naturally structured as a graph in a nonEuclidean space with roads as nodes and their natural connections as edges, researchers prefer to use graph convolutional networks (GCNs) [10] to model spatial dependence among traffic flows instead of convolutional neural networks (CNNs).
However, existing GCNbased methods have two main disadvantages in terms of the graph construction: (1) The graph structure employed in GCNs is heuristically predefined and represents only the physical structure of the road network. Thereby, it is not guaranteed to be optimal description of dependence among traffic flows. For example, the relationship between two traffic flows that have similar trends but are located far away from each other is also important for traffic prediction. However, such dependence cannot be captured in the predefined roadtopologybased graphs. (2) Introducing uncertainty into the graph structure, such as randomly dropping nodes or edges, could enhance the generalization capability of GCNs. Nevertheless, the graph structure employed in existing GCNbased methods is deterministic, which is lack of uncertainty investigation.
To solve above issues, a Bayesian SpatioTemporal Graph Convolutional Network (BSTGCN) is proposed in this paper. It views the graph structure as a sample drawn from a parametric generative model, and aims to infer the posterior probability of the graph structure based on two types of information. One type is the physical topology of the road network, the other type is traffic data. In addition, parameters of the generative model are optimized together with weights of GCNs by the backpropagation algorithm in an endtoend manner. The main contributions of our work lie in three folds:

This work proposes to learn the graph structure from the physical topology of the road network and traffic data in an endtoend manner, which can find a more accurate descriptor of the relationship among traffic flows.

A generative model is proposed to represent the graph structure, which can improve the generalization capability of GCNs.

We validate the effectiveness of our method on two realworld datasets, and the experimental results show that our approach outperforms stateoftheart methods by a noticeable margin.
The rest of the paper is organized as follows. Section 2 reviews research works related to traffic prediction. Section 3, 4, and 5 introduce the background, details of our method, and experimental results, respectively. Section 6 concludes this paper and points out some future directions.
2 Related Work
Traffic prediction has attracted a lot of attention in recent years due to its essential role in traffic management. Current methods generally have two categories, temporal dependence based methods, and spatiotemporal dependence based methods. Temporal dependence based methods only consider the temporal characteristics of traffic flows for traffic prediction. At the early stage, autoregressive models including AutoRegressive Integrated Moving Average (ARIMA) [1], Kalman filtering model [16], and seasonal ARIMA [23] are widely used in traffic prediction. However, these statistical models rely on the stationary assumption on traffic time series data, which hinders their performance on realworld traffic conditions that vary over time. Some traditional machine learning methods including Linear SVR [24], and random forest regression [12] are also tailored to solve traffic prediction, but are limited by handcrafted features and shallow architectures. With the rapid development of deep learning, a variety of neural network architectures are applied in traffic prediction, such as the Feed Forward Neural network [18], LSTM and GRU [4]. Despite their impressive capability in modeling temporal dynamics, these methods still have limited ability to achieve accurate traffic prediction due to lacking consideration of the spatial dependence among traffic flows.
Spatiotemporal dependence based methods take temporal nature and spatial dependence of traffic flows into account for traffic prediction. To capture the spatial dependence among traffic flows, early attempts including SAE [15], STResNet [28], and SRCN [27] have tried to employ various CNNs. Nevertheless, considering that CNNs prefer to Euclidean data [3], such as images, regular grids, and so on, such methods can not perform well in the road network with complex topological structure. As a result, a temporal graph convolutional network (TGCN) [30] is proposed for traffic prediction, where GCNs and GRUs are combined to capture spatiotemporal features of traffic flows. Later, A3TGCN [31] boosts the performance of TGCN through introducing an attention mechanism. However, existing GCNbased works ignore the uncertainty and the information of traffic flows in the process of graph construction.
3 Background
Graph Convolutional Network
GCNs have been widely applied to a broad of applications, such as semisupervised learning [7], action recognition [26], and quality assessment [25]. Graph convolutional operation can be designed in either spatial or spectral domain. In this paper, we focus on the latter. Spectral convolution on graph with nodes is defined as the product of a signal (each node with a dim feature) and a filter parameterized by , i.e.:
(1) 
where is the eigenvector of normalized Laplacian matrix , and is the Fourier transform of . Since the Fourier transform is computationally expensive, a faster propagation rule [10] is proposed, i.e.:
(2) 
where and are the output feature and trainable parameters of the layer , denotes the activation function, and equals . The normalized adjacency matrix is equal to , where and are the adjacency matrix and the degree matrix of . To include selfloop information, Eq. 2 is rewritten as:
(3) 
where , and is the identity matrix.
Bayesian Graph Convolutional Network
BGCN [29] is firstly introduced in the task of semisupervised node classification. In this task, we have an observed graph , where and denote the set of nodes and edges. And we obtain feature vectors of all nodes , but only know labels of a portion of nodes . Then, we aim to infer the labels of the remaining nodes based on , and . In the BGCN based framework viewing the graph structure and the weights of GCNs as random variables, the goal is to infer the posterior probability of labels , i.e.:
(4) 
where parameterizes random graphs and the term is modeled by a categorical distribution. However, this formula ignores any possible dependence of the graph on data as it targets the inference of . To this end, an enhanced BGCN [17] is proposed, and formulates the posterior predictive distribution as follows:
(5) 
where the item allows us to introduce the information of data. Additionally, as the integral in Eq. 5 is intractable, a Monte Carlo approximation [5] is often involved as follows:
(6) 
where graphs sampled from and weights samples drawn from . Considering that sampling graph from the posterior probability of is time consuming, the maximum a posterior (MAP) inference of is introduced as follows:
(7) 
As a result, Eq. 6 can be simplified as follows:
(8) 
where weights samples obtained via dropout.
4 Method
Problem Formulation
In the task of traffic prediction, we have access to the topology of the road network , where is the set of roads and denotes the set of edges. And we also obtain historical traffic data on roads , where belongs to , and is the dimension of traffic data. Then, our goal is to forecast future signals based on and in an iterative manner. In other words, to predict , we take observed signals and previous estimated results as the input. It is worth noting that the traffic data on roads only includes the traffic speed in this paper, i.e., is set to 1.
In general, traffic prediction is regarded as a task of learning a nonlinear mapping function which maps historical traffic data and the topology of the road network into traffic conditions in the future timestamps. Mathematically, we formulate the objective of traffic prediction as follows:
(9) 
where is the ground truth at time , and is the desirable horizon ahead of the current timestamp . In this paper, we propose a novel Bayesian spatiotemporal graph convolutional network (BSTGCN) to model . As shown in Figure 1, four modules are involved in the BSTGCN, including a projection module, a spatiotemporal context module, a decoder module, and a Bayesian inference module. Next, we will detail these modules in sequence.
Projection Module
The projection module is designed to project 1D traffic speed of each road into a highdimensional feature space using a linear layer. Concretely, the traffic speed of roads at time , , is transformed into 64dim traffic features as follows:
(10) 
where the function consists of a fullyconnected (FC) layer with 64 neurons, and the enhanced representation of traffic speed is beneficial to learn the spatiotemporal characteristics of traffic flows.
Spatiotemporal Context Module
As aforementioned, the future traffic speed of one road not only depends on the historical traffic speed of the road, but also is constrained to the physical topology of the road network. As a result, the spatiotemporal context module is proposed to capture the spatiotemporal features for each road. In the light of the good performance of Gate Recurrent Unit (GRU) [2] in modeling temporal characteristics of sequential data [14], we tailor the original GRU for dealing with graphstructured traffic data. In particular, we sequentially feed the input sequence into the tailored GRU, and denote the output of the tailored GRU at timestamp as:
(11) 
where the function is comprised of a tailored GRU with the hidden size of 64, and the inputstate transitions of the tailored GRU are expressed as follows:
(12) 
where , , , and are the hidden state, the reset, update, and new gates at time , respectively. is the sigmoid function, denotes the graph convolutional operation, and means the Hadamard product. , , , , , and are trainable parameters. As seen in Eq. 12, the tailored GRU replaces FC layers in the conventional GRU with GCNs, which could learn the spatial characteristics of traffic flows. Moreover, we introduce uncertainty and the information of traffic data to the graph structure employed in GCNs, which is detailed in the part of the Bayesian inference module.
Decoder Module
The decoder module aims to map the spatiotemporal feature into the traffic speed of roads at time with a linear layer, which is formulated as follows:
(13) 
where the function is made up of a FC layer with 1 neuron.
Bayesian Inference Module
The Bayesian inference module is designed to discover better graph structure from the , input data , and corresponding label , as well as introducing uncertainty into the graph structure. In this paper, we consider a Bayesian approach, viewing the graph structure as a sample drawn from a parametric generative model. We then aim to infer the posterior predictive distribution as follows:
(14) 
where is the set of trainable parameters of our network, and is the groundtruth traffic speed at time . As seen in Eq. 14, the posterior probability of graph is calculated in a twostep manner, where the information of the topology of the road network and the information of traffic data are successively introduced into the graph structure .
The specific calculation of Eq. 14 is presented as follows. Since there is no closedform solution for the integral in Eq. 14, a Monte Carlo approximation is introduced as follows:
(15) 
where graphs sampled from , weights samples drawn from , and is modeled by a Gaussian likelihood. Like the improved BGCN [17], we replace the integral over with a MAP process, as follows:
(16) 
As described in the work [17], solving Eq. 16 is equivalent to learning a symmetric adjacency matrix of , expressed as follows:
(17) 
where and control the scale and density of . Here, is the pairwise distance of roads in the embedding space, which is calculated as follows:
(18) 
where and are the embedding vector of the th and th road. In this paper, we learn embedding vectors of roads through the Graph Variational AutoEncoder algorithm [11]. After obtaining , we solve the Eq. 17 via the prevalent method [8]. As for the inference of , we also adopt a Monte Carlo approximation. Thus, Eq. 14 is rewritten as follows:
(19) 
where weights samples drawn from via dropout, and is a trainable deterministic variable that allows us to introduce the information of traffic data into the graph structure. As we can see, aims to learn the global graph structure as it is shared across all timestamps. The training algorithm of is described in Algorithm 1.
Relationship with Existing BGCNs
The improved BGCN [17] cannot be applied in traffic prediction due to two main reasons: (1) To include the information of traffic data in the graph structure, it needs to precalculate the MAP result for each timestamp, which is timeconsuming. (2) It relies on the assumption that the graph structure is symmetric in the calculation of . However, this hypothesis is easily violated in the task of traffic prediction because the mutual influence of two traffic flows is usually unequal. In our proposed method, we learn the dependence of the graph structure over traffic data by injecting a trainable parameter , without introducing extra calculation. Moreover, the learned graph structure can be either symmetric or asymmetric.
5 Experiments
Datasets and Evaluation Metrics
We verify our model on two realworld traffic datasets, SZtaxi and Losloop. The SZtaxi dataset records the traffic speed of 156 major roads in Luohu District, Shen Zhen from Jan. 1 to Jan. 31, 2015. And the Losloop dataset collects the traffic speed of 207 highways in Los Angeles County from Mar. 1 to Mar. 7, 2012. The traffic data is aggregated in the SZtaxi and Losloop datasets every 15 minutes and every 5 minutes. The topology of the road network is available in both datasets. In our experiment, we split both datasets into the training set and the evaluation set in a ratio of 4 and 1, and use observed 60minute traffic speed to predict traffic conditions in the next 15, 30, 45, and 60 minutes.
We compare BSTGCN with the following stateoftheart methods including historical average model (HA) [13], ARIMA [1], SVR [20], GCN model [10], GRU model [2], TGCN model [30], and A3TGCN model [31] in terms of Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Accuracy (ACC), Coefficient of Determination (), and Explained Variance Score (VAR). Higher ACC, and VAR values as well as lower RMSE and MAE values represent better prediction performance.
Implement Details
We implement BSTGCN on the PyTorch framework [19]. All trainable variables in BSTGCN are optimized by the Adam [9] optimizer. We train BSTGCN on a NVIDIA Geforce GTX 1080Ti GPU for 100 epochs. The learning rate is initialized as and for the SZtaxi and Losloop dataset, respectively. During training, 32 pairs are randomly generated from the training dataset per iteration, and the learning rate is decayed by 0.2 every 25 epochs. It is worth noting that we use the learned graph structure without dropout in the evaluation phase.
Parameter Experiment and Ablation Study
First, we investigate the impact of hidden size and Monte Carlo dropout probability on the prediction performance in Table 1 and 2. As we can see, the prediction performance with 64 hidden neurons achieves the optimum on two realworld datasets. And, the optimal configurations of Monte Carlo dropout probability are 0.1 and 0.5 for the SZtaxi and Losloop dataset. As a result, we fix these hyperparameters in the following experiment.
Second, we verify the importance of introducing uncertainty and different ways of learning the graph structure in Table 3. and are two variants of BSTGCN. and learn the individual traffic pattern for each timestamp through Graph Attention Network [22] and SelfAttention Network [21], but ignore the uncertainty. According to results of , , and BSTGCN, we can conclude that introducing uncertainty and the information of traffic data into the graph structure can steadily improve the performance of traffic prediction. In addition, we can see that that learns the global traffic pattern across all timestamps exceeds and by a clear margin, which confirms the advantage of our way to learn graph structure from traffic data.
Dataset  T  Metric  Hidden size  

8  16  32  64  128  
SZtaxi  60min  RMSE  4.0630  4.0490  4.0510  4.0270  4.1090 
MAE  2.7220  2.7000  2.6870  2.7010  2.7270  
ACC  0.7170  0.7179  0.7178  0.7195  0.7137  
R  0.8446  0.8496  0.8495  0.8513  0.8451  
VAR  0.8489  0.8499  0.8499  0.8515  0.8451  
Losloop  60min  RMSE  10.810  7.7330  7.5140  7.0840  7.0980 
MAE  6.9610  4.7280  4.4720  4.1350  4.2290  
ACC  0.8156  0.8682  0.8720  0.8793  0.8790  
R  0.3992  0.6924  0.7096  0.7418  0.7409  
VAR  0.4053  0.694  0.7096  0.7418  0.7414 
Dataset  T  Metric  Monte Carlo dropout probability  

0.1  0.3  0.5  0.7  0.9  
SZtaxi  60min  RMSE  4.0270  4.0350  4.0470  4.1020  4.6620 
MAE  2.6860  2.6720  2.7050  2.7290  3.1970  
ACC  0.7195  0.7189  0.7181  0.7143  0.6752  
R  0.8513  0.8507  0.8498  0.8457  0.8006  
VAR  0.8514  0.8507  0.85  0.8457  0.8008  
Losloop  60min  RMSE  6.9940  6.8370  6.7330  6.7760  6.9580 
MAE  4.0460  3.9340  3.9180  3.9740  4.1390  
ACC  0.8808  0.8835  0.8853  0.8845  0.8814  
R  0.7483  0.7596  0.7668  0.7638  0.7509  
VAR  0.7485  0.7601  0.7669  0.7638  0.7511 
Methods  BSTGCN  

Topology of the road network  ✓  
MAP result of  ✓  ✓  ✓  
Dependence of the graph structure over traffic data  ✓  ✓  
Uncertainty  ✓  
Self Attention Network  ✓  
Graph Attention Network  ✓  
RMSE  7.6700  7.4870  7.9860  7.0840  6.7330 
MAE  4.4410  4.3250  4.6660  4.1350  3.9180 
ACC  0.8693  0.8724  0.8639  0.8793  0.8853 
R  0.6973  0.7117  0.6719  0.7418  0.7668 
VAR  0.6976  0.7118  0.6721  0.7418  0.7669 
T  Metric  SZtaxi  Losloop  

HA  ARIMA  SVR  GCN  GRU  TGCN  A3TGCN  BSTGCN  HA  ARIMA  SVR  GCN  GRU  TGCN  A3TGCN  BSTGCN  
15min  RMSE  4.2951  7.2406  4.1455  5.6596  3.9994  3.9325  3.8989  3.9670  7.0970  10.044  6.0084  7.7922  5.2182  5.1264  5.0904  4.7585 
MAE  2.7815  4.9824  2.6233  4.2367  2.5955  2.7145  2.6840  2.6490  3.7585  7.6832  3.7285  5.3525  3.0602  3.1802  3.1365  2.9150  
ACC  0.7008  0.4463  0.7112  0.6107  0.7249  0.7295  0.7318  0.7237  0.8792  0.8275  0.8977  0.8673  0.9109  0.9127  0.9133  0.9185  
R  0.8307  0.8423  0.6654  0.8329  0.8539  0.8512  0.8557  0.7382  0.0025  0.8123  0.6843  0.8576  0.8634  0.8653  0.8810  
VAR  0.8307  0.0035  0.8424  0.6655  0.8329  0.8539  0.8512  0.8557  0.7382  0.8146  0.6844  0.8577  0.8634  0.8653  0.8811  
30min  RMSE  4.3481  6.7899  4.1628  5.6918  4.0942  3.9740  3.9228  4.0010  7.9717  9.3450  6.9588  8.3353  6.2802  6.0598  5.9974  5.6380 
MAE  2.8171  4.6765  2.6875  4.2647  2.6906  2.7522  2.7038  2.6530  4.1692  7.6891  3.7248  5.6118  3.6505  3.7466  3.6610  3.3580  
ACC  0.6971  0.3845  0.7100  0.6085  0.7184  0.7267  0.7302  0.7213  0.8642  0.8275  0.8815  0.8581  0.8931  0.8968  0.8979  0.9040  
R  0.8266  0.8410  0.6616  0.8249  0.8451  0.8493  0.8531  0.6709  0.0031  0.7492  0.6402  0.7957  0.8098  0.8137  0.8354  
VAR  0.8266  0.0081  0.8413  0.6617  0.8250  0.8451  0.8493  0.8532  0.6709  0.7523  0.6404  0.7958  0.8100  0.8137  0.8354  
45min  RMSE  4.3910  6.7852  4.1885  5.7142  4.1534  3.9910  3.9461  4.0110  8.7643  10.051  7.7504  8.8036  7.0343  6.7065  6.6840  6.2130 
MAE  2.8480  4.6734  2.7359  4.2844  2.7743  2.7645  2.7261  2.6780  4.5646  7.6924  4.1288  5.9534  4.0915  4.1158  4.1712  3.6440  
ACC  0.6941  0.3847  0.7082  0.6069  0.7143  0.7255  0.7286  0.7206  0.8507  0.8273  0.8680  0.8500  0.8801  0.8857  0.8861  0.8942  
R  0.8232  0.8391  0.6589  0.8198  0.8436  0.8474  0.8525  0.6035  0.6899  0.5999  0.7446  0.7679  0.7694  0.8008  
VAR  0.8232  0.0087  0.8397  0.6590  0.8199  0.8436  0.8474  0.8526  0.6036  0.0035  0.6947  0.6001  0.7451  0.7684  0.7705  0.8008  
60min  RMSE  4.4312  6.7708  4.2156  5.7361  4.0747  4.0099  3.9707  4.0270  9.4970  10.054  8.4388  9.2657  7.6621  7.2677  7.0990  6.7330 
MAE  2.8754  4.6655  2.7751  4.3034  2.7712  2.7860  2.7391  2.6860  4.9491  7.6952  4.5036  6.2892  4.5186  4.6021  4.2343  3.9180  
ACC  0.6913  0.3851  0.7063  0.6054  0.7197  0.7242  0.7269  0.7195  0.8382  0.8273  0.8562  0.8421  0.8694  0.8762  0.8790  0.8853  
R  0.8199  0.8370  0.6564  0.8266  0.8421  0.8454  0.8513  0.5360  0.6336  0.5583  0.6980  0.7283  0.7407  0.7668  
VAR  0.8199  0.0111  0.8379  0.6564  0.8267  0.8421  0.8454  0.8514  0.5361  0.0036  0.5593  0.5593  0.6984  0.7290  0.7415  0.7669 
Comparison with Existing Methods
Table 4 summaries the comparison between BSTGCN and existing methods in terms of RMSE, MAE, ACC, and VAR under various prediction horizon of traffic prediction. The major findings are introduced as follows. First, temporal dependence based methods including HA, ARIMA, SVR, and GRU achieve lower prediction performance on two realworld datasets, compared to spatiotemporal dependence based methods, i.e., TGCN, A3TGCN, and BSTGCN. This verifies the benefits of spatial characteristics of traffic speed for traffic prediction. Second, GCN, considering only spatial characteristics of traffic speed, is inferior to GRU in all evaluation metrics. This indicates that future traffic speed on roads is more dependent on historical traffic speed on roads. Third, we can observe that BSTGCN has a clear advantage over TGCN and A3TGCN, especially in the Losloop dataset. For instance, with respect to the 15minute prediction horizon, BSTGCN outperforms TGCN and A3TGCN by 1.76% and 1.57% in terms of . This confirms the superior capability of BSTGCN in capturing spatiotemporal characteristics of traffic speed. Finally, we can notice that BSTGCN shows its impressive ability in mitigating longterm prediction errors. For instance, with respect to 60minute time series, BSTGCN outperforms TGCN and A3TGCN by 3.85% and 2.61% in terms of in the Losloop dataset.
Visualized Results of Graph Structure
Figure 2 visualizes the observed adjacency matrix , the MAP estimation of adjacency matrix , and the final learned adjacency matrix . We can see that has the densest connectivity among traffic flows, but is of an asymmetric structure. This confirms that the assumption that adjacency matrix is symmetric is violated in the task of traffic prediction. In addition, we can notice that the relationships among traffic flows in are not always positive. However, such negative dependence among traffic flows cannot be captured in the original observed adjacency matrix .
Visualized Prediction Results
Figure 3 presents the prediction results of BSTGCN in the setting of 15, 30, and 45minute prediction horizon. First, we can see that the traffic speed in the SZtaxi dataset is more stationary than that of the Losloop dataset. This explains the phenomenon that the naive historical average model (HA) performs well in the SZtaxi dataset. Second, we can notice that the prediction results of BSTGCN are closer to the groundtruth traffic speed than that of existing methods, which is more obvious in the longterm prediction. This confirms the excellent ability of BSTGCN in traffic forecasting. Third, despite achieving impressive results, BSTGCN still cannot predict the value of sharply changing points.
Dataset  T  Metric  Activation function  

Identity  Sigmoid  
SZtaxi  60min  RMSE  4.0280  4.0270 
MAE  2.7020  2.6860  
ACC  0.7194  0.7195  
R  0.8512  0.8513  
VAR  0.8516  0.8514  
Losloop  60min  RMSE  6.5900  6.7330 
MAE  3.8420  3.9180  
ACC  0.8877  0.8853  
R  0.7766  0.7668  
VAR  0.7767  0.7669 
Deep Dive into Network Architecture
We also have a deep dive into the network architecture of BSTGCN. Specifically, we switch the activation function of the reset gate in the tailored GRU from the sigmoid function to the identity function. The experimental results are presented in Table 5. It is interesting that such a simple modification can bring a noticeable gain in the Losloop dataset, but has little impact in the SZtaxi dataset. This phenomenon indicates different datasets prefer to different neural architecture. As a result, we advocate that automatically search neural architecture for the specific dataset is a promising alternative to further improve the performance of BSTGCN.
6 Conclusion and Future Work
In this paper, we propose a Bayesian spatiotemporal graph convolutional network (BSTGCN) for traffic prediction. Specifically, we propose to learn the underlying graph structure from the observed topology of the road network and traffic data in an endtoend manner, and introduce uncertainty into the graph structure through a Bayesian approach. Experimental results on two realworld datasets verify the outstanding capability of BSTGCN in traffic prediction. In the future, we focus on two main directions. One is to extend BSTGCN to other spatiotemporal time series forecasting tasks, such as forecasting ride demand. The other is to perform neural architecture and hyperparameters search for BSTGCN.
References
 (1979) Analysis of freeway traffic timeseries data by using boxjenkins techniques. Cited by: §2, §5.
 (2014) On the properties of neural machine translation: encoderdecoder approaches. arXiv preprint arXiv:1409.1259. Cited by: §4, §5.
 (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2.
 (2016) Using lstm and gru neural network methods for traffic flow prediction. In 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), pp. 324–328. Cited by: §2.
 (2016) Dropout as a bayesian approximation: representing model uncertainty in deep learning. In international conference on machine learning, pp. 1050–1059. Cited by: §3.
 (2014) Traffic congestion costs americans $124 billion a year, report says. Forbes, October 14. Cited by: §1.
 (2019) Semisupervised learning with graph learningconvolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11313–11320. Cited by: §3.
 (2017) Large scale graph learning from smooth signals. arXiv preprint arXiv:1710.05654. Cited by: §4.
 (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.
 (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §3, §5.
 (2016) Variational graph autoencoders. arXiv preprint arXiv:1611.07308. Cited by: §4.
 (2007) Traffic flow prediction using adaboost algorithm with random forests as a weak learner. In Proceedings of world academy of science, engineering and technology, Vol. 19, pp. 193–198. Cited by: §2.
 (2004) A summary of traffic flow forecasting methods [j]. Journal of Highway and Transportation Research and Development 3, pp. 82–85. Cited by: §5.
 (2019) A grubased prediction framework for intelligent resource management at cloud data centres in the age of 5g. IEEE Transactions on Cognitive Communications and Networking. Cited by: §4.
 (2014) Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems 16 (2), pp. 865–873. Cited by: §2.
 (1984) Dynamic prediction of traffic volume through kalman filtering theory. Transportation Research Part B: Methodological 18 (1), pp. 1–11. Cited by: §2.
 (2019) Bayesian graph convolutional neural networks using nonparametric graph learning. arXiv preprint arXiv:1910.12132. Cited by: §3, §4, §4.
 (1999) Forecasting freeway link travel times with a multilayer feedforward neural network. ComputerAided Civil and Infrastructure Engineering 14 (5), pp. 357–367. Cited by: §2.
 (2017) Automatic differentiation in pytorch. Cited by: §5.
 (2004) A tutorial on support vector regression. Statistics and computing 14 (3), pp. 199–222. Cited by: §5.
 (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §5.
 (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §5.
 (2003) Modeling and forecasting vehicular traffic flow as a seasonal arima process: theoretical basis and empirical results. Journal of transportation engineering 129 (6), pp. 664–672. Cited by: §2.
 (2004) Traveltime prediction with support vector regression. IEEE transactions on intelligent transportation systems 5 (4), pp. 276–281. Cited by: §2.
 (2020) Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. arXiv preprint arXiv:2002.09140. Cited by: §3.
 (2018) Spatial temporal graph convolutional networks for skeletonbased action recognition. In Thirtysecond AAAI conference on artificial intelligence, Cited by: §3.
 (2017) Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors 17 (7), pp. 1501. Cited by: §2.
 (2017) Deep spatiotemporal residual networks for citywide crowd flows prediction. In ThirtyFirst AAAI Conference on Artificial Intelligence, Cited by: §2.
 (2019) Bayesian graph convolutional neural networks for semisupervised classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 5829–5836. Cited by: §3.
 (2019) Tgcn: a temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems. Cited by: §2, §5.
 (2020) A3Tgcn: attention temporal graph convolutional network for traffic forecasting. arXiv preprint arXiv:2006.11583. Cited by: §2, §5.