DDPGCN: MultiGraph Convolutional Network for Spatiotemporal Traffic Forecasting
Abstract
Traffic speed forecasting is one of the core problems in Intelligent Transportation Systems. For a more accurate prediction, recent studies started using not only the temporal speed patterns but also the spatial information on the road network through the graph convolutional networks. Even though the road network is highly complex due to its nonEuclidean and directional characteristics, previous approaches mainly focus on modeling the spatial dependencies only with the distance. In this paper, we identify two essential spatial dependencies in traffic forecasting in addition to distance, direction and positional relationship, for designing basic graph elements as the smallest building blocks. Using the building blocks, we suggest DDPGCN (Distance, Direction, and Positional relationship Graph Convolutional Network) to incorporate the three spatial relationships into prediction network for traffic forecasting. We evaluate the proposed model with two largescale realworld datasets, and find 7.40 average improvement for 1hour forecasting in highly complex urban networks.
1 Introduction
Traffic forecasting is a crucial task for Intelligent Transportation Systems(ITS) [19]. Improving these forecasting systems is important for a wide range of applications, such as autonomous vehicles operations, route optimization, and transportation system management. In this work, we focus on the traffic speed forecasting, which predicts the future traffic speeds for each segment of road using historical speed data. Accurate traffic speed forecasting can help prevent traffic congestion, shorten travel time, and reduce carbon emissions.
For a better prediction, recent deep learning studies have started to utilize not only the historical speed data but also the spatial information of the road networks. To manipulate spatial information into a format which can be used for deep networks like imagebased CNNs, [13, 14] simply unfold the road network and [25] uses a gridbased representation resulting in maplike images. However, these imagelike representations cannot fully capture the complex spatial relationships of traffic networks, such as driving directions and onpath proximity. To better understand the nongrid spatial characteristics of the traffic networks, recent works have started to employ the graph convolutional networks(GCNs).
While most of the previous works are limited by using Euclidean distance as the only graph element for GCNs [23, 24, 4], some studies expand GCNs to include nonEuclidean dependencies. [12, 5] modified distance graph with nonEuclidean relationships, such as flow direction and reachability. For bike demand forecasting, [3, 8, 9] implemented multigraph convolution based on the three types of graph elements, such as transportation connectivity and functional similarity, in addition to the distance. To adapt multigraph convolution to traffic forecasting, we first need to understand which nonEuclidean dependencies are important for traffic networks.
Figure 1 shows a simple example. Here we want to figure out how the speed pattern of the target link(colored in red) is related to the other four links, , , , and . If we consider the distance only, the directly neighbored links and might be the most related links to the target. However, due to the different driving direction, might show quite a different speed pattern. Compared to , could have a more consistent speed pattern with the target, because shares the target’s direction. Additionally, could also share a similar speed pattern with the target because they are heading to the same area. These properties might be more pronounced during commute hours.
In order to utilize these concepts, we define two types of spatial dependencies in our work in addition to distance: direction and positional relationship. Then, we propose a new type of traffic prediction network called DDPGCN (Distance, Direction, and Positional relationship Graph Convolutional Network). In DDPGCN, the nonEuclidean characteristics of direction and positional relationship of the complex road networks are described through multigraphs. In previous studies [3, 8, 9], multigraph convolution has been defined using undirected graphs consisting of links without direction information for the bike demand prediction task. However, the same approach cannot be directly applied to traffic forecasting on the directed graphs, especially consisting of links with direction information. In this work, we define multigraphs based on link vectors and link directions and suggest the partition filters to generate hybrid graphs that incorporate two different graph information. When evaluated on two largescale realworld datasets that are highly complex urban networks, DDPGCN easily outperformed the stateoftheart baselines.
Our main contributions are in twofolds.

We identify nonEuclidean spatial relationships, direction and positional relationship, and propose to encode them using multiple graphs. We also suggest the partition filters that can incorporate multigraphs into a single hybrid graph. To our best knowledge, this is the first time multigraphs are defined for the use in traffic speed forecasting.

We propose a traffic forecasting network(DDPGCN) which exploits the desired spatial dependencies effectively. This model is especially beneficial for longterm forecasting in a highly complex urban network, known as the most challenging problem.
2 Related Works
2.1 Graph Convolution
Graph convolutional networks(GCNs) were first introduced in [2], which bridges the spectral graph theory and deep neural networks. [6] proposed ChebNet, which improves GCNs with fast localized convolution filters using Chebyshev polynomials. ChebNet implicitly avoids the computation of the graph Fourier basis, and it greatly reduces the computational complexity. In addition, filters of Chebyshev polynomials are guaranteed to be localized in space. [11] introduced 1stChebNet, as an extension of ChebNet. They not only showed the competitive performance for a variety of tasks but also greatly reduced the computational cost by avoiding the eigenvalue decomposition that was needed for finding the Fourier basis.
2.2 Traffic Forecasting with GCNs
As GCNs heavily depend on the Laplacian matrix of a graph, it is crucial to determine a proper edge weighting that reflects network geometry sufficiently well. For traffic forecasting, most studies define the connectivities as whether two nodes are directly adjacent, and the edge weights are set inversely proportional to the physical distance or the travel time between nodes in the network [23, 22, 24]. Some of the recent studies have started to manipulate additional factors other than the distance in the road networks. [12, 5] modified distance graph with additional relationships, such as flow direction and reachability. For the case of bike demand forecasting, [3, 8, 9] considered three types of graph elements, such as transportation connectivity and functional similarity, and they implemented a multigraph convolution as the sum of the individual operations. In this work, we define two types of nonEuclidean graph elements. Then, we suggest a partition filter to modify graph elements and perform the multigraph convolution in several ways, in addition to how it was performed in previous studies.
3 Definitions and Problem Formulation
In this section, we define the key concepts for modeling road traffics and formulate the problem. Using the link concept from [15], we newly define link vector and link direction as below. In general, a link represents a road segment without an internal merge/diverge section, as shown in Figure 2(a).
Definition 1: The link vector of a link is defined as the difference between the end point and the start point , and can be formulated as
(1) 
where . For link vector , its link direction is defined as
(2) 
where and are the unit vectors in the direction of the xaxis and yaxis, respectively.
Note that ’s value could be between 0 and . An illustration of a link vector and link direction are shown in Figure 2(b).
Definition 2: A traffic network graph is a weighted directed graph representing a road network, where is the set of road links with , is the set of edges representing the connectedness among the road links, and is a weighted adjacency matrix representing spatial interdependencies.
Usually, the weighted adjacency matrix has when the road links and are not connected. However, we will define new adjacency matrices later, where this property does not necessarily hold. It is also noted that is not used in our work because the connectedness is fully described by . Finally, we formulate the problem as below.
Problem: If a graph signal represents the traffic speed observed on , and represents the graph signal observed at th time interval, the traffic forecasting problem aims to learn a function that maps the historical graph signals to the future graph signals. For a given graph ,
(3) 
In general, can be of size , where is the number of observed features for each link. Even though our dataset includes only the speed feature, i.e. , all of our results are directly applicable to the problems with .
4 Proposed Model
Before explaining our methods, we briefly summarize the general graph convolution based on the approximation of 1stChebNet[11] for a single directed graph . For a directed graph , 1stChebNet generalizes the definition of a graph convolution as
(4) 
where the signal , a scalar for every node,
the learnable parameters ,
and the diagonal degree matrix with .
4.1 Framework Overview
The system architecture of the proposed model DDPGCN is shown in Figure 3(a). It consists of two spatiotemporal convolutional blocks(STconvolutional blocks, Figure 3(b)), and a simple 1x1 convolutional layer to reduce the number of channels at the end. Each STconvolutional block contains a temporal block(Figure 3(c)) and a spatial block(Figure 7, as explained in Section 4.3). We represent a variety of spatial relationships of the road network in the form of three different graph elements. After that, we apply a simple partition filter to generate modified graph elements that contain the hybrid information of two graph elements. Finally, two types of multigraph convolution are applied to effectively capture the complex spatial relationships.
4.2 Definition of the Spatial Graph Elements
In order to improve our modeling capability of complex spatial relationships in road networks, as shown in Figure 4, we define two types of edge weight measures, in addition to distance, and build proper weighted adjacency matrices that can be used as the spatial block graph elements. For clarification, only the Prior 1 is currently known, and the other two are newly introduced in our work. Previous works [3, 8, 9] defined multigraphs on undirectional networks for the bike demand forecasting. In contrast, in our work, we define multigraphs on directional networks for traffic speed forecasting based on the domain knowledge in the traffic area.
Prior 1 [20]: Everything is related to everything else. But near things are more related than distant things.
Graph 1 (Distance, ): We consider the distance as the shortest interlink distance on the path. When the link vectors are directly connected, i.e. = , we calculate the distance as the average of the link lengths, i.e. . When the link vectors are not directly connected, , the distance is evaluated based on the Dijkstra algorithm[7]. After all the pairwise distances are evaluated, we define using the thresholded Gaussian kernel[16] as below, where and are hyperparameters.
(5) 
Prior 2: Distant links can be related depending on their directions. Links having the same directions might be more related than links having the opposite directions.
Graph 2 (Direction, ): We consider a simple direction measure with a proper normalization. To our best knowledge, we are the first to utilize the relative direction information with graph convolutional networks.
(6) 
Prior 3: Links are related depending on how they can be connected. The positional relationship, such as whether two links are heading to closely located destinations, could be additionally informative to the distance for understanding how links interact.
Graph 3 (Positional relationship, ): While Graph 1 considers only the shortest pathdistance that connects from to , two links can be connected through many different paths. In order to capture how two links are related while considering a variety of connection paths, we extend the link vectors maintaining the start point and the link direction . Then we define as the unweighted adjacency matrices that contain the information on where the two extended link vectors meet as described below for each .
(7) 
Figure 5 shows the four types of positional relationships. represents four possible intersection points of two extended links vectors. exists if the extension of in the backward direction meets the extension of in the backward direction. , , and are similarly defined depending on whether forward or backward extensions of and can meet. Each type implies how two links could interact. For example, two links would head to the same area when exists as described in Figure 5(b). Note that any cannot exist when two links are exactly parallel. This case is taken into consideration by setting all values to zero.
Partition filters We also define a set of partition filters that can be used over to create . is a scalarinput scalaroutput function, and decribes the elementwise application of filter . We constrain the set of partition filters to satisfy in order to make sure that is spread over the partitioned matrices without any increase or decrease in the elementwise sum. can be designed based on the histogram analysis. To smoothly handle the boundary values, we choose triangular partition filters. In our work, only are considered for applying the partition filters. Figure 6 shows 4directional triangular partition filters, . Instead of using directly, we will use that contain hybrid information between distance and direction because of its empirical superiority. For , appropriate partition filters can be designed by investigating the density in a similar way.
4.3 Building a Spatial Block
With the newly defined graph elements, we are ready to build spatial blocks for extracting complex spatial relationships. Spatial blocks are based on the three graph elements , and , and their partitioned versions. has no partition filter expansion, but we use instead of to exploit the hybrid information.
In Figure 7, each box represents a single graph convolution as described in Eq. (4), and we only denote the weighted adjacency matrix in the figure for simplicity. The number of convolutional operation and the choice of weight matrices can vary depending on the dataset or the prediction task.
We designed three types of spatial blocks, as shown in Figure 7. Single(Figure 7(a)) refers to the simple graph convolution considering only the Euclidean distance. Parallel(Figure 7(b)) and Stacked(Figure 7(c)) refer to the multigraph convolution including distance, direction, and positional relationship information. While both utilize four convolutional operations with different graph elements individually, they have different structures for connecting the graph elements. Parallel structure can be regarded as equivalent to multigraph convolution structure as defined in previous works [3, 8, 9].
4.4 Building a Temporal Block
We conducted extensive experiments to design the temporal blocks, including graph convolution, selfattention[21], multiconvolution[18], and temporal relational reasoning[26]. But it turns out that simple convolution showed the best performance with the shortest training time. Based on the empirical results, we adapt the simple temporal block as shown in Figure 3(c).
5 Experiment
5.1 Datasets
We conducted experiments on two realworld largescale datasets of Seoul, South Korea. Urban1(Gangnam) and Urban2(Mapo) correspond to the most crowded regions in Seoul, and they have highly complex connectivity patterns that cannot be explained only with distance; they have bidirectional links and complicated traffic signals. The traffic data were collected every 5 minutes for a month ranging from Apr 1st, 2018 to Apr 30th, 2018. The datasets were collected using the GPS of over 70,000 taxis.
Dataset  Urban1  Urban2 

Time spans  4/1/2018 4/30/2018  
Time interval  5min  
Region size (width, height)(m)  (7000, 7000)  
Number of links  480  455 
Speed mean(std)(km/h)  26.333(10.638)  25.917(9.784) 
Length mean(min, max)(m)  592(171, 2622)  561(80, 2629) 
Links per  11.274  10.280 
Average directly connected links  3.233  2.935 
Most of the previous studies have focused on the traffic networks that include only freeways and have simply defined links as the points without direction information [12, 23, 24, 4, 5]. Our datasets, however, not only include complex urban networks with a large number of intersections, traffic signals, and interactions with other roads(e.g. pedestrian paths), but also define links as the vectors including direction. We compared some statistical properties of the network to compare how complex they are. First, while every link pair in our datasets is connected, only 27 of link pairs are connected in the METRLA dataset used in [12, 24]. Second, as described in Table 1, for both datasets, most of the links can be directly connected on average to three paths(i.e. inflow, outflow, and more), and there are more than 10 links within a 1. They imply our dataset has a fairly dense link population when compared to the highway only network of same city, Seoul. For highway only network, most links can be directly connected on average in two paths(i.e. inflow and outflow only), and there are only around three links within a 1. Due to these dense link population characteristics, distance graph element alone cannot sufficiently reflect the complex spatial relationship of the urban networks.
Model  Urban1  Urban2  
MAE  MAPE (%)  RMSE  MAE  MAPE (%)  RMSE  
HA  3.34/ 3.34/ 3.34  14.68/ 14.67/ 14.68  5.42/ 5.42/ 5.41  3.23/ 3.22/ 3.22  14.43/ 14.42/ 14.41  4.86/ 4.86/ 4.85 
VAR  5.06/ 4.99/ 4.97  23.10/ 22.82/ 22.73  7.04/ 6.92/ 6.88  4.58/ 4.52/ 4.49  20.82/ 20.55/ 20.43  6.31/ 6.22/ 6.19 
LSVR  3.82/ 3.89/ 3.93  15.35/ 17.99/ 17.39  5.64/ 5.74/ 5.84  4.38/ 4.22/ 3.92  17.01/ 16.82/ 18.45  5.83/ 5.71/ 5.47 
ARIMA  3.49/ 3.79/ 4.04  15.40/ 16.85/ 18.09  5.28/ 5.65/ 5.94  3.30/ 3.56/ 3.78  14.78/ 15.99/ 17.03  4.77/ 5.09/ 5.37 
FCLSTM  3.91/ 3.92/ 3.92  17.29/ 17.32/ 17.31  6.38/ 6.39/ 6.39  3.81/ 3.81/ 3.82  17.10/ 17.12/ 17.12  5.57/ 5.58/ 5.58 
DCRNN  3.17/ 3.46/ 3.73  13.52/ 14.83/ 15.95  4.94/ 5.30/ 5.61  3.08/ 3.31/ 3.50  13.55/ 14.63/ 15.52  4.58/ 4.86/ 5.08 
STGCN  3.07/ 3.42/ 3.80  14.38/ 16.72/ 19.37  4.57/ 4.83/ 5.04  2.99/ 3.33/ 3.69  14.02/ 15.82/ 17.78  4.37/ 4.79/ 5.26 
DDPGCN(Single)  3.06/ 3.06/ 3.29  13.83/ 13.82/ 15.02  4.55/ 4.54/ 4.95  2.93/ 2.93/ 3.06  13.34/ 13.33/ 13.99  4.26/ 4.26/ 4.43 
DDPGCN(Parallel)  3.06/ 3.06/ 3.10  13.82/ 13.08/ 13.99  4.54/ 4.54/ 4.64  2.95/ 2.95/ 2.96  13.37/ 13.36/ 13.45  4.30/ 4.29/ 4.31 
DDPGCN(Stacked)  3.00/ 3.00/ 2.99  13.57/ 13.56/ 13.51  4.45/ 4.45/ 4.47  2.90/ 2.89/ 2.88  13.18/ 13.17/ 13.14  4.24/ 4.23/ 4.22 
5.2 Experimental Settings
We repeated the experiment five times and the average performances are provided in Table 2. For all datasets, we applied Zscore normalization. After excluding the weekends, 70% of the data is used for training, 10% for validation and the remaining 20% for testing in time order. For , as defined in Eq. 5, and are set depending on the data scale. In our study, we set to be and to be 0. For partition filters, we determined the number of filters, for and for , based on the histogram analysis of each dataset as described in Section 4.2. We set the to be 4(Urban1, Urban2) and to be 3(Urban1) or 4(Urban2). We set both and to 12 samples, where 12 corresponds to an one hour span. All experiments were implemented using Tensorflow 1.15 on a Linux cluster(CPU: Intel(R) Xeon(R) CPU E62620 v4 @ 2.10GHz, GPU: NVIDIA TITAN V). The training process takes about 15 minutes on a single GPU.
Baselines We compare the proposed model (DDPGCN) with the following methods: (1) HA: Historical Average; (2) VAR: Vector AutoRegression[10]; (3) LSVR: Linear Support Vector Regression; (4) ARIMA: AutoRegressive Integrated Moving Average model; (5) FCLSTM: Recurrent Neural Network with fully connected LSTM hidden units[17]; (6) DCRNN: Diffusion Convolutional Recurrent Neural Networks[12], which manipulates bidirectional diffusion convolution on the graph for capturing spatial dependency and uses sequencetosequence architecture with gated recurrent units to capture temporal dependency, and (7) STGCN: SpatioTemporal Graph Convolutional Networks[23], which is composed of spatiotemporal convolutional blocks including two gated sequential convolution layers and one spatial graph convolution layer in between.
5.3 Performance Comparison
Table 2 shows the performance comparison of DDPGCN and the baselines of the datasets Urban1 and Urban2 for 30, 45, 60min prediction on all datasets. To note, each model predicted all 12 sequential traffic speed values at all links simultaneously. These methods are evaluated based on three commonly used metrics in traffic forecasting, including (1) Mean Absolute Error (MAE), (2) Mean Absolute Percentage Error (MAPE), and (3) Root Mean Squared Error (RMSE).
Our proposed model, especially DDPGCN(Stacked), achieves the best performance for all cases with the exception of the 30min prediction MAPE of the Urban1 dataset. For 1hour forecasting, our model showed an improvement of 7.40% on average (9.83% maximum). In particular, the stacked spatial block, firstly introduced in our work, outperformed the parallel spatial block for all cases. Unlike other baseline methods, DDPGCN(Stacked) showed even better performance in longer forecasting horizons for some cases. Interestingly, we observed that graph convolutionbased methods are generally accurate only for (relatively) shortterm predictions and HA tends to be more accurate for (relatively) longterm predictions. We believe that this result is mainly due to the strong weekly periodicity of our dataset. While others utilize only recent 1hour speed information, HA utilizes a different type of information, i.e. weekly speeds. Even though we exploit only recent 1hour information with our model, however, our model always outperformed the others, including HA. It suggests that the nonEuclidean spatial relationships, direction, and positional relationship, are quite powerful priors for the accurate speed forecasting.
When HA that uses additional inputs is excluded and DDPGCN is compared to the graph convolution based methods only(DCRNN, STGCN, and DDPGCN), DDPGCN(Stacked) showed 9.60% average (19.84% maximum) improvement over all forecasting horizons and 16.07% average improvement for 1hour forecasting. A segment of the prediction results is shown in Figure 8, and we can easily see that DDPGCN(Stacked) is superior at capturing the abrupt changes, even 1hour ahead. This result suggests that nonEuclidean information is essential for capturing the abrupt changes in complex networks in advance.
5.4 Benefits of Spatial Graph Elements
To investigate the effect of each spatial graph element, we evaluate the performance degradation by performing ablation tests using DDPGCN(Stacked) on the Urban1 dataset. The result is shown in Table 3. Removing Direction() decreased the performance the most and removing Distance() the least. This result indicates that distance only element is not the most important graph element to reflect complex spatial relationships. On the other hand, nonEuclidean relationships, especially directions, should be fed into the network with partition filters.

MAE  MAPE(%)  RMSE  

Distance  3.01/ 3.01/ 3.00  13.59/ 13.58/ 13.51  4.47/ 4.47/ 4.47  
Direction  3.04/ 3.04/ 3.03  13.76/ 13.75/ 13.70  4.53/ 4.53/ 4.55  

3.03/ 3.03/ 3.03  13.73/ 13.72/ 13.69  4.52/ 4.52/ 4.55  

3.02/ 3.02/ 3.02  13.67/ 13.66/ 13.63  4.50/ 4.50/ 4.52  
None  3.00/ 3.00/ 2.99  13.57/ 13.56/ 13.51  4.45/ 4.45/ 4.47 
K  DCRNN  STGCN 





1  5.57  5.61  4.95  4.64  4.47  
2  5.27  5.55  4.82  4.60  4.46  
3  5.02  5.31  4.83  4.59  4.46  
4  4.85  5.43  4.75  4.58  4.46  
5  4.82  5.37  4.73  4.58  4.46  
10    5.25  4.69  4.56  4.47  
20    5.30  4.68  4.57  4.48 
We also examined if larger hops could be as important as nonEuclidean relationships. Here, instead of the 1stChebNet, we use Kpolynomial ChebNet[6]. Khops are applied only with the distance(). As shown in Table 4, even with large K, no model could outperform DDPGCN(Stacked). Moreover, the performance of DDPGCN(Stacked) was not improved for larger K. These results indicate that the direction and the positional relationship should be considered as important attributes for traffic forecasting, and they are more beneficial than the Khop neighbors’ information.
6 Conclusion and Future Work
In this paper, we proposed and described a new traffic speed forecasting network utilizing three spatial dependencies, namely distance, direction, and positional relationships. Our model includes multigraph convolution based on the properly modified graph elements by the simple partition filters. We have investigated examples of traffic forecasting problems and showed a large improvement in the longterm forecasting accuracy of highly complex urban networks when compared to the other stateoftheart algorithms. In the future, we will further evaluate our model on other datasets and investigate how temporal periodicity influences traffic forecasting.
Footnotes
References
 (2016) Layer normalization. arXiv preprint arXiv:1607.06450. Cited by: Figure 3.
 (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.1.
 (2018) Bike flow prediction with multigraph convolutional networks. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 397–400. Cited by: §1, §1, §2.2, §4.2, §4.3.
 (2019) Gated residual recurrent graph neural networks for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 485–492. Cited by: §1, §5.1.
 (2018) Traffic graph convolutional recurrent neural network: a deep learning framework for networkscale traffic learning and forecasting. arXiv preprint arXiv:1802.07007. Cited by: §1, §2.2, §5.1.
 (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2.1, §5.4.
 (1959) A note on two problems in connexion with graphs. Numerische mathematik 1 (1), pp. 269–271. Cited by: §4.2.
 (2019) Spatiotemporal multigraph convolution network for ridehailing demand forecasting. Cited by: §1, §1, §2.2, §4.2, §4.3.
 (2019) Multimodal graph interaction for multigraph convolution network in urban spatiotemporal forecasting. arXiv preprint arXiv:1905.11395. Cited by: §1, §1, §2.2, §4.2, §4.3.
 (1994) Time series analysis. Vol. 2, Princeton university press Princeton, NJ. Cited by: §5.2.
 (2016) Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.1, §4, footnote 1.
 (2017) Graph convolutional recurrent neural network: datadriven traffic forecasting. CoRR abs/1707.01926. External Links: Link, 1707.01926 Cited by: §1, §2.2, §5.1, §5.2.
 (2017) Learning traffic as images: a deep convolutional neural network for largescale transportation network speed prediction. Sensors 17 (4), pp. 818. Cited by: §1.
 (2017) Deep learning for shortterm traffic flow prediction. Transportation Research Part C: Emerging Technologies 79, pp. 1–17. Cited by: §1.
 (2017) Traffic state estimation on highway: a comprehensive survey. Annual Reviews in Control 43, pp. 128–151. Cited by: §3.
 (2012) Signal processing on graphs: extending highdimensional data analysis to networks and other irregular data domains. CoRR abs/1211.0053. External Links: Link, 1211.0053 Cited by: §4.2.
 (2014) Sequence to sequence learning with neural networks. Advances in NIPS. Cited by: §5.2.
 (2017) Inceptionv4, inceptionresnet and the impact of residual connections on learning. In ThirtyFirst AAAI Conference on Artificial Intelligence, Cited by: §4.4.
 (2018) Evaluation of spatiotemporal forecasting methods in various smart city applications. Renewable and Sustainable Energy Reviews 82, pp. 424–435. Cited by: §1.
 (1970) A computer movie simulating urban growth in the detroit region. Economic geography 46 (sup1), pp. 234–240. Cited by: §4.2.
 (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §4.4.
 (2018) Dynamic spatiotemporal graphbased cnns for traffic prediction. CoRR abs/1812.02019. External Links: Link, 1812.02019 Cited by: §2.2.
 (2017) Spatiotemporal graph convolutional neural network: A deep learning framework for traffic forecasting. CoRR abs/1709.04875. External Links: Link, 1709.04875 Cited by: §1, §2.2, §5.1, §5.2, footnote 1.
 (2019) STunet: a spatiotemporal unetwork for graphstructured time series modeling. arXiv preprint arXiv:1903.05631. Cited by: §1, §2.2, §5.1.
 (2017) Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors 17 (7), pp. 1501. Cited by: §1.
 (201809) Temporal relational reasoning in videos. In The European Conference on Computer Vision (ECCV), Cited by: §4.4.