DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting

DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting

Abstract

Traffic speed forecasting is one of the core problems in Intelligent Transportation Systems. For a more accurate prediction, recent studies started using not only the temporal speed patterns but also the spatial information on the road network through the graph convolutional networks. Even though the road network is highly complex due to its non-Euclidean and directional characteristics, previous approaches mainly focus on modeling the spatial dependencies only with the distance. In this paper, we identify two essential spatial dependencies in traffic forecasting in addition to distance, direction and positional relationship, for designing basic graph elements as the smallest building blocks. Using the building blocks, we suggest DDP-GCN (Distance, Direction, and Positional relationship Graph Convolutional Network) to incorporate the three spatial relationships into prediction network for traffic forecasting. We evaluate the proposed model with two large-scale real-world datasets, and find 7.40 average improvement for 1-hour forecasting in highly complex urban networks.

1 Introduction

Traffic forecasting is a crucial task for Intelligent Transportation Systems(ITS) [19]. Improving these forecasting systems is important for a wide range of applications, such as autonomous vehicles operations, route optimization, and transportation system management. In this work, we focus on the traffic speed forecasting, which predicts the future traffic speeds for each segment of road using historical speed data. Accurate traffic speed forecasting can help prevent traffic congestion, shorten travel time, and reduce carbon emissions.

For a better prediction, recent deep learning studies have started to utilize not only the historical speed data but also the spatial information of the road networks. To manipulate spatial information into a format which can be used for deep networks like image-based CNNs, [13, 14] simply unfold the road network and [25] uses a grid-based representation resulting in map-like images. However, these image-like representations cannot fully capture the complex spatial relationships of traffic networks, such as driving directions and on-path proximity. To better understand the non-grid spatial characteristics of the traffic networks, recent works have started to employ the graph convolutional networks(GCNs).

While most of the previous works are limited by using Euclidean distance as the only graph element for GCNs [23, 24, 4], some studies expand GCNs to include non-Euclidean dependencies. [12, 5] modified distance graph with non-Euclidean relationships, such as flow direction and reachability. For bike demand forecasting, [3, 8, 9] implemented multi-graph convolution based on the three types of graph elements, such as transportation connectivity and functional similarity, in addition to the distance. To adapt multi-graph convolution to traffic forecasting, we first need to understand which non-Euclidean dependencies are important for traffic networks.

Figure 1: An example of the importance of non-Euclidean and directional characteristics of traffic networks. Target link is colored in red.

Figure 1 shows a simple example. Here we want to figure out how the speed pattern of the target link(colored in red) is related to the other four links, , , , and . If we consider the distance only, the directly neighbored links and might be the most related links to the target. However, due to the different driving direction, might show quite a different speed pattern. Compared to , could have a more consistent speed pattern with the target, because shares the target’s direction. Additionally, could also share a similar speed pattern with the target because they are heading to the same area. These properties might be more pronounced during commute hours.

In order to utilize these concepts, we define two types of spatial dependencies in our work in addition to distance: direction and positional relationship. Then, we propose a new type of traffic prediction network called DDP-GCN (Distance, Direction, and Positional relationship Graph Convolutional Network). In DDP-GCN, the non-Euclidean characteristics of direction and positional relationship of the complex road networks are described through multi-graphs. In previous studies [3, 8, 9], multi-graph convolution has been defined using undirected graphs consisting of links without direction information for the bike demand prediction task. However, the same approach cannot be directly applied to traffic forecasting on the directed graphs, especially consisting of links with direction information. In this work, we define multi-graphs based on link vectors and link directions and suggest the partition filters to generate hybrid graphs that incorporate two different graph information. When evaluated on two large-scale real-world datasets that are highly complex urban networks, DDP-GCN easily outperformed the state-of-the-art baselines.

Our main contributions are in two-folds.

  • We identify non-Euclidean spatial relationships, direction and positional relationship, and propose to encode them using multiple graphs. We also suggest the partition filters that can incorporate multi-graphs into a single hybrid graph. To our best knowledge, this is the first time multi-graphs are defined for the use in traffic speed forecasting.

  • We propose a traffic forecasting network(DDP-GCN) which exploits the desired spatial dependencies effectively. This model is especially beneficial for long-term forecasting in a highly complex urban network, known as the most challenging problem.

2 Related Works

2.1 Graph Convolution

Graph convolutional networks(GCNs) were first introduced in [2], which bridges the spectral graph theory and deep neural networks. [6] proposed ChebNet, which improves GCNs with fast localized convolution filters using Chebyshev polynomials. ChebNet implicitly avoids the computation of the graph Fourier basis, and it greatly reduces the computational complexity. In addition, filters of Chebyshev polynomials are guaranteed to be localized in space. [11] introduced 1stChebNet, as an extension of ChebNet. They not only showed the competitive performance for a variety of tasks but also greatly reduced the computational cost by avoiding the eigenvalue decomposition that was needed for finding the Fourier basis.

2.2 Traffic Forecasting with GCNs

As GCNs heavily depend on the Laplacian matrix of a graph, it is crucial to determine a proper edge weighting that reflects network geometry sufficiently well. For traffic forecasting, most studies define the connectivities as whether two nodes are directly adjacent, and the edge weights are set inversely proportional to the physical distance or the travel time between nodes in the network [23, 22, 24]. Some of the recent studies have started to manipulate additional factors other than the distance in the road networks. [12, 5] modified distance graph with additional relationships, such as flow direction and reachability. For the case of bike demand forecasting, [3, 8, 9] considered three types of graph elements, such as transportation connectivity and functional similarity, and they implemented a multi-graph convolution as the sum of the individual operations. In this work, we define two types of non-Euclidean graph elements. Then, we suggest a partition filter to modify graph elements and perform the multi-graph convolution in several ways, in addition to how it was performed in previous studies.

3 Definitions and Problem Formulation

In this section, we define the key concepts for modeling road traffics and formulate the problem. Using the link concept from [15], we newly define link vector and link direction as below. In general, a link represents a road segment without an internal merge/diverge section, as shown in Figure 2(a).

Figure 2: Defining links (a) Examples of four links shown in four different colors. (b) A link vector , that represents a vector starting at and ending at in 2-D space, and its link direction .

Definition 1: The link vector of a link is defined as the difference between the end point and the start point , and can be formulated as

(1)

where . For link vector , its link direction is defined as

(2)

where and are the unit vectors in the direction of the x-axis and y-axis, respectively.

Note that ’s value could be between 0 and . An illustration of a link vector and link direction are shown in Figure 2(b).

Definition 2: A traffic network graph is a weighted directed graph representing a road network, where is the set of road links with , is the set of edges representing the connectedness among the road links, and is a weighted adjacency matrix representing spatial inter-dependencies.

Usually, the weighted adjacency matrix has when the road links and are not connected. However, we will define new adjacency matrices later, where this property does not necessarily hold. It is also noted that is not used in our work because the connectedness is fully described by . Finally, we formulate the problem as below.

Problem: If a graph signal represents the traffic speed observed on , and represents the graph signal observed at -th time interval, the traffic forecasting problem aims to learn a function that maps the historical graph signals to the future graph signals. For a given graph ,

(3)

In general, can be of size , where is the number of observed features for each link. Even though our dataset includes only the speed feature, i.e. , all of our results are directly applicable to the problems with .

4 Proposed Model

Before explaining our methods, we briefly summarize the general graph convolution based on the approximation of 1stChebNet[11] for a single directed graph . For a directed graph , 1stChebNet generalizes the definition of a graph convolution as

(4)

where the signal , a scalar for every node, the learnable parameters , and the diagonal degree matrix with .1

4.1 Framework Overview

Figure 3: Model description. Each box represents a single operation. refers to the nonlinear activation function(ReLU) and LN refers to the layer normalization[1]. (a) The framework of DDP-GCN consists of two spatio-temporal convolutional blocks(ST-convolution blocks), and a simple 1x1 convolutional layer. (b) Each ST-convolutional block contains a spatial block and a temporal block. (c) A temporal block contains a 1-D convolution only.

The system architecture of the proposed model DDP-GCN is shown in Figure 3(a). It consists of two spatio-temporal convolutional blocks(ST-convolutional blocks, Figure 3(b)), and a simple 1x1 convolutional layer to reduce the number of channels at the end. Each ST-convolutional block contains a temporal block(Figure 3(c)) and a spatial block(Figure 7, as explained in Section 4.3). We represent a variety of spatial relationships of the road network in the form of three different graph elements. After that, we apply a simple partition filter to generate modified graph elements that contain the hybrid information of two graph elements. Finally, two types of multi-graph convolution are applied to effectively capture the complex spatial relationships.

4.2 Definition of the Spatial Graph Elements

In order to improve our modeling capability of complex spatial relationships in road networks, as shown in Figure 4, we define two types of edge weight measures, in addition to distance, and build proper weighted adjacency matrices that can be used as the spatial block graph elements. For clarification, only the Prior 1 is currently known, and the other two are newly introduced in our work. Previous works [3, 8, 9] defined multi-graphs on undirectional networks for the bike demand forecasting. In contrast, in our work, we define multi-graphs on directional networks for traffic speed forecasting based on the domain knowledge in the traffic area.

Figure 4: Three types of weighted adjacency matrices for the same link pair. Each relationship is reflected into the value. (a) Matrix 1: The shortest path distance (), (b) Matrix 2: The difference between link directions (), (c) Matrix 3: Positional relationship, two links meet at , where the extension of in the forward direction meets the extension of in the backward direction ().

Prior 1 [20]: Everything is related to everything else. But near things are more related than distant things.

Graph 1 (Distance, ): We consider the distance as the shortest inter-link distance on the path. When the link vectors are directly connected, i.e. = , we calculate the distance as the average of the link lengths, i.e. . When the link vectors are not directly connected, , the distance is evaluated based on the Dijkstra algorithm[7]. After all the pair-wise distances are evaluated, we define using the thresholded Gaussian kernel[16] as below, where and are hyperparameters.

(5)

Prior 2: Distant links can be related depending on their directions. Links having the same directions might be more related than links having the opposite directions.

Graph 2 (Direction, ): We consider a simple direction measure with a proper normalization. To our best knowledge, we are the first to utilize the relative direction information with graph convolutional networks.

(6)

Prior 3: Links are related depending on how they can be connected. The positional relationship, such as whether two links are heading to closely located destinations, could be additionally informative to the distance for understanding how links interact.

Graph 3 (Positional relationship, ): While Graph 1 considers only the shortest path-distance that connects from to , two links can be connected through many different paths. In order to capture how two links are related while considering a variety of connection paths, we extend the link vectors maintaining the start point and the link direction . Then we define as the unweighted adjacency matrices that contain the information on where the two extended link vectors meet as described below for each .

(7)
Figure 5: The four positional relationships. Each represents a different type of , depending on where the two extended link vectors meet.

Figure 5 shows the four types of positional relationships. represents four possible intersection points of two extended links vectors. exists if the extension of in the backward direction meets the extension of in the backward direction. , , and are similarly defined depending on whether forward or backward extensions of and can meet. Each type implies how two links could interact. For example, two links would head to the same area when exists as described in Figure 5(b). Note that any cannot exist when two links are exactly parallel. This case is taken into consideration by setting all values to zero.

Partition filters       We also define a set of partition filters that can be used over to create . is a scalar-input scalar-output function, and decribes the element-wise application of filter . We constrain the set of partition filters to satisfy in order to make sure that is spread over the partitioned matrices without any increase or decrease in the element-wise sum. can be designed based on the histogram analysis. To smoothly handle the boundary values, we choose triangular partition filters. In our work, only are considered for applying the partition filters. Figure 6 shows 4-directional triangular partition filters, . Instead of using directly, we will use that contain hybrid information between distance and direction because of its empirical superiority. For , appropriate partition filters can be designed by investigating the density in a similar way.

Figure 6: Creating partition filters from . X axis shows angles and Y axis shows values for the choice of partition filters. The direction histogram of the links is properly scaled for plotting purpose and smoothed with a Gaussian kernel. In our work, the four directions degrees, happened to coincide with the histogram peaks, however, this is not the only possible choice, and the design of the partition filter allows any number and any choice of directions based on the histogram analysis.

4.3 Building a Spatial Block

With the newly defined graph elements, we are ready to build spatial blocks for extracting complex spatial relationships. Spatial blocks are based on the three graph elements , and , and their partitioned versions. has no partition filter expansion, but we use instead of to exploit the hybrid information.

Figure 7: Three types of spatial blocks. Each small box represents a single graph convolution and we only denote the weighted adjacency matrix for simplicity. refers to the element-wise sum. Details are provided in the text.

In Figure 7, each box represents a single graph convolution as described in Eq. (4), and we only denote the weighted adjacency matrix in the figure for simplicity. The number of convolutional operation and the choice of weight matrices can vary depending on the dataset or the prediction task.

We designed three types of spatial blocks, as shown in Figure 7. Single(Figure 7(a)) refers to the simple graph convolution considering only the Euclidean distance. Parallel(Figure 7(b)) and Stacked(Figure 7(c)) refer to the multi-graph convolution including distance, direction, and positional relationship information. While both utilize four convolutional operations with different graph elements individually, they have different structures for connecting the graph elements. Parallel structure can be regarded as equivalent to multi-graph convolution structure as defined in previous works [3, 8, 9].

4.4 Building a Temporal Block

We conducted extensive experiments to design the temporal blocks, including graph convolution, self-attention[21], multi-convolution[18], and temporal relational reasoning[26]. But it turns out that simple convolution showed the best performance with the shortest training time. Based on the empirical results, we adapt the simple temporal block as shown in Figure 3(c).

5 Experiment

5.1 Datasets

We conducted experiments on two real-world large-scale datasets of Seoul, South Korea. Urban1(Gangnam) and Urban2(Mapo) correspond to the most crowded regions in Seoul, and they have highly complex connectivity patterns that cannot be explained only with distance; they have bidirectional links and complicated traffic signals. The traffic data were collected every 5 minutes for a month ranging from Apr 1st, 2018 to Apr 30th, 2018. The datasets were collected using the GPS of over 70,000 taxis.

Dataset Urban1 Urban2
Time spans 4/1/2018 4/30/2018
Time interval 5min
Region size (width, height)(m) (7000, 7000)
Number of links 480 455
Speed mean(std)(km/h) 26.333(10.638) 25.917(9.784)
Length mean(min, max)(m) 592(171, 2622) 561(80, 2629)
Links per 11.274 10.280
Average directly connected links 3.233 2.935
Table 1: Details of datasets.

Most of the previous studies have focused on the traffic networks that include only freeways and have simply defined links as the points without direction information [12, 23, 24, 4, 5]. Our datasets, however, not only include complex urban networks with a large number of intersections, traffic signals, and interactions with other roads(e.g. pedestrian paths), but also define links as the vectors including direction. We compared some statistical properties of the network to compare how complex they are. First, while every link pair in our datasets is connected, only 27 of link pairs are connected in the METR-LA dataset used in [12, 24]. Second, as described in Table 1, for both datasets, most of the links can be directly connected on average to three paths(i.e. inflow, outflow, and more), and there are more than 10 links within a 1. They imply our dataset has a fairly dense link population when compared to the highway only network of same city, Seoul. For highway only network, most links can be directly connected on average in two paths(i.e. inflow and outflow only), and there are only around three links within a 1. Due to these dense link population characteristics, distance graph element alone cannot sufficiently reflect the complex spatial relationship of the urban networks.

Model Urban1 Urban2
MAE MAPE (%) RMSE MAE MAPE (%) RMSE
HA 3.34/ 3.34/ 3.34 14.68/ 14.67/ 14.68 5.42/ 5.42/ 5.41 3.23/ 3.22/ 3.22 14.43/ 14.42/ 14.41 4.86/ 4.86/ 4.85
VAR 5.06/ 4.99/ 4.97 23.10/ 22.82/ 22.73 7.04/ 6.92/ 6.88 4.58/ 4.52/ 4.49 20.82/ 20.55/ 20.43 6.31/ 6.22/ 6.19
LSVR 3.82/ 3.89/ 3.93 15.35/ 17.99/ 17.39 5.64/ 5.74/ 5.84 4.38/ 4.22/ 3.92 17.01/ 16.82/ 18.45 5.83/ 5.71/ 5.47
ARIMA 3.49/ 3.79/ 4.04 15.40/ 16.85/ 18.09 5.28/ 5.65/ 5.94 3.30/ 3.56/ 3.78 14.78/ 15.99/ 17.03 4.77/ 5.09/ 5.37
FC-LSTM 3.91/ 3.92/ 3.92 17.29/ 17.32/ 17.31 6.38/ 6.39/ 6.39 3.81/ 3.81/ 3.82 17.10/ 17.12/ 17.12 5.57/ 5.58/ 5.58
DCRNN 3.17/ 3.46/ 3.73 13.52/ 14.83/ 15.95 4.94/ 5.30/ 5.61 3.08/ 3.31/ 3.50 13.55/ 14.63/ 15.52 4.58/ 4.86/ 5.08
STGCN 3.07/ 3.42/ 3.80 14.38/ 16.72/ 19.37 4.57/ 4.83/ 5.04 2.99/ 3.33/ 3.69 14.02/ 15.82/ 17.78 4.37/ 4.79/ 5.26
DDP-GCN(Single) 3.06/ 3.06/ 3.29 13.83/ 13.82/ 15.02 4.55/ 4.54/ 4.95 2.93/ 2.93/ 3.06 13.34/ 13.33/ 13.99 4.26/ 4.26/ 4.43
DDP-GCN(Parallel) 3.06/ 3.06/ 3.10 13.82/ 13.08/ 13.99 4.54/ 4.54/ 4.64 2.95/ 2.95/ 2.96 13.37/ 13.36/ 13.45 4.30/ 4.29/ 4.31
DDP-GCN(Stacked) 3.00/ 3.00/ 2.99 13.57/ 13.56/ 13.51 4.45/ 4.45/ 4.47 2.90/ 2.89/ 2.88 13.18/ 13.17/ 13.14 4.24/ 4.23/ 4.22
Table 2: Performance comparison(30/ 45/ 60min). The best performing models of each forecasting horizon and metric are shown in Bold. All cases are repeated five times.

5.2 Experimental Settings

We repeated the experiment five times and the average performances are provided in Table 2. For all datasets, we applied Z-score normalization. After excluding the weekends, 70% of the data is used for training, 10% for validation and the remaining 20% for testing in time order. For , as defined in Eq. 5, and are set depending on the data scale. In our study, we set to be and to be 0. For partition filters, we determined the number of filters, for and for , based on the histogram analysis of each dataset as described in Section 4.2. We set the to be 4(Urban1, Urban2) and to be 3(Urban1) or 4(Urban2). We set both and to 12 samples, where 12 corresponds to an one hour span. All experiments were implemented using Tensorflow 1.15 on a Linux cluster(CPU: Intel(R) Xeon(R) CPU E6-2620 v4 @ 2.10GHz, GPU: NVIDIA TITAN V). The training process takes about 15 minutes on a single GPU.

Baselines       We compare the proposed model (DDP-GCN) with the following methods: (1) HA: Historical Average; (2) VAR: Vector Auto-Regression[10]; (3) LSVR: Linear Support Vector Regression; (4) ARIMA: Auto-Regressive Integrated Moving Average model; (5) FC-LSTM: Recurrent Neural Network with fully connected LSTM hidden units[17]; (6) DCRNN: Diffusion Convolutional Recurrent Neural Networks[12], which manipulates bidirectional diffusion convolution on the graph for capturing spatial dependency and uses sequence-to-sequence architecture with gated recurrent units to capture temporal dependency, and (7) STGCN: Spatio-Temporal Graph Convolutional Networks[23], which is composed of spatiotemporal convolutional blocks including two gated sequential convolution layers and one spatial graph convolution layer in between.

5.3 Performance Comparison

Table 2 shows the performance comparison of DDP-GCN and the baselines of the datasets Urban1 and Urban2 for 30, 45, 60-min prediction on all datasets. To note, each model predicted all 12 sequential traffic speed values at all links simultaneously. These methods are evaluated based on three commonly used metrics in traffic forecasting, including (1) Mean Absolute Error (MAE), (2) Mean Absolute Percentage Error (MAPE), and (3) Root Mean Squared Error (RMSE).

Our proposed model, especially DDP-GCN(Stacked), achieves the best performance for all cases with the exception of the 30-min prediction MAPE of the Urban1 dataset. For 1-hour forecasting, our model showed an improvement of 7.40% on average (9.83% maximum). In particular, the stacked spatial block, firstly introduced in our work, outperformed the parallel spatial block for all cases. Unlike other baseline methods, DDP-GCN(Stacked) showed even better performance in longer forecasting horizons for some cases. Interestingly, we observed that graph convolution-based methods are generally accurate only for (relatively) short-term predictions and HA tends to be more accurate for (relatively) long-term predictions. We believe that this result is mainly due to the strong weekly periodicity of our dataset. While others utilize only recent 1-hour speed information, HA utilizes a different type of information, i.e. weekly speeds. Even though we exploit only recent 1-hour information with our model, however, our model always outperformed the others, including HA. It suggests that the non-Euclidean spatial relationships, direction, and positional relationship, are quite powerful priors for the accurate speed forecasting.

Figure 8: Examples of 60-min prediction results for the dataset Urban1. (Left: Apr 25th, Right: Apr 30th) DDP-GCN is more predictive of the abrupt changes.

When HA that uses additional inputs is excluded and DDP-GCN is compared to the graph convolution based methods only(DCRNN, STGCN, and DDP-GCN), DDP-GCN(Stacked) showed 9.60% average (19.84% maximum) improvement over all forecasting horizons and 16.07% average improvement for 1-hour forecasting. A segment of the prediction results is shown in Figure 8, and we can easily see that DDP-GCN(Stacked) is superior at capturing the abrupt changes, even 1-hour ahead. This result suggests that non-Euclidean information is essential for capturing the abrupt changes in complex networks in advance.

5.4 Benefits of Spatial Graph Elements

To investigate the effect of each spatial graph element, we evaluate the performance degradation by performing ablation tests using DDP-GCN(Stacked) on the Urban1 dataset. The result is shown in Table 3. Removing Direction() decreased the performance the most and removing Distance() the least. This result indicates that distance only element is not the most important graph element to reflect complex spatial relationships. On the other hand, non-Euclidean relationships, especially directions, should be fed into the network with partition filters.

Removed
Component
MAE MAPE(%) RMSE
Distance 3.01/ 3.01/ 3.00 13.59/ 13.58/ 13.51 4.47/ 4.47/ 4.47
Direction 3.04/ 3.04/ 3.03 13.76/ 13.75/ 13.70 4.53/ 4.53/ 4.55
Positional
Relationship
3.03/ 3.03/ 3.03 13.73/ 13.72/ 13.69 4.52/ 4.52/ 4.55
Distance
()
3.02/ 3.02/ 3.02 13.67/ 13.66/ 13.63 4.50/ 4.50/ 4.52
None 3.00/ 3.00/ 2.99 13.57/ 13.56/ 13.51 4.45/ 4.45/ 4.47
Table 3: Effect of each element on the spatial dependency modeling for the Urban1 dataset. The worst performing cases are shown in Bold. All cases are repeated five times.
K DCRNN STGCN
DDP-GCN
(Single)
DDP-GCN
(Parallel)
DDP-GCN
(Stacked)
1 5.57 5.61 4.95 4.64 4.47
2 5.27 5.55 4.82 4.60 4.46
3 5.02 5.31 4.83 4.59 4.46
4 4.85 5.43 4.75 4.58 4.46
5 4.82 5.37 4.73 4.58 4.46
10 - 5.25 4.69 4.56 4.47
20 - 5.30 4.68 4.57 4.48
Table 4: Effect of spatial priors compared with SOTA graph-based algorithms utilizing K-hops of the distance matrix with respect to the 1-hour forecasting with Urban1 dataset based on RMSE. All cases are repeated five times except once with DCRNN whose training is very long. For DCRNN, the cases of are not conducted because the training is very long and unstable.

We also examined if larger hops could be as important as non-Euclidean relationships. Here, instead of the 1stChebNet, we use K-polynomial ChebNet[6]. K-hops are applied only with the distance(). As shown in Table 4, even with large K, no model could outperform DDP-GCN(Stacked). Moreover, the performance of DDP-GCN(Stacked) was not improved for larger K. These results indicate that the direction and the positional relationship should be considered as important attributes for traffic forecasting, and they are more beneficial than the K-hop neighbors’ information.

6 Conclusion and Future Work

In this paper, we proposed and described a new traffic speed forecasting network utilizing three spatial dependencies, namely distance, direction, and positional relationships. Our model includes multi-graph convolution based on the properly modified graph elements by the simple partition filters. We have investigated examples of traffic forecasting problems and showed a large improvement in the long-term forecasting accuracy of highly complex urban networks when compared to the other state-of-the-art algorithms. In the future, we will further evaluate our model on other datasets and investigate how temporal periodicity influences traffic forecasting.

Footnotes

  1. For a concise explanation, we will refer to graph convolution as defined above, however, it also can also be generalized to multi-dimensional tensors [11, 23].

References

  1. J. L. Ba, J. R. Kiros and G. E. Hinton (2016) Layer normalization. arXiv preprint arXiv:1607.06450. Cited by: Figure 3.
  2. J. Bruna, W. Zaremba, A. Szlam and Y. LeCun (2013) Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203. Cited by: §2.1.
  3. D. Chai, L. Wang and Q. Yang (2018) Bike flow prediction with multi-graph convolutional networks. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 397–400. Cited by: §1, §1, §2.2, §4.2, §4.3.
  4. C. Chen, K. Li, S. G. Teo, X. Zou, K. Wang, J. Wang and Z. Zeng (2019) Gated residual recurrent graph neural networks for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 485–492. Cited by: §1, §5.1.
  5. Z. Cui, K. Henrickson, R. Ke and Y. Wang (2018) Traffic graph convolutional recurrent neural network: a deep learning framework for network-scale traffic learning and forecasting. arXiv preprint arXiv:1802.07007. Cited by: §1, §2.2, §5.1.
  6. M. Defferrard, X. Bresson and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §2.1, §5.4.
  7. E. W. Dijkstra (1959) A note on two problems in connexion with graphs. Numerische mathematik 1 (1), pp. 269–271. Cited by: §4.2.
  8. X. Geng, Y. Li, L. Wang, L. Zhang, Q. Yang, J. Ye and Y. Liu (2019) Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. Cited by: §1, §1, §2.2, §4.2, §4.3.
  9. X. Geng, X. Wu, L. Zhang, Q. Yang, Y. Liu and J. Ye (2019) Multi-modal graph interaction for multi-graph convolution network in urban spatiotemporal forecasting. arXiv preprint arXiv:1905.11395. Cited by: §1, §1, §2.2, §4.2, §4.3.
  10. J. D. Hamilton (1994) Time series analysis. Vol. 2, Princeton university press Princeton, NJ. Cited by: §5.2.
  11. T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.1, §4, footnote 1.
  12. Y. Li, R. Yu, C. Shahabi and Y. Liu (2017) Graph convolutional recurrent neural network: data-driven traffic forecasting. CoRR abs/1707.01926. External Links: Link, 1707.01926 Cited by: §1, §2.2, §5.1, §5.2.
  13. X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang and Y. Wang (2017) Learning traffic as images: a deep convolutional neural network for large-scale transportation network speed prediction. Sensors 17 (4), pp. 818. Cited by: §1.
  14. N. G. Polson and V. O. Sokolov (2017) Deep learning for short-term traffic flow prediction. Transportation Research Part C: Emerging Technologies 79, pp. 1–17. Cited by: §1.
  15. T. Seo, A. M. Bayen, T. Kusakabe and Y. Asakura (2017) Traffic state estimation on highway: a comprehensive survey. Annual Reviews in Control 43, pp. 128–151. Cited by: §3.
  16. D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega and P. Vandergheynst (2012) Signal processing on graphs: extending high-dimensional data analysis to networks and other irregular data domains. CoRR abs/1211.0053. External Links: Link, 1211.0053 Cited by: §4.2.
  17. I. Sutskever, O. Vinyals and Q. Le (2014) Sequence to sequence learning with neural networks. Advances in NIPS. Cited by: §5.2.
  18. C. Szegedy, S. Ioffe, V. Vanhoucke and A. A. Alemi (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence, Cited by: §4.4.
  19. A. Tascikaraoglu (2018) Evaluation of spatio-temporal forecasting methods in various smart city applications. Renewable and Sustainable Energy Reviews 82, pp. 424–435. Cited by: §1.
  20. W. R. Tobler (1970) A computer movie simulating urban growth in the detroit region. Economic geography 46 (sup1), pp. 234–240. Cited by: §4.2.
  21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser and I. Polosukhin (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §4.4.
  22. M. Wang, B. Lai, Z. Jin, X. Gong, J. Huang and X. Hua (2018) Dynamic spatio-temporal graph-based cnns for traffic prediction. CoRR abs/1812.02019. External Links: Link, 1812.02019 Cited by: §2.2.
  23. B. Yu, H. Yin and Z. Zhu (2017) Spatio-temporal graph convolutional neural network: A deep learning framework for traffic forecasting. CoRR abs/1709.04875. External Links: Link, 1709.04875 Cited by: §1, §2.2, §5.1, §5.2, footnote 1.
  24. B. Yu, H. Yin and Z. Zhu (2019) ST-unet: a spatio-temporal u-network for graph-structured time series modeling. arXiv preprint arXiv:1903.05631. Cited by: §1, §2.2, §5.1.
  25. H. Yu, Z. Wu, S. Wang, Y. Wang and X. Ma (2017) Spatiotemporal recurrent convolutional networks for traffic prediction in transportation networks. Sensors 17 (7), pp. 1501. Cited by: §1.
  26. B. Zhou, A. Andonian, A. Oliva and A. Torralba (2018-09) Temporal relational reasoning in videos. In The European Conference on Computer Vision (ECCV), Cited by: §4.4.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
407133
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description