Inferring and Improving Street Maps with Data-Driven Automation
Street maps help to inform a wide range of decisions. Drivers, cyclists, and pedestrians rely on street maps for search and navigation. Rescue workers responding to disasters like hurricanes, tsunamis, and earthquakes rely on street maps to understand where people are, and to locate individual buildings . Transportation researchers rely on street maps to conduct transportation studies, such as analyzing pedestrian accessibility to public transport . Indeed, with the need for accurate street maps growing in importance, companies are spending hundreds of millions of dollars to map roads globally111See e.g. “Uber Will Spend $500 Million on Mapping to Diverge From Google”, Fortune (2016-07-15) by Kirsten Korosec..
However, street maps are incomplete or lag behind new construction in many parts of the world. In rural Indonesia, for example, entire groups of villages are missing from OpenStreetMap, a popular open map dataset . In many of these villages, the closest mapped road is miles away. In Qatar, construction of new infrastructure has boomed in preparation for the FIFA World Cup 2022. Due to the rapid pace of construction, it often takes a year for digital maps to be updated to reflect new roads and buildings.222An example of a subdivision in Doha, Qatar that was missing from maps for years is detailed at https://productforums.google.com/forum/#!topic/maps/dwtCso9owlU. Even in countries like the United States where significant investments have been made in digital maps, construction and road closures often take days or weeks to be incorporated in map datasets.
These problems arise because the processes used to create and maintain street maps today are extremely labor-intensive. Modern street map editing tools allow users to trace and annotate roads and buildings on a map canvas overlayed on relevant data sources, so that users can effectively edit the map while consulting the data. These data sources include satellite imagery, aerial imagery, and GPS trajectories (which consist of sequences of GPS positions captured from moving vehicles). Although the data presented by these tools help users to update a map dataset, the manual tracing and annotation process is cumbersome and a major bottleneck in map maintenance.
Over the past decade, many automatic map inference systems have been proposed to automatically extract information from these data sources at scale. Several approaches develop unsupervised clustering and density thresholding algorithms to construct road networks from GPS trajectory datasets [1, 4, 7, 16, 21]. Others apply machine learning methods to process satellite imagery and extract road networks [19, 22], building polygons [13, 24], and road attribute annotations (e.g., the number of lanes, presence of cycling lanes, or presence of street parking on a road) [6, 18].
However, automatic map inference has failed to gain traction in practice due to two key limitations. First, existing inference methods have high error rates (low precision), which manifest in noisy outputs containing incorrect roads, buildings, and attribute annotations. Second, although prior work has shown how to detect roads and buildings from the data sources, the challenge of leveraging this information to update real street map datasets has not been addressed. We argue that even with lower error rates, the quality of outputs from automatic approaches is below that of manually curated street map datasets, and semi-automation is needed to efficiently but robustly take advantage of automatic map inference to accelerate the map maintenance process.
At MIT and QCRI, we have developed a number of algorithms and approaches to address these challenges [2, 3, 14], which we combined into a new system we call Mapster. Mapster is a human-in-the-loop street map editing system that incorporates three components to robustly accelerate the mapping process over traditional tools and workflows: high-precision automatic map inference, data refinement, and machine-assisted map editing.
First, Mapster applies automatic map inference algorithms to extract initial estimates from the raw data sources. Although these estimates are noisy, we minimize errors by applying two novel approaches, iterative tracing for road network inference and graph network annotation for attribute inference, that replace heuristic, error-prone post-processing steps present in prior work with end-to-end machine learning and other more robust methods.
Second, Mapster refines the noisy estimates from map inference into map update proposals by removing several types of errors and reducing noise. To do so, we apply conditional generative adversarial networks (cGANs) trained to transform the noisy estimates into refined outputs that are more consistent with human-annotated data.
Finally, a machine-assisted map editing framework enables the rapid, semi-automated incorporation of these proposed updates into the street map dataset. This editing tool addresses the problem of leveraging inferred roads, buildings, and attribute annotations to update existing street map datasets.
Figure 1 summarizes the interactions between these components. We include links to two videos demonstrating the execution of Mapster, along with a link Mapster’s source code (which we have released as free software), in the footnote.333Video of iterative tracing in action: https://youtu.be/3_AE2Qn-Rdg. Video of machine-assisted map editing: https://youtu.be/i-6nbuuX6NY. Mapster source code: https://github.com/mitroadmaps.
Below, we first introduce our automatic map inference approaches for inferring roads and road attributes, which obtain substantially higher precision than prior work. We then detail our data refinement strategy, which applies adversarial learning to improve the quality of inferred road networks. Finally, we discuss our machine-assisted map editor, which incorporates novel techniques to maximize user productivity in updating street maps. We conclude with a discussion of future work.
2 Automatically Inferring Road Networks
Given a base road network, which may be empty or may correspond to the roads in the current street map dataset, the goal of road network inference is to leverage GPS trajectory data and satellite imagery to produce a road network map that covers roads not contained in the base map. The road network map is represented as a graph where vertices are annotated with spatial longitude-latitude coordinates, and edges correspond to straight-line road segments.
Broadly, prior approaches infer roads by dividing the space into a 2D grid, classifying whether each grid cell contains a road, and then connecting cells together to form edges. Figure 2(a) summarizes this strategy. For satellite imagery, recent work obtains the per-cell classification by applying deep convolutional neural networks (CNNs) that segment the imagery, transforming the input imagery into a single-channel image indicating the neural network’s confidence that there is a road at each pixel [9, 10, 17]. For GPS trajectories, several approaches perform the classification based on the number of GPS trajectories passing through each cell [5, 8, 11, 20].
However, we find that these methods exhibit low accuracy when faced with practical challenges such as noisy data and complex road topology. Figure 2(b) shows the outputs of prior work around a major highway junction in Chicago. Noise in the per-cell classification estimates are amplified when we connect cells together to draw edges, resulting in garbled road network maps.
Indeed, both GPS trajectory data and satellite imagery exhibit several types of noise that make robust identification of roads challenging. While GPS samples in open areas are typically accurate to four meters, in practice, due to high-rise buildings and reflection, GPS readings may be as far off as tens of meters. Correcting this error is difficult because errors are often spatially correlated — multiple GPS readings at the same location may be offset in the same way as they encounter the same reflection and distortion issues. Similarly, roads in satellite imagery are frequently occluded by trees, buildings, and shadows. Furthermore, distinguishing roads and buildings from non-road trails and surface structures in imagery is often nontrivial.
To substantially improve precision, we adopt an iterative road tracing approach in lieu of the per-cell classification strategy. Our iterative tracing method mimics the gradual tracing process that human map editors use to create road network maps, thereby eliminating the need for the heuristic post-processing steps that prior work applies to draw edges based on cell classification outputs.
Iterative tracing begins with the base map, and on each iteration, it adds a single road segment (one edge) to the map. To decide where to position this segment, it uses the data source to compute two values for each vertex in the portion of the map traced so far: (a) a confidence that an unmapped road intersects the vertex, and (b) the most likely angular direction of the unmapped road. It then selects the vertex with the highest confidence, and adds a segment in the corresponding direction. This procedure is repeated, adding one segment at a time to the map, until the highest confidence for the presence of an unmapped road falls below a threshold. We illustrate the iterative tracing procedure in Figure 2(c).
We develop different approaches to compute the unmapped road confidence and direction from satellite imagery  and from GPS trajectories . With satellite imagery, we compute these through a deep neural network. We develop a CNN model that inputs a window of satellite imagery around a vertex of the road network, along with an additional channel containing a representation of the road network that has been traced so far. We train the CNN to output the likelihood that an unmapped road intersects the vertex, and the most likely angular direction of this unmapped road.
With GPS trajectories, we compute the values at a vertex based on the peak direction of trajectories that pass through the vertex. We identify all trajectories that pass through the vertex, and construct a smoothed polar histogram over the directions that those trajectories follow after moving away from the vertex. We then apply a peak finding algorithm to identify peaks in the histogram that have not already been explored earlier in the tracing process. We select the peak direction, and measure confidence in terms of the volume of trajectories that match the peak direction.
Often, both satellite imagery and GPS trajectory data may be available in a region requiring road network inference. We develop a two-stage approach that leverages both data sources when inferring road networks to reduce errors and improve the map quality. In the first stage, we run iterative tracing using GPS data to infer segments along high-throughput roadways. Because these roads have high traffic volume, they are covered by large numbers of GPS trajectories, and so can be accurately traced with GPS data. At the same time, junctions along high-throughput roads (especially controlled-access highways) are generally more complex, often involving roundabouts and overpasses. These features make tracing based on satellite imagery challenging.
In the second stage, we fill in gaps in the road network with lower-throughput residential and service roads that were missed in the first stage by tracing with satellite imagery. These roads have simple topology and are covered by few GPS trajectories, making imagery a preferred data source.
We evaluate our method by comparing road networks inferred through iterative tracing against those inferred by prior cell-classification approaches. Figure 3 shows qualitative results using satellite imagery in Boston, Chicago, and Salt Lake City, and using GPS trajectory data in Chicago and Los Angeles. In contrast to cell classification, iterative tracing from satellite imagery infers roads robustly despite occlusion by buildings and shadows in dense urban areas. In lower density areas like Salt Lake City, iterative tracing performs comparably to prior work.
Cell classification from GPS trajectories produces noisy outputs at crucial but complex map features like highway interchanges. Despite the intricate connections at these features, iterative tracing accurately maps the interchanges.
3 Inferring Road Attributes
Modern navigation systems use more than just the road topology – they also make use of a number of road metadata attributes, such as the number of lanes, presence of cycling lanes, or the presence of street parking along a road. As a result, inferring these attributes is an important part of the Mapster system.
Prior work in road attribute inference applies an image classification approach that trains a convolutional neural network (CNN) to predict the road attributes given a window of satellite imagery around some location along the road. Then the local prediction at each location is post-processed through a global inference phase, e.g., inference in a Markov Random Field (MRF), to remove scattered errors.
This global inference phase is necessary because the CNN prediction at each location is often erroneous due to the limited receptive field of the CNN – in many cases, local information in the input window of satellite imagery is not sufficient to make a correct prediction. For example, in Figure 4(b.1), the road on the left side is occluded by trees. If the CNN inputs a window only from the left part of the road, it will be unable to correctly determine the number of lanes. The global inference phase algorithm can take all the predictions along the road as well as prior knowledge, such that the road attributes are often homogeneous along the road, into account to correct the errors in CNN classifiers.
However, we find this post-processing fix is often error-prone. For example, see Figure 4(b.2), where the lane count changes from 4 to 5 near an intersection. The image classifier outputs partially incorrect labels. However, the post-processing strategy cannot fix this problem as the global inference phase only takes the predictions from the image classifier as input and it may not be able to tell whether the number of lanes indeed changes or it is an error of the image classifier. This limitation is caused by the information barrier induced by the separation of local classification and global inference; the global inference phase can only use the image classifier’s prediction as input, but not other important information such as whether trees occlude the road or whether the road width changes.
To overcome the information barrier limitation, we develop a hybrid neural network architecture (see Figure 4(a)) that combines a CNN with a graph neural network (GNN). The CNN extracts local features from each segment along a road. The GNN then propagates these features along the road network. The end-to-end training of the combined CNN and GNN model is the key to the success of the method: our algorithm doesn’t rely on the often handcrafted and error-prone post-processing algorithm; instead, we learn the post-processing rules as well as the required CNN features directly from the data. The usage of the GNN in our system eliminates the receptive field limitation of local CNN image classifiers and the combination of CNN and GNN eliminates the information barrier limitation present in prior work, allowing the model to learn complex inductive rules that make it more robust to challenges such as occlusion in satellite imagery and the partial disappearance of important information (Figure 4(b)).
4 Refining Inferred Data
Although iterative tracing improves substantially over prior grid cell classification approaches, it nevertheless creates road networks with noisy features that clearly distinguish them from human-drawn maps. In particular, iterative tracing effectively captures the topology of road networks, but leaves geometrical abnormalities such as the examples of off-center junction vertices and noisy road curvature in Figure 5(a).
To refine the map and rectify these abnormalities, we use a model to learn the realistic road appearance. Our model is based on conditional generative adversarial networks (cGANs) . These networks learn to realistically reproduce complex image-to-image transformations, and have been successfully applied for many tasks including adding color to black-and-white images and transforming photos taken in daylight to plausible nighttime photos of the same scene .
An obvious transformation to learn for map inference is transforming satellite imagery or representations of GPS trajectories into road networks, where the output image contains lines corresponding to road segments. However, we find that the learning problem is too difficult under this strategy, and the cGAN model fails to learn to robustly identify roads — instead, it learns primarily to produce arbitrary lines that resemble lines in the ground truth road network representation.
Thus, instead, our cGAN model inputs not only satellite imagery or GPS trajectory data, but also a representation of the road network produced by iterative tracing. It outputs a refined road network representation where abnormalities in the input network have been corrected. By providing this initial road network, we reduce the complexity of the transformation, and thereby assist the cGAN to learn the transformation, especially early in the training process. Incorporating the initial road network representation derived from iterative tracing into the cGAN was a crucial insight that made training the adversarial model feasible.
Our cGAN architecture consists of two components: a generator that produces refined road networks given an initial road network and a data source, and a discriminator that learns to distinguish refinements made by the generator from road networks in the ground truth dataset. This network is adversarial because we train the generator and discriminator with opposing loss functions: the discriminator minimizes its classification error at distinguishing ground truth and generated (refined) road networks. In contrast, the generator learns by maximizing the discriminator’s classification error. Thus, in effect, we train the generator by having it learn to fool the discriminator into classifying its generated road network as ground truth.
The initial road network provided to the generator enables the cGAN model to learn to produce realistic road networks. At first, the generator simply copies the road network from its input to its output in order to deceive the discriminator. However, once the discriminator learns to better distinguish the iterative tracing road networks from hand-drawn ground truth roads, the generator begins making small adjustments to the roads so that they appear to be hand-drawn. As training continues, these adjustments become more robust and complete.
Figure 5(a) shows the outputs of our cGAN in blue, both on the geometrical abnormalities introduced earlier and on a larger region in Minneapolis. While refinement does not substantially alter the topology of the road network, the cGAN improves the geometry so that inferred roads resemble hand-mapped roads. These geometry improvements help to reduce the work needed to integrate inferred data into the street map.
5 Machine-Assisted Map Editing
To improve street map datasets, the inferred road network derived from iterative tracing and refinement steps must be incorporated into the existing road network map. Fully automated integration of the inferred road network is impractical: because the inferred roads may still include errors after refinement, adding all of the inferred roads to the map dataset would degrade the precision of the dataset.
Instead, we develop a human-in-the-loop map editing framework that enables human map editors to rapidly validate automatically inferred data . On initialization, our machine-assisted map editor builds an overlay containing the inferred road segments. Users interact with the overlay by left and right clicking to mark segments as correct or incorrect. Thus, rather than trace roads through a series of repeated clicks along the road, when a correct inferred segment already covers the road, users of the machine-assisted editor can rapidly add the road to the map with a single click on the inferred segment in the overlay.
Our map editor has two additional features to improve validation speed. First, the interface includes a prune button that, when pressed, executes a shortest-path based pruning algorithm to eliminate minor residential and service roads from the overlay, leaving only major arterial roads. This functionality is especially useful when mapping areas where the existing road network in the street map dataset has low coverage. In these areas, adding every missing road to the map may require substantial effort, but the quality of the map could already be improved significantly if major roads are incorporated. The pruning algorithm is effective at helping users focus on mapping these unmapped, major roads by reducing information overload. We show an example pruning result in Figure 5(b). Purple segments are pruned, leaving the yellow segments that correspond to major inferred roads.
Pruning is most useful in low-coverage areas. For high-coverage areas, we develop a teleport button that pans users to a connected component of inferred roads. In high-coverage areas, only a small number of roads are missing from the map, and identifying an unmapped road requires users to painstakingly scan the imagery one tile at a time. The teleport button eliminates this need, allowing users to jump to a group of missing roads and immediately begin mapping them.
6 Future Work
Although Mapster substantially reduces the workload of map creation and maintenance, the automatically inferred street maps still have more errors than maps created by professional map makers. Filling this gap requires advances in machine learning approaches for automatic map inference. Below, we detail several promising avenues to improve inference performance.
First, better neural network architectures can improve the performance of automatic mapping. So far, the design of the neural network architectures in Mapster are mostly inspired by the best practice in general computer vision tasks such as image segmentation and image classification. However, map inference tasks have unique characteristics such as the strong spatial correlation in the satellite imagery. Thus, improved neural network architectures that are specialized for map making tasks may yield higher accuracy. For example instead of using raw images or GPS traces as input, node and edge embeddings garnered from unsupervised tasks on extremely large map datasets could be used.
Second, end-to-end loss functions are a promising avenue to directly learn desired properties of the output road network. However, these desired properties, such as high precision and high recall over roads, generally can only be computed from the road network graph. As a result, the objectives are non-differentiable, since the graph can only be extracted from the probabilistic output of a machine learning model through non-differentiable functions. Thus, both prior work and our iterative tracing approach learn to build a graph indirectly: prior work minimizes the per-cell classification error, and in iterative tracing on satellite imagery we minimize the difference between the predicted angle of an untraced road and the correct angle on each tracing step. Nevertheless, end-to-end training has been shown to improve performance in other machine learning tasks, and could improve accuracy of automatic map creation and maintenance. Several techniques have been proposed for optimizing non-differentiable metrics. For example, reinforcement learning techniques can search for optimal policies under non-differentiable rewards. Alternatively, it may be possible to train an additional neural network which takes the intermediate map representation (e.g., road segmentation) as input and predicts the value of evaluation metrics. This additional neural network acts as a differentiable approximation of the evaluation metrics and can be leveraged to train a model to infer road networks that score highly on the metric.
In addition to potential improvements in the machine learning techniques, incorporating new data sources would also extend the capability of Mapster. In particular, two promising data sources are drone imagery and video from dashboard cameras. Drone imagery enables on-demand image sensing; for instance, if we find that the road structure in a region is unclear from satellite imagery and GPS data, we can reactively assign drones to collect aerial images over that region. Video from dashboard cameras enables inferring several additional street map features such as street names, business names, road signs, and road markers.
The world has approximately 64 million kilometers of roads and the road network is growing at a rapid pace as major countries like China, India, Indonesia, and Qatar gather economic momentum. Street maps are important. However, creating and maintaining street maps is very expensive and involves labor-intensive processes. As a result, although a lot of effort and money has been spent in maintaining street maps, today’s street maps are still imperfect, and are frequently either incomplete or lag behind new construction. Mapster is a holistic approach for applying automation to reduce the work needed to maintain street maps. By incorporating automatic map inference with data refinement and a machine-assisted map editor, Mapster makes automation practical for map editing, and enables the curation of map datasets that are more complete and up-to-date at less cost.
-  Ahmed, M., Karagiorgou, S., Pfoser, D., and Wenk, C. A comparison and evaluation of map construction algorithms using vehicle tracking data. GeoInformatica 19, 3 (2015), 601–632.
-  Bastani, F., He, S., Abbar, S., Alizadeh, M., Balakrishnan, H., Chawla, S., and Madden, S. Machine-assisted map editing. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (2018), ACM, pp. 23–32.
-  Bastani, F., He, S., Abbar, S., Alizadeh, M., Balakrishnan, H., Chawla, S., Madden, S., and DeWitt, D. RoadTracer: Automatic extraction of road networks from aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 4720–4728.
-  Biagioni, J., and Eriksson, J. Inferring road maps from global positioning system traces: Survey and comparative evaluation. Journal of the Transportation Research Board, 2291 (2012).
-  Biagioni, J., and Eriksson, J. Map inference in the face of noise and disparity. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems (2012), ACM, pp. 79–88.
-  Cadamuro, G., Muhebwa, A., and Taneja, J. Assigning a grade: Accurate measurement of road quality using satellite measurements. Proceedings of NIPS Workshop on Machine Learning for the Developing World (2018).
-  Cao, L., and Krumm, J. From GPS Traces to a Routable Road Map. In ACM SIGSPATIAL (2009), pp. 3–12.
-  Chen, C., and Cheng, Y. Roads digital map generation with multi-track gps data. Proceedings of the 2008 International Workshop on Geoscience and Remote Sensing (2008).
-  Cheng, G., Wang, Y., Xu, S., Wang, H., Xiang, S., and Pan, C. Automatic road detection and centerline extraction via cascaded end-to-end convolutional neural network. IEEE Transactions on Geoscience and Remote Sensing 55, 6 (2017), 3322–3337.
-  Costea, D., and Leordeanu, M. Aerial image geolocalization from recognition and matching of roads and intersections. Proceedings of the British Machine Vision Conference (2016).
-  Davies, J. J., Beresford, A. R., and Hopper, A. Scalable, distributed, real-time map generation. IEEE Pervasive Computing 5, 4 (2006), 47–54.
-  Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial nets. In Advances in Neural Information Processing Systems (2014), pp. 2672–2680.
-  Hamaguchi, R., and Hikosaka, S. Building detection from satellite imagery using ensemble of size-specific detectors. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2018).
-  He, S., Bastani, F., Abbar, S., Alizadeh, M., Balakrishnan, H., Chawla, S., and Madden, S. RoadRunner: Improving the precision of road network inference from gps trajectories. In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (2018).
-  Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. Image-to-image translation with conditional adversarial nets.
-  Liu, X., Biagioni, J., Eriksson, J., Wang, Y., Forman, G., and Zhu, Y. Mining Large-Scale, Sparse GPS Traces for Map Inference: Comparison of Approaches. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012).
-  Máttyus, G., Luo, W., and Urtasun, R. DeepRoadMapper: Extracting road topology from aerial images. In Proceedings of the IEEE International Conference on Computer Vision (2017), pp. 3438–3446.
-  Mattyus, G., Wang, S., Fidler, S., and Urtasun, R. Enhancing road maps by parsing aerial images around the world. In Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 1689–1697.
-  Mnih, V., and Hinton, G. E. Learning to detect roads in high-resolution aerial images. In European Conference on Computer Vision (2010), Springer, pp. 210–223.
-  Shi, W., Shen, S., and Lu, Y. Automatic generation of road network map from massive gps, vehicle trajectories. Proceedings of the 12th International IEEE Conference on Intelligent Transportation Systems (2009).
-  Stanojevic, R., Abbar, S., Thirumuruganathan, S., Chawla, S., Filali, F., and Aleimat, A. Robust road map inference through network alignment of trajectories. In Proceedings of the 2018 SIAM International Conference on Data Mining (2018), SIAM.
-  Wegner, J. D., Montoya-Zegarra, J. A., and Schindler, K. Road networks as collections of minimum cost paths. ISPRS Journal of Photogrammetry and Remote Sensing 108 (2015), 128–137.
-  Yin, A. A mapathon to pinpoint areas hardest hit in Puerto Rico. The New York Times.
-  Zhao, K., Kang, J., Jung, J., and Sohn, G. Building extraction from satellite images using mask r-cnn with building boundary regularization. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2018).
-  Zielstra, D., and Hochmair, H. H. Comparative study of pedestrian accessibility to transit stations using free and proprietary network data. Journal of the Transportation Research Board 2217, 1 (2011), 145–152.