Physics-informed GANs for coastal flood visualization
As climate change increases the intensity of natural disasters, society needs better tools for adaptation. Floods, for example, are the most frequent natural disaster, but during hurricanes the area is largely covered by clouds and emergency managers must rely on nonintuitive flood visualizations for mission planning.
To assist these emergency managers, we have created a deep learning pipeline that generates visual satellite images of current and future coastal flooding.
We advanced a state-of-the-art GAN called pix2pixHD, such that it produces imagery that is physically-consistent with the output of an expert-validated storm surge model (NOAA SLOSH).
By evaluating the imagery relative to physics-based flood maps, we find that our proposed framework outperforms baseline models in both physical-consistency and photorealism.
While this work focused on the visualization of coastal floods, we envision the creation of a global visualization of how climate change will shape our earth
[name=X. X., color=orange]xx \definechangesauthor[name=B. L., color=MediumSeaGreen]bl \definechangesauthor[name=A. P., color=DeepPink]ap
As our climate changes, natural disasters become more intense IPCC (2018). Floods are the most frequent weather-related disaster Centre for Research on the Epidemiology of Disasters (CRED) and UN Office for Disaster Risk Reduction UNISDR (2015) and already cost the U.S. per year; this damage will only grow over the next several decades NOAA National Centers for Environmental Information (NCEI) (2020); IPCC (2018). Today, emergency managers and local decision-makers rely on visualizations to understand and communicate flood risks (e.g., building damage) NOAA (2020). Shortly after a coastal flood, however, clouds cover the affected area and before a coastal flood no RGB satellite imagery exists to plan flood response or resilience strategies. Existing visualizations are limited to informative overviews (e.g., color-coded maps NOAA (2020); NOAA National Weather Service National Hurrican Center Storm Surge Prediction Unit (2020); Climate Central (2018)) or intuitive (i.e., photorealistic) street-view imagery Schmidt et al. (2019); Strauss (2015). Expert interviews, however, suggest that color-coded maps (displayed in Fig. 2a) are non-intuitive and can complicate communication among decision makers Radovan et al. (2020). Street-view imagery (displayed in Fig. 2b), on the other hand, offers intuitive understanding of flood damage, but remains too local for city-wide planning Radovan et al. (2020). To assist with both climate resilience and disaster response planning, we propose the first deep learning pipeline that generates satellite imagery of coastal floods, creating visualizations that are both intuitive and informative.
Recent advances in generative adversarial networks (GANs) generated photorealistic imagery of faces Isola et al. (2017); Wang et al. (2018), animals Zhu et al. (2017); Brock et al. (2018), or even satellite Requena-Mesa et al. (2019); Frühstück et al. (2019), and street-level flood imagery Schmidt et al. (2019). Disaster planners and responders, however, need imagery that is not only photorealistic, but also physically-consistent. In our implementation, we consider both GANs and variational autoencoders (VAEs), where GANs generate more photorealistic imagery (Dosovitskiy and Brox (2016); Zhu et al. (2017), Fig. 3) and VAEs capture system uncertainties more accurately Casale et al. (2018); Kingma and Welling (2014). Because our use case requires photorealism to provide intuition, we extend a state-of-the-art, high-resolution GAN, pix2pixHD Wang et al. (2018), to take in physical constraints and produce imagery that is both photorealistic and physically-consistent. We leave ensemble predictions to account for system uncertainties for future work.
There are multiple approaches to generating physically-consistent imagery with GANs, where we define physically-consistent to assess: Does the generated imagery depict the same flood extent as the storm surge model? One approach is conditioning the GAN on the outputs of physics-based models Reichstein et al. (2019); another approach is using a physics-based loss during evaluation Lesort et al. (2019); and yet another is embedding the neural network in a differential equation Rackauckas et al. (2020) (e.g., as parameters, dynamics Chen et al. (2018), residual Karpatne et al. (2017), differential operator Raissi (2018); Long et al. (2019), or solution Raissi et al. (2019)). Our work focuses on the first two methods, leveraging years of scientific domain knowledge by incorporating a physics-based storm surge model in the image generation and evaluation pipeline.
This work makes three contributions: 1) the first physically-consistent visualization of coastal flood model outputs as high-resolution, photorealistic satellite imagery; 2) a novel metric, the Flood Visualization Plausibility Score (FVPS), to evaluate the photorealism and physical-consistency of generated imagery; and 3) an extensible framework to generate physically-consistent visualizations of climate extremes.
The proposed pipeline uses a generative vision model to generate post-flooding images from pre-flooding images and a flood extent map, as shown in Fig. 1.
Data Overview. Obtaining post-flood images that display standing water is challenging due to cloud-cover, time of standing flood, satellite revisit rate, and cost of high-resolution imagery. This work leverages the xBD dataset Gupta et al. (2019), a collection of pre- and post-disaster images from events like Hurricane Harvey or Florence, from which we obtained pre- and post-flooding image pairs with the following characteristics: , RGB, (post-processing details in Section 5.2). We also downloaded flood hazard maps (at ), which are outputs of NOAA’s widely used storm surge model, SLOSH, that models the dynamics of hurricane winds pushing water on land (Section 5.2). We then aligned the flood hazard map with the flood images and reduced it into a binary flood extent mask (flooded vs. non-flooded).
Model architecture. The central model of our pipeline is a generative vision model that learns the physically-conditioned image-to-image transformation from pre-flood image to post-flood image. We leveraged the existing implementation of the GAN pix2pixHD Wang et al. (2018) and extended the input dimensions to to incorporate the flood extent map. Note that the pipeline is modular, such that it can be repurposed for visualizing other climate impacts.
The Evaluation Metric Flood Visualization Plausibility Score (FVPS). Evaluating imagery generated by a GAN is difficult Xu et al. (2018); Borji (2019). Most evaluation metrics measure photorealism or sample diversity Borji (2019), but not physical consistency Ravì et al. (2019) (see, e.g., SSIM Wang et al. (2004), MMD Bounliphone et al. (2016), IS Salimans et al. (2016), MS Tong et al. (2017) , FID Heusel et al. (2017); Zhou et al. (2020), or LPIPS Zhang et al. (2018)).
To evaluate physical consistency, as defined in Section 1, we propose using the intersection over union (IoU) between water in the generated imagery and water in the flood extent map. This method relies on flood masks, but because there are no publicly available flood segmentation models for RGB imagery, we trained our own model on hand-labeled flooding images (Section 5.2). This segmentation model produced flood masks of the generated and ground-truth flood image which allowed us to measure the overlap of water in between both. When the flood masks overlap perfectly, the IoU is 1; when they are completely disjoint, the IoU is 0.
To evaluate photorealism, we used the state-of-the-art perceptual similarity metric Learned Perceptual Image Patch Similarity (LPIPS) Zhang et al. (2018). LPIPS computes the feature vectors (of an ImageNet-pretrained deep CNN, AlexNet) of the generated and ground-truth tile and returns the mean-squared error between the feature vectors (best LPIPS is , worst is ).
Because the joint optimization over two metrics poses a challenging hyperparameter optimization problem, we propose to combine the evaluation of physical consistency (IoU) and photorealism (LPIPS) in a new metric (FVPS), called Flood Visualization Plausibility Score (FVPS). The FVPS is the harmonic mean over the submetrics, IoU and , that are both -bounded. Due to the properties of the harmonic mean, the FVPS is if any of the submetrics is ; the best FVPS is .
3 Experimental Results
In terms of both physical-consistency and photorealism, our physics-informed GAN outperforms an unconditioned GAN that does not use physics, as well as a handcrafted baseline model (Fig. 3).
A GAN without physics information generates photorealistic but non physically-consistent imagery. The inaccurately modeled flood extent in Fig. 3e illustrates the physical-inconsistency and a low IoU of in Table 1 over the test set further confirms it (see Section 5.2 for test set details). Despite the photorealism (), the physical-inconsistency renders the model non-trustworthy for critical decision making, as confirmed by the low FVPS of . The model is the default pix2pixHD Wang et al. (2018), which only uses the pre-flood image and no flood mask as input.
A handcrafted baseline model generates physically-consistent but not photorealistic imagery. Similar to common flood visualization tools Climate Central (2018), the handcrafted model overlays the flood mask input as a hand-picked flood brown (#998d6f) onto the pre-flood image, as shown in Fig. 3g. Because typical storm surge models output flood masks at low resolution ( NOAA National Weather Service National Hurrican Center Storm Surge Prediction Unit (2020)), the handcrafted baseline generates pixelated, non-photorealistic imagery. Combining the high IoU of and the poor LPIPS of , yields a low FVPS score of , highlighting the difference to the physics-informed GAN in a single metric.
The proposed physics-informed GAN generates physically-consistent and photorealistic imagery. To create the physics-informed GAN, we trained pix2pixHD Wang et al. (2018) from scratch on our dataset ( on Google Cloud GPUs). This model successfully learned how to convert a pre-flood image and a flood mask into a photorealistic post-flood image, as shown in Fig. 5. The model outperformed all other models in IoU (), LPIPS (), and FVPS () (Table 1). The learned image transformation “in-paints“ the flood mask in the correct flood colors and displays an average flood height that does not cover structures (e.g., buildings, trees), as shown in randomly sampled test images in Fig. 4. While our model also outperforms the VAEGAN (BicyleGAN), the latter has the potential to create ensemble forecasts over the unmodeled flood impacts, such as the probability of destroyed buildings.
|GAN w/ phys. (ours)||0.265||0.283||0.502||0.365||0.533||0.408|
|GAN w/o phys.||0.293||0.293||0.226||0.226||0.275||0.275|
|VAEGAN w/ phys.||0.449||-||0.468||-||0.437||-|
4 Discussion and Future Work
Although our pipeline outperformed all baselines in the generation of physically-consistent and photorealistic imagery of coastal floods, there are areas for improvement in future works. For example, our dataset only contained samples and is biased towards vegetation-filled satellite imagery; this data limitation likely contributes to our model rendering human-built structures, such as streets and out-of-distribution skyscrapers in Fig. 4 top-left, as smeared. In addition, the dataset was generated by Maxar imagery and preliminary results suggest that our model does not generalize well to other data sources such as NAIP imagery USDA-FSA-APFO Aerial Photography Field Office (2019). Although we attempted to overcome our data limitations using several state-of-the-art augmentation techniques, this work would benefit from more public sources of high-resolution satellite imagery (experiment details in Section 5.3). Finally, the computational intensity of training GANs made it difficult to fine-tune models on new data; improved transfer learning techniques could address this challenge. Lastly, satellite imagery is an internationally trusted source for analyses in deforestation, development, or military Hansen et al. (2013); Anderson et al. (2017), and with the rise of “deep-fake“ models, more work is needed in the identification of and education around misinformation and ethical AI Barredo Arrieta et al. (2020). Given our pipeline’s results, however, we hope to deploy with NOAA by integrating flood forecasts with aerial imagery along the entire U.S. East Coast.
Vision for the future. We envision a global visualization tool for climate impacts. Our proposed pipeline can generalize in time, space, and type of event. By changing the input data, future work can visualize impacts of other well-modeled, climate-attributed events, including arctic sea ice melt, wildfires, or droughts. Non-binary climate impacts, such as inundation height, or drought strength can be generated by replacing the binary flood mask with continuous model predictions. Opportunities are abundant for further work in visualizing our changing Earth, and given its potential impact for both climate mitigation and adaptation, we encourage the ML community to take up this challenge.
This research was conducted at the Frontier Development Lab (FDL), US. The authors gratefully acknowledge support from the MIT Portugal Program, National Aeronautics and Space Administration (NASA), and Google Cloud. Further, we greatly appreciate the time, feedback, direction, and help from Prof. Bistra Dilkina, Ritwik Gupta, Mark Veillette, Capt. John Radovan, Peter Morales, Esther Wolff, Leah Lovgren, Guy Schumann, Freddie Kalaitzis, Richard Strange, James Parr, Sara Jennings, Jodie Hughes, Graham Mackintosh, Michael van Pohle, Gail M. Skofronick-Jackson, Tsengdar Lee, Madhulika Guhathakurta, Julien Cornebise, Maria Molina, Massy Mascaro, Scott Penberthy, Derek Loftis, Sagy Cohen, John Karcz, Jack Kaye, Janot Mendler de Suarez, Campbell Watson, and all other FDL researchers.
5.1 Additional Results
Pre- and post-flood imagery
Post-flood images that display standing water are challenging to acquire due to cloud-cover, time of standing flood, satellite revisit rate, and cost of high-resolution imagery. To the extent of the authors’ knowledge, xBD Gupta et al. (2019) is the best publicly available data-source for preprocessed high-resolution imagery of pre- and post-flood images. More open-source, high-resolution, pre- and post-disaster images can be found in unprocessed format on DigitalGlobe’s Open Data repository DigitalGlobe (2020).
Data Overview: flood-related RGB image pairs from seven flood events at of resolution of which 30% display a standing flood ().
Flood-related events: hurricanes (Harvey, Florence, Michael, Matthew in the U.S. East or Golf Coast), spring floods (Midwest U.S., ’19), tsunami (Indonesia), monsoon (Nepal).
Our evaluation test set is composed by 108 images of each hurricane Harvey and Florence. The test set excludes imagery from hurricane Michael or Matthew, because the majority of tiles does not display standing flood.
We did not used digital elevation maps (DEMs), because the information of low-resolution DEMs is contained in the storm surge model and high-resolution DEMs for the full U.S. East Coast are not publicly available.
An important part of pre-processing the xBD data was to correct the geospatial references per image. Correcting the geolocation is necessary to extrapolate our model to visualize future floods across the full U.S. East Coast, based on storm surge model outputs NOAA (2020) and high-resolution imagery USDA-FSA-APFO Aerial Photography Field Office (2019). To align the imagery, we (1) extracted tiles from NAIP that approximately match xBD tiles via google earth engine, (2) detected keypoints in both tiles via AKAZE, (3) identified matching keypoints via l2-norm in image coordinates, (4) approximated the homography matrix between two feature matrices via RANSAC, and (5) applied the homography matrix to transform the xBD tile.
For post-flood images, segmentation masks of flooded/non-flooded pixels were manually annotated to train a pix2pix segmentation model Isola et al. (2017) from scratch. The model consisted of a vanilla U-Net for the generator that was trained with L1-loss, IoU, and adversarial loss; its last layers were finetuned solely on L1-loss. A four-fold cross validation was performed leaving images for testing. The segmentation model selected to be used by the FVPS has a mean IoU performance of . Labelled imagery will be made available at the project GitLab.
Storm Surge predictions
Developed by the National Weather Service (NWS), the Sea, Lake and Overland Surges from Hurricanes (SLOSH) model Jelesnianski et al. (1992) estimates storm surge heights from atmospheric pressure, size, forward speed and track data, which are used as a wind model driving the storm surge. The SLOSH model consists of shallow water equations, which consider unique geographic locations, features and geometries. The model is run in deterministic, probabilistic and composite modes by various agencies for different purposes, including NOAA, National Hurricane Center (NHC) and NWS. We use outputs from the composite approach – that is, running the model several thousand times with hypothetical hurricanes under different storm conditions. As a result, we obtain a flood hazard map as displayed in Fig. 2a which are storm-surge, height-differentiated, flood extents. Future works will use the state-of-the-art ADvanced CIRCulation model (ADCIRC) Luettich et al. (1992) model, which has a stronger physical foundation, better accuracy, and higher resolution than SLOSH. ADCIRC storm surge model output data is available for the USA from the Flood Factor online tool developed by First Street Foundation.
Standard data augmentation, here rotation, random cropping, hue, and contrast variation, and state-of-the art augmentation - here elastic transformations Simard et al. (2003) - were applied. Further, spectral normalization Miyato et al. (2018) was used to stabilize the training of the discriminator. And a relativistic loss function has been implemented to stabilize adversarial training. We also experimented with training pix2pixHD on LPIPS loss. Quantitative evaluation of these experiments, however, showed that they did not have significant impact on the performance and, ultimately, the results in the paper have been generated by the pytorch implementation Wang et al. (2018) extended to -channel inputs.
Pre-training LPIPS on satellite imagery. The standard LPIPS did not clearly distinguish in between the handcrafted baseline and the phyiscs-informed GAN, contrasting the opinion of a human evaluator. This is most likely because LPIPS currently leverages a neural network that was trained on object classification from ImageNet. The neural network might not be capable to extract meaningful high-level features to compare the similarity of satellite images. In preliminary tests the ImageNet-network would classify all satellite imagery as background image, indicating that the network did not learn features to distinguish satellite imagery. Future work, will use LPIPS with a network trained to have satellite imagery specific features, e.g., Tile2Vec or a land-use segmentation Robinson et al. (2019) model.
5.4 Further discussion: Ethical Implications.
Satellite imagery is an internationally trusted data source to conduct analyses in deforestation, development, logistics, or military Hansen et al. (2013); Anderson et al. (2017). Generating artificial satellite imagery can enable various stakeholders with malicious intent to, e.g., depict fake military operations, and result in a loss of trust in satellite imagery. Hence, we have put a strong focus onto generating physically-consistent imagery and clearly label our imagery as artificial, following the guidelines for responsible AI Barredo Arrieta et al. (2020). We further encourage analyses to source data from trusted sources (e.g., NASA, ESA, or PT Space) and public education around misinformation and ethical AI.
- Earth observation in service of the 2030 agenda for sustainable development. Geo-spatial Information Science 20 (2), pp. 77–96. Cited by: §4, §5.4.
- Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible ai. Information Fusion 58, pp. 82 – 115. Cited by: §4, §5.4.
- Pros and cons of gan evaluation measures. Computer Vision and Image Understanding 179, pp. 41–65. Cited by: §2.
- A test of relative similarity for model selection in generative models. External Links: Cited by: §2.
- Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096. Cited by: §1.
- Gaussian process prior variational autoencoders. In Advances in Neural Information Processing Systems, pp. 10369–10380. Cited by: §1.
- The human cost of weather-related disasters 1995-2015. Cited by: §1.
- Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31, pp. 6571–6583. Cited by: §1.
- Sea level rise, predicted sea level rise impacts on major cities from global warming up to 4c. External Links: Cited by: §1, Figure 3, §3.
- Open data for disaster response. External Links: Cited by: §5.2.1.
- Generating images with perceptual similarity metrics based on deep networks. In Advances in Neural Information Processing Systems 29, pp. 658–666. Cited by: §1.
- TileGAN: synthesis of large-scale non-homogeneous textures. ACM Trans. Graph. 38 (4). Cited by: §1.
- Creating xBD: A Dataset for Assessing Building Damage from Satellite Imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Cited by: Figure 2, §2, §5.2.1.
- High-resolution global maps of 21st-century forest cover change. Science 342 (6160), pp. 850–853. External Links: Cited by: §4, §5.4.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems, pp. 6626–6637. Cited by: §2.
- Global warming of 1.5c. an ipcc special report on the impacts of global warming of 1.5c above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty. Cited by: §1.
- Image-to-image translation with conditional adversarial networks. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, Cited by: §1, §5.2.2.
- SLOSH: sea, lake, and overland surges from hurricanes. NOAA Technical Report NWS 48, National Oceanic and Atmospheric Administration, U. S. Department of Commerce, pp. 71. Note: (Scanning courtesy of NOAA’s NOS’s Coastal Service’s Center) Cited by: §5.2.3.
- Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. arXiv e-prints, pp. arXiv:1710.11431. Cited by: §1.
- Auto-encoding variational bayes. Proceedings of the 2nd International Conference on Learning Representations (ICLR). Cited by: §1.
- Deep unsupervised state representation learning with robotic priors: a robustness analysis. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §1.
- PDE-Net 2.0: learning pdes from data with a numeric-symbolic hybrid deep network. Journal of Computational Physics 399, pp. 108925. Cited by: §1.
- ADCIRC: an advanced three-dimensional circulation model for shelves coasts and estuaries, report 1: theory and methodology of ADCIRC-2DDI and ADCIRC-3DL. In Dredging Research Program Technical Report, pp. 137. Cited by: §5.2.3.
- Spectral normalization for generative adversarial networks. 2018 International Conference on Learning Representations (ICLR). Cited by: §5.3.
- U.S. Billion-Dollar Weather and Climate Disasters (2020). External Links: Cited by: §1.
- National Storm Surge Hazard Maps, Texas to Maine, Category 5. External Links: Cited by: Figure 1, Figure 2, §1, §3.
- NOAA sea level rise viewer. External Links: Cited by: §1, Figure 3, §5.2.1.
- Universal differential equations for scientific machine learning. ArXiv abs/2001.04385. Cited by: §1.
- Expert interviews with u.s. airforce meteorologist, technical consultant at world bank for climate adaptation in the carribean, assistant. prof. in coastal resources management, and associate prof. in flood modeling, respectively. Cited by: §1.
- Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, pp. 686–707. Cited by: §1.
- Deep hidden physics models: deep learning of nonlinear partial differential equations. Journal of Machine Learning Research 19 (25), pp. 1–24. Cited by: §1.
- Adversarial training with cycle consistency for unsupervised super-resolution in endomicroscopy. Medical image analysis 53, pp. 123–131. Cited by: §2.
- Deep learning and process understanding for data-driven earth system science. Nature 566, pp. 195 – 204. Cited by: §1.
- Predicting landscapes from environmental conditions using generative networks. In German Conference on Pattern Recognition, pp. 203–217. Cited by: §1.
- Large scale high-resolution land cover mapping with multi-resolution data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), External Links: Cited by: §5.3.
- Improved techniques for training gans. In Advances in neural information processing systems, pp. 2234–2242. Cited by: §2.
- Visualizing the consequences of climate change using cycle-consistent adversarial networks. International Conference on Learning Representations (ICLR) Workshop on Tackling Climate Change with AI. Cited by: §1, §1.
- Best practices for convolutional neural networks applied to visual document analysis.. In Icdar, Vol. 3. Cited by: §5.3.
- Surging seas: sea level rise analysis. External Links: Cited by: Figure 2, §1.
- Mode regularized generative adversarial networks. In International Conference on Learning Representations, Cited by: §2.
- National Geospatial Data Asset National Agriculture Imagery Program (NAIP) Imagery. External Links: Cited by: Figure 2, §4, §5.2.1.
- High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 8798–8807. External Links: Cited by: §1, Figure 3, §2, §3, §3, §5.3.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §2.
- An empirical study on evaluation metrics of generative adversarial networks. arXiv preprint arXiv:1806.07755. Cited by: §2.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, Cited by: §2, §2.
- Establishing an evaluation metric to quantify climate change image realism. Machine Learning: Science and Technology 1 (2), pp. 025005. Cited by: §2.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Cited by: §1.
- Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems (NeurIPS) 30, pp. 465–476. External Links: Cited by: §1, Figure 3.