An Attention-Based System for Damage Assessment Using Satellite Imagery

# An Attention-Based System for Damage Assessment Using Satellite Imagery

## Abstract

When disaster strikes, accurate situational information and a fast, effective response are critical to save lives. Widely available, high resolution satellite images enable emergency responders to estimate locations, causes, and severity of damage. Quickly and accurately analyzing the extensive amount of satellite imagery available, though, requires an automatic approach. In this paper, we present Siam-U-Net-Attn model – a multi-class deep learning model with an attention mechanism – to assess damage levels of buildings given a pair of satellite images depicting a scene before and after a disaster. We evaluate the proposed method on xView2, a large-scale building damage assessment dataset, and demonstrate that the proposed approach achieves accurate damage scale classification and building segmentation results simultaneously.

\cvprfinalcopy

## 1 Introduction

Natural disasters wreak havoc on nations. They kill approximately 90,000 people every year and affect 160 million people around the globe [1]. Furthermore, areas afflicted by weather and climate disasters sustain significant physical, social, and economic devastation. Short-term effects of disasters evolve into long-term ramifications that linger for years to come [5, 1]. Considering economic consequences alone reveals staggering figures. For example, the 2010 Haiti earthquake inflicted approximately $7.8 billion -$8.5 billion in damages to infrastructure [3]. In 2019, the United States endured fourteen distinct natural disasters whose overall damages each exceeded \$1 billion dollars [12]. Environmental climate analyses also indicate that the frequency and brutality of natural disasters will increase in the future due to climate change and rising greenhouse gas emissions [2, 5]. Therefore, the impact of disasters is immediate, far-reaching, and continuous growing.

With the increase in severity and regularity of disasters, preparation for disaster recovery and emergency resource planning is needed now more than ever. Emergency responders require rapid and reliable situational details to save disaster victims while ensuring their own safety during rescue efforts. Moreover, accurate damage estimates assist responders in determining evacuation plans and in preventing secondary disasters caused by collapses of damaged buildings. In the long run, damage assessment estimates also empower planning efforts for building and infrastructure repairs.

Very high resolution (VHR) satellite imagery is increasingly available due to an ever-expanding fleet of commercial satellites, such as DigitalGlobe’s WorldView satellites [15]. VHR imagery enables detailed assessment of disaster damage at the building level. With the recent improvement in machine learning methods, especially deep learning approaches, rapid analysis of large amounts of VHR satellite imagery is feasible and this facilitates damage estimation and aids in disaster relief efforts. In this paper, we propose a Siam-U-Net-Attn model to quickly and accurately estimate the damage of a disaster. Our approach analyzes two satellite images of the same scene, acquired before and after a disaster. It then produces a mask showing buildings with labels that indicate different damage scale levels, as depicted in Figure 1.

The main contributions of this work include:

• Development of a multi-class deep learning model with attention technique that accurately classifies damage levels of buildings in satellite imagery.

• Production of semantic building segmentation masks using the proposed method.

• Demonstration that the proposed model achieves better results for building damage scale classification than other methods while simultaneously achieving accurate building segmentation results.

## 2 Related Work

The proposed method achieves building damage scale classification by analyzing buildings within satellite imagery and determining the level of damage inflicted to them. Due to limited amounts of labeled data, most research addressing damage scale classification instead simplifies this multi-class task to a change detection operation, which assigns a binary label, damage or no-damage to each building. Research approaches for solving change detection fall into several broad categories [4].

Algebra-based change detection techniques perform mathematical operations on image pixels to obtain a difference image. Such approaches, including image differencing [20] and change vector analysis [29], involve a threshold selection process to determine which components changed. Algebra-based change detection methods are relatively simple to implement, but they do not provide contextual information about the detected changes.

Transform-based change detection approaches transform event images. Image transforms, including the standard principal component analysis approach [31], strive to determine pertinent information for the change detection task. While transforming the images enables analysis of change in a different dimensionality, it also presents challenges in labeling regions of change in the event images themselves.

Classification-based change detection methods usually rely on larger amounts of labeled data. They easily extend to the multi-class damage scale classification task considered in this paper. Xu \etal [34] and Fujita \etal [13] describe several models for this objective, including a single-stream model and a double-stream model (\ie, Siamese network). Their models evaluate a pair of input images of a scene before and after a disaster. Then, they produce a single binary classification label, indicating whether the image contains damage or no-damage. Similarly, Nex \etal [27] propose a binary classification model based on DenseNet [19] with dilated convolution [35] to achieve a larger receptive field. Mou \etal [26] and Lyu \etal [25] introduce recurrent neural networks to jointly learn spectral-spatial-temporal features for change detection. Connors \etal [7] design a semi-supervised method that uses a variational autoencoder [22] to infer change detection labels without ground truth for every training instance. An unsupervised method is proposed by Liu \etal [23] using active learning [32] to construct training samples and using graph convolutional network [14] for change detection. However, none of these approaches produce pixel-wise classification masks.

There is some research in constructing building classification masks in an unsupervised manner. Jong \etal [10] utilize the U-Net model [30] to detect changes in satellite images. They first train a U-Net model for the building segmentation tasks. During change detection inferencing, they collect two sets of features from the trained U-Net model (\ie, activations of different layers in the U-Net), given two query images. Then, the difference of the two sets of features forms the change detection map. Another approach is a deep convolutional coupling network proposed by Liu \etal [24] uses both optical and radar images for unsupervised change detection. They use an ad-hoc weight initialization for the network that is based on the noise models of the optical and radar images to help the model learn the proper features during training.

Supervised classification methods constitute the final category of solutions to the change detection task. Demir \etal [11] propose a method that only requires the annotation of one image in a time series. They train a supervised classification model using a dataset constructed by an active learning approach [32]. Chu \etal [6] apply deep belief networks (DBNs) [17] to produce a change detection map. Two DBNs are used for extracting features from the image regions that contain changes and do not contain changes, respectively. They compare the feature distances obtained from the two DBNs for each image patch to construct the change detection map. Papadomanolaki \etal [28] combine the U-Net model with a LSTM [18] model in order to use temporal information from multiple frames of satellite imagery. Compared to results that use only two input frames, their model achieves better performance. Daudt \etal [8] propose using an encoder-decoder-based architecture to produce the change detection map. The decoder upsamples features extracted from the encoder to generate a mask indicating damage levels throughout the region under analysis. They also improve on this performance in [9] by combining the semantic segmentation task with the change detection task to achieve multi-task learning. They use two U-Net models in total; one for each task. The semantic segmentation U-Net utilizes one image (taken either before or after the change event) to produce the segmentation mask of objects of interest. The change detection U-Net utilizes two images (\ie, one from before the changes and one from after the changes) as well as the features extracted from the semantic segmentation model to produce the change detection mask. By fusing the features together, they achieve better performance in the change detection task.

Our proposed model extends this concept and combines the previously mentioned U-Net model with the Siamese model. The U-Net model learns the semantic segmentation of buildings, while the Siamese model learns the damage scale classification. The use of the Siamese model allows us to reduce the number of learned parameters and the size of the model during both training and inferencing in comparison to [9]. By combining these models, we achieve multi-task learning of both segmentation and classification. Additionally, we introduce a self-attention module that improves the performance by incorporating long-range information from the entire image.

## 3 Our Proposed Method

We propose a Siam-U-Net-Attn model for damage classification and building segmentation, as shown in Figure 2. It is inspired by [8, 30]. One element of this architecture is a U-Net model that analyzes a single input image and produces a segmentation mask showing building locations in the input image. The U-Net model is a fully convolutional network that was proposed by [30] for image segmentation. Besides its encoder-decoder structure for local information extraction, it also utilizes skip connections to retain global information. A single U-Net model analyzes input frames and , which depict the same scene pre-disaster and post-disaster, respectively. Since the U-Net focuses on the building segmentation objective, it is agnostic to the disaster. In other words, we can use the same model for both pre-disaster and post-disaster images to produce binary masks and , corresponding to their respective input frames. The two green regions in Figure 2 indicate the shared U-Net model for and .

The features extracted from the encoder regions of the U-Net model also assist in the damage scale classification task. The two-stream features produced by the U-Net encoder and a new, separate decoder constitute the Siamese network, shown as the blue region in Figure 2. In the Siamese network, we compare features from the two input frames to detect the damage levels of buildings. Simple differencing and channel-wise concatenation are two methods to compare the two-stream features. By comparing features from the two frames, the Siamese model evaluates the differences between the features in order to assess the damage levels. Figure 2 shows the architecture of the Siam-U-Net-Attn in difference mode (\ie, Siam-U-Net-Attn-diff). The Siam-U-Net-Attn in concatenation mode (\ie, Siam-U-Net-Attn-conc) can be obtained by replacing the difference operations with channel-wise concatenation operations. In Section 5, we will compare the performance of the proposed model in difference and concatenation modes.

Analyzing a building by itself is not sufficient for accurate damage level classification. It is also necessary for the network to consider the area surrounding buildings in its assessment. For example, natural disasters such as floods may not damage a building’s roof, but water surrounding the building may indicate interior damage. Since convolution is a local operation that can only access local neighborhoods, we use a self-attention module [33, 36] to capture long-range information. Figure 3 illustrates the mechanism of the self-attention module. Assume the input feature map is , where is the flattened size of feature map along the height and width dimensions (\ie) and is the number of channels of the input feature. To compute the attention map, we first transform the input features into two feature spaces by:

 f(x)=Wfx,   g(x) =Wgx.

The attention map is calculated as

 a(x)=Softmax(f(x)Tg(x)).

The Softmax function is computed along the second dimension to normalize each row of the attention map. We then apply the attention map to the input features as:

 o(x)=h(x)a(x)T,

where .

, , and are trainable parameters that are implemented as the convolution operation with a kernel size of . According to [36], we choose to reduce memory usage. The final output of the self-attention module is a weighted summation of the original input with the attention feature:

 y(x)=γo(x)+x,

where is also a learnable parameter. Therefore, each value of the self-attention output contains information of every input feature provided by the attention map. As shown in Figure 2, the model invokes a self-attention module after merging the features from the two input frames. It is important to note that the attention map from the self-attention module requires a lot of memory for large-resolution features, so we place the module in a low resolution layer of size to reduce the memory usage.

## 4 Dataset

In this paper, we use the xView2 dataset [16] for both training and testing. This dataset is designed for the task of building damage assessment and covers a wide variety of disaster events, such as tsunamis, earthquakes, and volcanic eruptions. It contains 2,799 pairs of pre-event/post-event multi-band images with resolution pixels. Additionally, it contains segmentation ground truth masks with building polygons and classification labels indicating damage levels. There are four damage levels: no-damage, minor-damage, major-damage, and destroyed.  [16] describes the scoring method used to assign damage levels.

To reduce the memory usage during training and testing, we use image patches of size as the inputs to our system. We crop every satellite image into 16 non-overlapping patches, each sized . The final dataset contains 44,784 pairs of image patches. We also use data augmentation methods (\ie, horizontal/vertical flipping, random color jittering, and random cropping) during training to reduce overfitting. Random color jittering and cropping are applied independently to pre-event and post-event images to simulate poor image normalization and registration. We implement two different data splitting methods to separate the dataset into training, validation and testing sets. For the first split (Split I), we crop full-resolution images into patches and then separate the patches into training, validation, and testing sets according to a ratio of . For the second split (Split II), we separate the full-resolution images into the different sets before cropping them into patches. The reason for these two dataset splits is to explore how the method performs on scenes it has never seen before. In Split I, the training and testing datasets could both contain image patches from the same full-resolution image. Thus, patches of the same scene could be contained in both the training and testing sets. In Split I, we ensure that the training and testing patches come from different full-resolution images. Therefore, the training and testing datasets contain different scenes, simulating performance in a real-world scenario when the model is presented with images it has never seen.

## 5 Experimental Results

As shown in Figure 2, our model consists of eight convolution blocks for the encoder and decoder components. We design the eight convolution blocks to ensure the resolution of the middle layer (\ie, the layer with the smallest feature resolution) is . Each downsampling block consists of convolution, ReLU, batch normalization, and maxpooling layers. Each upsampling block consists of upsampling with bilinear interpolation, convolution, batch normalization, and ReLU layers. The output damage scale classification mask has five channels: the four damage levels plus one background label. We use weighted binary cross-entropy loss and multi-label cross-entropy loss for the building segmentation loss and damage scale classification loss , respectively, which are defined as:

 Ls =−(ws,1yslogps+ws,0(1−ys)log(1−ps)) Ld =−5∑c=1wd,cyd(c)logpd(c)

and are the ground truth label and the detected building segmentation probability, respectively, while and are the ground truth label and the detected classification probability for damage scale . and are weights applied to each class to address the class imbalances present in our dataset. Since most areas in our images do not contain any buildings, we choose a larger weight for the building class, indicated by in the segmentation loss . Additionally, undamaged buildings are more common than damaged buildings in our dataset. Therefore, we also select larger weights for the damaged-building classes () compared to the non-damaged buildings () in the damage scale classification loss . Table 1 shows the empirical weights we institute.

The final loss function is the summation of the two building segmentation losses for the two input frames and plus the damage scale classification loss. The Adam optimizer [21] is used to train the proposed models. We train our model for 100 epochs with an initial learning rate of 0.001. The learning rate linearly decays to zero in the final epoch.

We compare the two proposed models with the three methods from [8]: fully convolutional early fusion (FC-EF), fully convolutional Siamese-difference (FC-Siam-diff), and fully convolutional Siamese-concatenation (FC-Siam-conc). The FC-EF model is essentially the U-Net model we described previously. Its input is and after concatenation along their channels. The FC-Siam-diff and FC-Siam-conc models utilize the Siamese model without the U-Net decoder used in the proposed method. These methods are designed for the change detection task and thus operate in a binary classification fashion. To compare these models with our proposed method, we changed their output layers from binary classification layers to multi-label classification layers. We also used the same training settings we selected for our method, including the optimizer and learning rate, since the authors of the compared methods do not provide the training parameters they used in their papers.

Table 2 shows a quantified comparison of damage scale classification results for Split I and Split II. To evaluate performance, we use the same evaluation metrics as proposed in the xView2 challenge [16]. The evaluation metric for the building segmentation task is defined as:

 F1s =2TPs2TPs+FPs+FNls

where the , , and are the number of true-positive, false-positive, and false-negative pixels of segmentation results for the entire testing set. Since the compared methods only produce multi-class damage scale classification masks, we binarize them to create segmentation masks for comparison purposes.

The evaluation metric for the damage scale classification task is defined as the harmonic mean of the F1 scores for the four damage scales:

 F1d =4∑c∈{1,2,3,4}(F1c+ϵ)−1,

where the is the F1 score for the class , which is defined as:

 F1c =2TPc2TPc+FPc+FNc.

The , , and are the number of true-positive, false-positive, and false-negative pixels of the class for the testing set. Note that this testing set does not include background pixels; it only includes pixels from the foreground as determined by the building localization ground truth.

The proposed approaches outperform the compared methods for the damage scale classification task by a large margin. With the help of the self-attention module, the proposed methods produce better damage scale classification results using long-range information, as described in Section 3. However, Split II proves to be more difficult than Split I, and we see a drop in performance. Although there is no overlap between training and testing samples, the degradation indicates that the model might memorize damage levels based on image scenes. Thus, it could potentially classify patches by recognizing which scene they depict and matching them to scenes already learned, rather than learning to recognize damage in a way that could be applied to new, never-before-seen imagery. Therefore, although the two dataset splits are legitimate in terms of separating training and testing data, Split II may avoid model overfitting and present a more reliable analysis of model performance. For our analysis for the rest of this section, we only consider the results from Split II.

All methods achieve similar performance in the building segmentation task. This is because the proposed methods also use a U-Net for the segmentation task, and the self-attention module that we implement enhances results on the damage scale classification task only.

The Siamese models in concatenation mode (\ieFC-Siam-conc and Siam-U-Net-Attn-conc models) achieve slightly better results than the model in difference mode (\ieFC-Siam-diff and Siam-U-Net-Attn-diff models). This is because channel-wise concatenation retains more information than simple differencing.

Figure 4 shows the damage scale classification results from the proposed and compared models for a specific scene. The ground truth classification of the buildings in this example is major-damage since a flooding region, which appears as a brownish-yellowish color in the post-event image patch, completely surrounds the buildings. The results in the second row of Figure 4 indicate that the three compared methods detect and localize most of the buildings but fail to accurately classify their damage levels. The FC-Siam-conc model achieves the best results amongst the compared methods. Compared to the FC-EF and FC-Siam-diff models, it avoids a false alarm detection in the top-left region of the image patch. Moreover, compared to the FC-Siam-diff model, the concatenation operation from the FC-Siam-conc and FC-EF models helps preserve the necessary information to correctly detect and classify the building in the bottom-left of the image patch. However, none of these methods assign the correct damage level labels completely. By comparison, our two proposed methods successfully classify all the buildings, shown in the third row of results of Figure 4. They also segment most of the buildings in the image patch correctly. Compared to the Siam-U-Net-Attn-diff model, the concatenation operation from the Siam-U-Net-Attn-conc model also helps to correctly detect the building in the bottom-left of the image patch.

Figure 5 depicts some challenging cases in the testing set and highlights the capabilities of our proposed methods to correctly classify them. The first challenge case involves two buildings partially occluded by trees in the post-event image patch (\ie, the left-most and bottom-most buildings). Although the FA-Siam-conc model accurately classifies the buildings in the center of the image patch, it misses the left building entirely and most of the bottom building. By comparison, our Siam-U-Net-Attn-conc model detects both of the occluded buildings correctly. However, it also reports a false alarm. The second challenge case we consider contains more occlusion, due to cloud cover. The compared method completely fails to detect the building covered by the clouds. Even though it is difficult for a human to recognize the building location in this case, our proposed method detects it. The third challenge case considers two image patches of a co-registered scene taken with different off-nadir angles (\ie, viewing angle from the sensor to the ground). Both of the compared method and proposed method achieve an accurate classification of most of the buildings. However, the compared method misses two buildings located towards the bottom of the image patch. Although the two buildings are still visible in the post-event image patches, the different off-nadir angles change the appearance of the scene quite a bit. Nonetheless, the proposed model detects the two buildings, even with this large variation in appearance. Therefore, our proposed methods provide a more robust damage scale classification than the compared methods.

Figure 6 presents the F1 scores of each damage scale level. Overall, the proposed methods, especially the Siam-U-Net-Attn-conc model, perform better than the compared methods for most damage scale levels. Most of the methods achieve the best performance on buildings with no-damage and achieve the worst performance on buildings with minor-damage. Minor-damage buildings present the most difficult challenge because they usually do not exhibit visible damage on the buildings themselves. Damage assessment experts from [16] consider buildings as minor-damage due to flooding regions, volcano flow, or burned trees partially surrounding them. This is very similar to the major-damage classification except that a building classified as major-damage indicate that such elements completely surround that particular building. Thus, these two similar damage scale levels present a greater challenge for damage scale classification models. As shown in Figure 7, the proposed Siam-U-Net-Attn-conc model fails to recognize that the water region (\ie, the dark green region in the post-event image patch) only partially surrounds the buildings. Instead, it mislabels it as major-damage. All of the compared and proposed methods demonstrate this behavior, performing worse on the minor-damage buildings as compared to other damage scale levels.

The utility of the self-attention module can also be visualized. We portray an attention map in Figure 8 to demonstrate the effectiveness of the self-attention module. For a given query location (\ie, the red point in the post-event image patch), we can obtain the corresponding attention map. Pixel values in the attention map indicate the importance of that pixel to the query point. The brighter a pixel is, the more important it is to the classification efforts at the query point. In the area shown in the example, the brownish-yellowish area in the post-event image patch indicates the flooding region. According to the attention map, the self-attention model highlights this flooding area, which aids the model in classifying the buildings’ damage levels.

Figure 9 shows some examples of final, full-resolution building segmentation masks constructed from the image patches used by the proposed models. Since the models operate on image patches, the model results must be stitched together to create a full-resolution mask corresponding to the original image. For visualization purposes, we crop the original full-resolutions images differently than the method outlined previously. This cropping method is only performed on images in the testing dataset, solely for the purpose of producing better and more coherent visual results. The goal of this different procedure is to reduce abrupt edges at the boundaries of adjacent patches. We use a moving-window approach to crop full-resolution images into patches with overlapping regions. The stride for the moving-window is 32 pixels in both the vertical and horizontal directions. Then, the model analyzes these patches and produces corresponding segmentation maps. Next, we use a voting strategy for each pixel contained in the overlapping regions to determine the final segmentation mask. More specifically, we sum the probabilities of each class to calculate five overall probabilities that a specific pixel belongs to each of the damage level classes. Then, we label the pixel under consideration as the class with the maximum probability. The two examples in Figure 9 show that our proposed methods perform well on the building segmentation task in cases with dense and sparse building densities.

## 6 Conclusion

In this paper, we present a Siam-U-Net-Attn model with self-attention for building segmentation and damage scale classification in satellite imagery. The proposed technique compares pairs of images captured before and after disasters to produce segmentation masks that indicate damage scale classifications and building locations. Results show that the proposed model accomplishes both damage classification and building segmentation more accurately than other approaches with the xView2 dataset. We use the self-attention module to enhance damage scale classification by considering information from the entire image.

### References

1. W. H. O. (WHO) Environmental health in emergencies. Note: \urlhttps://www.who.int/environmental_health_emergencies/natural_events/en Cited by: §1.
2. M. V. Aalst (2006-03) The impacts of climate change on the risk of natural disasters. Disasters. External Links: Link Cited by: §1.
3. K. Amadeo Haiti earthquake facts, its damage, and effects on the economy. Note: https://www.thebalance.com/haiti-earthquake-facts-damage-effects-on-economy-3305660 Cited by: §1.
4. A. Asokan and A. Jude (2019-03) Change detection techniques for remote sensing applications: a survey. Earth Science Informatics 12 (2), pp. 1–18. External Links: Link Cited by: §2.
5. L. Boustan, M. Kahn, P. Rhode and M. Yanguas (2019-06) The effect of natural disasters on economic activity in us counties: a century of data. National Bureau of Economic Research, Inc. NBER Working Papers 23410. External Links: Link Cited by: §1.
6. Y. Chu, G. Cao and H. Hayat (2016-11) Change detection of remote sensing image based on deep neural networks. Proceedings of the International Conference on Artificial Intelligence and Industrial Engineering. Note: Beijing, China External Links: Link Cited by: §2.
7. C. Connors and R. R. Vatsavai (2017-07) Semi-supervised deep generative models for change detection in very high resolution imagery. IEEE International Geoscience and Remote Sensing Symposium, pp. 1063–1066. External Links: Link Cited by: §2.
8. R. C. Daudt, B. L. Saux and A. Boulch (2018-10) Fully convolutional siamese networks for change detection. Proceedings of the IEEE International Conference on Image Processing, pp. 4063–4067. Note: Athens, Greece External Links: Link Cited by: §2, §3, Figure 5, Table 2, §5.
9. R. C. Daudt, B. L. Saux, A. Boulch and Y. Gousseau (2019-10) Multitask learning for large-scale semantic change detection. Computer Vision and Image Understanding 187, pp. 102783. External Links: Link Cited by: §2, §2.
10. K. L. de Jong and A. S. Bosman (2019-07) A fast learning algorithm for deep belief nets. Proceedings of the International Joint Conference on Neural Networks). Note: Budapest, Hungary External Links: Link Cited by: §2.
11. B. Demir, F. Bovolo and L. Bruzzone (2013-01) Updating land-cover maps by classification of image time series: a novel change-detection-driven transfer learning approach. IEEE Transactions on Geoscience and Remote Sensing 51 (1), pp. 300–312. External Links: Link Cited by: §2.
12. N. N. C. for Environmental Information (NCEI) U.s. billion-dollar weather and climate disasters (2020). Note: https://www.ncdc.noaa.gov/billions/ Cited by: §1.
13. A. Fujita, K. Sakurada, T. Imaizumi, R. Ito, S. Hikosaka and R. Nakamura (2017-05) Damage detection from aerial images via convolutional neural networks. Proceedings of the IAPR International Conference on Machine Vision Applications. External Links: Link Cited by: §2.
14. V. Garcia and J. Bruna (2018-Feburary) Few-shot learning with graph neural networks. arXiv:1711.04043. External Links: Link Cited by: §2.
15. D. Globe The digitalglobe constellation. Note: \urlhttps://www.digitalglobe.com/company/about-us Cited by: §1.
16. R. Gupta, B. Goodman, N. Patel, R. Hosfelt, S. Sajeev, E. Heim, J. Doshi, K. Lucas, H. Choset and M. Gaston (2019-06) Creating xbd: a dataset for assessing building damage from satellite imagery. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. External Links: Link Cited by: §4, §5, §5.
17. G. E. Hinton, S. Osindero and Y. Teh (2006-07) A fast learning algorithm for deep belief nets. Neural Computing 18 (7), pp. 1527â1554. External Links: Link Cited by: §2.
18. S. Hochreiter and J. Schmidhuber (1997-11) Long short-term memory. Neural Computing 9 (8), pp. 1735â1780. External Links: Link Cited by: §2.
19. G. Huang, Z. Liu, L. Maaten and K. Weinberger (2017-05) Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. External Links: Link Cited by: §2.
20. L. Ke, Y. Lin, Z. Zheng, L. Zhang and L. Meng (2018-02) Adaptive change detection with significance test. IEEE Access 6, pp. 27442–27450. External Links: Link Cited by: §2.
21. D. Kingma and J. Ba (2015-05) Adam: a method for stochastic optimization. Proceedings of the IEEE Conference on International Conference for Learning Representations. External Links: Link Cited by: §5.
22. D. P. Kingma, D. J. Rezende, S. Mohamed and M. Welling (2014-12) Semi-supervised learning with deep generative models. Proceedings of the International Conference on Neural Information Processing Systems, pp. 3581â3589. External Links: Link Cited by: §2.
23. H. Liu, Z. Wang, F. Shang, M. Zhang, M. Gong, F. Ge and L. Jiao (2019-11) A novel deep framework for change detection of multi-source heterogeneous images. International Conference on Data Mining Workshops, pp. 165–171. External Links: Link Cited by: §2.
24. J. Liu, M. Gong, K. Qin and P. Zhang (2018-03) A deep convolutional coupling network for change detection based on heterogeneous optical and radar images. IEEE Transactions on Neural Networks and Learning Systems 29 (3), pp. 545–559. External Links: Link Cited by: §2.
25. H. Lyu, H. Lu and L. Mou (2016) Learning a transferable change rule from a recurrent neural network for land cover change detection. Remote Sensing 8 (6). External Links: Link Cited by: §2.
26. L. Mou, L. Bruzzone and X. X. Zhu (2019-02) Learning spectral-spatial-temporal features via a recurrent convolutional neural network for change detection in multispectral imagery. IEEE Transactions on Geoscience and Remote Sensing 57 (2), pp. 924–935. External Links: Link Cited by: §2.
27. F. Nex, D. Duarte, F. Tonolo and N. Kerle (2019-11) Structural building damage detection with deep learning: assessment of a state-of-the-art cnn in operational conditions. Remote Sensing 11, pp. 2765. External Links: Link Cited by: §2.
28. M. Papadomanolaki, S. Verma, M. Vakalopoulou, S. Gupta and K. Karantzalos (2019-07) Detecting urban changes with recurrent neural networks from multitemporal sentinel-2 data. Proceedings of the IEEE International Geoscience and Remote Sensing Symposium. Note: Yokohama, Japan External Links: Link Cited by: §2.
29. Z. Qi, A. G. Yeh, X. Li and X. Zhang (2015-09) A three-component method for timely detection of land cover changes using polarimetric sar images. ISPRS Journal of Photogrammetry and Remote Sensing 107, pp. 3–21. External Links: Link Cited by: §2.
30. O. Ronneberger, P.Fischer and T. Brox (2015-11) U-net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention 9351, pp. 234–241. External Links: Link Cited by: §2, §3.
31. V. Sadeghi, F. F. Ahmadi and H. Ebadi (2016-03) Design and implementation of an expert system for updating thematic maps using satellite imagery (case study: changes of lake urmia. Arabian Journal of Geosciences 9 (257). External Links: Link Cited by: §2.
32. K. Wang, D. Zhang, Y. Li, R. Zhang and L. Lin (2017-12) Cost-effective active learning for deep image classification. IEEE Transactions on Circuits and Systems for Video Technology 27 (12), pp. 2591–2600. External Links: Link Cited by: §2, §2.
33. X. Wang, R. Girshick, A. Gupta and K. He (2018-06) Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Note: Salt Lake City, UT External Links: Link Cited by: §3.
34. J. Xu, W. Lu, Z. Li, P. Khaitan and V. Zaytseva (2019-10) Building damage detection in satellite imagery using convolutional neural networks. arXiv:1910.06444. External Links: Link Cited by: §2.
35. F. Yu and V. Koltun (2016-05) Multi-scale context aggregation by dilated convolutions. Proceedings of the IEEE International Conference on Learning Representations. External Links: Link Cited by: §2.
36. H. Zhang, I. Goodfellow, D. Metaxas and A. Odena (2019-06) Self-attention generative adversarial networks. Proceedings of the IEEE International Conference on Machine Learning 97, pp. 7354–7363. External Links: Link Cited by: Figure 3, §3, §3.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters