Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models1footnote 11footnote 1This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

Wide-Area Land Cover Mapping with Sentinel-1 Imagery using Deep Learning Semantic Segmentation Models1

Abstract

Land cover mapping is essential for monitoring the environment and understanding the effects of human activities on it. The automatic approaches to land cover mapping (i.e., image segmentation) mostly used traditional machine learning that requires heuristic feature design. On the natural images, deep learning has outperformed traditional machine learning approaches on a range of tasks, including the image segmentation. On remote sensing images, recent studies are demonstrating successful application of specific deep learning models or their adaptations to particular small-scale land cover mapping tasks (e.g., to classify wetland complexes). However, it is not readily clear which of the existing state-of-the-art models for natural images are the best candidates to be taken for the particular remote sensing task and data.

In this study, we answer that question for mapping the fundamental land cover classes using the satellite imaging radar data. We took ESA Sentinel-1 C-band SAR images available at no cost to users as representative data. CORINE land cover map produced by the Finnish Environment Institute was used as a reference, and the models were trained to distinguish between the 5 Level-1 CORINE classes. We selected seven among the state-of-the-art semantic segmentation models so that they cover a diverse set of approaches: U-Net, DeepLabV3+, PSPNet, BiSeNet, SegNet, FC-DenseNet, and FRRN-B. These models were pre-trained on the ImageNet dataset and further fine-tuned in this study. Specifically, we used 14 ESA Sentinel-1 scenes acquired during the summer season in Finland, which are representative of the land cover in the country.

Upon the evaluation and benchmarking, all the models demonstrated solid performance. The best model, FC-DenseNet (Fully Convolutional DenseNets), achieved the overall accuracy of 90.7%. Except for the producer accuracy of two classes (urban and water bodies), FC-DenseNet has outperformed all the other models across the accuracy measures and the classes. Overall, our results indicate that the semantic segmentation models are suitable for efficient wide-area mapping using satellite SAR imagery. Our results also provide baseline accuracy against which the newly proposed models should be evaluated and suggest the DenseNet-based models are the first candidate for this task.

keywords:
synthetic aperture radar, deep learning, semantic segmentation, land cover mapping, image classification, Sentinel-1 data, C-band, CORINE
2

1 Introduction

Mapping of land cover and its change has a critical role in the characterization of the current state of the environment. The changes in land cover can be due either to human activities as well as caused by climate changes on a regional scale. The land cover, on the other hand, affects climate through water and energy exchange with the atmosphere and by changing carbon balance. Because of this, land cover belongs to the Essential Climate Variables Bojinski et al. (2014). Hence, timely assessment of land cover and its change is one of the most important applications in satellite remote sensing. Thematic maps are needed annually for various purposes in medium resolution (circa 250 m) with less than 15% measurement uncertainty and in high resolution (10-30 m) with less than 5% uncertainty.

CORINE Land Cover (CLC) is a notable example of a consistent Pan-European land cover mapping initiative Büttner et al. (2004); Bossard et al. (2000) coordinated by the European Environment Agency (EEA).3 CORINE stands for coordination of information on the environment. It is an on-going long-term effort providing most harmonized classification land cover data in Europe with updates approximately every 4 years. The CORINE maps are an important source of land cover information suitable for operational purposes also for various customer groups in Europe. It has altogether 44 classes, though many of them are not strictly ecological classes but rather land use classes. On the continental scale, CORINE provides a harmonized map with 25 ha minimum mapping unit (MMU) for areal phenomena, and a minimum width of 100 m for linear phenomena Büttner (2014). National land cover maps in the CORINE framework can exhibit smaller mapping units. In Finland, the latest revision of CORINE land cover map at the time of this study was 2012 round produced by the Finnish Environment Institute. The map has an MMU of and was produced by a combined automated and manual interpretation of the high-resolution satellite optical data followed by the data integration with existing basic map layers Törmä et al. (2015).

The state-of-the-art approaches used for land cover mapping mainly rely on the satellite optical imagery. The key role is played by the Landsat imagery often augmented by the MODIS or SPOT-5 imagery Chen et al. (2015); Almeida et al. (2016); Homer et al. (2015). Other sources of information employed for land cover mapping include Digital Elevation Models (DEM) and very high-resolution imagery Zhao et al. (2016). When it comes to the large-scale and multitemporal land cover mapping, a more recent optical imagery source is Copernicus Sentinel-2. With a revisit of 5 days, it has become another key data source Griffiths et al. (2019).

International programs, such as the European Space Agency’s (ESA’s) Copernicus Torres et al. (2012) behind the Sentinel satellites are taking significant efforts to make Earth Observation (EO) data freely available for commercial and non-commercial purposes. The Copernicus programme is a multi-billion investment by the EU and ESA aiming to provide essential services based on accurate and timely data from satellites. Its main goals are to improve the ways of managing the environment, to help mitigate the effects of climate change, and enable the creation of new applications and services, such as for environmental monitoring and urban development.

The provision of free satellite data for mapping in the framework of such programs also enables application of methods that could not be used earlier because they require vast and representative datasets for training, for example deep learning. In recent years, deep learning has brought about several breakthroughs in the pattern recognition and computer vision LeCun et al. (1998); Krizhevsky et al. (2012); Simonyan and Zisserman (2014). The success of the deep learning models can be attributed to both their deep multilayer structure creating nonlinear functions and, hence, allowing extraction of hierarchical sets of features from the data, and to their end-to-end training scheme allowing for simultaneous learning of the features from the raw input and predicting the task at hand. In this way, the heuristic feature design is removed. This is advantageous compared to the traditional machine learning methods (e.g., support vector machine (SVM) and random forest (RF)), which require a multistage feature engineering procedure. In deep learning, such a procedure is replaced with a simple end-to-end deep learning workflow. One of the key requirements for successful application of deep learning methods is a large amount of data available from which the model can automatically learn the representative features for the prediction task Goodfellow et al. (2016). The availability of open satellite imagery, such as from Copernicus, offers just that.

The land cover mapping systems based solely on optical imagery suffer from issues with cloud cover and weather conditions, especially in the tropical areas, and with a lack of illumination in the polar regions. Among the free satellite data offered by the Copernicus programme are synthetic aperture radar (SAR) images from the Sentinel-1 satellites. SAR is an active radar imaging technique that does not require illumination and is not hampered by cloud-cover due to penetration of microwave radiation through clouds. The utilisation of SAR imagery, hence, would allow mapping such challenging regions and increasing the mapping frequency in the orchestrated efforts like CORINE. One of the significant issues previously was the absence of timely and consistent high-resolution wide-area SAR coverage. With the advent of Copernicus Sentinel-1 satellites, operational use of imaging radar data becomes feasible for consistent wide-area mapping. The first Copernicus Sentinel-1 mission was launched in April 2014. Firstly, Sentinel-1A alone was capable of providing C-band SAR data in up to four imaging modes with a revisit time of 12 days. Once Sentinel-1B was launched in 2016 the revisit time has reduced to 6 days Torres et al. (2012).

We studied wide-area SAR-based land cover mapping by methodologically combining the two discussed recent advances: the improved methods for large-scale image processing using deep learning and the availability of SAR imagery from the Sentinel-1 satellites.

1.1 Land Cover Mapping with SAR Imagery

While using optical satellite data is still a mainstream in land cover and land cover change mapping Cohen and Goward (2004); Goetz et al. (2009); Atzberger (2013); Hame et al. (2013); Törmä et al. (2015), SAR data has been getting more attention as more suitable sensors appear. To date, several studies have investigated the suitability of SAR for land cover mapping, focusing primarily at L-band, C-band, and X-band polarimetric Antropov et al. (2014); Lonnqvist et al. (2010) multitemporal and multi-frequency SAR Waske and Braun (2009a) Bruzzone et al. (2004), as well as, at the combined use of SAR and optical data Ullmann et al. (2014); Clerici et al. (2017); Castañeda and Ducrot (2009); Ban et al. (2010); Laurin et al. (2013).

Independently of the imagery used, the majority of land cover mapping methods so far are based on traditional supervised classification techniques Khatami et al. (2016). Widely used classifiers are support vector machines (SVM), decision trees, random forests (RF), and maximum likelihood classifiers (MLC) Zhao et al. (2016); Almeida et al. (2016); Waske and Braun (2009b); Khatami et al. (2016). However, extracting a large number of features needed for classification, i.e., the feature engineering process, is time intensive, and requires lots of expert work in developing an fine-tuning classification approaches. This limits the applications of the traditional supervised classification methods on a large scale.

Backscattered microwave radiation is composed from multiple fundamental scattering mechanisms determined by the vegetation water content, surface roughness, soil moisture, horizontal and vertical structure of the scatterers, as well as imaging geometry during the datatake. Accordingly, a considerable number of classes can be differentiated in SAR images Balzter et al. (2015); Antropov et al. (2014). However, majority of SAR classification algorithms use fixed SAR observables (e.g., polarimetric features) to infer specific land cover classes, despite the large temporal, seasonal and environmental variability between different geographical sites. This leads to a lack of generalisation capability and a need to use extensive and representative reference data and SAR data. The latter means the need to account for not only all variation of SAR signatures for a specific class but also the need to consider seasonal effects, as changes in moisture of soil and vegetation, as well as frozen state of land Park (2015) that strongly affect SAR backscatter. On the other hand, when using multitemporal approaches, such seasonal variation can be used as an effective discriminator among different land cover classes.

When using exclusively SAR data for land cover mapping, reported accuracy often turn out to be relatively low for operational land cover mapping and change monitoring. Methodologically, reported solutions utilized supervised approaches, linking SAR observables and class labels to pixels, superpixels or objects in parametric or nonparametric manner Antropov et al. (2014); Lonnqvist et al. (2010); Dobson et al. (1996); Balzter et al. (2015); Hame et al. (2013); Sirro et al. (2018); De Alban et al. (2018); Longepe et al. (2011); Esch et al. (2011); Cable et al. (2014); Niu and Ban (2013); Evans et al. (2010); Lumsdon et al. (2005).

However, tackling relatively large number of classes was considered only in several studies, often with relatively low reported accuracies. For instance, in da Costa Freitas et al. (2008) it was found that P-band PolSAR imagery was unsatisfactory for mapping more than five classes with the iterated conditional mode (ICM) contextual classifier applied to several polarimetric parameters. They achieved a Kappa value of 76.8% when mapping four classes. Classification performance of the L-band ALOS PALSAR and C-band RADARSAT-2 images was compared in the moist tropics Li et al. (2012). L-band provided 72.2% classification accuracy for a coarse land cover classification system and C-band only 54.7%.In a similar study in Lao PDR, ALOS PALSAR data were found to be mostly useful as a back-up option to optical ALOS AVNIR dataHame et al. (2013). Multitemporal Radarsat‐1 data with HH polarization and ENVISAT ASAR data with VV polarization (both C-band) were studied for classification of five land cover classes in Korea with moderate accuracy Park and Chi (2008). Waske et al. Waske and Braun (2009b) applied boosted decision tree and random forests to multi-temporal C-band SAR data reaching accuracy up to 84%. Several studies Lonnqvist et al. (2010), Antropov et al. (2014) investigated specifically SAR suitability for the boreal zone, with reported accuracy up to 83% depending on the classification technique (maximum likelihood, probabilistic neural networks, etc.) when five super-classes (based on CORINE data) were used.

The potential of Sentinel-1 imagery for CORINE-type thematic mapping was assessed in a study that used Sentinel-1A data for mapping class composition in Thuringia Balzter et al. (2015). Long-time series of Sentinel-1 SAR data are considered especially suitable for crop type mapping Tomppo et al. (2019); Nguyen et al. (2016); Veloso et al. (2017); Satalino et al. (2013), with increased number of studies attempting land cover mapping in general Vicente-Guijalba et al. (2018); Ge et al. (2019).

Moreover, as Sentinel-1 data are presently the only free source of SAR data routinely available for wide-area mapping at no cost for users, it seems the best candidate data for development and testing of improved classification approaches. Previous studies indicate a necessity for developing and testing new methodological approaches that can be effectively applied to a large-scale and deal with the variability of SAR observables concerning ecological land cover classes. We suggest adopting state-of-the-art deep learning approaches for this purpose.

1.2 Deep Learning in Remote Sensing

The advances in the deep learning techniques for computer vision, in particular, Convolutional Neural Networks (CNNs) LeCun et al. (1998); LeCun and Bengio (1995), have led to the application of deep learning in several domains that rely on computer vision. Examples are self-driving cars, image search engines, medical diagnostics, and augmented reality. Deep learning approaches are starting to be adopted in the remote sensing domain, as well.

Zhu et al. Zhu et al. (2017) provide a discussion on the specificities of remote sensing imagery (compared to ordinary RGB images) that result in specific deep learning challenges in this area. For example, remote sensing data are georeferenced, often multi-modal, with particular imaging geometries, there are interpretation difficulties, and the ground-truth or labelled data needed for deep learning is still often lacking. Additionally, most of the state-of-the-art CNNs are developed for three-channel input images (i.e., RGB) and so certain adaptations are needed to apply them on the remote sensing data Mahdianpari et al. (2018).

Nevertheless, several research papers tackling remote sensing imagery with deep learning techniques were published in recent years. Zhang et al. Zhang et al. (2016b) review the field and find applications to image preprocessing Zhang et al. (2014), target recognition Chen et al. (2013, 2014a), classification Liu et al. (2015); Wang et al. (2015); Tuia et al. (2015), and semantic feature extraction and scene understanding Hu et al. (2015); Penatti et al. (2015); Luus et al. (2015); Zhang et al. (2016a). The deep learning approaches are found to outperform standard methods applied up to several years ago, i.e., SVMs and RFs Ishii et al. (2015); Kussul et al. (2017).

When it comes to deep learning for land cover or land use mapping, applications have been limited to optical satellite Mahdianpari et al. (2018); Chen et al. (2014b); Mahdianpari et al. (2018); Wang et al. (2015) or aerial Wu et al. (2018) imagery, and hyperspectral imagery Tuia et al. (2015); Chen et al. (2014b) owing to the similarity of these images to ordinary RGB images studied in computer vision Mahdianpari et al. (2018).

When it comes to SAR images, Zhang et al. Zhang et al. (2016b) found that there is already a significant success in applying deep learning techniques for object detection and scene understanding. However, for classification on SAR data, applications are scarce and advances are yet to be achieved Zhang et al. (2016b). Published research includes deep learning for crop types mapping using combined optical and SAR imagery Kussul et al. (2017), as well as the use of SAR images exclusively Duan et al. (2017). However, those methods applied deep learning only to some part of the task at hand and not in an end-to-end fashion. Wang et al. Wang et al. (2015), for instance, just used deep neural networks for merging over-segmented elements, which are produced using traditional segmentation approaches. Similarly, Tuia et al. Tuia et al. (2015) applied deep learning to extract hierarchical features, which they further fed into a multiclass logistic classifier. Duan et al. Duan et al. (2017) used first unsupervised deep learning and then continued with a couple of supervised labelling tasks. Chen et al. Chen et al. (2014b) applied a deep learning technique (stacked autoencoders) to discover the features, but then they still used traditional machine learning (SVM, logistic regression) for the image segmentation. Unlike those methods, we applied the deep learning in an end-to-end fashion, i.e., from supervised feature extraction to the land class prediction. This makes our approach more flexible, robust and, adaptable to the SAR data from new regions, as well as more efficient.

When it comes to the end-to-end approaches for SAR classification, there are several studies where the focus was on a small area and on a specific land cover mapping task. For instance, Mohammadimanesh et al. Mohammadimanesh et al. (2019) used fully polarimetric SAR (PolSAR) imagery from RADARSAT-2 to classify wetland complexes, for which they have developed a specifically tailored semantic segmentation model. However, the authors have tackled a small test area (around ) and have not explored how their model generalizes to other types of areas. Similarly, Wang et al. Wang et al. (2018) adapted existing CNN models into a fixed-feature-size CNN that they have evaluated on a small scale RADARSAT-2 or AIRSAR (i.e., airborne SAR data). In both cases, they have used more advanced fully polarimetric SAR imagery at better resolution as opposed to Sentinel-1, which means the imagery with more input information to the deep learning models. Importantly, it is only Sentinel-1 that offers open operational data with up to every 6 days repeat. Because of this, the discussed approaches developed and tested specifically for PolSAR imagery at a higher resolution cannot be considered applicable for a wide-area mapping, yet. Similarly, Ahishali et al. Ahishali et al. (2019) applied end-to-end approaches to SAR data. They have also worked with single polarized COSMO-SkyMed imagery. However, all the imagery they considered was X-band SAR contrary to C-band imagery we use here and again only on a small scale. The authors proposed a compact CNN model that they found had outperformed some of the off-the-shelf CNN methods, such as Xception and Inception-ResNet-v2. It is important to note that compared to those, the off-the-shelf models that we consider here are more sophisticated semantic segmentation models, some which employ Xception or ResNet but only as a module in their feature extraction parts.

In summary, the capabilities of the deep learning approaches for the classification have been investigated to a lesser extent for SAR imagery than for optical imagery. The attempts to use SAR data for land cover classification were relatively limited in scope, area, or the number of used SAR scenes. Particularly, wide-area land cover mapping was never addressed. The reasons for this include comparatively poor availability of SAR data compared to optical (greatly changed since the advent of Sentinel-1), complex scattering mechanisms leading to ambiguous SAR signatures for different classes (which makes SAR image segmentation more difficult than the optical image segmentation Li et al. (2015)), as well as the speckle noise caused by the coherent nature of the SAR imaging process.

1.3 Study goals

Present study addresses the identified research gap of a lack of wide-area land cover mapping using SAR data. We achieve this by training, fine-tuning, and evaluating a set of suitable state-of-the-art deep learning models from the class of semantic segmentation models, and demonstrating their suitability for land cover mapping. Moreover, our work is the first to examine and demonstrate the suitability of deep learning for land cover mapping from SAR images on a large-scale, i.e., across the whole country.

Specifically, we applied the semantic segmentation models on the SAR images taken over Finland. We focused on the images of Finland because there is the land cover mask of a suitable resolution that can be used for training labels (i.e., CORINE). The training is performed with the seven selected models (SegNet Badrinarayanan et al. (2017), PSPNet Zhao et al. (2017), BiSeNet Yu et al. (2018), DeepLabV3+ Chen et al. (2018b, a), U-Net Ronneberger et al. (2015a); Howard et al. (2017), FRRN-B Pohlen et al. (2017), and FC-DenseNet Jégou et al. (2017)), which have encoder modules pre-trained on the large RGB image corpus ImageNet 2012.4 Those models are freely available.5 In other words, we reused semantic segmentation architectures developed for natural images with pre-trained weights on RGB images and we fine-tuned them on the SAR images. Our results (with over 90% overall accuracy) demonstrate the effectiveness of the deep learning methods for the land cover mapping with SAR data.

In addition to having the high-resolution CORINE map that can serve as a ground-truth (labels) for training the deep learning models, another reason that we selected Finland is that it is a northern country with frequent cloud cover, which means that using optical imagery for wide-area mapping is often not feasible. Hence, demonstrating the usability of radar imagery for land cover mapping is particularly useful here.

Even though Finland is a relatively small country, there is still considerable heterogeneity present in terms of land cover types and how they appear in the SAR images. Namely, SAR backscattering is sensitive to several factors that likely differ between countries or between distant areas within a country. Examples of such factors are moisture levels, terrain variation and soil roughness, predominant forest biome and tree species proportions, types of shorter vegetation and crops in agricultural areas, and specific types of built environments. We did not contain our study to a particular area of Finland where the SAR signatures might be consistent but we obtained the images across a wide area. Hence, demonstrating the suitability of our methods in this setting hints at their potential generalizability. Namely, it means that, similarly as we did here, the semantic segmentation models can be fine-tuned and adapted to work on data from other regions or countries with the different SAR signatures.

On the other hand, we took into account that the same areas will appear somewhat different on the SAR images across different seasons. Scattering characteristics of many land cover classes change considerably between the summer and winter months, and sometimes even within weeks during seasonal changes Antropov et al. (2012, 2014). These include snow cover and melting, freeze/thaw of soils, ice on rivers and lakes, crops growing cycle, leaf-on and leaf-off conditions in deciduous trees. Because of this, in the present study, we focused only on the scenes acquired during the summer season. However, we did allow our training dataset to contain several images of the same area, taken during different times during the summer season. This way not only spatial, but also temporal variation of SAR signatures is introduced.

Our contributions can be summarised as follows:

C1:

We thoroughly benchmarked seven selected state-of-the-art semantic segmentation models covering a diverse set of approaches for land cover mapping using Sentinel-1 SAR imagery. We provide insights on the best models in terms of both accuracy and efficiency.

C2:

Our results demonstrate the power of deep learning models along with SAR imagery for accurate wide-area land cover mapping in the cloud obscured boreal zone and polar regions.

2 Deep Learning Terminology

As with other representation learning models, the power of deep learning models comes from their ability to learn rich features (representations) from the dataset automatically Goodfellow et al. (2016). The automatically learned features are usually better suited for the classifier or other task at hand than hand-engineered features. Moreover, thanks to a large number of layers employed, it has been proven that the deep learning networks can discover hierarchical representations, so that the higher level representations are expressed in terms of the lower level, simpler ones. For example, in the case of images, the low-level representations that can be discovered are edges, and using them, the mid-level ones can be expressed, such as corners and shapes, and this helps to express the high-level representations, such as object elements and their identities Goodfellow et al. (2016).

The deep learning models in computer vision can be grouped according to their main task in three categories. In Table 1, we provide a description for those categories. However, the deep learning terminology for those tasks does not always correspond well to the terminology used in the remote sensing community. Relevant to our task, a number of remote sensing studies uses the term classification in the context of land cover mapping, inherently meaning pixel- or region-based classification, which in the deep learning terminology corresponds to semantic segmentation. In Table 1 we list the corresponding terminology that we encountered being used for each task in both, the deep learning and remote sensing communities. This is helpful to disambiguate when talking about different, and recognize when talking about the same tasks in the two domains. In the present study, the focus is on land cover mapping. Hence, we tackle semantic segmentation in the deep learning terminology and image classification, i.e., pixel-wise classification, in the remote sensing terminology.

Deep learning Remote sensing Task description
Classification Krizhevsky et al. (2012) Image Annotation, Scene Understanding, Scene Classification Assigning a whole image to a class based on what is (mainly) represented in it, for example a ship, oil tank, sea or land.
Object Detection, Localization, Recognition Goodfellow et al. (2016) Automatic Target Recognition Detecting (and localizing) presence of particular objects in an image. These algorithms can detect several objects in the given image. For instance ship detection in SAR images.
Semantic Segmentation Long et al. (2015) Image Classification, Clustering Assigning a class to each pixel in an image based on which image object or region it belongs to. These algorithms not only detect and localize objects in the image, but also output their exact areas and boundaries.
Table 1: Terminology for the main tasks in computer vision and its use in the deep learning versus remote sensing communities.

Convolutional Neural Networks (CNNs) LeCun et al. (1998); Krizhevsky et al. (2012) are the deep learning model that has transformed the computer vision field. Initially, CNNs are defined to tackle the image classification (deep learning terminology) task. Their structure is inspired by the visual perception of mammals Hubel and Wiesel (1962). CNNs are named after one of the most important operations, which is particular to them compared to other neural networks, i.e., convolutions. Mathematically, a convolution is a combination of two other functions. A convolution is applied on the image by sliding a filter (kernel) of a given size which is usually small compared to the original image size. Different purpose filters are designed; for example, a filter can serve as a vertical edge detector. Application of such a convolution operation on an image results in a feature map. Another common operation that is usually applied after a convolution is pooling. Pooling reduces the size of the feature map while providing robustness to the extracted features. Common CNNs end with a fully connected layer which is used for final predictions, commonly for image classification. By employing a large number of convolutional layers (depth), CNNs are able to extract gradually more complex and abstract features. The first CNN model to demonstrate their impressive effectiveness in image classification (of hand digits) was LeNet LeCun et al. (1998). Several years later, Krizhevsky et al. Krizhevsky et al. (2012) developed AlexNet, the deep CNN to dramatically push the limits of classification accuracy on the famous ImageNet computer vision challenge Russakovsky et al. (2015). Since then, a variety of CNN-based models are proposed. Some notable examples are: VGG network Simonyan and Zisserman (2014), ResNet He et al. (2016), DenseNet Huang et al. (2017), and Inception V3 Szegedy et al. (2015). The effectiveness of CNNs has been also proven in various real-world applications Ji et al. (2013); Sainath et al. (2013).

Once CNNs have proven their effectiveness to classify images, Long et al. Long et al. (2015) were the first to discover how they can augment a given CNN model to make it suitable for the semantic segmentation task – they proposed the Fully Convolutional Neural Network (FCN) framework. This generic architecture can be used to adapt any CNN network used for classification into a segmentation model. Namely, the authors have shown that by replacing the last, fully connected layer, with an appropriate convolutions layer, so that they will upsample and restore the resolution of the input at the output layer, CNNs can be transformed to classify each individual pixel (instead of the whole image). The basic idea is as follows. The encoder is used to learn the feature maps, and is usually based on a pre-trained deep CNN for classification, such as ResNet, VGG, or Inception. The decoder part serves to upsample the discriminative features that the encoder has learned from the coarse-level feature map to the fine, pixel level. Long et al. Long et al. (2015) have shown that this upsampling (backward) computation can be efficiently performed using backward convolutions (deconvolutions). Moreover, this means that the specific CNN models, such as those mentioned above, can all be incorporated in the FCN framework for segmentation, giving rise to FCN-AlexNet Long et al. (2015), FCN-ResNet He et al. (2016), FCN-VGG16 Long et al. (2015), FCN-DenseNet Jégou et al. (2017) etc. We present a diagram of the generic FCN architecture in Figure 1.

Figure 1: The architecture of Fully Convolutional Neural Networks (FCNs) Long et al. (2015)

3 Materials and methods

Here, we first describe the study site, SAR, and reference data. This is followed by an in-depth description of the deep learning terminology and the models used in the study. We finish with the description of the experimental setup and the evaluation metrics.

3.1 Study site

Figure 2: Study area in Finland, with reference CORINE land cover data and schematic location of areas used for model training and accuracy assessment

Our study site is covering the area of Finland at latitudes from 61°to 67.5°. The processed area is shown in Figure 2. The study area includes central and northern areas of Finland, covered primarily by boreal forestland with inclusions of water bodies (primarily lakes), urban settlements and agricultural areas, as well as marshland and open bogs. We have omitted Lapland due to potential snow cover during the months of data acquisition. The terrain height variation is moderate and mostly within meters range.

3.2 SAR data

Presently, Sentinel-1 is a C-band SAR dual-satellite system with two satellites orbiting apart Torres et al. (2012), launched in 2014 and 2016, respectively. The operational acquisition modes are Stripmap (SM), Interferometric Wide-Swath (IW), Extra Wide Swath (EW), and Wave Mode (WV). The IW-mode is the default mode over land, providing 250 km wide swath composed of three sub-swaths, with single look image of at 5 m by 20 m spatial resolution. It uses the so-called TOPS (Terrain Observation with Progressive Scan) SAR mode.

The SAR data acquired by Sentinel-1 satellites in IW mode are used in the study. Altogether, Sentinel-1A images acquired during summers of 2015 and 2016 were used in the study, more concretely during June, July, and August in those two years. Their geographical coverage is schematically shown in Figure 2.

Original scenes were downloaded as Level-1 Ground Range Detected (GRD) products. They represent focused SAR data that has been detected, multi-looked and projected to ground-range using an Earth ellipsoid. They were orthorectified in ESA SNAP S1TBX software using local digital terrain model (with 2 meters resolution) available from National Land Survey of Finland. The pixel spacing of ortho-rectified scenes was set to 20 meters. Orthorectification included terrain flattening to obtain backscatter in gamma-nought format Small et al. (2012). The scenes were further re-projected to the ERTS89 / ETRS-TM35FIN projection (EPSG:3067) and resampled to a final pixel size of 20 metres.

3.3 Reference data

In Finland, the Finnish Environment Institute (SYKE) is responsible for production of the CORINE maps. While for most of the EU territory, the CORINE mask of spatial resolution is available, the national institutions might choose to create more precise maps, and SYKE, in particular, had produced a spatial resolution mask for Finland (Figure 3). Since then, the updates have been produced regularly, the latest one at the time of this study, which we used, being CLC2012. There are different land use classes in the map that can be hierarchically grouped into 4 CLC Levels. In detail, there are classes on CLC Level-3, classes on CLC Level-2, and top CLC Level-1 classes. According to the information provided by SYKE, the accuracy of the CLC Level-3 is , of the CLC Level-3, , and of the CLC Level-1, it is . The selected classes and their corresponding color codes used for our segmentation results are shown in Table 2.

Until the most recent CORINE production round, EEA member countries adopted national approaches for the production of CORINE. EEA Technical Guidelines include manual digitalization of land cover change based on visual interpretation of optical satellite imagery. In Finland, the European CLC was not applicable for the majority of national users due to large minimal mapping unit (MMU). Thus national version was produced with somewhat modified nomenclature of classes Kallio (2004b, a). The national high-resolution CLC2012 data is in raster format of 20 m, with corresponding MMU. In the provision of 2012 update of CLC, obtaining optical imagery over Scandinavia and Britain was particularly challenging because of the frequent clouds, thus calling for the use of radar imagery to meet user requirements on accuracy and coverage Balzter et al. (2015). CORINE map itself is built from high resolution satellite images acquired primarily during the summer and, to a smaller extent, during the spring months Büttner et al. (2004).

class R G B color
Water bodies (500) 0 191 255 blue
Peatland, bogs, and marshes (400) 173 216 230 light blue
Forested areas (300) 127 255 0 green
Agricultural areas (200) 222 184 135 brown
Urban fabric (100) 128 0 0 red
Table 2: CORINE CLC Level-1 classes and their color codes used in our classification results
Figure 3: Zoomed in area fragment with our reference data, i.e., CORINE shown on top (left) along with the Google Earth layer (right).

3.4 Semantic Segmentation Models

We selected following seven state-of-the-art Garcia-Garcia et al. (2017) semantic segmentation models to test for our land cover mapping task: SegNet Badrinarayanan et al. (2017), PSPNet Zhao et al. (2017), BiSeNet Yu et al. (2018), DeepLabV3+ Chen et al. (2018b, a), U-Net Ronneberger et al. (2015a); Howard et al. (2017), FRRN-B Pohlen et al. (2017), and FC-DenseNet Jégou et al. (2017). The models were selected to cover a wide set of approaches to semantic segmentation. In the following, we describe its specific architecture for each of these DL models. We will use the following common abbreviations: conv for convolution operation, concat for concatenation, max pool for max pooling operation, BN for batch normalisation, and ReLU for the rectified linear unit activation function.

BiSeNet (Bilateral Segmentation Network)

Figure 4: The architecture of BiSeNet. ARM stands for the Attention Refinement Module and FFM for the Feature Fusion Module introduced in the model’s paper Yu et al. (2018).

BiSeNet model is designed to decouple the functions of encoding additional spatial information and enlarging the receptive field, which are fundamental to achieving good segmentation performance. As can be seen in Figure 4, there are two main components to this model: Spatial Path (SP) and Context Path (CP). Spatial Path serves to encode rich spatial information. Context Path serves to provide sufficient receptive field and uses global average pooling and pre-trained Xception Chollet (2017) or ResNet He et al. (2016) as the backbone. The goal of the creators was not only to obtain superior performance but to achieve a balance between the speed and performance. Hence, BiSeNet is a relatively fast semantic segmentation model.

SegNet (Encoder-Decoder-Skip)

Figure 5: The architecture of SegNet-based Encoder-Decoder with Skip connections Badrinarayanan et al. (2017). Blue tiles represent Convolution + Batch Normalisation + ReLU, green tiles represent Pooling, red – Upsampling, and yellow – a softmax operation.

Similarly to BiSeNet, SegNet is also designed with computational performance in mind, this time, particularly during inference. Because of this, the network has a significantly smaller number of trainable parameters compared to most of the other architectures. The encoder in SegNet is based on VGG16: it consists of its first 13 convolutional layers, while the fully connected layers are omitted. Hence, the novelty of this network lies in its decoder part, as follows. The decoder consists of one decoder layer for each encoder layer and so it also has 13 layers. Each individual decoder layer utilizes max-pooling indices memorized from its corresponding encoder feature map. The authors have showed that this enhances boundary delineation between classes. Finally, the decoder output is sent to a multi-class soft-max function yielding classification for each pixel (see Figure 5).

Mobile U-Net

Mobile U-Net is based on the U-Net Ronneberger et al. (2015b) semantic segmentation architecture shown in Figure 6. In designing U-Net, Fully Convolutional approach was generally employed with a following modification. Their upsampling part of the architecture has no fully convolutional layer but is nearly symmetrical to the feature extraction part due to the use of the similar feature maps. This results in a u-shaped architecture (see Figure 6), and hence the name of the model. While originally developed for biomedical images, the U-net architecture has proven successful for image segmentation in other domains, as well. Here, we somewhat modify the U-Net architecture, according to MobileNets Howard et al. (2017) framework, to improve its efficiency. In particular, the MobileNets framework uses Depthwise Separable Convolutions, a form which factorizes standard convolutions (e.g., ) into a depthwise convolution (applied separately to each input band) and a pointwise () convolution to combine the outputs of depthwise convolution.

Figure 6: The architecture of U-Net Ronneberger et al. (2015b)

DeepLab-V3+

DeepLab-V3+ Chen et al. (2018b) is an improved version of DeepLab-V3 Chen et al. (2017), while the latter is an improved version the original DeepLab Chen et al. (2018a) model. This segmentation model does not follow the FCN framework like the previously discussed models. The main features that distinguish the DeepLab model from FCNs are the atrous convolutions for upsampling and the application of probabilistic machine learning models, concretely, conditional random fields (CRFs) for a finer localization accuracy in the final fully connected layer. Atrous convolutions, in particular, allow to enlarge the context from which the next layer feature maps are learned, while preserving the number of parameters (and, thus, the same efficiency). Using a chain of atrous convolutions allows to compute the final output layer of a CNN at an arbitrarily high resolution (removing the need for the upsampling part as used in FCNs). In the follow up work, proposing DeepLab-V3, Chen et al. Chen et al. (2017) change the approach to atrous convolutions to gradually double the atrous rates, and show that with an adapted version, their new algorithm outperforms the previous one, even without including the fully connected CRF layer. Finally, in their newest adaption to the model, called DeepLab-V3+, Chen et al. Chen et al. (2018b) turn to a similar approach to the FCNs, i.e., they add a decoder module to the architecture (see Figure 7). That is, they employ the features extracted by the DeepLab-V3 module in the encoder part, and add the decoder module consisting of and convolutions.

Figure 7: The architecture of DeepLabV3+ Chen et al. (2018b)
Figure 8: The architecture of FRRN-B. RU_n and FRRU_n stand for residual units and full-resolution residual units with n-channel convolutions, respectively. FRRUs simultaneously operate on the two streams Pohlen et al. (2017).

FRRN-B (Full-Resolution Residual Networks)

As we have seen, most of the semantic segmentation architectures are based on some form a FCN, and so they utilize existing classification networks, such on ResNet or VGG16 as encoders. We also discussed the main reason for such approaches, which is to take advantage of the learned weights from those architectures pretrained for the classification task. Nevertheless, one disadvantage of the FCN approach is that the resulting network outputs of the encoder part (particularly, after the pooling operations) are at a lower resolution, which deteriorates localization performance of the overall segmentation model. Pohlen et al. Pohlen et al. (2017) proposed to tackle this by having two parallel network streams processing the input image: a pooling and a residual stream (Figure 8). As the name says, the pooling stream performs successive pooling and then unpooling operations, and it serves to obtain good recognition of the objects and classes. The residual stream computes residuals at the full image resolution, which enables that low level features, i.e., object pixel-level locations, are propagated to the network output. The name of the model comes from its building blocks, i.e., full-resolution residual units. Each such a unit simultaneously operates on the pooling and the residual stream. In the original paper Pohlen et al. (2017), the authors propose two alternative architecture FRRN-A, and FRRN-B, and they show that FRRN-B achieves superior performance on the Cityscapes benchmark dataset. Hence, we employ the FRRN-B architecture.

PSPNet (Pyramid Scene Parsing Network)

Figure 9: The architecture of PSPNet Zhao et al. (2017)

Zhao et al. Zhao et al. (2017) propose the Pyramid Scene Parsing as a solution to the challenge of making the local predictions based on the local context only, and not considering the global image scene. In remote sensing, an example for this challenge happening could be when a model wrongly predicts the water with waves present in it as the dry vegetation class, because they appear similar and the model did not consider that these pixels are being part of a larger water surface, i.e., it missed the global context. In similarity to the other FCN-based approaches, PSPNet uses a pre-trained classification architecture to extract the feature map, in this case, ResNet. The main module of this network is the pyramid pooling, which is enclosed by a square in Figure 9. As can be seen in the Figure, this module fuses features at four scales: from the coarse (red) to the fine (green). Hence, the output of each level in the pyramid pooling module contains the feature map of a different resolution. In the end, the different features are stacked together yielding the final pyramid pooling global feature for predictions.

FC-DenseNet (Fully Convolutional DenseNets)

This semantic segmentation algorithm is built using DenseNet CNN Huang et al. (2017) as a basis for the encoder, followed by applying the FCN approach Jégou et al. (2017). The specificity of the DenseNet architecture is the presence of blocks where each layer is connected to all other layers in a feed-forward manner. Figure 10 shows the architecture of FC-DenseNet where the blocks are represented by the Dense Block units. According to Huang et al. (2017), such architecture scales well to hundreds of layers without any optimization issues, while yielding excellent results in classification tasks. In order to efficiently upsample the DenseNet feature maps, Jegou et al. Jégou et al. (2017) substitute the upsampling convolutions of FCNs by Dense Blocks and Transitions Up. The Transition Up modules consist of transposed convolutions, which are then concatenated with the outputs from the input skip connection (the dashed lines in Figure 10).

Figure 10: The architecture of FC-DenseNet Jégou et al. (2017)

3.5 Training approach

To accomplish better segmentation performance, there is an option to pre-train the semantic segmentation models (in particular, their encoder modules) using a larger set of available images of another type (such as natural images). Using the model pre-trained with natural images to continue training with the limited set of SAR images, the knowledge becomes effectively transferred from the natural to the SAR task Bengio (2012). To accomplish such transfer, we used the models whose encoders were pre-trained for the ImageNet classification task and fine-tuned them using our SAR dataset (described next).

3.6 Experimental Setup

In this section, we describe first how we prepared the SAR images for training with the deep learning models which are designed for natural images. Then we provide the details of our implementation.

SAR Data Preprocessing for Deep Learning

Sentinel-1 imagery comes in two polarization channels, each of them being informative about certain types of land cover. Hence, using their combination is expected to yield better land cover mapping results than using any of them independently. Moreover, the previous work has shown the benefits of also using the DEM model for land cover mapping Zhao et al. (2016). Hence, as the third layer, we used the DEM of Finland from the National Land Survey.

SAR backscatter for both polarizations was converted to decibels by applying the transformation. In addition, for the deep learning models, each band should be normalized so that the distribution of the pixel values would resemble a Gaussian distribution centered at zero. This is done to yield a faster convergence during the training. To normalize the data, each pixel value is subtracted by the mean of all pixels and then divided by their standard deviation. In addition, given that the semantic segmentation models expect pixel values in the range (0,255), we scaled the normalized data and also the DEM values to this range. Such preprocessed layers are then used to create the image dataset for training.

We named the created dataset SAR RGB-DEM. The naming comes from the process used to create the images in this dataset. Namely, one of the two channels of a Sentinel-1 image is assigned to R and the other to G channel. For the third, B channel, we use the DEM layer.

Train/Development and Test (Accuracy Assessment) Dataset

The images from the SAR RGB-DEM dataset needed to be split into partial images (further in the text called imagelets) for training. Thus, each imagelet represented an area of roughly . The first reason for this preprocessing is about the squared shape: some of the selected models required squared-shaped images. Some other of the models were flexible with the image shape and size but we wanted to make the setups for all the models the same so that their results are comparable. The second reason for the preprocessing is about the computational capacity: with our hardware setup (described below), this was the largest image size that we could work with.

Upon splitting the SAR RGB-DEM images, we discarded those imagelets that were completely outside the land mass area, as well as those for which we did not have a complete CORINE label (such as if they fell in part outside the Finnish borders). This resulted in more than imagelets of size .

Given the geography of Finland, to have representative training data, it seems useful to include imagelets from both northern and southern (including the large cities) parts of the country into the model training. On the other hand, some noticeable differences are found also in the gradient from east to west of the country. To achieve representative training dataset, we selected all imagelets between the longitudes of 24°and 28°for the accuracy assessment (model testing), and all the other imagelets for the model training (that is training and development in the computer vision terminology). In this way, we prevented the situation in which two images of the same area but acquired at different times are used one for training and the other one for testing. Images that were overlapping any border of the introduced strip were discarded. The procedure resulted in images in the training and development set and images in the test (accuracy assessment) set. Finally, we used from the training and development set for training and the rest for development of the deep learning models.

Architecture Base model Parameters
BiSeNet ResNet101 24.75M
SegNet VGG16 34.97M
Mobile U-Net Not applicable 8.87M
DeepLabV3+ ResNet101 47.96M
FRRN-B ResNet101 24.75M
PSPNet ResNet101 56M
FC-DenseNet ResNet101 9.27M
Table 3: The properties of the examined semantic segmentation architectures

Data Augmentation

Further, we have employed the data augmentation technique. The main idea behind the data augmentation is to enable improved learning by reusing original images with slight transformations such as rotation, flipping, adding Gaussian noise, or slightly changing the brightness. This provides additional information to the model and the dataset size is effectively increased. Moreover, an additional benefit of the data augmentation is in helping the model to learn some invariant data properties for which no examples are present in the original dataset. Given the sensitivity of the SAR backscatter, we did not want to augment the images in terms of the color, brightness, or by adding noise. However, we could safely employ rotations and flipping. For rotations, we only used the °increments, giving three possible rotated versions of an image. For image flipping, we applied horizontal and vertical flipping, or both at the same time, giving another three possible versions of the original image.6 Notice that our images are square, so the transformations did not change the image dimensions. Finally, we applied the online augmentation, as opposite to the offline version. In the online process, each augmented image is seen only once, and so this process yields a network that generalises better.

Implementation

To apply the described semantic segmentation models, we adapted the open-source Semantic Segmentation Suite. We used Python with TensorFlow Abadi et al. (2016) backend.

Hardware and Training Setup

We trained and tested separately each of the deep learning models on a single GPU (NVIDIA GeForce GTX 1080) on a machine with 32GB of RAM.

We used the RMSProp optimisation algorithm, learning rate of , and decay of the learning rate of . Each model was trained for an equal number of epochs and during the process, the checkpoint for the best model was saved. Then we used that model for evaluation on the test set and we report those results.

3.7 Evaluation Metrics

In the review on the metrics used in land cover classification, Costa et al. Costa et al. (2018) have found a lack of consistency, complicating intercomparison of different studies. To avoid such issues and ensure that our results are easily comparable with the literature, we thoroughly evaluated our models. For each model and class, we report the following measures of accuracy: precision, also known as producer’s accuracy (PA), recall, also known as user’s accuracy (UA), and overall accuracy and Kappa coefficient. The formulas are as follows.

For each segmentation class (land cover type) , we calculate precision (producer’s accuracy):

and recall (user’s accuracy):

where represents true positive, false positive, and false negative pixels for the class .

When it comes to accuracy Csurka et al. (2013), we calculate per class accuracy:7

and overall pixel accuracy:

where is the number of pixels having a ground truth label and being classified/predicted as , is the total number of pixels labelled with , and is the number of classes. All these metrics can take values from 0 to 1.

Finally, we also use a Kappa statistic (Cohen’s measure of agreement), indicating how the classification results compare to the values assigned by chance Cohen (1960). Kappa statistics can take values from 0 to 1. Starting from a by confusion matrix with elements , following calculations are done:

(1)
(2)
(3)

where the observed proportional agreement (effectively the overall accuracy), and are the row and column totals for classes and , and is the expected proportion of agreement. The final measure of agreement is given by such statistic Cohen (1960)

(4)

Depending on the value of Kappa, the observed agreement is considered as either poor (0.0 to 0.2), fair (0.2 to 0.4), moderate (0.4 to 0.6), good (0.6 to 0.8) or very good (0.8 to 1.0).

4 Results and Discussion

Using the experimental setup described in previous section, we evaluated the seven selected semantic segmentation models: SegNet Badrinarayanan et al. (2017), PSPNet Zhao et al. (2017), BiSeNet Yu et al. (2018), DeepLabV3+ Chen et al. (2018b, a), U-Net Ronneberger et al. (2015a); Howard et al. (2017), FRRN-B Pohlen et al. (2017), and FC-DenseNet Jégou et al. (2017). The overall classification performance statistics for all studied models is gathered in Table 4. Figure 11 shows maps produced for several imagelets with the best performing model, FC-DenseNet. Obtained results are compared to prior work and classification performance for different land cover classes is discussed further.

{adjustbox}

angle=90 LC classes Test scale () Accuracy (UA, PA %) BiSeNet DeepLabV3+ SegNet FRRN-B U-Net PSPNet FC-DenseNet Urban fabric (100) 10816 26, 21 15, 14 36, 31 38, 30 45, 25 38, 18 62, 27 Agricultural areas (200) 25160 49, 51 50, 49 69, 66 68, 68 66, 66 53, 48 72, 71 Forested areas (300) 285462 90, 91 88, 96 93, 94 92, 95 92, 95 89, 95 93, 96 Peatland, bogs and marshes (400) 20990 54, 43 56, 13 67, 57 71, 55 70, 52 65, 31 74, 58 Water bodies (500) 53564 85, 91 94, 92 96, 96 95, 96 96, 96 94, 94 96, 96 Overall Accuracy (%) 83.86 85.49 89.03 89.27 89.25 86.51 90.66 Kappa 0.641 0.649 0.754 0.758 0.754 0.680 0.785 Average inference time (s) 0.0389 0.0267 0.0761 0.1424 0.0848 0.0495 0.1930

Table 4: Summary of the classification performance and efficiency of various Deep Learning models (UA-user’s accuracy, PA - producer’s accuracy, average inference time is per image in the dataset)
FC-DenseNet103
CLC2012 Sentinel-1 class
urban water forest field peatland total PA
1 7301999 413073 15892771 3212839 221476 27042158 27.0
2 78331 128294872 3457634 171029 1935276 133937142 95.8
3 3663698 2703632 686788977 12795703 7730444 713682454 96.2
4 766200 121609 16527970 44866048 620934 62902761 71.3
5 56097 1866020 19164137 1091008 30309189 52486451 57.8
total 11866325 133399206 741831489 62136627 40817319 990050966
UA 61.5 96.2 92.6 72.2 74.3 90.7
Table 5: Confusion matrix for classification with FC-DenseNet model
Figure 11: Illustration of the FC-DenseNet model performance: selection of classification results, i.e., direct output of the network, without any post-processing (bottom row) versus reference Corine data (upper row)

4.1 Classification Performance

All the models performed relatively well in terms of classification, achieving the overall accuracy above . Three models performed particularly well, achieving the accuracy score above : SegNet, FRRN-B, and the best performing model FC-DenseNet, which achieved the accuracy of .

Before further analysis, let us recall that CORINE is not exclusively a land cover map, but rather land cover and land use map, thus for specific classes can differ from ecological classes observed by Sentinel-1. Also, the aggregation to Level-1 is sometimes not strictly “ecological” or complies to physics surface scattering considerations. For example, roads, airports, major industrial areas and road network often exhibit areas similar to field, presence of trees and green vegetation near summer cottages can cause them exhibit signatures close to forest rather than urban, sometimes forest on the rocky terrain can be misclassified as urban instead due to presence of very bright targets and strong disruptive features, while confusion between peatland and field areas is also often a common place. Finally, the accuracy of the CORINE data is only somewhat higher than 90%.

As for the results across the different land classes, all the models performed particularly well in recognising the water bodies and forested areas, while the urban fabric represented the most challenging class for all the models. We expect that the inclusion of the DEM as one layer in the training images has helped to achieve good results on the water bodies class for most of the models (except for BiSeNet, all the models achieved both the user and producer accuracy above ). The urban class was particularly challenging for the following main reasons. First, this class changes the most, as new houses, roads, and urban areas are built. While we took the most suitable available CORINE class in terms of time for our Sentinel-1 images, there are almost certain differences between the urban class as it was in 2012 and in 2015-2016. Second, the CORINE map itself does not have a perfect accuracy, neither aggregation rules are perfect. As a matter of fact, in majority of studies where SAR based classification was done versus CLC or similar data, a poor or modest overall agreement was observed for this class Lonnqvist et al. (2010); Lumsdon et al. (2005); Antropov et al. (2012, 2014), while the user’s accuracy was strongly higher than producer’s Antropov et al. (2011). The latter is exactly due to radar being able to sense sharp boundaries and bright targets very well whereas such bright targets often don’t dominate the whole CORINE Level-1 urban class. We argue that any inaccuracies present will be particularly attenuated in our models for the urban class because of the sharp and sudden boundary changes in this class, unlike for the others, such as forest and water. The top performing model, i.e., FC-DenseNet, performed the best across all the classes. It is particularly notable that it achieved the user accuracy, i.e., precision for the urban class of , improving on it significantly compared to all the other models. Nevertheless, its score on the producer accuracy, i.e., recall on this class of is outperformed by the two other top models, i.e., SegNet and FRRN-B.

We mentioned the issues of SAR backscattering sensitivity to several ground factors so that the same classes might appear differently on the images between countries or between distant areas within a country. An interesting indication of our study, however, is that the deep learning models might be able to deal with this issue. Namely, we used the models pre-trained on ImageNet and fine tuned them with a relatively small number () of Sentinel-1 scenes. The models learned to recognize varying types of the backscattering signal across the country of Finland. This indicates that with a similar type of fine-tuning, present models could be relatively easily adapted to the other areas and countries, with different SAR backscattering patterns. Such robustness and adaptability of the deep learning models come from their automatic learning of feature representation, without the need for a human pre-defining those features.

4.2 Computational Performance

The training times with our hardware configuration took from days up to weeks for the different models. This could be significantly improved by training each model using a multi-GPU system instead of a single-GPU, as we did.

In terms of the inference time, we also saw the differences in the performance. In Table 4, we present the average inference time per the imagelets that we worked with. The results show that there is a trade-off between classification and computational performance: the best models in terms of classification results (i.e., FC-DenseNet and FRRN-B) take several times longer inference time compared to the rest. Depending on the application, this might not be of particular importance.

4.3 Comparison to Similar Work

Obtained results compare favourably to previous similar studies on land cover classification with SAR data Antropov et al. (2012, 2014); Lonnqvist et al. (2010); Lumsdon et al. (2005); Laurin et al. (2013); Balzter et al. (2015). Depending on the level of classes aggregation (4-5 major classes or more), with using mostly statistical or classical machine learning approaches reported classification accuracies were as high as 80-87% to as low as 30% when only SAR imagery were used.

Two recent studies that employed neural networks to SAR imagery classification (albeit in combination with satellite optical data) for land cover mapping were Laurin et al. (2013) and Kussul et al. (2017), with reported classification accuracies of up to 97.5% and %, respectively.

The best model in our experiments achieved the overall accuracy of . However, our results are obtained using solely the SAR imagery. In contrast, SAR imagery (PALSAR) alone yielded the overall accuracy of 78.1% in Laurin et al. (2013). The types of classes they studied are also different compared to ours (crops versus vegetation versus land cover types) and our study is performed on a larger area. Importantly, the previous studies have applied different types of models (regular NNs versus CNN versus semantic segmentation). In particular, the CNN models work on the resolution windows, while we have applied more advanced semantic segmentation models, which work on the level of a pixel. Keeping in mind findings from Laurin et al. (2013) that the addition of optical images on top of SAR improved the results for over 10%, we expect that our models would perform comparably well or outperform these previous works if applied to a combined SAR and optical imagery.

In terms of the deep learning setup, the most similar to ours are the studies Mahdianpari et al. (2018) and Mohammadimanesh et al. (2019). However, RapidEye optical imagery at 5 m spatial resolution was used in Mahdianpari et al. (2018), and the test site was considerably smaller. Study Mohammadimanesh et al. (2019), similar to our research, relied exclusively on SAR imagery, however, fully polarimetric images, and acquired by RADARSAT-2 at considerably better resolution. They have developed an FCN-type of a semantic segmentation model ‘specifically designed for the classification of wetland complexes using PolSAR imagery’. Using this model to classify eight wetland map classes, they achieved the overall accuracy of %. However, because their model is designed specifically for wetland complexes, it is not clear if such a model would generalize to other types of areas. Compared to our study, they have focused on a considerably smaller area (nearly the size of a single imagelet we used), and on a very specific task (wetland types mapping). Thus, it is not readily clear how general their approach is and how it compares to our presented approach.

4.4 Outlook and Future Work

There are several lines for potential improvement based on the results of this study, as well as future work directions.

First, using even a larger set of Sentinel-1 images can be recommended since for the supervised deep learning models large amounts of data are crucial. Here, we processed only imagelets altogether, but deep learning algorithms become efficient typically only once they are trained with hundreds of thousands or millions of images.

Second, if SAR images and reference data of a higher resolution are used, we expect better classification performance, too, as smaller details could be potentially captured. Also, better agreement in acquisition timing of reference and SAR imagery can be recommended. The reference and training data should come from the same months or year if possible, and that the reference maps should represent the reality as accurately as possible. The models in our experiments were certainly limited by the CORINE’s own limited accuracy.

Third, in this study we have tested the effectiveness of off-the-shelf deep learning models for land cover mapping from SAR data. While the results show their effectiveness, it is also likely that the novel types of models, specifically developed for the radar data (such as Mohammadimanesh et al. (2019)), will yield even better results. Based on our results, we suggest DenseNet-based models as a starting point. In particular, one could develop the deep learning models to handle directly the SLC data which preserve the phase information.

Focusing on a single season is both an advantage and a limitation. Importantly, we have avoided confusion between SAR signatures varying seasonally for several land cover classes. However, multitemporal dynamics itself can be potentially used as an additional useful class-discriminating parameter. Incorporating seasonal dynamics of each land cover pixel (as a time series) is left for future work, perhaps with additional need to incorporate recurrent neural networks into the approach.

As discussed in Section 3.1.1, it could be suitable to use more detailed (specific) land cover classes, as aggregation of smaller LC classes into Level-1 CORINE classes is not exactly ecological, leading to mixing several distinct SAR signatures in one class, and thus causing additional confusion for the classifier. Later, classified specific classes can be aggregated into larger classes, potentially showing improved performance Hame et al. (2013).

Finally, we have used only SAR images and a freely-available DEM model for the presented large-scale land cover mapping. If one were to combine other type of remote sensing images, in particular the optical images, we expect that the results would significantly improve. This is true for those areas where such imagery can be collected due to cloud coverage, while in operational scenario it would potentially require use of at least two models (with and without optical satellite imagery). It is also important to access added value of SAR imagery with deep learning models when optical satellite images are available, as well as possible data fusion and decision fusion scenarios, before a decision on the mapping approach is done Hame et al. (2013).

5 Conclusion

Our study demonstrated the potential for applying state-of-the-art semantic segmentation models to SAR image classification with high accuracy. Several models were benchmarked in a countrywide classification experiment using Sentinel-1 IW-mode SAR data, reaching nearly 91% overall classification accuracy with the best performing model (FC-DenseNet). Given that the 14 used Sentinel-1 scenes resulted in 7K training images, this indicates strong potential for using pre-trained CNNs for further fine-tuning and seems particularly suitable when the number of training images is limited (to thousand or tens of thousands instead of millions). In addition to suggesting the best candidate semantic segmentation models for land cover mapping with SAR data (that is, the DenseNet-based models), our study offers baseline results against which the newly proposed models should be evaluated. Several possible improvements for the future work were identified, including the necessity for testing multitemporal approaches, data fusion, and very high-resolution SAR imagery, as well as developing models specifically for SAR, and will be addressed in future work.

Footnotes

  1. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
  2. journal: Journal of LaTeX Templates
  3. https://land.copernicus.eu/pan-european/corine-land-cover
  4. http://image-net.org/challenges/LSVRC/2012
  5. https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models
  6. Vertical flip operation switches between top-left and bottom-left image origin (reflection along the central horizontal axis), and horizontal flip switches between top-left and top-right image origin (reflection along the central vertical axis)
  7. Effectively, per class accuracy is defined as the recall obtained on each class.

References

  1. Tensorflow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. Cited by: §3.6.4.
  2. Dual and single polarized SAR image classification using compact convolutional neural networks. Remote Sensing 11 (11), pp. 1340. Cited by: §1.2.
  3. High spatial resolution land use and land cover mapping of the brazilian legal amazon in 2008 using landsat-5/tm and modis data. Acta Amazonica 46 (3), pp. 291–302. Cited by: §1.1, §1.
  4. Land cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network. IEEE Transactions on Geoscience and Remote Sensing 52 (9), pp. 5256–5270. External Links: Document Cited by: §1.1, §1.1, §1.1, §1.1, §1.3, §4.1, §4.3.
  5. Volume scattering modeling in PolSAR decompositions: study of ALOS PALSAR data over boreal forest. IEEE Transactions on Geoscience and Remote Sensing 49 (10), pp. 3838–3848. Cited by: §4.1.
  6. PolSAR mosaic normalization for improved land-cover mapping. IEEE Geoscience and Remote Sensing Letters 9 (6), pp. 1074–1078. Cited by: §1.3, §4.1, §4.3.
  7. Advances in remote sensing of agriculture: context description, existing operational monitoring systems and major information needs. Remote Sensing 5 (2), pp. 949–981. External Links: Document Cited by: §1.1.
  8. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 39 (12), pp. 2481–2495. Cited by: §1.3, Figure 5, §3.4, §4.
  9. Mapping CORINE land cover from Sentinel-1A SAR and SRTM digital elevation model data using random forests. Remote Sensing 7 (11), pp. 14876–14898. External Links: Document Cited by: §1.1, §1.1, §1.1, §3.3, §4.3.
  10. Fusion of quickbird ms and radarsat sar data for urban land-cover mapping: object-based and knowledge-based approach. International Journal of Remote Sensing 31 (6), pp. 1391–1410. Cited by: §1.1.
  11. Deep learning of representations for unsupervised and transfer learning. In Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 17–36. Cited by: §3.5.
  12. The concept of essential climate variables in support of climate research, applications, and policy. Bulletin of the American Meteorological Society 95 (9), pp. 1431–1443. Cited by: §1.
  13. CORINE land cover technical guide: addendum 2000. Cited by: §1.
  14. An advanced system for the automatic classification of multitemporal sar images. IEEE Transactions on Geoscience and Remote Sensing 42 (6), pp. 1321–1334. External Links: Document Cited by: §1.1.
  15. The CORINE land cover 2000 project. EARSeL eProceedings 3 (3), pp. 331–346. Cited by: §1, §3.3.
  16. CORINE land cover and land cover change products. In Land Use and Land Cover Mapping in Europe, pp. 55–74. Cited by: §1.
  17. Multi-temporal polarimetric radarsat-2 for land cover monitoring in northeastern ontario, canada. Remote Sensing 6 (3), pp. 2372–2392. External Links: Link, ISSN 2072-4292, Document Cited by: §1.1.
  18. Land cover mapping of wetland areas in an agricultural landscape using sar and landsat imagery. Journal of Environmental Management 90 (7), pp. 2270–2277. Cited by: §1.1.
  19. Global land cover mapping at 30 m resolution: a pok-based operational approach. ISPRS Journal of Photogrammetry and Remote Sensing 103, pp. 7–27. Cited by: §1.
  20. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE transactions on pattern analysis and machine intelligence 40 (4), pp. 834–848. Cited by: §1.3, §3.4.4, §3.4, §4.
  21. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587. Cited by: §3.4.4.
  22. Encoder-decoder with atrous separable convolution for semantic image segmentation. arXiv preprint arXiv:1802.02611. Cited by: §1.3, Figure 7, §3.4.4, §3.4, §4.
  23. Aircraft detection by deep belief nets. In Pattern Recognition (ACPR), 2013 2nd IAPR Asian Conference on, pp. 54–58. Cited by: §1.2.
  24. Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geoscience and remote sensing letters 11 (10), pp. 1797–1801. Cited by: §1.2.
  25. Deep learning-based classification of hyperspectral data. IEEE Journal of Selected topics in applied earth observations and remote sensing 7 (6), pp. 2094–2107. Cited by: §1.2, §1.2.
  26. Xception: deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258. Cited by: §3.4.1.
  27. Fusion of sentinel-1a and sentinel-2a data for land cover mapping: a case study in the lower magdalena region, colombia. Journal of Maps 13 (2), pp. 718–726. External Links: Document Cited by: §1.1.
  28. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 (1), pp. 37 – 46. Cited by: §3.7, §3.7.
  29. Landsat’s role in ecological applications of remote sensing. BioScience 54 (6), pp. 535–545. External Links: Document Cited by: §1.1.
  30. Supervised methods of image segmentation accuracy assessment in land cover mapping. Remote sensing of environment 205, pp. 338–351. Cited by: §3.7.
  31. What is a good evaluation measure for semantic segmentation?. In BMVC, Vol. 27, pp. 2013. Cited by: §3.7.
  32. Land use and land cover mapping in the brazilian amazon using polarimetric airborne p-band sar data. IEEE Transactions on Geoscience and Remote Sensing 46 (10), pp. 2956–2970. Cited by: §1.1.
  33. Combined Landsat and L-band SAR data improves land cover classification and change detection in dynamic tropical landscapes. Remote Sensing 10 (2). External Links: ISSN 2072-4292, Document Cited by: §1.1.
  34. Knowledge-based land-cover classification using ERS-1/JERS-1 SAR composites. IEEE Transactions on Geoscience and Remote Sensing 34 (1), pp. 83–99. External Links: Document, ISSN 0196-2892 Cited by: §1.1.
  35. SAR image segmentation based on convolutional-wavelet neural network and markov random field. Pattern Recognition 64, pp. 255–267. Cited by: §1.2.
  36. Characterization of land cover types in terrasar-x images by combined analysis of speckle statistics and intensity information. IEEE Transactions on Geoscience and Remote Sensing 49 (6), pp. 1911–1925. External Links: Document, ISSN 0196-2892 Cited by: §1.1.
  37. Using alos/palsar and radarsat-2 to map land cover and seasonal inundation in the brazilian pantanal. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 3 (4), pp. 560–575. External Links: Document, ISSN 1939-1404 Cited by: §1.1.
  38. A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857. Cited by: §3.4.
  39. Deep recurrent neural networks for land-cover classification using Sentinel-1 InSAR time series. In IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Vol. , pp. 473–476. External Links: Document, ISSN 2153-6996 Cited by: §1.1.
  40. Mapping and monitoring carbon stocks with satellite observations: a comparison of methods. Carbon Balance and Management 4. External Links: Document Cited by: §1.1.
  41. Deep learning. Cited by: §1, Table 1, §2.
  42. Intra-annual reflectance composites from sentinel-2 and landsat for national-scale crop and land cover mapping. Remote sensing of environment 220, pp. 135–151. Cited by: §1.
  43. Improved mapping of tropical forests with optical and SAR imagery, part I: forest cover and accuracy assessment using multi-resolution data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6 (1), pp. 74–91. Cited by: §1.1, §1.1, §1.1, §4.4, §4.4.
  44. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §2, §2, §3.4.1.
  45. Completion of the 2011 national land cover database for the conterminous united states–representing a decade of land cover change information. Photogrammetric Engineering & Remote Sensing 81 (5), pp. 345–354. Cited by: §1.
  46. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. Cited by: §1.3, §3.4.3, §3.4, §4.
  47. Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sensing 7 (11), pp. 14680–14707. Cited by: §1.2.
  48. Densely connected convolutional networks.. In CVPR, Vol. 1, pp. 3. Cited by: §2, §3.4.7.
  49. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology 160 (1), pp. 106–154. Cited by: §2.
  50. Surface object recognition with cnn and svm in landsat 8 images. In 2015 14th IAPR International Conference on Machine Vision Applications (MVA), Vol. , pp. 341–344. External Links: Document, ISSN Cited by: §1.2.
  51. The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pp. 1175–1183. Cited by: §1.3, §2, Figure 10, §3.4.7, §3.4, §4.
  52. 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence 35 (1), pp. 221–231. Cited by: §2.
  53. Finnish corine land cover 2000 classification.. XXth ISPRS Congress, Anchorage, US (), pp. . Cited by: §3.3.
  54. The production of finnish corine land cover 2000 classification.. XXth ISPRS Congress, Istanbul, Turkey (), pp. . Cited by: §3.3.
  55. A meta-analysis of remote sensing research on supervised pixel-based land-cover image classification processes: General guidelines for practitioners and future research. Remote Sensing of Environment 177, pp. 89–100. Cited by: §1.1.
  56. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1, Table 1, §2.
  57. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geoscience and Remote Sensing Letters 14 (5), pp. 778–782. Cited by: §1.2, §1.2, §4.3.
  58. Optical and sar sensor synergies for forest and land cover mapping in a tropical site in west africa. International Journal of Applied Earth Observation and Geoinformation 21, pp. 7–16. Cited by: §1.1, §4.3, §4.3, §4.3.
  59. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361 (10), pp. 1995. Cited by: §1.2.
  60. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §1.2, §1, §2.
  61. A comparative analysis of alos palsar l-band and radarsat-2 c-band data for land-cover classification in a tropical moist region. ISPRS Journal of Photogrammetry and Remote Sensing 70, pp. 26–38. External Links: Document Cited by: §1.1.
  62. Homogeneous region segmentation for SAR images based on two steps segmentation algorithm. In Computers, Communications, and Systems (ICCCS), International Conference on, pp. 196–200. Cited by: §1.2.
  63. Hyperspectral classification via deep networks and superpixel segmentation. International Journal of Remote Sensing 36 (13), pp. 3459–3482. External Links: Document Cited by: §1.2.
  64. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: Figure 1, Table 1, §2.
  65. Assessment of alos palsar 50 m orthorectified fbd data for regional land cover classification by support vector machines. IEEE Transactions on Geoscience and Remote Sensing 49 (6), pp. 2135–2150. External Links: Document, ISSN 0196-2892 Cited by: §1.1.
  66. Polarimetric SAR data in land cover mapping in boreal zone. IEEE Transactions on Geoscience and Remote Sensing 48 (10), pp. 3652–3662. External Links: Document, ISSN 0196-2892 Cited by: §1.1, §1.1, §1.1, §4.1, §4.3.
  67. Polarimetric classification of land cover for Glen Affric radar project. IEE Proceedings - Radar, Sonar and Navigation 152 (6), pp. 404–412. External Links: Document, ISSN 1350-2395 Cited by: §1.1, §4.1, §4.3.
  68. Multiview deep learning for land-use classification. IEEE Geoscience and Remote Sensing Letters 12 (12), pp. 2448–2452. Cited by: §1.2.
  69. Very deep convolutional neural networks for complex land cover mapping using multispectral remote sensing imagery. Remote Sensing 10 (7), pp. 1119. Cited by: §1.2, §1.2, §4.3.
  70. A new fully convolutional neural network for semantic segmentation of polarimetric SAR imagery in complex land cover ecosystem. ISPRS Journal of Photogrammetry and Remote Sensing 151, pp. 223 – 236. External Links: ISSN 0924-2716 Cited by: §1.2, §4.3, §4.4.
  71. Mapping rice extent and cropping scheme in the mekong delta using sentinel-1a data. Remote Sensing Letters 7 (12), pp. 1209–1218. Cited by: §1.1.
  72. Multi-temporal radarsat-2 polarimetric sar data for urban land-cover classification using an object-based support vector machine and a rule-based approach. International Journal of Remote Sensing 34 (1), pp. 1–26. External Links: Document Cited by: §1.1.
  73. Integration of multitemporal/polarization c‐band sar data sets for land‐cover classification. International Journal of Remote Sensing 29 (16), pp. 4667–4688. External Links: Document Cited by: §1.1.
  74. Variations of microwave scattering properties by seasonal freeze/thaw transition in the permafrost active layer observed by ALOS PALSAR polarimetric data. Remote Sensing 7 (12), pp. 17135–17148. External Links: ISSN 2072-4292, Document Cited by: §1.1.
  75. Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vol. , pp. 44–51. External Links: Document, ISSN 2160-7508 Cited by: §1.2.
  76. Full-resolution residual networks for semantic segmentation in street scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4151–4160. Cited by: §1.3, Figure 8, §3.4.5, §3.4, §4.
  77. U-net: convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), LNCS, Vol. 9351, pp. 234–241. Note: (available on arXiv:1505.04597 [cs.CV]) External Links: Link Cited by: §1.3, §3.4, §4.
  78. U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241. Cited by: Figure 6, §3.4.3.
  79. Imagenet large scale visual recognition challenge. International journal of computer vision 115 (3), pp. 211–252. Cited by: §2.
  80. Deep convolutional neural networks for LVCSR. In Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on, pp. 8614–8618. Cited by: §2.
  81. C-band SAR data for mapping crops dominated by surface or volume scattering. IEEE Geoscience and Remote Sensing Letters 11 (2), pp. 384–388. Cited by: §1.1.
  82. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. Cited by: §1, §2.
  83. Potential of different optical and SAR data in forest and land cover classification to support redd+ mrv. Remote Sensing 10 (6). External Links: ISSN 2072-4292, Document Cited by: §1.1.
  84. Terrain-flattened gamma nought Radarsat-2 backscatter. Canadian Journal of Remote Sensing 37 (5), pp. 493–499. Cited by: §3.2.
  85. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §2.
  86. Assessment of land-cover data for land-surface modelling in regional climate studies. Boreal Environment Research 20 (2), pp. 243–260. Cited by: §1.1, §1.
  87. Cropland classification using Sentinel-1 time series: methodological performance and prediction uncertainty assessment. Remote Sensing 11 (21). External Links: ISSN 2072-4292, Document Cited by: §1.1.
  88. GMES Sentinel-1 mission. Remote Sensing of Environment 120, pp. 9–24. External Links: Document Cited by: §1, §1, §3.2.
  89. Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions. ISPRS Journal of Photogrammetry and Remote Sensing 105, pp. 272–285. Cited by: §1.2, §1.2, §1.2.
  90. Land cover characterization and classification of arctic tundra environments by means of polarized synthetic aperture x- and c-band radar (polsar) and landsat 8 multispectral imagery — richards island, canada. Remote Sensing 6 (9), pp. 8565–8593. External Links: Link, ISSN 2072-4292, Document Cited by: §1.1.
  91. Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications. Remote Sensing of Environment 199, pp. 415–426. Cited by: §1.1.
  92. Sincohmap: land-cover and vegetation mapping using multi-temporal Sentinel-1 interferometric coherence. In IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, Vol. , pp. 6631–6634. External Links: Document, ISSN 2153-6996 Cited by: §1.1.
  93. Deep hierarchical representation and segmentation of high resolution remote sensing images. In Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International, pp. 4320–4323. Cited by: §1.2, §1.2, §1.2.
  94. Multi-pixel simultaneous classification of polsar image using convolutional neural networks. Sensors 18 (3), pp. 769. Cited by: §1.2.
  95. Classifier ensembles for land cover mapping using multitemporal sar imagery. ISPRS Journal of Photogrammetry and Remote Sensing 64 (5), pp. 450–457. External Links: Document Cited by: §1.1.
  96. Classifier ensembles for land cover mapping using multitemporal sar imagery. ISPRS Journal of Photogrammetry and Remote Sensing 64 (5), pp. 450–457. Cited by: §1.1, §1.1.
  97. Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks. Remote Sensing 10 (3), pp. 407. Cited by: §1.2.
  98. Bisenet: bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 325–341. Cited by: §1.3, Figure 4, §3.4, §4.
  99. Scene classification via a gradient boosting random convolutional network framework. IEEE Transactions on Geoscience and Remote Sensing 54 (3), pp. 1793–1802. Cited by: §1.2.
  100. {}-Regularized deconvolution network for the representation and restoration of optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing 52 (5), pp. 2617–2627. Cited by: §1.2.
  101. Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geoscience and Remote Sensing Magazine 4 (2), pp. 22–40. Cited by: §1.2, §1.2.
  102. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890. Cited by: §1.3, Figure 9, §3.4.6, §3.4, §4.
  103. Detailed dynamic land cover mapping of chile: accuracy improvement by integrating multi-temporal data. Remote Sensing of Environment 183, pp. 170–185. Cited by: §1.1, §1, §3.6.1.
  104. Deep learning in remote sensing: a review. arXiv preprint arXiv:1710.03959. Cited by: §1.2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
414446
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description