Rule based End-to-End Learning Framework for Urban Growth Prediction

Rule based End-to-End Learning Framework for Urban Growth Prediction

Saptarshi Pal (Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, India.) Soumya K Ghosh (Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, India.)

Due to the rapid growth of urban areas in the past decades, it has become increasingly important to model and monitor urban growth in mega cities. Although several researchers have proposed models for simulating urban growth, they have been primarily dependent on various manually selected spatial and nonspatial explanatory features for building models. A practical difficulty with this approach is manual selection procedure, which tends to make model design process laborious and non-generic. Despite the fact that explanatory features provide us with the understanding of a complex process, there has been no standard set of features in urban growth prediction over which scholars have consensus. Hence, design and deploying of systems for urban growth prediction have remained challenging tasks. In order to reduce the dependency on human devised features, we have proposed a novel End-to-End prediction framework to represent remotely sensed satellite data in terms of rules of a cellular automata model in order to improve the performance of urban growth prediction. Using our End-to-End framework, we have achieved superior performance in Figure of Merit, Producer’s accuracy, User’s accuracy, and Overall accuracy metrics respectively over existing learning based methods.

Keywords— End-to-End learning, Decision trees, Urban growth prediction, Landsat, Representation Learning.

1 Introduction

Urban regions are fast growing entities in the modern world. Areas which have been outskirts of a city/town twenty years back are now habituated and developed into urban form. Research has shown that uncontrolled sprawl of urban areas has potentially adverse effects on the environment in terms of biodiversity, loss of habitat and change of landscape ([4, 20]). United nations have predicted based on past data that urban population will rise from a current proportion of per cent to a proportion of percent by the year 111 With the uncontrolled expansion of cities, it has become important to monitor and control changes in the urban landuse.

A fundamental difficulty in monitoring urban areas is due to its huge spread, which makes manual tracking of global landuse/built-up changes intractable. However, there are several remote sensing satellites periodically capturing images of the earth surface, which contains plenty of real time information regarding change of urban landscape. Hence, satellite imagery has been considered as an effective component for real time monitoring of urban growth.

Urban growth monitoring using satellite derived products has been addressed by several researchers ([20, 28, 18]). Models using cellular automaton, agent based methods, neural networks, logistic regression, fractals have been developed for robust and accurate prediction of urban growth ([19, 6, 21, 1]). Cellular automata is a widely preferred model for urban growth prediction due to its simplicity. Several CA models and corresponding case studies have been done on various major cities for detecting urban change, for instance, Vancouver Britain, Wuhan China etc ([1, 26, 15, 30, 9]). Recently, artificial neural networks have also been quite popular in the land use change prediction ([28, 25]) and studies using the combination of neural networks and cellular automata models have been proposed ([23]). On the other hand, support vector machines (SVM), Decision trees and Random Forest based models have also been proposed ([26, 15, 2, 14, 17]). Besides, logistic regression has been quite suitable for urban growth prediction in order to assess the driving forces behind urban growth ([13]). Overall, among the learning techniques used for prediction, there have been broadly four categories which have dominated the urban growth community, namely logistic regression, artificial neural networks, support vector machines and inductive learning approaches ([21]).

Each of the models developed using learning methodologies mentioned above has utilized spatial and non-spatial explanatory features for predicting urban growth ([21]). Spatial features such as distance from urban areas, water bodies, swamps etc. are derived from satellite imagery manually using supervised land use classification methods ([28, 29]). Other kinds of spatial factors such as elevation, distance from roads and railways are derived from other open spatial data repositories like Open Street Maps222 and US Shuttle Radar Topography Mission333 Two issues of this method are discussed as follows.

  • It has been evident from practical experience ([5]) that feature engineering significantly affects results of machine learning algorithms. At present, most of the urban growth prediction models use distance based features, for instance, distance from roads, city centers, railways, water bodies etc. and the distance metric is generally euclidean distance. Although the distance from certain regions does determine built-up, that may not be always euclidean distance. Moreover, there are other distance metrics, for instance, Minkowski’s distance, Mahalanobis distance, etc which have not been sufficiently used in urban growth prediction modeling. It may be possible to achieve better performance using other distance metrics. The issue, in this case, is a selection of suitable distance based criterion which requires manual effort.

  • The spectral properties can also be a crucial feature as different spectral properties indicate different landuse and different landuse may have different growth patterns. Hence another limitation of distance based features is that they are unable to capture the spectral properties of landuse.

  • Due to the disparity in the geography of our earth surface and noise in satellite imagery, it is often excessively labor-intensive to derive all the different kinds of landuse feature maps from satellite imagery.

We intend to address these issues by introducing a framework in which one can represent the data in a certain format and then use a classifier directly to generate approximate features. The features are approximate because they will be generated in an encoded form. This form we refer to as the knowledge structure and it varies according to the classifier which we use for knowledge representation. Since we do not know apriori what kinds of features a classifier can generate from data, therefore it is a challenge to find out which knowledge structure is optimal for a particular task.

The idea of End-to-End learning is to bypass intermediate feature engineering steps in model building and design a model that can directly learn robust representations from input data ([5, 10]). End-to-End learning is a paradigm of machine learning that has recently gained importance due to the advent of Representation learning and high-performance computing infrastructures. It has delivered exceptional performance in domains like image classification, speech processing, image segmentation etc ([16, 3, 11]). Majority of these techniques demonstrate the existence of complex relationships between various real world objects/phenomenon and conclusions that we draw from it. These complex relations are stored in our memory in terms of intricate knowledge structures which are hard to decode but makes human beings quite efficient in motor skills.

Inspired by the above works, we put forward our hypothesis: Automatically extracted information from remotely sensed data can be an important feature in modeling urban growth. The primary objective of this paper is to utilize the concept of End-to-End in developing a novel framework for prediction of urban growth. The proposed framework utilizes unsupervised representation learning and supervised classification methods to learn rules of an urban growth cellular automata model from raw satellite data to form various knowledge structures. Our proposed framework gives superior prediction performance than existing learning based methodologies in terms of various metrics which evaluates urban growth simulation performance.

It should be noted that we are not suggesting to shun the use of distance based features. We also do not claim that only using satellite imagery is good enough to build an urban growth model. On the contrary, this paper provides a initial framework which can be upgraded further to plug-in other kinds of features, including distance based features with the constraint that such an addition needs minimum or no manual selection. This is a key feature of End-to-End systems as the performance of such a system is dependent more on paramters and less on human ability to recognize patterns.

The key contributions of this paper are as follows.

  • Eliminating feature engineering module from prediction framework.

  • Generating representations from spectral information present in remote sensing satellite imagery in improving performance of urban growth prediction.

The rest of the sections are organized as follows. In section , we define the cellular automata model for urban growth prediction. Section describes the framework that has been used to discover and store rules. Section presents results and discussions of the experiments conducted in the region of Mumbai, India. Finally, section provides conclusion and future research directions.

2 Urban Growth modeled using Cellular Automata (CA)

Urban growth, being a spatiotemporal dynamic process, can be modeled using the theory of cellular automata. A typical CA model is composed of an infinite array of cells having a finite number of states, which transform at discrete time steps using certain transition rules. The transition rules of a CA model signify the relationship between a cell and its neighborhood. The neighborhood criterion can be Von Neumann ( neighbors) or Moore Neighborhood ( neighbors). Transition rules are iteratively applied to the cells to simulate dynamic processes over multiple time steps.

We define a cell state at point and time as , where is a binary label variable representing Built-up and Non Built-up and is a transition indicator. The range of label variable is , where represents Non Built-up and represents Built-up. Let us represent the set of Built-up and Non Built-up pixels originally as and . The transition indicator indicates whether the cell under consideration has undergone a transformation in the current time step. The range of the transition indicator is , where represents transition from Non Built-up to Non Built-up (), Built-up to Built-up (), Non Built-up to Built-up () and Built-up to Non Built-up (). A pictorial representation of the transition classes are displayed in Fig. 1 (b).

We define the update rule for the transition indicator as


, where is the neighborhood criterion and is a update function to be modeled. The transition function for update of the state to given as is,


The theory of cellular automata is convenient in this scenario because it considers the space as a discrete array of cells. Since in our case, the data is of raster type, it is already in form of discrete matrix of pixels. So discretization of space is not required to be done explicitly. If the data was in vector form, then we would have to define the discretization of the space separately.

2.1 Extension of the CA model

The urban growth CA model can be extended to include other variables which are driving factors in urban growth ([31, 9]). In order to insert the raster information in the CA model, we have modified our CA model in the following way.

  • The state variable is extended to include a raster data variable from the images to form . The raster image can consist of multiple bands in which case, represents a vector of values of all bands at point .

  • The update equation (1) gets transformed to include the raster variable in the following way.


The above modification allows us to reflect upon the relationship between the built-up variables and the raster variables through the function and . The function is an encoding function which is responsible for generating a desired length encoding of the information present in . Thus, we have reduced the problem of modeling urban growth from raster data to modeling the function and .

3 End-to-End Framework for learning Cellular Automata rules

In this section, we discuss the details of our architecture, illustrated in Fig. 1 (a), for generating a transition function for the cellular automata model described in eqn (3). The architecture comprises of the following components.

  • Two data repositories consisting of built-up rasters and remotely sensed rasters which will be required during model building.

  • Data representation module is for representing the data obtained from the data repositories in such a way so as to facilitate learning and knowledge representation.

  • The learning algorithm stage is expected to learn patterns from the data which is received from the previous stage.

  • The knowledge representation module stores the knowledge learned by the learning algorithm for further querying.

  • Finally, the prediction stage predicts future built-up conditions.

3.1 Data Representation

Figure 1: (a) End-to-End framework for prediction of urban growth (b) Venn diagram of transition classes relevant to Urban growth prediction

The data representation component consists of procedures to build a data matrix and a label matrix from the raster and built-up maps. The data matrix is formed from the raster and built-up maps in form of a matrix of size , where is the number of data points in the raster, is a neighborhood function and is the cardinality of a set. The neighborhood set plays a significant role in the framework, as it represents the effect of neighboring pixels. Increase in the neighborhood set size increases the model complexity by increasing the dimensions of the data matrix ([27]).

A crucial step at this stage is the design of the encoder function . We can build an encoder function by training an autoencoder[vincent2010stacked] with the vectors of form to form a length representation. An autoencoder is a neural network consisting of an encoder which encodes the data vector of a certain size and a decoder which reconstructs the data vector back from the encoding. The network is trained by minimizing the error between the decoded output and the original output. Finally the encoder from the autoencoder is used to encode the input which is known as a representation. Since the method is unsupervised, therefore there is no requirement to label the data for encoder design. The encoder design for our framework is depicted in Fig. 2. If a vector is represented by , then can be expressed as


, where is a weight matrix and is an activation function (eg. sigmoid, relu). The eqn. 4 represents a layer feed forward neural network which encodes the information in a raster position and its neighborhood. Similar to the encoder function, we can express the decoder function (say ) as


, where is a weight matrix and is an activation function. The only constraint in eqn. 5 is that the output vector must have the equal dimension to that of . Then the optimization of the parameters can be done using the following minimization equation.


Subsequent to the optimization, the representations of length can be generated from the vectors using the function as in eqn. 4.

Figure 2: Learning Representation from Neighborhood using an Autoencoder.

The data matrix can be obtained by concatenating the built-up vectors () and raster representation () columnwise. The raster representation variables are non-categorical with values ranging from , whereas the built-up variables are categorical with value either (Non Built-up) or (Built-up). The rows of the data matrix can be directly supplied to the function given in eqn. (3).

The label matrix is an matrix which consists of labels of transition classes namely, Non Built-up to Non Built-up, Non Built-up to Built-up, Built-up to Non Built-up and Built-up to Built-up. The transition classes and their class labels are illustrated as a Venn diagram in Fig 1 (b). Classes and are within classes and because in the dataset usually number of transition cells are lesser in number than number of persistent cells. The rows of the data matrix form the input vector to the function , whereas rows of the label matrix form output values of the function . The procedure for building the data and label matrix is given in Algorithm 1.

Contrary to other works ([20, 28, 31]), where two classes {Built-up, Non Built-up} have been used for training classifiers, we have used four classes in our label matrix. This resolves two following issues.

  • With the two class approach, the framework turns into a built-up classification problem because the raster images consist of information regarding built-up. In our case, we have four classes out of which two are transition and two are persistence classes. This tweak divides the data matrix into four parts and patterns of transition and persistence classes get separated. The classifier still does classification, but the pattern classification problem turns into, “Which patterns lead to transition and which ones lead to persistence?”.

  • Comparing the Built-up maps of two close time steps, we have observed that transition from Non Built-up to Built-up and Built-up to Non Built-up occurs in a small fraction of cells when compared to cells which are persistent. In the -class approach of modeling, certain learning algorithms tend to ignore patterns that are less fraction of the dataset considering them as noise. For instance, if there are transition pixels and persistent pixels, then a default choice for high predictive accuracy is to give an accuracy of ([8]). The imbalance in the dataset is unavoidable in this case as it is a natural property of the problem at hand, which gets covered up when not considered separately. Imbalanced datasets are common in many practical problems and have been addressed multiple times ([7]). In our case, we have used three separate metrics, namely Figure of Merit (FoM), Producer’s Accuracy (PA) and User’s Accuracy (UA) other than classification accuracy to track whether this problem is tackled by the knowledge representation or not ([24]). These three metrics are designed to check whether a landuse change model can predict the transitions properly. Therefore, it is crucial to have high FoM, PA and UA along with OA to guarantee a good simulation technique.

3.2 Knowledge Representation

Knowledge as defined by Fischler and Firschein 1987, refers to stored information or models used by person or machine to interpret, predict, and appropriately respond to the outside world. By the very nature of the definition, knowledge is task specific ([12]). There are various ways in which knowledge can be represented such as rules (Decision Trees), decision surfaces (SVM), computation nodes (ANN), probability distributions (Probabilistic Graphical models) and nearest neighbors (KNN). Table 1 provides a description of the various kinds of supervised classification methods, which can be considered in the End-to-End framework for knowledge representation (Only short mathematical descriptions are provided in the table). The reason behind referring these as knowledge representation techniques is as follows.

A classifier is primarily comprised of a mathematical form and a parameter estimation (training) technique. The mathematical form consists of a set of parameters which represents a family of functions. The parameter estimation technique is an algorithm for finding a suitable set of parameters corresponding to the data, that optimizes a particular objective. When a classifier is trained using data, it builds a knowledge structure based on its corresponding mathematical representation. The developed knowledge represented by a classifier is the set of parameters, which have been obtained after parameter estimation. The performance of the knowledge representation technique on a particular problem depends on how precise the set of parameters is estimated from the data.

To the best of our knowledge, the selection of a classifier requires a profound understanding of the data, which may not be always available for complex multivariate problems. Since any of the classifiers in Table 1 can represent the transition function (given in eqn. (3)), therefore one needs to select a classifier based on certain criterion.

In our case, the selection of the classifier as a knowledge representation technique is based on eight metrics, namely Cross validation, Training time, Prediction time, Figure of Merit (FoM), Producer’s accuracy (PA), User’s accuracy (UA) and Overall accuracy (OA). Each of these metrics is a different window that provides a certain view of the framework — Cross validation measures overfitting; FoM, PA, UA and OA measure framework’s performance in learning patterns and simulation; Training and Prediction time measure swiftness of the knowledge representation technique. Based on all the views, we have selected our suitable knowledge representation technique as a Decision tree or Ensemble of Decision trees (Random Forests) for generation of cellular automata rules. Hence we call the framework a Rule based End-to-End framework.

Classifier Knowledge representation
Mathematical representation
Training procedure
Logistic regression
Stores Knowledge
in terms of
weights ( and )
of a hyperplane
Stochastic gradient descent
Gaussian Naive Bayes
Stores knowledge
in terms of
Probability distribution
Maximum likelihood estimation
Support vector machines
Stores knowledge in
terms of weights
( and ) and kernel
Sequential minimal optimization
Multi Layer Perceptrons
Stores knowledge
in terms of
computation nodes
K-Nearest Neighbors
Stores feature vectors
and class labels
as knowledge
feature vectors and
class labels
instance based learning
or non-generalizing learning
Decision Tree
Stores knowledge
in terms
of rules
CART, ID3, C4.5
Ensemble methods
(Random forest, AdaBoost)
Stores knowledge
in terms of the
unit classifier
in the ensemble
Depends on the unit classifier Bagging/Boosting
Table 1: Features of popular Classifiers which are also knowledge representation technique.

3.3 Methodology

As discussed in Section 2, the Urban Growth Cellular Automata model consists of three components namely, cell state , transition function and update rule (eq. (2)). The cell state and update rule needs to be predefined and is not flexible, while the transition function is capable of taking any form during the training. Hence the framework is generic and does not depend on manually designed features. The procedures for dataset preparation, training and prediction are given in the Algorithms 1, 2 and 3 respectively.

Algorithm 1 consists of the following steps.

  • Gathering raster and built-up values of a point and its neighborhood in a matrix with row vector () and in a matrix with row vector () respectively. The matrix is used to build an autoencoder using a custom function . This is followed by generation of representations () and collection of all points in and representations to form a data matrix. It should be noted that the neighborhood criterion needs to be decided before the Data generation process. In this paper, we have used a standard Moore neighborhood criterion.

  • Preparation of label matrix which is composed of transition and persistent classes. To prepare the matrix, we have declared isUrban() function, which returns true if the point at time is urban and false otherwise. It is required to determine whether at point and time interval transition has occurred or not.

Algorithm 2 consists of the following steps.

  • Retrieve the Data matrix and Label matrix from Algorithm 2.

  • Select a set of classifiers as in Table 1 and train these classifiers using their corresponding training algorithm.

  • The resultant is a set of trained classifiers, each of which can be considered as a transition function .

  • Then, depending on the Figure of Merit, Producer’s accuracy, User’s accuracy, Overall accuracy, cross-validation, training time and prediction time, one can choose which one to use. In certain cases, it may be convenient to use cross-validation, training time and prediction time to remove certain classifiers from the list initially.

Algorithm 3 consists of the following steps.

  • Gathering raster and built-up values of a point and its neighborhood in row vectors and respectively.

  • Using from Algorithm 1 to generate representations. This is followed by concatenation of the vectors and representations to form .

  • Use a trained classifier to predict on .

  • If predicts a transition from Non Built-up to Built-up or a persistence from Built-up to Built-up, then the final class at is Built-up and Non Built-up otherwise.

It may be noted that at least one built-up image is necessary as an initial point to start prediction procedure. This is due to the fact that we have modeled urban growth as a cellular automaton for which the framework is recurrent in nature (see Fig. 3). Figure 3 represents the flowchart of the simulation procedure. It shows that the framework can simulate urban growth up to as many years as possible in the future each time utilizing the last predicted image.

It has been proved in ([22]) that an arbitrary neighbor, two-dimensional cellular automata can be simulated in terms of a set of ten partial differential equations. The PDEs indicate the relation between the dynamic variable and space-time, which can be a theoretical form of the urban growth cellular automata. Without the cellular automata, the procedure would be merely a built-up/landuse classification rather than prediction.

   function which returns True if point is Urban.
   function which builds a function after training on a dataset.
   Data matrix, Label matrix and Encoder function.
  for all pixel  do
     if !isUrban() and !isUrban(then
     else if isUrban() and isUrban(then
     else if isUrban() and !isUrban(then
     end if
  end for
Algorithm 1 Data Representation procedure
   Data Matrix
   Label Matrix
   Set of classifiers as in Table 1
   Trains with corresponding training algorithm
   Set of trained classifiers on and
  for all  do
  end for
Algorithm 2 Parameter estimation procedure
   (from Procedure 2)
   Encoder function from the output of Algorithm 1.
   Future Built-up map.
  for all pixel  do
     if  then
         Non Built-up
     end if
  end for
Algorithm 3 Prediction procedure
Figure 3: Recurrence of the Rule based Framework due to the cellular automata

3.4 Features of the End-to-End Framework

The proposed End-to-End architecture has certain key features which differentiates it from earlier proposed urban growth prediction frameworks ([28, 18, 19, 30]).

  • The key improvement is the removal of a manual feature extraction module, which is inherent in the existing architectures and replacing it with a data representation module. The data representation module generates representations from the raw satellite images in an unsupervised manner in a form as described in Section 3.1 and does not require any separate database consisting of manually selected explanatory features.

  • The knowledge representation layer stores knowledge that directly relates built up and raster representations with the transition classes. This is a significant distinction from the earlier models where learning models were used to establish relationships between manually selected explanatory features and Built-up. The removal of manual selection processes reduces the bias on models and knowledge structures, thereby creating an opportunity to develop new theories and explain the results of a complex process as urban growth.

  • Furtheremore, since representation learning is unsupervised, therefore it can be done without taking into account the final objective (in this case urban growth). Thus we can say that the representations are generic in nature and can be used for other applications as well.

Finally, these qualities come with a drawback, as implementation becomes easier but knowledge representation turns incomprehensible. Hence theoretically, it becomes a challenging task to extract meaning from these rules. This is due to the fact that the variables in the Data matrix are frequency/band values which can have multiple semantics. Therefore, the rules generated can have multiple interpretations among which the most appropriate interpretation needs to be found out.

4 Experiments and Results

We have conducted our experiments on the region of Mumbai, which is the capital city of Maharashtra, India (latitude: N, longitude: E). The city lies on the west coast of India and has a natural harbor. Mumbai is one of the mega cities of India also often colloquially known as the City of Dreams. According to Census India, the population of Mumbai has been steadily rising from approximately million in to more than million in . The region under consideration is shown in Fig. 4a.

The experiments are conducted in a Virtual Machine (VM) of an OpenStack based cloud infrastructure444 with the 8 VCPUs, 16 GB RAM and operating system Ubuntu 14.04.

4.1 Data Collection and Preprocessing

We have collected remotely sensed natural color ( bands) Landsat images from the United States Geological Survey (USGS) website555 for the years , and for the Mumbai region. The Mumbai region has been extracted from the raster and the final image consisted of data pixels. The images have been segmented to generate the built-up raster images that are binary images with white pixels representing built-up and black pixels representing non built-up. The segmentation has been carried out using maximum likelihood classification implemented in semi-automatic classification plugin666 of QGIS777 The semi-automatic classification method is essentially a semi-manual labeling method where initial labels are to be provided by a human and then segmented maps are generated using the raster values of the Landsat image. Since it is manual and inaccurate it needs to be verified to some source or reference maps which in our case have been Google Earth and Mumbai maps from a previously published work on Mumbai [28]. The total number of pixels that transformed from non built-up to built-up during the years considered in our study are given in Table 2. The percentage of pixels that changed from non-urban to urban is approximately , which is similar to other studies conducted on the region of Mumbai ([28, 20]). The slight aberrations are due to the classification inaccuracy of the classifiers used for performing the landcover classification.

Time step Pixels transformed Pixels persistent
Table 2: Number of pixels transformed vs number of pixels persistent
Figure 4: Rasters showing (a) Mumbai year 1991 and built-up conditions of the year (b) 1991 (c) 2001 (d) 2011. (White represents built-up while black represents non built-up)

The rasters showing the urban growth conditions of the years , and are shown in Fig 4. In our case, we have considered classes i.e. and . The class have had per cent contribution in the training set, probably as there have been no recent massive de-urbanization in the area. Hence we have considered all such instances to also fall in the class . The data matrix and the label matrix are generated from the data considering a Moore neighborhood of radius . The length of the encoded representation is varied in multiples of . The autoencoder is trained using the Adaptive Gradient Descent Algorithm [duchi2011adaptive] with batch size of . Subsequent to the training, the encoder present in the autoencoder is used to generate representations for creating the data matrix.

4.2 Training and Validation

The data matrix and label matrix generated in the previous step have been used to develop knowledge structures using various kinds of knowledge representation techniques as given in Table 1. The classifiers have been trained in a multi-parameter setting in the following ways.

  • Logistic Regression: For this classifier, we have executed the training for times each with a values of L2 regularization in the range .

  • Gaussian Naive Bayes: No special prior probabilities have been set for this classifier.

  • Support Vector Machine: Different kinds of kernel functions like linear, polynomial and radial basis function have been used to test performance metrics. For polynomial kernels, the degree of polynomial have been varied from to .

  • Multi Layer Perceptron: Parameters like number of hidden layers, hidden layer sizes, batch size, number of iterations, learning rate, momentum have been varied in different ranges to predict performance. The hidden layer configurations which have been tested are , , , , . The number of iterations and batch size have been taken in multiples of . Learning rate have been taken with constant, adaptive and inverse scaled and in the range . Momentum have been varied in the range .

  • Single and Ensemble of decision trees: For this classifier, we have used Gini impurity as the measure to separate the datasets. The maximum height of the tree is set to values in the range . The algorithm used to build the decision tree is an optimized version of CART (Classification and Regression Trees), which can handle both categorical and non-categorical data888 For ensemble method of knowledge representation, we have considered decision trees/estimators. The implementations of the learning methods that we have used are available at scikit learn999 library in python.

The average of all the results from all the different kinds of parameter settings are taken as the final performance of a knowledge structure.

According to [24], only comparing classification accuracy is not enough for validating a landuse change model. They argued that in order to validate a change model, there needs to be a validation of metrics, namely Figure of Merit (FoM), Producer’s accuracy (PA), User’s Accuracy (UA) and Overall Accuracy (OA). Since urban growth is a land use change model, therefore we have used these metrics for model validation. The former metrics assist in comparing the original and predicted maps of urban growth, while Overall Accuracy (OA) can be thought of as classification accuracy. According to ([24]), the validation measures depend on variables which are

  • = Area of error due to observed change predicted as persistence.

  • = Area correct due to observed change predicted as change.

  • = Area of error due to observed change predicted in the wrong gaining category.

  • = Area of error due to observed persistence predicted as change.

  • = Area correct due to observed persistence predicted as persistence.

Figure of Merit () provides us the amount of overlap between the observed and predicted change. Producer’s accuracy () gives the proportion of pixels that model accurately predicted as change, given that the reference maps indicate observed change. User’s accuracy () gives the proportion of pixels that the model predicts accurately as change, given that the model predicts change. The equations of the metrics are given as follows.

Figure 5: Predicted maps of Mumbai using Decision Tree (a) (b)
Figure 6: Predicted maps of Mumbai using Random Forest (a) (b)

The training and validation of End-to-End framework have been done using cross validation. Cross validation by dividing the dataset (generated from the dataset built from the years ) into parts and randomly selecting a part as validation set while others are used for training. The reason for selecting this is because the number of pixels which changed during this period is more than the other periods (Table 2). The validation results in terms of classification accuracy mean and variance is shown in Table 3. Performance comparison of existing methods with End-to-End approach is presented in Fig 9 (a) and (b). The first set of bars presents the results of existing methods applied to our dataset, whereas the last two set of bars provides the results of End-to-End approach with single and ensemble of decision trees. The resultant built up maps representing transition classes predicted by both our approaches are displayed in Fig. 5 and 6.

Comparision of the training and prediction time taken by the knowledge structures is provided in Fig. 7 (a) and (b).

We have compared our framework with four existing methods ([28], [18], [9] and [14]) in terms of and . Some of the distance based features which have been used in these works are distance to roads, built-up, river, urban planning area, central business district, railway, wetlands, forests, agricultural lands etc. Some of the non-distance based factors are digital elevation maps, slope, population density, land/crop density etc. During experimentation, we have manually generated each of these feature maps to model and compare the results of these works with our End-to-End framework.

4.3 Discussion

We defend our hypothesis regarding End-to-End learning for prediction of urban growth with the results of our experiments. The comparison of our framework with existing frameworks based on the four parameters () as in Fig. 9 (a) and (b) reveals that End-to-End learning performs significantly better than the existing learning based methods developed for urban growth prediction. We argue based on the results that this is possible due to the superior representation and robustness of encoded representations combined with an ensemble of decision trees. An approximate summary of enhancements provided by our End-to-End framework on the dataset is given as follows.

  • improvement on Figure of Merit .

  • improvement on Producer’s accuracy .

  • improvement on User’s Accuracy .

  • improvement of Overall Accuracy .

The cross validation accuracies and the training, as well as prediction time is given in Table 3. It is evident from the table that both Decision trees and Random Forests (ensemble of decision trees) provide optimal results in terms of cross validation accuracy, training time, prediction time, and as compared to other prediction models. However, Decision tree has a problem of overfitting which can be fixed by using a Random Forest. Hence we conclude that Random Forest (ensemble of decision trees) is an optimal choice for knowledge representation for our proposed End-to-End framework.

Figure 7: Comparison of (a) training and (b) prediction time with respect to the knowledge representation technique.

Figure 8 depicts the change of performance metrics with respect to the size of the encoding. It shows a sharp rise in the begining followed by saturation in the performance metrics as the encoding size is increased. This implies that with increase in encoding size, information is more precisely encoded by the autoencoder and hence performance of the simulation improves. However, the tradeoff is if encoding size is increased arbitrarily then time required to train the autoencoder increases more than performance metrics. Furthermore, increasing the encoding size increases size of the feature vector in the data matrix, which brings in the curse of dimensionality issue101010When dimensionality of a feature vector is increased without increase in size of the data, then data tends to become sparse. This problem is referred to as curse of dimensionality. In this case, we believe that saturation is caused by increase in dimensions of the feature vector in the data matrix.

Figure 8: Performance metrics with respect to the encoding size.

From the comparison in Fig. 9 (a) and (b), we have observed that the Overall Accuracy (OA) and the User’s Accuracy (UA) metric is much higher than the Figure of merit (FoM) and Producer’s accuracy (PA) for certain existing methods. This is indeed the case that has also been reflected in results shown by experiments in some of the other works, for instance, [28] have claimed and as FoM, PA and OA using MLP for the region of Mumbai. One of the reasons behind this peculiarity is due to the imbalance in the datasets problem that we discussed earlier in the paper (section III A). Since Figure of Merit and Producer’s accuracy provides the performance of the model in terms of transitions and the fraction of transition pixels is low, therefore a training algorithm might not have learned them correctly. Furthermore, we can also see that existing models have comparable Overall Accuracy (OA) of about . This can be due to the fact that only of the pixels fall in the transition class and the simple default strategy to give maximum accuracy is to give [8]. In the End-to-End approach, we have seen that the metrics and are comparable, which indicates that the imbalance in datasets have been handled.

The user’s accuracy metric is high for certain models because of the fact that there is only one direction of change i.e. non built-up to built-up. Therefore, predicted pixels in the transition category can be only one category and the model that predicted a higher number of pixels in the transition category have higher user’s accuracy.

Classifier Cross-Validation (Mean +/- variance) Training Time (in seconds) Prediction Time (in seconds)
MLP [28]
Logistic Regression [18]
SVM [9]
Random Forests [14]
End-to-End approach (MLP single layer)
End-to-End approach (SGD)
End-to-End approach (Naive Bayes)
End-to-End approach (KNN)
End-to-End approach (AdaBoost)
End-to-End approach (Decision Tree)
End-to-End approach (Random Forests) 0.900792 (+/- 0.043120) 53.06 2.9
Table 3: Performance of other classifiers in comparison to Decision tree and Random Forests
Model Remarks
MLP ([28])
High Training time. Poor FoM, PA and UA than
Proposed Method. Uses feature engineering.
Logistic Regression ([18])
Poor FoM, PA and UA than Proposed Method.
Uses feature engineering.
SVM ([9])
High training times. Poor FoM, PA and UA than
Proposed Method. Uses feature engineering.
Random Forests ([14])
Poor FoM, PA and UA than Proposed Method.
Uses feature engineering.
End-to-End approach (MLP single layer) High Training time. Poor FoM, PA and UA than Proposed Method.
End-to-End approach (SGD) Poor FoM, PA and UA than Proposed Method.
End-to-End approach (Naive Bayes) Poor FoM, PA and UA than Proposed Method.
End-to-End approach (Decision Tree) Optimal but has Problem of overfitting.
End-to-End approach (Random Forests) Optimal
Table 4: Limitations of the other models to the proposed model.
Figure 9: Performance comparison of our approach with existing methods for years (a) (b)

4.4 Future Urban Growth Prediction

Figures 10(a) and 10(b) show future urban growth prediction for year starting from using our End-to-End framework and knowledge structure as Decision Tree as well as Random Forest respectively. The raster over which the built-up is displayed is the year natural color image of Mumbai. The white pixels represent the built-up regions. It can be seen that the framework in case of Decision tree does not encroach upon water bodies and swamps, whereas in case of Random Forest, few encroachments are present. It may be noted that despite providing no explicit region information like water bodies, swamps, forests etc to the framework during training, the framework has been able to capture them. Hence, we can say that the rule based framework automatically divides the regions in the satellite image in some way to determine in which places growth can happen. This knowledge is currently encoded in the decision trees which can be extracted only if we discover the rules.

Figure 10: Simulated urban growth of 2051 on mumbai region with knowledge structure as (a) Decision Tree and (b) Random Forests

5 Conclusion and Future Work

We have introduced the concept of End-to-End learning in urban growth prediction by developing a framework for learning rules of a cellular automata model directly by representing spectral information from remote sensing data. We have empirically verified our framework on predicting urban growth for the region of Mumbai over a time frame of years. Our End-to-End framework has outperformed existing learning based methodologies with a simpler implementation than the existing frameworks.

Future work can be based on challenges which we have encountered in this work. Since spatial resolution of the satellite is fixed, therefore as temporal resolution reduces, the number of cells in which built-up happens reduces. Due to this, the number of points in the transition classes reduces and persistence classes increases. Therefore, the dataset gets heavily imbalanced and the ability of the classifier to learn patterns of transition classes reduces. The problem of imbalanced datasets can be alleviated if both the temporal and spatial resolution of the satellite images are increased. This is possible if data from high-resolution satellites like IKONOS/ QuickBird are used. However, the data load of high-resolution satellites is high, hence more infrastructure and sophisticated algorithms would be necessary.

Despite superior performance of the End-to-End approach, one of the drawbacks of End-to-End learning is essentially difficulty in understanding the rules. Hence, uncovering automatically generated rules in an End-to-End learning framework can be considered as a challenge which needs to be resolved in the subsequent studies. Furthermore, the End-to-End framework can be extended to incorporate other data resources which are vector form, which may further improve the performance of the framework.


  • [1] Maher Milad Aburas, Yuek Ming Ho, Mohammad Firuz Ramli, and Zulfa Hanan Ash’aari. The simulation and prediction of spatio-temporal urban growth trends using cellular automata models: A review. International Journal of Applied Earth Observation and Geoinformation, 52:380–389, 2016.
  • [2] M Ahmadlou, MR Delavar, H Shafizadeh-Moghadam, and A Tayyebid. Modeling urban dynamics using random forest: Implementing roc and toc for model evaluation. ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pages 285–290, 2016.
  • [3] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561, 2015.
  • [4] R Basawaraja, KB Chari, SR Mise, and SB Chetti. Analysis of the impact of urban sprawl in altering the land-use, land-cover pattern of raichur city, india, using geospatial technologies. Journal of Geography and Regional Planning, 4(8):455, 2011.
  • [5] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  • [6] Basudeb Bhatta. Urban Growth Analysis and Remote Sensing: A Case Study of Kolkata, India 1980–2010. Springer Science & Business Media, 2012.
  • [7] Nitesh V Chawla. Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook, pages 853–867. Springer, 2005.
  • [8] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  • [9] Yongjiu Feng, Yan Liu, and Michael Batty. Modeling urban growth with gis based cellular automata and least squares svm rules: a case study in qingpu–songjiang area of shanghai, china. Stochastic Environmental Research and Risk Assessment, 30(5):1387–1400, 2016.
  • [10] Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In ICML, volume 14, pages 1764–1772, 2014.
  • [11] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
  • [12] Simon S Haykin. Neural networks and learning machines, volume 3. Pearson Upper Saddle River, NJ, USA:, 2009.
  • [13] Zhiyong Hu and CP Lo. Modeling urban growth in atlanta using logistic regression. Computers, Environment and Urban Systems, 31(6):667–688, 2007.
  • [14] Courage Kamusoko and Jonah Gamba. Simulating urban growth using a random forest-cellular automata (rf-ca) model. ISPRS International Journal of Geo-Information, 4(2):447–470, 2015.
  • [15] Xinli Ke, Lingyun Qi, and Chen Zeng. A partitioned and asynchronous cellular automata model for urban growth simulation. International Journal of Geographical Information Science, 30(4):637–659, 2016.
  • [16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
  • [17] Xia Li and Anthony Gar-On Yeh. Data mining of cellular automata’s transition rules. International Journal of Geographical Information Science, 18(8):723–744, 2004.
  • [18] Yu-Pin Lin, Hone-Jay Chu, Chen-Fa Wu, and Peter H Verburg. Predictive ability of logistic regression, auto-logistic regression and neural network models in empirical land-use change modeling–a case study. International Journal of Geographical Information Science, 25(1):65–87, 2011.
  • [19] Yan Liu. Modelling urban development with geographical information systems and cellular automata. CRC Press, 2008.
  • [20] Hossein Shafizadeh Moghadam and Marco Helbich. Spatiotemporal urbanization processes in the megacity of mumbai, india: A markov chains-cellular automata urban growth model. Applied Geography, 40:140–149, 2013.
  • [21] Sulaiman Ibrahim Musa, Mazlan Hashim, and Mohd Nadzri Md Reba. A review of geospatial-based urban growth models and modelling initiatives. Geocarto International, pages 1–21, 2016.
  • [22] Stephen Omohundro. Modelling cellular automata with partial differential equations. Physica D: Nonlinear Phenomena, 10(1-2):128–134, 1984.
  • [23] Hichem Omrani, Amin Tayyebi, and Bryan Pijanowski. Integrating the multi-label land-use concept and cellular automata with the artificial neural network-based land transformation model: an integrated ml-ca-ltm modeling framework. GIScience and Remote Sensing, 2017.
  • [24] Robert Gilmore Pontius, Wideke Boersma, Jean-Christophe Castella, Keith Clarke, Ton de Nijs, Charles Dietzel, Zengqiang Duan, Eric Fotsing, Noah Goldstein, Kasper Kok, et al. Comparing the input, output, and validation maps for several models of land change. The Annals of Regional Science, 42(1):11–37, 2008.
  • [25] Behzad Saeedi Razavi. Predicting the trend of land use changes using artificial neural network and markov chain model (case study: Kermanshah city). Research Journal of Environmental and Earth Sciences, 6(4):215–226, 2014.
  • [26] Andreas Rienow and Roland Goetzke. Supporting sleuth–enhancing a cellular automaton with support vector machines for urban growth modeling. Computers, Environment and Urban Systems, 49:66–81, 2015.
  • [27] Inés Santé, Andrés M García, David Miranda, and Rafael Crecente. Cellular automata models for the simulation of real-world urban processes: A review and analysis. Landscape and Urban Planning, 96(2):108–122, 2010.
  • [28] Hossein Shafizadeh-Moghadam, Julian Hagenauer, Manuchehr Farajzadeh, and Marco Helbich. Performance analysis of radial basis function networks and multi-layer perceptron networks in modeling urban change: a case study. International Journal of Geographical Information Science, 29(4):606–623, 2015.
  • [29] Rajesh Bahadur Thapa and Yuji Murayama. Urban growth modeling of kathmandu metropolitan region, nepal. Computers, Environment and Urban Systems, 35(1):25–34, 2011.
  • [30] Jasper Van Vliet, Roger White, and Suzana Dragicevic. Modeling urban growth using a variable grid cellular automaton. Computers, Environment and Urban Systems, 33(1):35–43, 2009.
  • [31] Qingsheng Yang, Xia Li, and Xun Shi. Cellular automata for simulating land use changes based on support vector machines. Computers & geosciences, 34(6):592–602, 2008.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description