Rule based EndtoEnd Learning Framework for Urban Growth Prediction
Abstract
Due to the rapid growth of urban areas in the past decades, it has become increasingly important to model and monitor urban growth in mega cities. Although several researchers have proposed models for simulating urban growth, they have been primarily dependent on various manually selected spatial and nonspatial explanatory features for building models. A practical difficulty with this approach is manual selection procedure, which tends to make model design process laborious and nongeneric. Despite the fact that explanatory features provide us with the understanding of a complex process, there has been no standard set of features in urban growth prediction over which scholars have consensus. Hence, design and deploying of systems for urban growth prediction have remained challenging tasks. In order to reduce the dependency on human devised features, we have proposed a novel EndtoEnd prediction framework to represent remotely sensed satellite data in terms of rules of a cellular automata model in order to improve the performance of urban growth prediction. Using our EndtoEnd framework, we have achieved superior performance in Figure of Merit, Producer’s accuracy, User’s accuracy, and Overall accuracy metrics respectively over existing learning based methods.
Keywords— EndtoEnd learning, Decision trees, Urban growth prediction, Landsat, Representation Learning.
1 Introduction
Urban regions are fast growing entities in the modern world. Areas which have been outskirts of a city/town twenty years back are now habituated and developed into urban form. Research has shown that uncontrolled sprawl of urban areas has potentially adverse effects on the environment in terms of biodiversity, loss of habitat and change of landscape ([4, 20]). United nations have predicted based on past data that urban population will rise from a current proportion of per cent to a proportion of percent by the year ^{1}^{1}1http://www.un.org/en/development/desa/news/population/worldurbanizationprospects.html. With the uncontrolled expansion of cities, it has become important to monitor and control changes in the urban landuse.
A fundamental difficulty in monitoring urban areas is due to its huge spread, which makes manual tracking of global landuse/builtup changes intractable. However, there are several remote sensing satellites periodically capturing images of the earth surface, which contains plenty of real time information regarding change of urban landscape. Hence, satellite imagery has been considered as an effective component for real time monitoring of urban growth.
Urban growth monitoring using satellite derived products has been addressed by several researchers ([20, 28, 18]). Models using cellular automaton, agent based methods, neural networks, logistic regression, fractals have been developed for robust and accurate prediction of urban growth ([19, 6, 21, 1]). Cellular automata is a widely preferred model for urban growth prediction due to its simplicity. Several CA models and corresponding case studies have been done on various major cities for detecting urban change, for instance, Vancouver Britain, Wuhan China etc ([1, 26, 15, 30, 9]). Recently, artificial neural networks have also been quite popular in the land use change prediction ([28, 25]) and studies using the combination of neural networks and cellular automata models have been proposed ([23]). On the other hand, support vector machines (SVM), Decision trees and Random Forest based models have also been proposed ([26, 15, 2, 14, 17]). Besides, logistic regression has been quite suitable for urban growth prediction in order to assess the driving forces behind urban growth ([13]). Overall, among the learning techniques used for prediction, there have been broadly four categories which have dominated the urban growth community, namely logistic regression, artificial neural networks, support vector machines and inductive learning approaches ([21]).
Each of the models developed using learning methodologies mentioned above has utilized spatial and nonspatial explanatory features for predicting urban growth ([21]). Spatial features such as distance from urban areas, water bodies, swamps etc. are derived from satellite imagery manually using supervised land use classification methods ([28, 29]). Other kinds of spatial factors such as elevation, distance from roads and railways are derived from other open spatial data repositories like Open Street Maps^{2}^{2}2https://www.openstreetmap.org/ and US Shuttle Radar Topography Mission^{3}^{3}3https://lta.cr.usgs.gov/SRTM1Arc. Two issues of this method are discussed as follows.

It has been evident from practical experience ([5]) that feature engineering significantly affects results of machine learning algorithms. At present, most of the urban growth prediction models use distance based features, for instance, distance from roads, city centers, railways, water bodies etc. and the distance metric is generally euclidean distance. Although the distance from certain regions does determine builtup, that may not be always euclidean distance. Moreover, there are other distance metrics, for instance, Minkowski’s distance, Mahalanobis distance, etc which have not been sufficiently used in urban growth prediction modeling. It may be possible to achieve better performance using other distance metrics. The issue, in this case, is a selection of suitable distance based criterion which requires manual effort.

The spectral properties can also be a crucial feature as different spectral properties indicate different landuse and different landuse may have different growth patterns. Hence another limitation of distance based features is that they are unable to capture the spectral properties of landuse.

Due to the disparity in the geography of our earth surface and noise in satellite imagery, it is often excessively laborintensive to derive all the different kinds of landuse feature maps from satellite imagery.
We intend to address these issues by introducing a framework in which one can represent the data in a certain format and then use a classifier directly to generate approximate features. The features are approximate because they will be generated in an encoded form. This form we refer to as the knowledge structure and it varies according to the classifier which we use for knowledge representation. Since we do not know apriori what kinds of features a classifier can generate from data, therefore it is a challenge to find out which knowledge structure is optimal for a particular task.
The idea of EndtoEnd learning is to bypass intermediate feature engineering steps in model building and design a model that can directly learn robust representations from input data ([5, 10]). EndtoEnd learning is a paradigm of machine learning that has recently gained importance due to the advent of Representation learning and highperformance computing infrastructures. It has delivered exceptional performance in domains like image classification, speech processing, image segmentation etc ([16, 3, 11]). Majority of these techniques demonstrate the existence of complex relationships between various real world objects/phenomenon and conclusions that we draw from it. These complex relations are stored in our memory in terms of intricate knowledge structures which are hard to decode but makes human beings quite efficient in motor skills.
Inspired by the above works, we put forward our hypothesis: Automatically extracted information from remotely sensed data can be an important feature in modeling urban growth. The primary objective of this paper is to utilize the concept of EndtoEnd in developing a novel framework for prediction of urban growth. The proposed framework utilizes unsupervised representation learning and supervised classification methods to learn rules of an urban growth cellular automata model from raw satellite data to form various knowledge structures. Our proposed framework gives superior prediction performance than existing learning based methodologies in terms of various metrics which evaluates urban growth simulation performance.
It should be noted that we are not suggesting to shun the use of distance based features. We also do not claim that only using satellite imagery is good enough to build an urban growth model. On the contrary, this paper provides a initial framework which can be upgraded further to plugin other kinds of features, including distance based features with the constraint that such an addition needs minimum or no manual selection. This is a key feature of EndtoEnd systems as the performance of such a system is dependent more on paramters and less on human ability to recognize patterns.
The key contributions of this paper are as follows.

Eliminating feature engineering module from prediction framework.

Generating representations from spectral information present in remote sensing satellite imagery in improving performance of urban growth prediction.
The rest of the sections are organized as follows. In section , we define the cellular automata model for urban growth prediction. Section describes the framework that has been used to discover and store rules. Section presents results and discussions of the experiments conducted in the region of Mumbai, India. Finally, section provides conclusion and future research directions.
2 Urban Growth modeled using Cellular Automata (CA)
Urban growth, being a spatiotemporal dynamic process, can be modeled using the theory of cellular automata. A typical CA model is composed of an infinite array of cells having a finite number of states, which transform at discrete time steps using certain transition rules. The transition rules of a CA model signify the relationship between a cell and its neighborhood. The neighborhood criterion can be Von Neumann ( neighbors) or Moore Neighborhood ( neighbors). Transition rules are iteratively applied to the cells to simulate dynamic processes over multiple time steps.
We define a cell state at point and time as , where is a binary label variable representing Builtup and Non Builtup and is a transition indicator. The range of label variable is , where represents Non Builtup and represents Builtup. Let us represent the set of Builtup and Non Builtup pixels originally as and . The transition indicator indicates whether the cell under consideration has undergone a transformation in the current time step. The range of the transition indicator is , where represents transition from Non Builtup to Non Builtup (), Builtup to Builtup (), Non Builtup to Builtup () and Builtup to Non Builtup (). A pictorial representation of the transition classes are displayed in Fig. 1 (b).
We define the update rule for the transition indicator as
(1) 
, where is the neighborhood criterion and is a update function to be modeled. The transition function for update of the state to given as is,
(2) 
The theory of cellular automata is convenient in this scenario because it considers the space as a discrete array of cells. Since in our case, the data is of raster type, it is already in form of discrete matrix of pixels. So discretization of space is not required to be done explicitly. If the data was in vector form, then we would have to define the discretization of the space separately.
2.1 Extension of the CA model
The urban growth CA model can be extended to include other variables which are driving factors in urban growth ([31, 9]). In order to insert the raster information in the CA model, we have modified our CA model in the following way.

The state variable is extended to include a raster data variable from the images to form . The raster image can consist of multiple bands in which case, represents a vector of values of all bands at point .

The update equation (1) gets transformed to include the raster variable in the following way.
(3)
The above modification allows us to reflect upon the relationship between the builtup variables and the raster variables through the function and . The function is an encoding function which is responsible for generating a desired length encoding of the information present in . Thus, we have reduced the problem of modeling urban growth from raster data to modeling the function and .
3 EndtoEnd Framework for learning Cellular Automata rules
In this section, we discuss the details of our architecture, illustrated in Fig. 1 (a), for generating a transition function for the cellular automata model described in eqn (3). The architecture comprises of the following components.

Two data repositories consisting of builtup rasters and remotely sensed rasters which will be required during model building.

Data representation module is for representing the data obtained from the data repositories in such a way so as to facilitate learning and knowledge representation.

The learning algorithm stage is expected to learn patterns from the data which is received from the previous stage.

The knowledge representation module stores the knowledge learned by the learning algorithm for further querying.

Finally, the prediction stage predicts future builtup conditions.
3.1 Data Representation
The data representation component consists of procedures to build a data matrix and a label matrix from the raster and builtup maps. The data matrix is formed from the raster and builtup maps in form of a matrix of size , where is the number of data points in the raster, is a neighborhood function and is the cardinality of a set. The neighborhood set plays a significant role in the framework, as it represents the effect of neighboring pixels. Increase in the neighborhood set size increases the model complexity by increasing the dimensions of the data matrix ([27]).
A crucial step at this stage is the design of the encoder function . We can build an encoder function by training an autoencoder[vincent2010stacked] with the vectors of form to form a length representation. An autoencoder is a neural network consisting of an encoder which encodes the data vector of a certain size and a decoder which reconstructs the data vector back from the encoding. The network is trained by minimizing the error between the decoded output and the original output. Finally the encoder from the autoencoder is used to encode the input which is known as a representation. Since the method is unsupervised, therefore there is no requirement to label the data for encoder design. The encoder design for our framework is depicted in Fig. 2. If a vector is represented by , then can be expressed as
(4) 
, where is a weight matrix and is an activation function (eg. sigmoid, relu). The eqn. 4 represents a layer feed forward neural network which encodes the information in a raster position and its neighborhood. Similar to the encoder function, we can express the decoder function (say ) as
(5) 
, where is a weight matrix and is an activation function. The only constraint in eqn. 5 is that the output vector must have the equal dimension to that of . Then the optimization of the parameters can be done using the following minimization equation.
(6) 
Subsequent to the optimization, the representations of length can be generated from the vectors using the function as in eqn. 4.
The data matrix can be obtained by concatenating the builtup vectors () and raster representation () columnwise. The raster representation variables are noncategorical with values ranging from , whereas the builtup variables are categorical with value either (Non Builtup) or (Builtup). The rows of the data matrix can be directly supplied to the function given in eqn. (3).
The label matrix is an matrix which consists of labels of transition classes namely, Non Builtup to Non Builtup, Non Builtup to Builtup, Builtup to Non Builtup and Builtup to Builtup. The transition classes and their class labels are illustrated as a Venn diagram in Fig 1 (b). Classes and are within classes and because in the dataset usually number of transition cells are lesser in number than number of persistent cells. The rows of the data matrix form the input vector to the function , whereas rows of the label matrix form output values of the function . The procedure for building the data and label matrix is given in Algorithm 1.
Contrary to other works ([20, 28, 31]), where two classes {Builtup, Non Builtup} have been used for training classifiers, we have used four classes in our label matrix. This resolves two following issues.

With the two class approach, the framework turns into a builtup classification problem because the raster images consist of information regarding builtup. In our case, we have four classes out of which two are transition and two are persistence classes. This tweak divides the data matrix into four parts and patterns of transition and persistence classes get separated. The classifier still does classification, but the pattern classification problem turns into, “Which patterns lead to transition and which ones lead to persistence?”.

Comparing the Builtup maps of two close time steps, we have observed that transition from Non Builtup to Builtup and Builtup to Non Builtup occurs in a small fraction of cells when compared to cells which are persistent. In the class approach of modeling, certain learning algorithms tend to ignore patterns that are less fraction of the dataset considering them as noise. For instance, if there are transition pixels and persistent pixels, then a default choice for high predictive accuracy is to give an accuracy of ([8]). The imbalance in the dataset is unavoidable in this case as it is a natural property of the problem at hand, which gets covered up when not considered separately. Imbalanced datasets are common in many practical problems and have been addressed multiple times ([7]). In our case, we have used three separate metrics, namely Figure of Merit (FoM), Producer’s Accuracy (PA) and User’s Accuracy (UA) other than classification accuracy to track whether this problem is tackled by the knowledge representation or not ([24]). These three metrics are designed to check whether a landuse change model can predict the transitions properly. Therefore, it is crucial to have high FoM, PA and UA along with OA to guarantee a good simulation technique.
3.2 Knowledge Representation
Knowledge as defined by Fischler and Firschein 1987, refers to stored information or models used by person or machine to interpret, predict, and appropriately respond to the outside world. By the very nature of the definition, knowledge is task specific ([12]). There are various ways in which knowledge can be represented such as rules (Decision Trees), decision surfaces (SVM), computation nodes (ANN), probability distributions (Probabilistic Graphical models) and nearest neighbors (KNN). Table 1 provides a description of the various kinds of supervised classification methods, which can be considered in the EndtoEnd framework for knowledge representation (Only short mathematical descriptions are provided in the table). The reason behind referring these as knowledge representation techniques is as follows.
A classifier is primarily comprised of a mathematical form and a parameter estimation (training) technique. The mathematical form consists of a set of parameters which represents a family of functions. The parameter estimation technique is an algorithm for finding a suitable set of parameters corresponding to the data, that optimizes a particular objective. When a classifier is trained using data, it builds a knowledge structure based on its corresponding mathematical representation. The developed knowledge represented by a classifier is the set of parameters, which have been obtained after parameter estimation. The performance of the knowledge representation technique on a particular problem depends on how precise the set of parameters is estimated from the data.
To the best of our knowledge, the selection of a classifier requires a profound understanding of the data, which may not be always available for complex multivariate problems. Since any of the classifiers in Table 1 can represent the transition function (given in eqn. (3)), therefore one needs to select a classifier based on certain criterion.
In our case, the selection of the classifier as a knowledge representation technique is based on eight metrics, namely Cross validation, Training time, Prediction time, Figure of Merit (FoM), Producer’s accuracy (PA), User’s accuracy (UA) and Overall accuracy (OA). Each of these metrics is a different window that provides a certain view of the framework — Cross validation measures overfitting; FoM, PA, UA and OA measure framework’s performance in learning patterns and simulation; Training and Prediction time measure swiftness of the knowledge representation technique. Based on all the views, we have selected our suitable knowledge representation technique as a Decision tree or Ensemble of Decision trees (Random Forests) for generation of cellular automata rules. Hence we call the framework a Rule based EndtoEnd framework.
Classifier  Knowledge representation 

Training procedure  
Logistic regression 

Stochastic gradient descent  
Gaussian Naive Bayes 

Maximum likelihood estimation  
Support vector machines 

Sequential minimal optimization  
Multi Layer Perceptrons 

Backpropagation  
KNearest Neighbors 




Decision Tree 

CART, ID3, C4.5  


Depends on the unit classifier  Bagging/Boosting 
3.3 Methodology
As discussed in Section 2, the Urban Growth Cellular Automata model consists of three components namely, cell state , transition function and update rule (eq. (2)). The cell state and update rule needs to be predefined and is not flexible, while the transition function is capable of taking any form during the training. Hence the framework is generic and does not depend on manually designed features. The procedures for dataset preparation, training and prediction are given in the Algorithms 1, 2 and 3 respectively.
Algorithm 1 consists of the following steps.

Gathering raster and builtup values of a point and its neighborhood in a matrix with row vector () and in a matrix with row vector () respectively. The matrix is used to build an autoencoder using a custom function . This is followed by generation of representations () and collection of all points in and representations to form a data matrix. It should be noted that the neighborhood criterion needs to be decided before the Data generation process. In this paper, we have used a standard Moore neighborhood criterion.

Preparation of label matrix which is composed of transition and persistent classes. To prepare the matrix, we have declared isUrban() function, which returns true if the point at time is urban and false otherwise. It is required to determine whether at point and time interval transition has occurred or not.
Algorithm 2 consists of the following steps.

Retrieve the Data matrix and Label matrix from Algorithm 2.

Select a set of classifiers as in Table 1 and train these classifiers using their corresponding training algorithm.

The resultant is a set of trained classifiers, each of which can be considered as a transition function .

Then, depending on the Figure of Merit, Producer’s accuracy, User’s accuracy, Overall accuracy, crossvalidation, training time and prediction time, one can choose which one to use. In certain cases, it may be convenient to use crossvalidation, training time and prediction time to remove certain classifiers from the list initially.
Algorithm 3 consists of the following steps.

Gathering raster and builtup values of a point and its neighborhood in row vectors and respectively.

Using from Algorithm 1 to generate representations. This is followed by concatenation of the vectors and representations to form .

Use a trained classifier to predict on .

If predicts a transition from Non Builtup to Builtup or a persistence from Builtup to Builtup, then the final class at is Builtup and Non Builtup otherwise.
It may be noted that at least one builtup image is necessary as an initial point to start prediction procedure. This is due to the fact that we have modeled urban growth as a cellular automaton for which the framework is recurrent in nature (see Fig. 3). Figure 3 represents the flowchart of the simulation procedure. It shows that the framework can simulate urban growth up to as many years as possible in the future each time utilizing the last predicted image.
It has been proved in ([22]) that an arbitrary neighbor, twodimensional cellular automata can be simulated in terms of a set of ten partial differential equations. The PDEs indicate the relation between the dynamic variable and spacetime, which can be a theoretical form of the urban growth cellular automata. Without the cellular automata, the procedure would be merely a builtup/landuse classification rather than prediction.
3.4 Features of the EndtoEnd Framework
The proposed EndtoEnd architecture has certain key features which differentiates it from earlier proposed urban growth prediction frameworks ([28, 18, 19, 30]).

The key improvement is the removal of a manual feature extraction module, which is inherent in the existing architectures and replacing it with a data representation module. The data representation module generates representations from the raw satellite images in an unsupervised manner in a form as described in Section 3.1 and does not require any separate database consisting of manually selected explanatory features.

The knowledge representation layer stores knowledge that directly relates built up and raster representations with the transition classes. This is a significant distinction from the earlier models where learning models were used to establish relationships between manually selected explanatory features and Builtup. The removal of manual selection processes reduces the bias on models and knowledge structures, thereby creating an opportunity to develop new theories and explain the results of a complex process as urban growth.

Furtheremore, since representation learning is unsupervised, therefore it can be done without taking into account the final objective (in this case urban growth). Thus we can say that the representations are generic in nature and can be used for other applications as well.
Finally, these qualities come with a drawback, as implementation becomes easier but knowledge representation turns incomprehensible. Hence theoretically, it becomes a challenging task to extract meaning from these rules. This is due to the fact that the variables in the Data matrix are frequency/band values which can have multiple semantics. Therefore, the rules generated can have multiple interpretations among which the most appropriate interpretation needs to be found out.
4 Experiments and Results
We have conducted our experiments on the region of Mumbai, which is the capital city of Maharashtra, India (latitude: N, longitude: E). The city lies on the west coast of India and has a natural harbor. Mumbai is one of the mega cities of India also often colloquially known as the City of Dreams. According to Census India, the population of Mumbai has been steadily rising from approximately million in to more than million in . The region under consideration is shown in Fig. 4a.
The experiments are conducted in a Virtual Machine (VM) of an OpenStack based cloud infrastructure^{4}^{4}4http://www.sit.iitkgp.ernet.in/Meghamala with the 8 VCPUs, 16 GB RAM and operating system Ubuntu 14.04.
4.1 Data Collection and Preprocessing
We have collected remotely sensed natural color ( bands) Landsat images from the United States Geological Survey (USGS) website^{5}^{5}5http://glovis.usgs.gov/ for the years , and for the Mumbai region. The Mumbai region has been extracted from the raster and the final image consisted of data pixels. The images have been segmented to generate the builtup raster images that are binary images with white pixels representing builtup and black pixels representing non builtup. The segmentation has been carried out using maximum likelihood classification implemented in semiautomatic classification plugin^{6}^{6}6https://plugins.qgis.org/plugins/SemiAutomaticClassificationPlugin/ of QGIS^{7}^{7}7http://www.qgis.org/en/site/. The semiautomatic classification method is essentially a semimanual labeling method where initial labels are to be provided by a human and then segmented maps are generated using the raster values of the Landsat image. Since it is manual and inaccurate it needs to be verified to some source or reference maps which in our case have been Google Earth and Mumbai maps from a previously published work on Mumbai [28]. The total number of pixels that transformed from non builtup to builtup during the years considered in our study are given in Table 2. The percentage of pixels that changed from nonurban to urban is approximately , which is similar to other studies conducted on the region of Mumbai ([28, 20]). The slight aberrations are due to the classification inaccuracy of the classifiers used for performing the landcover classification.
Time step  Pixels transformed  Pixels persistent 

The rasters showing the urban growth conditions of the years , and are shown in Fig 4. In our case, we have considered classes i.e. and . The class have had per cent contribution in the training set, probably as there have been no recent massive deurbanization in the area. Hence we have considered all such instances to also fall in the class . The data matrix and the label matrix are generated from the data considering a Moore neighborhood of radius . The length of the encoded representation is varied in multiples of . The autoencoder is trained using the Adaptive Gradient Descent Algorithm [duchi2011adaptive] with batch size of . Subsequent to the training, the encoder present in the autoencoder is used to generate representations for creating the data matrix.
4.2 Training and Validation
The data matrix and label matrix generated in the previous step have been used to develop knowledge structures using various kinds of knowledge representation techniques as given in Table 1. The classifiers have been trained in a multiparameter setting in the following ways.

Logistic Regression: For this classifier, we have executed the training for times each with a values of L2 regularization in the range .

Gaussian Naive Bayes: No special prior probabilities have been set for this classifier.

Support Vector Machine: Different kinds of kernel functions like linear, polynomial and radial basis function have been used to test performance metrics. For polynomial kernels, the degree of polynomial have been varied from to .

Multi Layer Perceptron: Parameters like number of hidden layers, hidden layer sizes, batch size, number of iterations, learning rate, momentum have been varied in different ranges to predict performance. The hidden layer configurations which have been tested are , , , , . The number of iterations and batch size have been taken in multiples of . Learning rate have been taken with constant, adaptive and inverse scaled and in the range . Momentum have been varied in the range .

Single and Ensemble of decision trees: For this classifier, we have used Gini impurity as the measure to separate the datasets. The maximum height of the tree is set to values in the range . The algorithm used to build the decision tree is an optimized version of CART (Classification and Regression Trees), which can handle both categorical and noncategorical data^{8}^{8}8http://scikitlearn.org/stable/modules/tree.html#treealgorithms. For ensemble method of knowledge representation, we have considered decision trees/estimators. The implementations of the learning methods that we have used are available at scikit learn^{9}^{9}9http://scikitlearn.org/ library in python.
The average of all the results from all the different kinds of parameter settings are taken as the final performance of a knowledge structure.
According to [24], only comparing classification accuracy is not enough for validating a landuse change model. They argued that in order to validate a change model, there needs to be a validation of metrics, namely Figure of Merit (FoM), Producer’s accuracy (PA), User’s Accuracy (UA) and Overall Accuracy (OA). Since urban growth is a land use change model, therefore we have used these metrics for model validation. The former metrics assist in comparing the original and predicted maps of urban growth, while Overall Accuracy (OA) can be thought of as classification accuracy. According to ([24]), the validation measures depend on variables which are

= Area of error due to observed change predicted as persistence.

= Area correct due to observed change predicted as change.

= Area of error due to observed change predicted in the wrong gaining category.

= Area of error due to observed persistence predicted as change.

= Area correct due to observed persistence predicted as persistence.
Figure of Merit () provides us the amount of overlap between the observed and predicted change. Producer’s accuracy () gives the proportion of pixels that model accurately predicted as change, given that the reference maps indicate observed change. User’s accuracy () gives the proportion of pixels that the model predicts accurately as change, given that the model predicts change. The equations of the metrics are given as follows.
(7) 
(8) 
(9) 
(10) 
The training and validation of EndtoEnd framework have been done using cross validation. Cross validation by dividing the dataset (generated from the dataset built from the years ) into parts and randomly selecting a part as validation set while others are used for training. The reason for selecting this is because the number of pixels which changed during this period is more than the other periods (Table 2). The validation results in terms of classification accuracy mean and variance is shown in Table 3. Performance comparison of existing methods with EndtoEnd approach is presented in Fig 9 (a) and (b). The first set of bars presents the results of existing methods applied to our dataset, whereas the last two set of bars provides the results of EndtoEnd approach with single and ensemble of decision trees. The resultant built up maps representing transition classes predicted by both our approaches are displayed in Fig. 5 and 6.
Comparision of the training and prediction time taken by the knowledge structures is provided in Fig. 7 (a) and (b).
We have compared our framework with four existing methods ([28], [18], [9] and [14]) in terms of and . Some of the distance based features which have been used in these works are distance to roads, builtup, river, urban planning area, central business district, railway, wetlands, forests, agricultural lands etc. Some of the nondistance based factors are digital elevation maps, slope, population density, land/crop density etc. During experimentation, we have manually generated each of these feature maps to model and compare the results of these works with our EndtoEnd framework.
4.3 Discussion
We defend our hypothesis regarding EndtoEnd learning for prediction of urban growth with the results of our experiments. The comparison of our framework with existing frameworks based on the four parameters () as in Fig. 9 (a) and (b) reveals that EndtoEnd learning performs significantly better than the existing learning based methods developed for urban growth prediction. We argue based on the results that this is possible due to the superior representation and robustness of encoded representations combined with an ensemble of decision trees. An approximate summary of enhancements provided by our EndtoEnd framework on the dataset is given as follows.

improvement on Figure of Merit .

improvement on Producer’s accuracy .

improvement on User’s Accuracy .

improvement of Overall Accuracy .
The cross validation accuracies and the training, as well as prediction time is given in Table 3. It is evident from the table that both Decision trees and Random Forests (ensemble of decision trees) provide optimal results in terms of cross validation accuracy, training time, prediction time, and as compared to other prediction models. However, Decision tree has a problem of overfitting which can be fixed by using a Random Forest. Hence we conclude that Random Forest (ensemble of decision trees) is an optimal choice for knowledge representation for our proposed EndtoEnd framework.
Figure 8 depicts the change of performance metrics with respect to the size of the encoding. It shows a sharp rise in the begining followed by saturation in the performance metrics as the encoding size is increased. This implies that with increase in encoding size, information is more precisely encoded by the autoencoder and hence performance of the simulation improves. However, the tradeoff is if encoding size is increased arbitrarily then time required to train the autoencoder increases more than performance metrics. Furthermore, increasing the encoding size increases size of the feature vector in the data matrix, which brings in the curse of dimensionality issue^{10}^{10}10When dimensionality of a feature vector is increased without increase in size of the data, then data tends to become sparse. This problem is referred to as curse of dimensionality. In this case, we believe that saturation is caused by increase in dimensions of the feature vector in the data matrix.
From the comparison in Fig. 9 (a) and (b), we have observed that the Overall Accuracy (OA) and the User’s Accuracy (UA) metric is much higher than the Figure of merit (FoM) and Producer’s accuracy (PA) for certain existing methods. This is indeed the case that has also been reflected in results shown by experiments in some of the other works, for instance, [28] have claimed and as FoM, PA and OA using MLP for the region of Mumbai. One of the reasons behind this peculiarity is due to the imbalance in the datasets problem that we discussed earlier in the paper (section III A). Since Figure of Merit and Producer’s accuracy provides the performance of the model in terms of transitions and the fraction of transition pixels is low, therefore a training algorithm might not have learned them correctly. Furthermore, we can also see that existing models have comparable Overall Accuracy (OA) of about . This can be due to the fact that only of the pixels fall in the transition class and the simple default strategy to give maximum accuracy is to give [8]. In the EndtoEnd approach, we have seen that the metrics and are comparable, which indicates that the imbalance in datasets have been handled.
The user’s accuracy metric is high for certain models because of the fact that there is only one direction of change i.e. non builtup to builtup. Therefore, predicted pixels in the transition category can be only one category and the model that predicted a higher number of pixels in the transition category have higher user’s accuracy.
Classifier  CrossValidation (Mean +/ variance)  Training Time (in seconds)  Prediction Time (in seconds) 

MLP [28]  
Logistic Regression [18]  
SVM [9]  
Random Forests [14]  
EndtoEnd approach (MLP single layer)  
EndtoEnd approach (SGD)  
EndtoEnd approach (Naive Bayes)  
EndtoEnd approach (KNN)  
EndtoEnd approach (AdaBoost)  
EndtoEnd approach (Decision Tree)  
EndtoEnd approach (Random Forests)  0.900792 (+/ 0.043120)  53.06  2.9 
Model  Remarks  

MLP ([28]) 


Logistic Regression ([18]) 


SVM ([9]) 


Random Forests ([14]) 


EndtoEnd approach (MLP single layer)  High Training time. Poor FoM, PA and UA than Proposed Method.  
EndtoEnd approach (SGD)  Poor FoM, PA and UA than Proposed Method.  
EndtoEnd approach (Naive Bayes)  Poor FoM, PA and UA than Proposed Method.  
EndtoEnd approach (Decision Tree)  Optimal but has Problem of overfitting.  
EndtoEnd approach (Random Forests)  Optimal 
4.4 Future Urban Growth Prediction
Figures 10(a) and 10(b) show future urban growth prediction for year starting from using our EndtoEnd framework and knowledge structure as Decision Tree as well as Random Forest respectively. The raster over which the builtup is displayed is the year natural color image of Mumbai. The white pixels represent the builtup regions. It can be seen that the framework in case of Decision tree does not encroach upon water bodies and swamps, whereas in case of Random Forest, few encroachments are present. It may be noted that despite providing no explicit region information like water bodies, swamps, forests etc to the framework during training, the framework has been able to capture them. Hence, we can say that the rule based framework automatically divides the regions in the satellite image in some way to determine in which places growth can happen. This knowledge is currently encoded in the decision trees which can be extracted only if we discover the rules.
5 Conclusion and Future Work
We have introduced the concept of EndtoEnd learning in urban growth prediction by developing a framework for learning rules of a cellular automata model directly by representing spectral information from remote sensing data. We have empirically verified our framework on predicting urban growth for the region of Mumbai over a time frame of years. Our EndtoEnd framework has outperformed existing learning based methodologies with a simpler implementation than the existing frameworks.
Future work can be based on challenges which we have encountered in this work. Since spatial resolution of the satellite is fixed, therefore as temporal resolution reduces, the number of cells in which builtup happens reduces. Due to this, the number of points in the transition classes reduces and persistence classes increases. Therefore, the dataset gets heavily imbalanced and the ability of the classifier to learn patterns of transition classes reduces. The problem of imbalanced datasets can be alleviated if both the temporal and spatial resolution of the satellite images are increased. This is possible if data from highresolution satellites like IKONOS/ QuickBird are used. However, the data load of highresolution satellites is high, hence more infrastructure and sophisticated algorithms would be necessary.
Despite superior performance of the EndtoEnd approach, one of the drawbacks of EndtoEnd learning is essentially difficulty in understanding the rules. Hence, uncovering automatically generated rules in an EndtoEnd learning framework can be considered as a challenge which needs to be resolved in the subsequent studies. Furthermore, the EndtoEnd framework can be extended to incorporate other data resources which are vector form, which may further improve the performance of the framework.
References
 [1] Maher Milad Aburas, Yuek Ming Ho, Mohammad Firuz Ramli, and Zulfa Hanan Ash’aari. The simulation and prediction of spatiotemporal urban growth trends using cellular automata models: A review. International Journal of Applied Earth Observation and Geoinformation, 52:380–389, 2016.
 [2] M Ahmadlou, MR Delavar, H ShafizadehMoghadam, and A Tayyebid. Modeling urban dynamics using random forest: Implementing roc and toc for model evaluation. ISPRSInternational Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pages 285–290, 2016.
 [3] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoderdecoder architecture for image segmentation. arXiv preprint arXiv:1511.00561, 2015.
 [4] R Basawaraja, KB Chari, SR Mise, and SB Chetti. Analysis of the impact of urban sprawl in altering the landuse, landcover pattern of raichur city, india, using geospatial technologies. Journal of Geography and Regional Planning, 4(8):455, 2011.
 [5] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
 [6] Basudeb Bhatta. Urban Growth Analysis and Remote Sensing: A Case Study of Kolkata, India 1980–2010. Springer Science & Business Media, 2012.
 [7] Nitesh V Chawla. Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook, pages 853–867. Springer, 2005.
 [8] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. Smote: synthetic minority oversampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
 [9] Yongjiu Feng, Yan Liu, and Michael Batty. Modeling urban growth with gis based cellular automata and least squares svm rules: a case study in qingpu–songjiang area of shanghai, china. Stochastic Environmental Research and Risk Assessment, 30(5):1387–1400, 2016.
 [10] Alex Graves and Navdeep Jaitly. Towards endtoend speech recognition with recurrent neural networks. In ICML, volume 14, pages 1764–1772, 2014.
 [11] Alex Graves, Abdelrahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
 [12] Simon S Haykin. Neural networks and learning machines, volume 3. Pearson Upper Saddle River, NJ, USA:, 2009.
 [13] Zhiyong Hu and CP Lo. Modeling urban growth in atlanta using logistic regression. Computers, Environment and Urban Systems, 31(6):667–688, 2007.
 [14] Courage Kamusoko and Jonah Gamba. Simulating urban growth using a random forestcellular automata (rfca) model. ISPRS International Journal of GeoInformation, 4(2):447–470, 2015.
 [15] Xinli Ke, Lingyun Qi, and Chen Zeng. A partitioned and asynchronous cellular automata model for urban growth simulation. International Journal of Geographical Information Science, 30(4):637–659, 2016.
 [16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
 [17] Xia Li and Anthony GarOn Yeh. Data mining of cellular automata’s transition rules. International Journal of Geographical Information Science, 18(8):723–744, 2004.
 [18] YuPin Lin, HoneJay Chu, ChenFa Wu, and Peter H Verburg. Predictive ability of logistic regression, autologistic regression and neural network models in empirical landuse change modeling–a case study. International Journal of Geographical Information Science, 25(1):65–87, 2011.
 [19] Yan Liu. Modelling urban development with geographical information systems and cellular automata. CRC Press, 2008.
 [20] Hossein Shafizadeh Moghadam and Marco Helbich. Spatiotemporal urbanization processes in the megacity of mumbai, india: A markov chainscellular automata urban growth model. Applied Geography, 40:140–149, 2013.
 [21] Sulaiman Ibrahim Musa, Mazlan Hashim, and Mohd Nadzri Md Reba. A review of geospatialbased urban growth models and modelling initiatives. Geocarto International, pages 1–21, 2016.
 [22] Stephen Omohundro. Modelling cellular automata with partial differential equations. Physica D: Nonlinear Phenomena, 10(12):128–134, 1984.
 [23] Hichem Omrani, Amin Tayyebi, and Bryan Pijanowski. Integrating the multilabel landuse concept and cellular automata with the artificial neural networkbased land transformation model: an integrated mlcaltm modeling framework. GIScience and Remote Sensing, 2017.
 [24] Robert Gilmore Pontius, Wideke Boersma, JeanChristophe Castella, Keith Clarke, Ton de Nijs, Charles Dietzel, Zengqiang Duan, Eric Fotsing, Noah Goldstein, Kasper Kok, et al. Comparing the input, output, and validation maps for several models of land change. The Annals of Regional Science, 42(1):11–37, 2008.
 [25] Behzad Saeedi Razavi. Predicting the trend of land use changes using artificial neural network and markov chain model (case study: Kermanshah city). Research Journal of Environmental and Earth Sciences, 6(4):215–226, 2014.
 [26] Andreas Rienow and Roland Goetzke. Supporting sleuth–enhancing a cellular automaton with support vector machines for urban growth modeling. Computers, Environment and Urban Systems, 49:66–81, 2015.
 [27] Inés Santé, Andrés M García, David Miranda, and Rafael Crecente. Cellular automata models for the simulation of realworld urban processes: A review and analysis. Landscape and Urban Planning, 96(2):108–122, 2010.
 [28] Hossein ShafizadehMoghadam, Julian Hagenauer, Manuchehr Farajzadeh, and Marco Helbich. Performance analysis of radial basis function networks and multilayer perceptron networks in modeling urban change: a case study. International Journal of Geographical Information Science, 29(4):606–623, 2015.
 [29] Rajesh Bahadur Thapa and Yuji Murayama. Urban growth modeling of kathmandu metropolitan region, nepal. Computers, Environment and Urban Systems, 35(1):25–34, 2011.
 [30] Jasper Van Vliet, Roger White, and Suzana Dragicevic. Modeling urban growth using a variable grid cellular automaton. Computers, Environment and Urban Systems, 33(1):35–43, 2009.
 [31] Qingsheng Yang, Xia Li, and Xun Shi. Cellular automata for simulating land use changes based on support vector machines. Computers & geosciences, 34(6):592–602, 2008.