ChaosNet: A Chaos based Artificial Neural Network Architecture for Classification
Abstract
Inspired by chaotic firing of neurons in the brain, we propose ChaosNet
– a novel chaos based artificial neural network architecture for classification tasks. ChaosNet
is built using layers of neurons, each of which is a 1D chaotic map known as the Generalized Luröth Series (GLS) which has been shown in earlier works to possess very useful properties for compression, cryptography and for computing XOR and other logical operations. In this work, we design a novel learning algorithm on ChaosNet
that exploits the topological transitivity property of the chaotic GLS neurons. The proposed learning algorithm gives consistently good performance accuracy in a number of classification tasks on well known publicly available datasets with very limited training samples. Even with as low as (or fewer) training samples/class (which accounts for less than 0.05% of the total available data), ChaosNet
yields performance accuracies in the range . We demonstrate the robustness of ChaosNet
to additive parameter noise and also provide an example implementation of a 2layer ChaosNet
for enhancing classification accuracy. We envisage the development of several other novel learning algorithms on ChaosNet
in the near future.
Generalized Luröth Series chaos universal approximation theorem topological transitivity classification artificial neural networks
Chaos has been empirically found in the brain at several spatiotemporal scales[1, 2]. In fact, individual neurons in the brain are known to exhibit chaotic bursting activity and several neuronal models such as the HindmarshRose neuron model exhibit complex chaotic dynamics [3]. Though Artificial Neural Networks (ANN) such as Recurrent Neural Networks exhibit chaos, to our knowledge, there have been no successful attempts in building an ANN for classification tasks which is entirely comprised of neurons which are individually chaotic. Building on our earlier research, in this work, we propose ChaosNet
– an ANN built out of neurons – each of which is a 1D chaotic map known as Generalized Luröth Series (GLS). GLS has been shown to have salient properties such as ability to encode and decode information losslessly with Shannon optimality, computing logical operations (XOR, AND etc.), universal approximation property and ergodicity (mixing) for cryptography applications. In this work, ChaosNet
exploits the topological transitivity property of chaotic GLS neurons for classification tasks with stateofthe art accuracies in the low training sample regime. This work, inspired by the chaotic nature of neurons in the brain, demonstrates the unreasonable effectiveness of chaos and its properties for machine learning. It also paves the way for designing and implementing other novel learning algorithms on the ChaosNet
architecture.
1 Introduction
With the success of Artificial Intelligence, learning through algorithms such as Machine Learning (ML) and Deep Learning (DL) has become an area of intense activity and popularity with applications reaching almost every field known to humanity. These include – medical diagnosis [4], computer vision, cybersecurity [5], natural language processing, speech processing[6], just to name a few. These algorithms, though inspired by the biological brain, are remotely related to the biological process of learning and memory encoding. The learning procedures used in these artificial neural networks (ANNs) to modify weights and biases are based on optimization techniques and minimization of loss/error functions. The ANNs at present use an enormous number of hyperparamters which are fixed by an adhoc procedure for improving prediction as more and more new data is input into the system. These synaptic changes employed are solely datadriven and have little or no rigorous theoretical backing [7, 8]. Furthermore, for accurate prediction/classification, these methods require enormous amount of training data that captures the distribution of the target classes.
Despite their tremendous success, ANNs are nowhere close to the human mind for accomplishing tasks such as natural language processing. To incorporate the excellent learning abilities of the human brain, as well as, to understand the brain better, researchers are now focusing on developing biologically inspired algorithms and architectures. This is being done both in the context of learning [9] and memory encoding [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].
One of the most interesting properties of the brain is its ability to exhibit Chaos[21] – the phenomenon of complex unpredictable and randomlike behaviour arising from simple deterministic nonlinear systems^{1}^{1}1Deterministic chaos is characterized by the Butterfly Effect – sensitive dependence of behaviour to minute changes in initial conditions.. The dynamics in electroencephalogram (EEG) signals is known to be chaotic [2]. The sensitivity to small shifts in internal functional parameters of a neuronal system helps to get desired response to different influences. This attribute resembles the dynamical properties of chaotic systems [22, 23, 24]. Moreover, it is seen that the brain may not reach a state of equilibrium after a transient, but is constantly alternating between different states. For this reason, it is suggested that with the change in functional parameters of the neurons, the brain is able to exhibit different behaviours – periodic orbits, weak chaos and strong chaos for different purposes. For example, there has been evidence to suggest that weak chaos may be good for learning [25] and periodic activity in the brain being useful for attention related tasks [26]. Thus, chaotic regimes exhibiting a wide variety of behaviors help the brain in quick adaptation to changing conditions.
Chaotic behaviour is exhibited not only by brain networks which are composed of billions of neurons, but the dynamics at the neuronal level (cellular and subcellular) are also chaotic [2]. Impulse trains produced by these neurons are actually responsible for the transmission and storage of information in the brain. These impulses or action potentials are generated when different ions cross the axonal membrane causing a change in the voltage across it. Hodgkin and Huxley were the first to propose a dynamical system’s model for the interaction between the ion channels and axon membrane, that is capable of generating realistic action potentials [27]. Later, its simplified versions such as the HindmarshRose model [28] and the FitzughNagumo [29, 30] model were proposed. All these models exhibit chaotic behaviour.
Although there exist some artificial neural networks which display chaotic dynamics (an example is Recurrent Neural Networks [31]), to the best of our knowledge, none of the architectures proposed for classification tasks till date exhibit chaos at the level of individual neurons. However, for a theoretical explanation of memory encoding in the brain, many chaotic neuron models have been suggested. These include the Aihara model [10] which has been utilized for memory encoding in unstable periodic orbits of the network [11]. Freeman, Kozma and group have developed chaotic models inspired from the mammalian olfactory network to explain the process of memorizing of odors [12, 13, 14]. Chaotic neural networks have been studied also by Tsuda et al. for their functional roles as short term memory generators as well a dynamic link for long term memory [15, 16]. Kaneko has explored the dynamical properties of globally coupled chaotic maps suggesting possible biological information processing capabilities of these networks [17, 18]. Our group (two of the authors) has also proposed a biologicallyinspired network architecture with chaotic neurons which is capable of memory encoding [19].
In this work, we propose ChaosNet
– an ANN built out of 1D chaotic map Generalized Luröth Series (GLS) as its individual neurons. This network can accomplish classification tasks by learning with limited training samples. ChaosNet
is developed as an attempt to use some of the best properties of biological neural networks arising as a result of rich chaotic behavior of individual neurons and is shown to accomplish challenging classification tasks comparable to or better than conventional ANNs while requiring far less training samples.
Choice of 1D maps as neurons in ChaosNet
helps to keep the processing simple and at the same time exploit the useful properties of chaos. Our group (two of the authors) has discussed the use of the property of topological transitivity of these GLS neurons for classification [32]. Building on this initial work, in the current manuscript, we propose a novel and improved version of topological transitivity scheme for classification. This improved novel scheme, proposed for the very first time in this paper, utilizes a ‘spikecount rate’ like property of the firing of chaotic neurons as a neural code for learning and is inspired from biological neurons. Moreover, the network is capable of displaying hierarchical architecture which can integrate information as it is passed on to higher levels (deeper layers in the network). The proposed classification scheme is rigorously tested on publicly available datasets – MNIST, KDDCup’99, Exoplanet and Iris.
Current stateoftheart algorithms in the AI community rely heavily on availability of enormous amounts of training data. However, there are several practical scenarios where huge amounts of training data may not be available [33].Learning from limited samples plays a key role especially in a field like cyber security where new malware attacks occur frequently. For example, detecting zero day malware attack demands the algorithms to learn from fewer data samples. Our proposed ChaosNet
architecture addresses this issue.
The paper is organized as follows. GLSneuron and its properties are described in section II. In section III, we introduce layer ChaosNet
architecture and its application for topological transitivity symbolic sequence based classification algorithm. Experiments, results and discussion including parameter noise analysis of the layer TTSS classification algorithm are dealt in section IV. Multilayer ChaosNet
architecture is introduced in section V. We conclude with future research direction in section VI.
2 GLSNeuron and its properties
The neuron we propose is a piecewise linear 1D chaotic map known as Generalized Luröth Series or GLS [34]. The well known Tent map, Binary map and their skewed cousins are all examples of GLS. Mathematically, the types of GLS neurons we use in this work are described below.
2.1 GLSNeuron types: and
where and . Refer to Figure 1(b).
The symbols L (or 0) and R (or 1) are associated with the intervals and respectively, thereby defining the symbolic sequence ^{2}^{2}2Let be the trajectory of a chaotic map with initial condition , where . The interval is partitioned into sub intervals denoted as . If then we denote by the symbol . The new sequence of symbol is the symbolic sequence of the trajectory of . for every trajectory starting from an initial value on the map. We enumerate some salient properties of the GLSneuron:

Each GLSneuron has two parameters  an initial activity value and the skew value . The parameter defines the generating Markov partition^{3}^{3}3Generating Markov Partition or GMP is based on splitting the state space into a complete set of disjoint regions, namely, it covers all state space and enables associating a onetoone correspondence between trajectories and itinerary sequences of symbols (L and R) without losing any information[35]. for the map and also acts as internal discrimination threshold of the neuron which will be used for feature extraction.

GLSneuron can fire either chaotically or in a periodic fashion (of any finite period length) depending on the value of the initial activity value . The degree of chaos is controlled by the skew parameter . The lyapunov exponent of the map[36] is given by . If the base of the logarithm is chosen as 2, then (bits/iteration) where is the symbolic sequence of the trajectory obtained by iterating the initial neural activity and is Shannon Entropy in bits/symbol. For , .

Any finite length input stream of bits can be losslessly compressed as an initial activity value on the neuron by performing a backward iteration on the map. Further, such a lossless compression scheme has been previously shown to be Shannon optimal[37].

Error detection properties can be incorporated into GLSneuron to counter the effect of noise[38].

Recently, two of the authors of the present work have proposed a compressionbased neural architecture for memory encoding and decoding using GLS [19].

GLSneuron has the property of topological transitivity  defined in a later section, which we will employ to perform classification.

A single layer of a finite number of GLSneurons satisfies a version of the Universal Approximation Theorem which we shall prove in a subsequent section of this paper.
The aforementioned properties make GLSneurons an ideal choice for building our novel architecture for classification.
3 ChaosNet: The Proposed Architecture
In this section, we first introduce the novel ChaosNet
architecture followed by a description of the singlelayer Topological TransitivitySymbolic Sequence classification algorithm. We discuss the key principles behind this algorithm, its parameters and hyperparameters, an illustrative example and a proof of the Universal Approximation Theorem.
3.1 Single layer ChaosNet Topological Transitivity  Symbolic Sequence (TTSS) based Classification Algorithm
We propose for the first time, a single layer chaos inspired neuronal architecture for solving classification problems (Figure 2). It consists of a single input and a single output layer. The input layer consists of GLS neurons and extracts patterns from each sample of the input data instance. The nodes in the output layer stores the representation vectors corresponding to each of the target classes (class classification problem). The entire input data is represented as a matrix (X) of dimensions where represents the number of data instances and represents the number of samples for each data instance. When the input data consists of images, each of dimensions , we vectorize each image and the resulting matrix X will be of dimensions with . Each row of this matrix will be an instance of the vectorized image. The number of neurons in the input layer of ChaosNet
is set equal to the number of samples of data instance.
The GLS neurons () in the architecture in Figure 2 have an initial neural activity of units. This is also considered as the initial value of the chaotic map. The GLS neurons starts firing chaotically when encountered by the stimulus. The stimulus is a real number which is normalized to lie between and . The input vector of data samples (or data instance) represented as in Figure 2 are the stimuli corresponding to respectively. The chaotic neural activity values at time of the GLS neurons are denoted by respectively, where
(1) 
The chaotic firing of each GLS neuron stops when their corresponding activity value starting from the initial neural activity () reaches the epsilon neighbourhood of . The time at which each neuron stops firing can thus be different. The halting of firing of each GLS neuron is guaranteed by Topological Transitivity (TT) property. The time taken ( ms) for the firing of th GLS neuron to reach the epsilon neighbourhood of incoming stimulus is termed as firing time. The fraction of this firing time for which the activity of the GLS neuron is greater than the discrimination threshold () is defined as Topological Transitivity  Symbolic Sequence (TTSS) feature. The formal definition of Topological Transitivity is as follows:
Definition 1: Topological Transitivity property for a map states that for all nonempty open set pairs and in , there exists an element and a non negative finite integer such that .
For example, we consider a GLSD map () which is chaotic with . Let and where . From the definition of Topological Transitivity, the existence of an integer and a real number such that is ensured. We consider (initial neural activity of the GLSNeuron) and as the stimulus to the th GLS Neuron (after normalizing). Thus, will always exist. It is important to highlight that for certain values of there may be no such , for eg., initial values that lead to periodic orbits. However, we can always find a value for for which exists. This is because, for a chaotic map, there are an infinite number of initial values that lead to nonperiodic ergodic orbits.
Single layer ChaosNet
TTSS algorithm consists of mainly three steps:

Feature extraction using TTSS  TTSS based feature extraction step is represented in the flowchart provided in Figure 4. Let the neighbourhood of th sample of th data instance be represented as where . Let the normalized stimulus to the th GLSNeuron be and the corresponding neuronal firing time be . The firing trajectory of the GLS Neuron upon encountering a stimulus () is represented as where . The trajectory is denoted as . The fraction of firing time for which the GLSNeuron’s chaotic trajectory activity value () is greater than the internal discrimination threshold value () is defined as the Topological Transitivity  Symbolic Sequence (TTSS) feature and denoted by .
(2) where represents the duration of firing for which the chaotic trajectory is above the discrimination threshold () for the th GLS Neuron (see Figure 3). The TTSS feature can be looked at as a spikecount rate based neural code [40] for the GLSNeuron which is active (or firing) for a total of time units, in which the spiking activity (greater than threshold activity) is limited to time units.

TTSS Training  TTSS based training step is represented in the flowchart provided in Figure 5. Let us assume an class classification problem where classes are represented by , , …, and the corresponding truelabels be denoted as respectively. Let the normalized distinct matrices be of size . The matrices denotes the data belonging to , , …, respectively. Training involves extracting features from using TTSS method so as to yield . Feature extraction using TTSS algorithm is applied on each stimulus, hence the size of will be same as . Once the TTSS based feature extraction is found for the data belonging to the distinct classes, the average across row is computed next:
are row vectors and are termed as mean representation vectors corresponding to the classes. is a vector where the average internal representation of all the stimuli corresponding to th class is encoded. As more and more input data are received the mean representation vectors get updated.

TTSS Testing  The computational steps involved in testing are in the flowchart provided in Figure 6. Let denote the normalized test data matrix of size . The th row of represents th test data instance denoted as . The TTSS based feature extraction step is applied to each row (each test data instance where ). Let the feature extracted data of test samples using TTSS algorithm be denoted as where is the th row of . Feature extraction is followed by the computation of cosine similarity of independently with each of the (mean representation vectors) respectively:
represents the scalar product between vectors and , denotes the norm of rowvector . The above will give scalar values which are the cosine similarity values between and . Out of these scalar values, the index () corresponding to the maximum cosine similarity () is considered as the label for :
If there is more than one index with maximum cosine similarity, we take the smallest such index. The above procedure is continued until a unique label is assigned to each test data instance.
3.2 Parameters vs Hyperparameters
Distinguishing model parameters and model hyperparameters plays a crucial role in machine learning tasks. The model parameters and hyperparameters of the proposed method are as follows:
Model parameter: This is an internal parameter which is estimated from the data. These internal parameters are what the model learns while training. In the case of single layer ChaosNet
TTSS method, the mean representation vectors are the model parameters which are learned while training. The model parameters for deep learning (DL) are the weights and biases learnt during training the neural network. In Support Vector Machines (SVM), the support vectors are the parameters. In all these cases, the parameters are learned while training.
Model hyperparameters: These are configurations which are external to the model and typically not estimated from the data. The hyperparameters are tuned for a given classification or predictive modelling task in order to estimate the best model parameters. The hyperparameters of a model are often arrived by heuristics and hence are different for different tasks. In the case of single layer ChaosNet
TTSS method, the hyperparameters are the initial neural activity (), internal discrimination threshold (), used in defining the neighbourhood interval of () and the chaotic map chosen. In DL, the hyperparameters are the number of hidden layers, number of neurons in the hidden layer, learning rate and the activation function chosen. In the case of KNearest Neighbours (KNN) classification algorithm, the number of nearest neighbours () is a hyperparameter. In SVM, the choice of kernel is a hyperparameter. In Decision Tree, depth of the tree and the least number of samples required to split the internal node are the hyperparameters. In all these cases the hyperparameters need to be fixed by crossvalidation.
3.3 Example to illustrate single layer ChaosNet TTSS method
We consider a binary classification problem. The two classes are denoted as and with class labels and respectively. The input dataset is a matrix . The first two rows of are denoted as and the remaining two rows of are denoted as . and represent data instances belonging to and respectively. Since each row of has samples, the input layer of ChaosNet
in this case has GLS neurons in the input layer. We set the hyperparameters , , and chaotic map as .
The steps of the layer ChaosNet
TTSS algorithm are:

Step 1: Normalization of data: is normalized^{4}^{4}4For a nonconstant matrix , normalization is achieved by performing . A constant matrix is normalized to all ones. to .

Step 2: Training  TTSS based feature extraction: From the normalized matrix , we sequentially pass one row of at a time to the input layer of
ChaosNet
and extract the TTSS feature. As an example, let and be the first two rows of (pertaining to ) which are passed to the input layer sequentially. Each GLS neuron in the input layer starts firing chaotically until it reaches the neighbourhood of which are , , and . Let the neural firing of be denoted as , , , . The fraction of firing time for which (internal discrimination threshold) is determined for the GLS neurons to yield the TTSS feature vector . Similarly, the TTSS feature vector for is . The mean representation vector of is calculated as . In a similar fashion, the mean representation vector of is computed. Thus, at the end of training, the mean representation vectors and of and are stored in nodes and of the output layer respectively. 
Step 3: Testing  as an example, let be a test sample (after normalization) that needs to be classified. Similar to the training step, is first fed to the input layer of
ChaosNet
having 4 GLS neurons and its activity is recorded. Subsequently, TTSS feature vector is extracted from the neural activity which is denoted as . We compare individually with the two internal representation vectors and by using cosine similarity metric. The test sample is classified to that class for which the cosine similarity is maximum.
A singlelayer ChaosNet
with finite number of GLS neurons has the ability to approximate any arbitrary realvalued, bounded discrete time function (with finite support) as we shall show in the next subsection.
3.4 Universal Approximation Theorem (UAT)
Cybenko (in 1989) proved one of the earliest versions of the Universal Approximation Theorem (UAT). UAT states that continuous functions on compact subsets of can be approximated by an ANN with  hidden layer having finite number of neurons with sigmoidal activation functions [41]. Thus, simple neural networks with appropriately chosen parameters can approximate continuous functions of a wide variety. We have recently proven a version of this theorem for the GLSneuron [19] which we reproduce below (with minor modifications).
Universal Approximation Theorem (UAT) for GLSNeuron:
Let us consider a real valued bounded discrete time function having a support length . UAT guarantees the existence of a finite set of GLS neurons denoted as , , , , such that, for these neurons, the GLSencoded output values , , , can approximate . Here s are the initial values and s are the skew parameters of the GLS maps. In other words, the UAT for GLS neurons guarantees the existence of the function satisfying the following:
(3) 
where is arbitrarily small.
Proof: By construction: for a given , the range of the function is uniformly quantized in such a way that the quantized and original function differ by an error . The boundedness of ensures that the above is always true, because we can find integers , , , corresponding to the time indices which satisfy the following inequality after proper global scaling of :
(4) 
where , s are all integers and , where (a finite real number) denotes the proper global scaling constant which in turn depends on . Let us consider only the significant number of bitplanes of s  denote it as (the value of is the least power of 2 which is just greater than the maximum of ).
Let the discretetime quantized integervalued signal that approximates be called as . The bitplanes of each value of the function are extracted next to yield bitstreams , where the Most Significant Bit (MSB) and Least Significant Bit (LSB) are denoted by and respectively. The th stream of bits is a length binary list. The backiteration on an appropriate GLS can encode the binary list losslessly to the initial value (the probability of 0 in the bitstream is represented by the parameter which is also the skew parameter of the map). The above procedure is followed for the bitstreams to yield , , , GLS encoded neurons. The perfect lossless compression property of GLS enables the recovering of original quantized bitstreams using GLS decoding [37]. Each step in our construction uses procedures that are deterministic (and which always halt). This means the compositions of several nonlinear maps (quantization, scaling, bitplane extraction, GLS encoding are all nonlinear) can be used to construct the desired function . Thus, there exists a function that satisfies inequality (Eq. 3).
The function is not unique since it depends on how the s are chosen and finding an explicit analytical expression is not easy (but we know it exists). The above argument can be extended for continuousvalued functions (by sampling) as well as functions in higher dimensions.
4 Experiments, Results and Discussion
Learning from insufficient training samples is very challenging for ANN/ML/DL algorithms since they are typically dependent on learning from vast amount of data. The efficacy of our proposed ChaosNet
TTSS method based classification is evaluated on MNIST, KDDCup’99, Iris and Exoplanet data in the low training sample regime. The description of the datasets used in the analysis of the proposed method are given here.
4.1 Datasets
4.1.1 Mnist
MNIST [42] is a commonly used hand written digits ( to ) image dataset in the ML/DL community. These images have a dimension of 28 pixels 28 pixels and are stored digitally as bit grayscale images. The training set of MNIST database consists of images whereas the test set consists of images. MNIST is a multiclass classification ( classes) problem and the goal is to classify the images into their correct respective classes. For our analysis, we have independently trained with randomly chosen data samples per class. For each such trial of training, the algorithm is tested with () unseen test images.
4.1.2 KDDCup’99
KDDCup’99 [43] is a benchmark dataset used in the evaluation of intrusion detection systems (IDS). The creation of this dataset is based on the acquired data in IDS DARPA’98 evaluation program [44]. There are roughly 4,900,000 single connection vectors in the KDDCup’99 training data. The number of features in each connection vector is 41. Each data sample is labeled as either normal or a specific attack type. We considered 10% of data samples from the entire KDDCup’99 data. In this 10% KDDCup’99 data that we considered, there are normal and 21 different attack categories. Out of these, we took the following classes for our analysis: back, ipsweep, neptune, normal, portsweep, satan, smurf, teardrop, and warezmaster. Training was done independently with data samples per class. The data samples were chosen randomly for training from the existing training set. For each trial of training, the algorithm is tested with unseen data.
4.1.3 Iris
Iris data^{5}^{5}5http://archive.ics.uci.edu/ml/datasets/iris [45] consists of attributes from 3 types of Iris plants. These plants are as follows: Setosa, Versicolour and Virginica. The number of attributes used in this dataset are 4. The attributes are sepal length, sepal width, petal length and petal width (all measured in centimeters). This is a class classification problem with data samples per category. For our analysis, we have independently trained with randomly chosen data samples per class. For each trial of training, the algorithm is tested with () unseen test data.
4.1.4 Exoplanet
PHLEC dataset^{6}^{6}6The habitable Exoplanet Catalog: http://phl.upr.edu/hec (combined with stellar data from the Hipparcos catalog [46] ) has 68 attributes (of which 55 are continuous valued and 13 are categorical) and more than confirmed exoplanets (at the time of writing this paper). Important attributes such as surface temperature, atmospheric type, radius, mass, flux, earth’s similarity index, escape velocity, orbital velocity etc. are included in the catalog (with both observed and estimated attributes). From an analysis pointofview, this presents interesting challenges [47, 48]. The dataset consists of six classes, of which three were used in our analysis as they are sufficiently large in size. The other three classes are dropped (while training) since they had very low number of samples. The classes considered are mesoplanet, psychroplanet and nonhabitable. Based on their thermal properties, these three classes or types of planets are described as follows:

Mesoplanets: Also known as Mplanets, these have mean global surface temperature in the range 0C to 50C which is a necessary condition for survival of complex terrestrial life. The planetary bodies with sizes smaller than Mercury and larger than Ceres fall in this category, and these are generally referred to as Earthlike planets.

Psychroplanets: Planets in this category have mean global surface temperature in the range 50C to 0C. This is much colder than optimal for terrestrial life to sustain.

NonHabitable: Those planets which do not belong to either of the above two categories fall in this category. These planets do not have necessary thermal properties for sustaining life.
The three remaining classes in the data are – thermoplanet, hyperthermoplanet and hypopsychroplanet. However, owing to highly limited number of samples in each of these classes, we ignore these classes. While running the classification methods, we consider multiple attributes of the parent sun of the exoplanets that include mass, radius, effective temperature, luminosity, and the limits of the habitable zone. The first step consists of preprocessing data from PHLEC. An important challenge in the dataset is that a total of about of the data is missing (with a majority being of the attribute P. Max Mass) and in order to overcome this, we used a simple method of removing instances with missing data after extracting the appropriate attributes for each experiment, as most of the missing data is from the nonhabitable class. We considered another data set where a subset of attributes (restricted attributes) consisting of Planet Minimum Mass, Mass, Radius, SFlux Minimum, SFlux Mean, SFlux Maximum are used as input. This subset of attributes do not consider surface temperature and any attribute related to surface temperature at all, making the decision boundaries more complicated to decipher. Following this, the ML approaches were used on these preprocessed datasets. The online data source for the current work is available at http://phl.upr.edu/projects/habitableexoplanetscatalog/data/database.
4.2 Results and Discussion
We compare the proposed layer ChaosNet
TTSS method with the following ML algorithms: Decision Tree [49], KNearest Neighbour (KNN [50]), Support Vector Machine (SVM [51]) and Deep Learning algorithm (DL, 2 layers [52]). The parameters used in ML algorithms are provided in Appendix. We have used Scikitlearn [53] and Keras [54] package for the implementation of ML algorithms and layer neural network respectively.
Dataset  Initial Neural  Discrimination  GLSNeuron  

Activity ()  Threshold ()  type  
MNIST  
KDDCup’99  
Iris  
Exoplanet  ^{7}^{7}7Actual value was .  
Exoplanet^{8}^{8}8Exoplanet with No Surface Temperature  ^{9}^{9}9Actual value was .  
Exoplanet^{10}^{10}10Exoplanet with Restricted attributes  ^{11}^{11}11Actual value was . 
Performance of ChaosNet
TTSS Method on MNIST data:
Figure 7 shows the comparative performance of ChaosNet
TTSS method with SVM, Decision Tree, KNN and DL (2layer) on MNIST dataset. In the case of MNIST data ChaosNet
TTSS method outperforms classsical ML techniques like SVM, Decision Tree and KNN. The ChaosNet
TTSS method gave slightly higher performance than DL upto training with 8 samples per class. As the number of training samples increased (beyond 8), DL outperforms ChaosNet
TTSS method.
Performance of ChaosNet
TTSS Method on KDDCup’99 data:
Figure 8 shows the comparative performance of ChaosNet
TTSS method with SVM, Decision Tree, KNN and DL (2layer) on KDDCup’99 dataset. In the low training sample regime for KDDCup’99 data, ChaosNet
TTSS method outperforms the classical ML and DL algorithms except for training with samples/class, where Decision Tree outperforms ChaosNet
TTSS method.
Performance of ChaosNet
TTSS Method on Iris data:
Figure 9 shows the comparative performance of ChaosNet
TTSS method with SVM, Decision Tree, KNN and DL (2layer) on Iris dataset. In the case of Iris data, ChaosNet
TTSS method gives consistently the best results when trained with samples per class.
Performance of ChaosNet
TTSS Method on Exoplanet data: Figure 10 shows the comparative performance of ChaosNet
TTSS method with SVM, Decision Tree, KNN and DL (2layer) on Exoplanet dataset. We consider classes from the exoplanet data, namely Non Habitable, Meso Planet and Psychroplanet. There are a total of attributes to explore the classification of types of exoplanet. We have considered another scenario where surface temperature (Figure 11) is removed from the set of attributes. This makes the classification problem harder as the decision boundaries between classes become fuzzy in the absence of surface temperature. Additionally, restricted set of attributes (Figure 12) is considered where the direct/indirect influence of surface temperature is mitigated by removing all related attributes from the original full set of attributes. This makes habitability classification an incredibly complex task. Even though the literature is replete with possibilities of using both supervised and unsupervised learning methods, the soft margin between classes, namely psychroplanet and mesoplanet makes the task of discrimination incredibly difficult. This has perhaps resulted in very few published work on automated habitability classification. A sequence of recent explorations by Saha et. al. (2018) expanding previous work by Bora et. al [55] on using Machine Learning algorithm to construct and test planetary habitability functions with exoplanet data raises important questions.
In our study, independent trials of training is done with very less samples: randomly chosen data samples per class. The algorithm is then tested on unseen data for each of the independent trials. This accounts for the efficacy of the algorithm in detecting variances in new data. A consistent performance of ChaosNet
TTSS method is observed in the low training sample regime. ChaosNet
TTSS method gives the second highest performance in terms of accuracy when compared to SVM, KNN and DL. The highest performance is given by Decision Tree. Despite the sample bias due to the nonhabitable class, we were able to achieve remarkable accuracies with the proposed algorithms without having to resort to under sampling and synthetic data augmentation. Additionally, the performance of ChaosNet
TTSS is consistent compared to other methods used in the analysis.
4.3 Single layer ChaosNet TTSS algorithm in the presence of Noise
One of the key ways to identify the robustness of a ML algorithm is by testing its efficiency in classification or prediction in the presence of noise. Robustness needs to be tested under the following scenarios: noisy test data, training data attributes affected by noise, inaccurate training data labels due to the influence of noise and noise affected model parameters^{12}^{12}12Hyperparameters are rarely subjected to noise and hence we ignore this scenario. It is always possible to protect the hyperparameters by using strong error correction codes.. Amongst these, noise affected model parameters is the most challenging since it can significantly impact performance of the algorithm. We consider a scenario where the parameters learned by the model while training are passed through a noisy channel. As an example, we compare the performance of the single layer ChaosNet
TTSS algorithm and layer neural network (DL architecture) for Iris data. The parameters for the both algorithms are modified by Additive White Gaussian Noise (AWGN) with zero mean and increasing variance. The Iris dataset consists of classes with attributes per data instance. We considered the specific case of training with only samples per class.
Parameters settings for the single layer TTSS algorithm implemented on ChaosNet
for Iris data: Corresponding to the 3 classes for the Iris data, the output layer of ChaosNet
consists of nodes , , and which store the mean representation vectors. Since each representation vector contains 4 components (corresponding to the 4 input attributes), the total number of learnable parameters are . These parameters are passed through a channel corrupted by AWGN with zero mean and increasing standard deviation (from to ). This results in a variation of the SignaltoNoise Ratio (SNR) from to .
Parameter settings for the layer neural network for Iris data: The layer neural network has nodes in the input layer, neurons in the hidden layer and neurons in the output layer. Thus, the total number of learnable parameters (weights and biases) for this architecture are: (. These parameters are passed through a channel corrupted by AWGN with zero mean and increasing standard deviation (from to ). This results in a variation of the SignaltoNoise Ratio (SNR) from to .
Parameter Noise Analysis:
The results corresponding to additive gaussian parameter noise for ChaosNet
TTSS algorithm and layer neural network are provided in Figure 13 and Figure 15 respectively. Figure 14 and Figure 16 depict the variation of SNR (dB) with of the AWGN for the two algorithms. Firstly, we can observe that the performance of ChaosNet
TTSS algorithm degrades gracefully with increasing , whereas for the layer neural network there is a sudden and drastic fall in performance (as .). A closer observation of the variation of accuracy for different SNRs is provided in Table 2. In the case of 1layer ChaosNet
TTSS method, for the SNR in the range to , the accuracy remains unchanged at %. Whereas for the same SNR, the accuracy for the 2layer neural network reduced from % to %. This preliminary analysis on parameter noise indicates the better performance of our method when compared with layer neural network. However, more extensive analysis will be performed for other datasets and other noise scenarios in the near future.








95.83  95.00 










67.50  65.00 


5 Multilayer ChaosNet TTSS Algorithm
So far we have discussed TTSS algorithm implemented on singlelayer ChaosNet
architecture. In this section, we investigate the addition of hidden layers to the ChaosNet
architecture. A three layer ChaosNet
with two hidden layers and one output layer is depicted in Figure 17. It consists of an input layer with GLS neurons , and two hidden layers with GLS neurons and respectively. Let the neural activity of the input layer GLS neurons be represented by the chaotic trajectories , , …, with firing times , , …, respectively and denotes the maximum firing time. The th GLS neuron in the hidden layer has its own intrinsic dynamics starting with an initial neural activity of (represented by a self connection with a coupling constant ) and potentially connected to every GLS neuron in the input layer (represented by coupling constant ).
The coupling coefficient connecting the th GLS neuron in the input layer to the th GLS neuron in the first hidden layer of the multilayer ChaosNet
architecture is .
The output of the th GLS neuron of the first hidden layer is as follows:
(5) 
where , and , . The output of the th neuron of the second hidden layer is as follows:
(6) 
where , . In the above equations, represents the D chaotic GLS map. From the output of the second hidden layer, the TTSS features are extracted from the GLS neurons which are subsequently passed to the output layer for computation and storage of the mean representation vectors corresponding to the classes.
In the multilayer ChaosNet
TTSS method, s and s are the additional hyperparameters. The above algorithm can be extended in a straightforward manner for more than hidden layers.
5.1 Multilayer ChaosNet TTSS method for Exoplanet dataset
We have implemented the multilayer ChaosNet
TTSS method for the Exoplanet dataset with 2 layers (one hidden layer and one output layer). The number of GLS neurons in the input and first hidden layer are and respectively. The output layer consists of nodes as it is a class classification problem (Mesoplanets, Psychroplanets and NonHabitable). Every neuron in the first hidden layer is connected to only two neurons in the input layer (except the last neuron which is connected to only one neuron in the input layer). The output of the GLS neurons in the hidden layer is given by:
(7) 
for . To the last neuron of first hidden layer the input value is passed as such. The hyperparameters used in the classification task are: and , ; initial neural activity , internal discrimination threshold (), type of GLS neuron used and chosen for Exoplanet classification task are the same as in Table 1 (fourth row).
From Figure 18, layer ChaosNet
TTSS method has slightly improved the accuracy of Exoplanet classification task over that of layer ChaosNet
TTSS method for four and higher number of training samples per class. There is a reduction in the dimensionality of the representation vectors (at the cost of increase in the number of hyperparameters). While these preliminary results are encouraging, more extensive testing of multilayer ChaosNet
TTSS method with fully connected layers (and more than one hidden layer) need to be performed in the future.
6 Conclusions and Future Research Directions
Stateoftheart performance on classification tasks reported by algorithms in literature are typically for or trainingtesting split of the datasets. The performance of these algorithms will dip considerably as the number of training samples reduces. ChaosNet
demonstrates (layer TTSS method) consistently good (and reasonable) performance accuracy in the low training sample regime. Even with as low as (or fewer) training samples/class (which accounts for less than 0.05% of the total available data), ChaosNet
yields performance accuracies in the range .
Future work includes determining optimal hyperparameter settings to further improve accuracy, testing on more datasets and classification tasks, extension to predictive modelling and incorporating robustness into GLS neurons to external noise. Multilayer ChaosNet
architecture presents a number of avenues for further research such as determining optimal number of layers, type of coupling (unidirectional and bidirectional) between layers, homogeneous and heterogeneous layers (successive layers can have neurons with different 1D chaotic maps), coupled map lattices, use of 2D and higher dimensional chaotic maps and even flows in the architecture and exploring properties of chaotic synchronization in such networks.
Highly desirable features such as Shannon optimal lossless compression, computation of logical operations (XOR, AND etc.), universal approximation property and topological transitivity – all thanks to the chaotic nature of GLS neurons – makes ChaosNet
a potentially attractive ANN architecture for diverse applications (from memory encoding for storage and retrieval purposes to classification tasks). We expect design and implementation of novel learning algorithms on the ChaosNet
architecture in the near future that can efficiently exploit these wonderful properties of chaotic GLS neurons.
The code for the proposed ChaosNet
architecture (TTSS method) is available at https://github.com/HarikrishnanNB/ChaosNet.
Acknowledgment
Harikrishnan N. B. thanks “The University of TransDisciplinary Health Sciences and Technology (TDU)” for permitting this research as part of the PhD programme. Aditi Kathpalia is grateful to the Manipal Academy of Higher Education for permitting this research as a part of the PhD programme. The authors gratefully acknowledge the financial support of Tata Trusts. N. N. dedicates this work to late Prof. Prabhakar G Vaidya who initiated him to the fascinating field of Chaos Theory.
References
 [1] Philippe Faure and Henri Korn. Is there chaos in the brain? i. concepts of nonlinear dynamics and methods of investigation. Comptes Rendus de l’Académie des SciencesSeries IIISciences de la Vie, 324(9):773–793, 2001.
 [2] Henri Korn and Philippe Faure. Is there chaos in the brain? ii. experimental evidence and related models. Comptes rendus biologies, 326(9):787–840, 2003.
 [3] Yinshui Fan and Arun V Holden. Bifurcations, burstings, chaos and crises in the rosehindmarsh model for neuronal activity. Chaos, Solitons & Fractals, 3(4):439–449, 1993.
 [4] Yiming Ding, Jae Ho Sohn, Michael G Kawczynski, Hari Trivedi, Roy Harnish, Nathaniel W Jenkins, Dmytro Lituiev, Timothy P Copeland, Mariam S Aboian, Carina Mari Aparici, et al. A deep learning model to predict a diagnosis of alzheimer disease by using 18ffdg pet of the brain. Radiology, 290(2):456–464, 2018.
 [5] NB Harikrishnan, R Vinayakumar, and KP Soman. A machine learning approach towards phishing email detection. In Proceedings of the AntiPhishing Pilot at ACM International Workshop on Security and Privacy Analytics (IWSPA AP). CEURWS. org, volume 2013, pages 455–468, 2018.
 [6] Alex Graves, Abdelrahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645–6649. IEEE, 2013.
 [7] Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey, and David Daniel Cox. On the information bottleneck theory of deep learning. 2018.
 [8] Naftali Tishby and Noga Zaslavsky. Deep learning and the information bottleneck principle. In Information Theory Workshop (ITW), 2015 IEEE, pages 1–5. IEEE, 2015.
 [9] Charles B Delahunt and J Nathan Kutz. Putting a bug in ML: The moth olfactory network learns to read MNIST. arXiv preprint arXiv:1802.05405, 2018.
 [10] Kazuyuki Aihara, T Takabe, and Masashi Toyoda. Chaotic neural networks. Physics letters A, 144(67):333–340, 1990.
 [11] Nigel Crook and Tjeerd Olde Scheper. A novel chaotic neural network architecture. In ESANN, pages 295–300, 2001.
 [12] Walter J Freeman et al. Mass action in the nervous system, volume 2004. Citeseer, 1975.
 [13] HungJen Chang and Walter J Freeman. Parameter optimization in models of the olfactory neural system. Neural Networks, 9(1):1–14, 1996.
 [14] Robert Kozma and Walter J Freeman. A possible mechanism for intermittent oscillations in the kiii model of dynamic memoriesthe case study of olfaction. In IJCNN’99. International Joint Conference on Neural Networks. Proceedings (Cat. No. 99CH36339), volume 1, pages 52–57. IEEE, 1999.
 [15] Ichiro Tsuda. Dynamic link of memory—chaotic memory map in nonequilibrium neural networks. Neural networks, 5(2):313–326, 1992.
 [16] John S Nicolis and Ichiro Tsuda. Chaotic dynamics of information processing: The “magic number seven plusminus two” revisited. Bulletin of Mathematical Biology, 47(3):343–365, 1985.
 [17] Kunihiko Kaneko. Lyapunov analysis and information flow in coupled map lattices. Physica D: Nonlinear Phenomena, 23(13):436–447, 1986.
 [18] Kunihiko Kaneko. Clustering, coding, switching, hierarchical ordering, and control in a network of chaotic elements. Physica D: Nonlinear Phenomena, 41(2):137–172, 1990.
 [19] Aditi Kathpalia and Nithin Nagaraj. A novel compression based neuronal architecture for memory encoding. In Proceedings of the 20th International Conference on Distributed Computing and Networking, pages 365–370. ACM, 2019.
 [20] Zainab Aram, Sajad Jafari, Jun Ma, Julien C Sprott, Sareh Zendehrouh, and VietThanh Pham. Using chaotic artificial neural networks to model memory in the brain. Communications in Nonlinear Science and Numerical Simulation, 44:449–459, 2017.
 [21] Kathleen T Alligood, Tim D Sauer, and James A Yorke. Chaos. Springer, 1996.
 [22] Agnessa Babloyantz and Carlos Lourenço. Brain chaos and computation. International Journal of Neural Systems, 7(04):461–471, 1996.
 [23] Colin Barras. Mind maths: Brainquakes on the edge of chaos. New Scientist, 217(2903):36, 2013.
 [24] Thomas Elbert, Brigitte Rockstroh, Zbigniew J Kowalik, Manfried Hoke, Mark Molnar, James E Skinner, and Niels Birbaumer. Chaotic brain activity. Electroencephalography and Clinical Neurophysiology/Supplement, 44:441–449, 1995.
 [25] JC Sprott. Is chaos good for learning? Nonlinear dynamics, psychology, and life sciences, 17(2):223–232, 2013.
 [26] G Baghdadi, S Jafari, JC Sprott, F Towhidkhah, and MR Hashemi Golpayegani. A chaotic model of sustaining attention problem in attention deficit disorder. Communications in Nonlinear Science and Numerical Simulation, 20(1):174–185, 2015.
 [27] Hodgkin AL and Huxley AF. A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology, 117:500–544, 1952.
 [28] James L Hindmarsh and RM Rose. A model of neuronal bursting using three coupled first order differential equations. Proceedings of the Royal society of London. Series B. Biological sciences, 221(1222):87–102, 1984.
 [29] Richard FitzHugh. Impulses and physiological states in theoretical models of nerve membrane. Biophysical journal, 1(6):445–466, 1961.
 [30] Jinichi Nagumo, Suguru Arimoto, and Shuji Yoshizawa. An active pulse transmission line simulating nerve axon. Proceedings of the IRE, 50(10):2061–2070, 1962.
 [31] A Zerroug, L Terrissa, and A Faure. Chaotic dynamical behavior of recurrent neural network. Annu. Rev. Chaos Theory Bifurc. Dyn. Syst, 4:55–66, 2013.
 [32] Harikrishnan N B and Nithin Nagaraj. A novel chaos theory inspired neuronal architecture. arXiv preprint arXiv:1905.12601, 2019.
 [33] Ralf C Staudemeyer and Christian W Omlin. Extracting salient features for network intrusion detection using machine learning methods. South African computer journal, 52(1):82–96, 2014.
 [34] Karma Dajani and Cor Kraaikamp. Ergodic theory of numbers. Number 29. Cambridge University Press, 2002.
 [35] Nicolás Rubido, Celso Grebogi, and Murilo S Baptista. Entropybased generating markov partitions for complex systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 28(3):033611, 2018.
 [36] Nithin Nagaraj. Novel applications of chaos theory to coding and cryptography. PhD thesis, NIAS, 2008.
 [37] Nithin Nagaraj, Prabhakar G Vaidya, and Kishor G Bhat. Arithmetic coding as a nonlinear dynamical system. Communications in Nonlinear Science and Numerical Simulation, 14(4):1013–1020, 2009.
 [38] Nithin Nagaraj. Using cantor sets for error detection. PeerJ Computer Science, 5:e171, 2019.
 [39] KwokWo Wong, Qiuzhen Lin, and Jianyong Chen. Simultaneous arithmetic coding and encryption using chaotic maps. IEEE Transactions on Circuits and Systems II: Express Briefs, 57(2):146–150, 2010.
 [40] Wulfram Gerstner, Werner M Kistler, Richard Naud, and Liam Paninski. Neuronal dynamics: From single neurons to networks and models of cognition. Cambridge University Press, 2014.
 [41] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
 [42] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. 2010.
 [43] KDD Cup. Data (1999). URL http://www. kdd. org/kddcup/view/kddcup1999/Data, 1999.
 [44] Richard P Lippmann, David J Fried, Isaac Graf, Joshua W Haines, Kristopher R Kendall, David McClung, Dan Weber, Seth E Webster, Dan Wyschogrod, Robert K Cunningham, et al. Evaluating intrusion detection systems: The 1998 darpa offline intrusion detection evaluation. In Proceedings DARPA Information Survivability Conference and Exposition. DISCEX’00, volume 2, pages 12–26. IEEE, 2000.
 [45] Catherine L Blake and Christopher J Merz. UCI repository of machine learning databases, 1998.
 [46] A. M’endez. The night sky of exoplanets. Hipparcos catalog, 2011.
 [47] Snehanshu Saha, Nithin Nagaraj, Archana Mathur, and Rahul Yedida. Evolution of novel activation functions in neural network training with applications to classification of exoplanets. arXiv preprint arXiv:1906.01975, 2019.
 [48] Snehanshu Saha, Suryoday Basak, M Safonova, K Bora, Surbhi Agrawal, Poulami Sarkar, and Jayant Murthy. Theoretical validation of potential habitability via analytical and boosted tree methods: An optimistic study on recently discovered exoplanets. Astronomy and computing, 23:141–150, 2018.
 [49] J. Ross Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986.
 [50] Thomas M Cover, Peter Hart, et al. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27, 1967.
 [51] Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28, 1998.
 [52] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436, 2015.
 [53] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
 [54] François Chollet et al. Keras. https://keras.io, 2015.
 [55] Kakoli Bora, Snehanshu Saha, Surbhi Agrawal, Margarita Safonova, Swati Routh, and Anand Narasimhamurthy. Cdhpf: New habitability score via data analytic modeling. Astronomy and Computing, 17:129–143, 2016.
Appendix
(I). Algorithms for the 1layer TTSS Method implemented on ChaosNet
(II). Hyperparameters used by machine learning algorithms in scikitlearn and Keras
The following table lists the hyperparameters that we have used for the machine learning algorithms for generating the results in the main manuscript.
Datasets  Algorithms  Hyperparameters 
MNIST  class_weight=None, criterion=’gini’, max_depth=None, max_features=None,  
KDDCup’99  max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None,  
Iris  Decision Tree  min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, 
Exoplanet  random_state=1234, splitter=’best’  
MNIST  C=1.0, cache_size=200, class_weight=None, coef0=0.0,  
KDDCup’99  decision_function_shape=’ovr’, degree=3, gamma=’auto’, kernel=’rbf’,  
Iris  SVM  max_iter=1, probability=False, random_state=None, shrinking=True, 
Exoplanet  tol=0.001, verbose=False  
MNIST  KNN  algorithm=’auto’, leaf_size=30, metric=’minkowski’, 
metric_params=None, n_jobs=1, n_neighbors= 5, p=2, weights=’uniform’  
KDDCup’99  KNN  algorithm=’auto’, leaf_size=30, metric=’minkowski’, 
metric_params=None, n_jobs=1, n_neighbors= 9, p=2, weights=’uniform’  
Iris  KNN  algorithm=’auto’, leaf_size=30, metric=’minkowski’, 
Exoplanet  metric_params=None, n_jobs=1, n_neighbors= 3, p=2, weights=’uniform’  
MNIST  number of neurons in 1st hidden layer = 784, activation=’relu’,  
2layer neural network  number of neurons in output layer = 10, activation = ’softmax’,  
loss=’categorical_crossentropy’, optimizer=’adam’  
KDDCup’99  number of neurons in 1st hidden layer = 41, activation=’relu’,  
2layer neural network  number of neurons in output layer = 9, activation = ’softmax’,  
loss=’categorical_crossentropy’, optimizer=’adam’  
Iris  number of neurons in 1st hidden layer = 4, activation=’relu’,  
2layer neural network  number of neurons in output layer = 3, activation = ’softmax’,  
loss=’categorical_crossentropy’, optimizer=’adam’  
Exoplanet  number of neurons in 1st hidden layer = 45, activation=’relu’,  
2layer neural network  number of neurons in output layer = 3, activation = ’softmax’,  
loss=’categorical_crossentropy’, optimizer=’adam’  
Exoplanet  number of neurons in 1st hidden layer = 42, activation=’relu’,  
with no  2layer neural network  number of neurons in output layer = 3, activation = ’softmax’, 
surface temperature  loss=’categorical_crossentropy’, optimizer=’adam’  
Exoplanet  number of neurons in 1st hidden layer = 6, activation=’relu’,  
with 6  2layer neural network  number of neurons in output layer = 3, activation = ’softmax’, 
restricted features  loss=’categorical_crossentropy’, optimizer=’adam’ 