SBAF: A New Activation Function for Artificial Neural Net based Habitability Classification

SBAF: A New Activation Function for Artificial Neural Net based Habitability Classification

Snehanshu Saha PES University South Campus Archana Mathur Indian Statistical Institute Kakoli Bora, Surbhi Agrawal PES University South Campus Suryoday Basak University of Texas at Arlington
Abstract

We explore the efficacy of using a novel activation function in Artificial Neural Networks (ANN) in characterizing exoplanets into different classes. We call this Saha-Bora Activation Function (SBAF) as the motivation is derived from long standing understanding of using advanced calculus in modeling habitability score of Exoplanets. The function is demonstrated to possess nice analytical properties and doesn’t seem to suffer from local oscillation problems. The manuscript presents the analytical properties of the activation function and the architecture implemented on the function.

keywords:
Astroinformatics, Machine Learning, Exoplanets, ANN, Activation Function.
journal: Astronomy and Computing\biboptions

authoryear

1 Introduction

For hundreds of years, astronomers and philosophers have considered the possibility that the Earth is a very rare case of a planet as it harbors life. This was partly due to the fact that after the initial missions exploring our neighbors Mars and Venus, no traces of life were found. However, over the past two decades, discoveries of exoplanets have poured in by the hundreds and the rate at which exoplanets are being discovered is increasing. The inference from this is that planets around stars are a rule rather than an exception with the actual number of planets exceeding the number of stars in our galaxy by orders of magnitude. In order to find interesting samples from the massive ongoing growth in the data, a sophisticated pipeline may be developed which can quickly and efficiently classify exoplanets based on habitability classes.

The process of discovery of exoplanets is rather complex, (Bains and Schulze-Makuch, 2016), as the size of exoplanets is small compared to other types of stellar objects such as stars, galaxies, quasars, etc. which can be discovered with greater ease. A very careful analysis of stellar signals is required to detect planetary samples. Some of the methods of detecting exoplanets include radial velocity based detections, gravitational lensing, etc. Imaging-based methods of discovery of exoplanets are not well developed yet and are at a rather controversial stage but could be more effective in exoplanet discovery with improvements. The data collected is imperfect and sometimes difficult to analyze with certainty. Given the rapid technological improvements and the accumulation of a large amount of data, it is pertinent to explore advanced methods of data analysis to rapidly classify planets into appropriate categories based on the physical characteristics.

There exist different approaches to solving the habitability problem. Explicit score computation, (Bora et al., 2016) giving rise to metrics is one way of addressing the issue. However, habitability is too complex a problem to be equated with Earth-similarity alone (Agrawal et al., 2018). Therefore, model based evaluations (Saha et al., 2018c) need to be synthesized with feature based classification (Basak et al., 2018).

Existing work on characterizing exoplanets are based on assigning habitability scores to each planet which allows for a quantitative comparison to Earth. The Earth Similarity Index, Biological Complexity Index and Planetary Habitability Index are distance-based metrics which gauge the similarity of a planet to that of Earth; the Cobb-Douglas Habitability Score (CDHS), Bora et al. (2016) makes use of econometric modeling to find the similarity of a planet to Earth. Recently, a collaborative effort between Google and NASA resulted in the discovery of two exoplanets. In Saha et.al. Saha et al. (2017), an advanced tree-based classifier, Gradient Boosted Decision Tree was used to classify Proxima b and planets in the TRAPPIST-1 system. The accuracies were nearly perfect, giving us the basis of exploring other machine classifiers for the task.

Remainder of the paper is organized as follows. A novel activation function to train an artificial neural network (ANN) is introduced. We discuss the theoretical nuances of such a function. In the next section, the back propagation mechanism with the relevant architecture is described paving the foundation for ANN based classification of exoplanets. We conclude by discussing the efficacy of the proposed method.

2 Saha-Bora Activation Function (SBAF) for a Neural Network

Neural networks (Lippmann, 1994), commonly known as Artificial Neural network(ANN), is a system of interconnected units organized in layers, which processes information signals by responding dynamically to inputs. Layers of the network are oriented in such a way that inputs are fed at input layer and output layer receives output after being processed at neurons of one or more hidden layers. Hidden layers consist of computing neurons that are connected to input and output layers through a system of weighted connections. The network has ability to learn from input patterns, whereby with every input fed to the network, weights are updated in such a way that the error between the desired and observed output is minimum. Hidden layers are equipped with a special function called activation function (Elfwing et al., 2018), (Saha et al., 2018a) to trigger neurons to process and propagate outputs across the network.
A special class of ANN called Back propagation (Younger et al., ) deals with computing the error between observed and desired output and later feeds this error back to the network with each cycle or ’epoch’. The weights are updated correspondingly and learning or training of the network is performed till the error is minimized.
Activation function acts as a functional mapping between inputs and outputs. It allows the network to learn and model complex dataset like audio, video and text. Most popular activation functions are Sigmoid, hyperbolic tangent and Relu.

The activation function is as follows:

(1)

From the definition of the function, we have:

(2)

Substituting Equation 2 in 1,

(3)

Remark: is the lineaar combination of surface temperature, called as input to the NN, and weights (normalized between and ) and is the complement of that, together explaining the perfect discrimination between habitability classes as explained in TSI (Basak et al., 2018). The motivation of SBAF is derived from this fact of TSI. Using shall maximize the width of the two separating hyperplanes in the SVM used in TSI (See the proof below) as the kernel has a global maxima when . This is equivalent to the CDHS formulation when CD-HPF is written as where , is suitably assumed to be (CRS condition), and the representation ensures global maxima (maximum width of the separating hyperplanes) under such constraints, Bora et al. (2016), Saha et al. (2017). The new activation function to be used for training a neural network for habitability classification boasts of an optima. Evidently, from the graphical simulations below, we observe less flattening of the function and therefore the formulation should be able to tackle local oscillations more easily as compared to the more generally used sigmoid function. Moreover, since , the variable term in the denominator of SBAF, may be approximated to a first order polynomial. This may help us in circumventing expensive floating point operations without compromising the precision.

Figure 1: Surface Plot of SBAF
Figure 2:
Figure 3:

2.1 Existence of Optima: Second order Differentiation of SBAF for Neural Network

From Equation 3 ,

Therefore,

(4)

Now, substituting 1 in 4 we get,

when ,

Clearly, the first derivative vanishes when , the derivative is positive when and is negative when (implying range of values for so that the function becomes increasing or decreasing, please see Eq. (3)). We need to determine the sign of the second derivative when to ascertain the condition of maxima (corresponding to maximum width of the separating hyperplane ensuring optimal discrimination between habitability classes). Assuming , the condition of optimality, , by construction lies between . Hence, ensuring maxima of .

3 Backpropagation with SBAF

The basic structure of the neural network consists of input layer, hidden layer and output layer. Let us assume the nodes at input layer are , , at hidden layer , and at output layer , .

3.1 Basic Structure

Goal: to optimize the weights so that the network can learn how to map from inputs to outputs.

3.2 The Forward Pass

Calculate the total input for .

Use SBAF to calculate the output for , .

Repeat the process for output layer neuron.

The outputs are

Calculating the errors,

3.3 The Backward Pass

Update the weights so that the actual output is closer to target output, thereby minimizing the error.

3.3.1 Output Layer

Consider : let’s find the gradient wrt , i.e., .

Calculate each component on the RHS one by one:

(5)

Using the SBAF

(6)

Finally,

(7)

Putting and and together in ,

(8)

where is the learning rate.
Likewise,

3.3.2 Hidden Layer

Consider

We need to find .
Apparently,

The chain rule says,

(1)
(2)

Computing all the components of equation ,

Similarly, computing all the components of ,

We know and .

Adding up everything,

Likewise, , , , and can be computed.

4 Discussion

  • is surface temperature (normalized between and ) and is the complement of that, together explaining the perfect discrimination between habitability classes as explained in our TSS above. The motivation of SBAF is derived from this fact of TSS. Using shall maximize the width of the two separating hyperplanes in the SVM used in TSS (See the proof below) as the kernel has a global maxima when . This is equivalent to the CDHS formulation when CD-HPF is written as where , is suitably assumed to be (CRS condition), and the representation ensures global maxima (maximum width of the separating hyperplanes) under such constraints (Bora et al., 2016; Saha et al., 2017).

  • The new activation function to be used for training a neural network for habitability classification boasts of an optima. Evidently, from the graphical simulations below, we observe less flattening of the function and therefore the formulation should be able to tackle local oscillations more easily as compared to the more generally used sigmoid function. Moreover, since , the variable term in the denominator of SBAF, may be approximated to a first order polynomial. This may help us in circumventing expensive floating point operations without compromising the precision.

  • Need to show that the maxima is unique in the defined interval. This will circumvent the local maxima problem.

Habitability classification is a complex task. Even though the literature is replete with rich and sophisticated methods using both supervised (Zighed et al., 2010) and unsupervised learning methods, the soft margin between classes, namely psychroplanet and mesoplanet makes the task of discrimination incredibly difficult. A sequence of recent explorations by Saha et. al. expanding previous work by Bora et. al. on using Machine Learning algorithm to construct and test planetary habitability functions with exoplanet data raises important questions. The 2018 paper ((Saha et al., 2017)) analyzed the elasticity of the Cobb-Douglas Habitability Score (CDHS) and compared its performance with other machine learning algorithms. They demonstrated the robustness of their methods to identify potentially habitable planets (Saha et al., 2018b) from exoplanet dataset. Given our little knowledge on exoplanets and habitability, these results and methods provide one important step toward automatically identifying objects of interest from large datasets by future ground and space observatories. The variable term in SBAF, is inspired from a history of modeling such terms as production functions and exploiting optimization principles in production economics, (Saha et al., 2016), (Ginde et al., 2016), (Ginde et al., 2015). Complexities/bias in data may often necessitate devising classification methods to mitigate class imbalance, (Mohanchandra et al., 2015) to improve upon the original method, (Vapnik and Chervonenkis, 1964), (Cortes and Vapnik, 1995) or manipulate confidence intervals (Khaidem et al., 2016). However, these improvisations led the authors to believe that, a general framework to train in forward and backward pass may turn out to be efficient. This is the primary reason to design a neural network with a novel activation function. We shall use the architecture to discriminate exoplanetary habitability (Schulze-Makuch and Bains, 2018), (Schulze-Makuch et al., 2011), (Irwin et al., 2014), (Shallue and Vanderburg, 2018), (Méndez, 2011), (Méndez, 2018).

References

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
202133
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description