Introduction to Gravitational Clustering
Abstract
The downfall of many supervised learning algorithms, such as neural networks, is the inherent need for a large amount of training data [1]. Although there is a lot of buzz about big data, there is still the problem of doing classification from a small dataset. Other methods such as support vector machines, although capable of dealing with few samples, are inherently binary classifiers [4], and are in need of learning strategies such as One vs All in the case of multiclassification. In the presence of a large number of classes this can become problematic. In this paper we present, a novel approach to supervised learning through the method of clustering. Unlike traditional methods such as KMeans [7], Gravitational Clustering does not require the initial number of clusters, and automatically builds the clusters, individual samples can be arbitrarily weighted and it requires only few samples while staying resilient to overfitting.
Shell : Bare Demo of IEEEtran.cls for Journals
1Introduction
The name of this algorithm is derived from the metaphor that the algorithm was built upon. Each cluster is symbolic of a planet,and each planet has a mass and a radius as well as the class that it represents. But unlike real life planets, our planets are static with respect to other planets. The process of training can be conceptually thought of as building a universe. The process of predicting is simply placing a mass in the universe and tracing what planet it will appear on.
This algorithm exhibits three nice properties:
Ability to learn from a few samples.
Ability to weight the importance of training vectors.
The nature of the algorithm makes it resilient to overfitting.
The ability to weight the importance of training vectors as well as the ability to learn from a few samples allows us to model a system that supports the notion of prototypes, e.g. Eleanor Rosch [5](P. 41).
2Definition
Let us start by mathematically defining what each one of our symbolic structures will be. The most important structure is our cluster or our planet. We will define the planet as containing a dynamic mass , dynamic radius , dynamic position and a static class . Mathematically:
Our universe will simply consist of a set of planets. The universe will also hold a couple of global constants. The initial radius of a planet that has just been created which we will denote with . The so called percent step, which represents the amount a test mass moves before recalculating the new forces on the test mass. We will denote this with the Greek . The amount of steps taken or iterations will be denoted with . The distance between planets will be calculated with the function denoted .
3Training Model
One of the better aspects of the model is its ability to rate your feature vectors. To do so, let us define a hybrid feature vector .
The variable allows us to rate the value of the feature vector. For example if you have a probabilistic diagnosis, each feature vector will contain the class of the diagnosis as well as the probability of the diagnosis represented by the mass. The training is quite simple. Below is the pseudocode.
The new position is a weighted sum of the two position vectors with respect to their weight.
3.1Asymptotic Analysis
Our training simply traverses through all of the planets in the universe and computes the distance from the training sample. Saying is the amount of planets and is the dimensionality of our feature vectors. Assuming that the planet exists, we get
Using a KDTree [2] will allow us to train with the average asymptotic of
On the flip side, assuming we have to add the planet:
KDTree [2]
This is the asymptotic of adding a single train vector. Stating that is the number of samples we end up with the final equation being.
3.2Comparison of Training Times

KMeans  SVM  Decision Trees  

Big O  

Yes  Yes  No  Partial  

Yes  No  No  No  
is synonymous with
is synonymous with
4Simulation Testing Model
Metaphorically, predicting the class of a new point is equivalent to dropping a piece of mass into the universe and tracing the mass until it collides with a planet. In this metaphor, we assume that the planets are infinitely small and therefore there will be no interference. Our test point will simply be defined as Let us first define getting the normalized directional force vector. Recall from physics that the gravitational force between two planets is
In our case, we will assume that the mass of each test point is equal to every other, therefore we can disregard the mass. We can also remove the constant. Our hybrid force equation per planet is now:
Where r is . We define the total normalized force on our test mass with the custom equation.
To restate, is the percent step taken with respect to the force. Now let us describe the simulation algorithm:
4.1Asymptotic Analysis of Simulation Testing Model
Let us state that is the number of planets and is the dimensionality of our feature vector. Calculating the force takes up
The 4 comes from the vector arithmetic that needed to be done. One subtraction, one multiplication, one distance squared, one division. The N term came from the summation. The total simulation next becomes.
The 3 more D terms come from: finding the magnitude, multiplying by force (simultaneously multiplying by ) and the update summation. The next N came from finding the planets with the radius containing pos. We can disregard the final if statement since they do not directly affect N. We get:
5Probabilistic NonSimulating Model
We propose an different method of computing the class of the test point, without the need of simulation and through purely statistical methods. We first make an assumption that a planet or cluster is normally distributed from the center and the standard deviation is some function of the radius of the planet . Therefore let us the define the probability density function.
Now to define our prediction equation:
To account for the fact that different classes have different amounts of planets, we will transform this function into:
We removed the normalization constant, due to the fact that this is a relative measure. The bottom of the fraction is the number of planets per class which insures that there is no bias due to the different amounts of clusters with varying radius’s. The mass term is added to insure that greater planets have a greater impact on the rating.
Through trial and error we found the best function for was simply .
The asymptotic will simply be
6Testing Results
We tested the algorithm out on the Wisconsin breast cancer dataset [9] [6]. Below are the results.
Gravitational Clustering 




Simulated Model  89.65%  90.59%  
Probabilistic Model  92.78%  72.41%  
It is interesting to note that the larger the clusters and smaller the amount of clusters the less accurate the probabilistic model will be. Unless of course the clusters perfectly model the data that they encapsulate.
We continued our testing by comparing the outputs of some popular out of the box methods. All the other algorithms were implemented in the scikitlearn library [8]. The datasets we used were the popular Iris dataset [6], digits dataset[8], Ollivetti dataset [3].




GC Prob  GC Sim 




Iris  98.41%  96.82%  94.66%  97.33%  96%  
Digits  86.95%  91.04%  98.99%  25.61%  83.85%  
Olivetti  65.5%  77.5%  7.5%  8.5%  99.5%  
To show that our algorithm can handle very few samples, we tested the following datasets again, but this time we only used 1 sample per each class as the training data. Below are the results.
Accuracy Per Dataset  


GC Prob  GC Sim  
Iris  93.33%  92.00%  
Digits  59.96%  58.18%  
Olivetti  63.5%  53.75%  
7Conclusion
In this paper we introduced a novel technique to clustering and supervised learning that can learn from a few samples, while maintaining a low asymptotic runtime and inherently allowing for arbitrary sample weighting. We compared it to current techniques for classification and showed both the strengths of the algorithm as well as the weaknesses. From the test results we can infer that our algorithm acts consistently in both low and high dimensional data, as well as staying consistent in a range of multiclass datasets. All the code written, including the tests and the algorithm itself can be found on https://github.com/ArmenAg/GravitationalClustering/
Thank you for reading.
0
Armen Aghajanyan was born in Yerevan,Armenia in the year 1997.He is currently a junior at Interlake High school in Washington State.He began programming at the age of 12, and just two years ago he became obsessed with the ﬁeld of Machine Learning. Currently he holds a job at American Choice Modeling as a Artificial Intelligence and Machine Learning developer.
References
 Conjugategradient neural networks in classification of multisource and veryhighdimensional remote sensing data.
Benediktsson, J. A., SWAIN, P. H., and ERSOY, O. K. (1993). International Journal of Remote Sensing, 14(15):2883–2903.  Multidimensional binary search trees used for associative searching.
Bentley, J. L. (1975). Commun. ACM, 18(9):509–517.  Hidden markov models for recognition using artificial neural networks.
Bevilacqua, V., Mastronardi, G., Pedone, A., Romanazzi, G., and Daleno, D. (2006). 4113:126–134.  Supportvector networks.
Cortes, C. and Vapnik, V. (1995). Machine Learning, 20(3):273–297.  Women, Fire, and Dangerous Things.
Lakoff, G. (1987). The University of Chicago Press.  UCI machine learning repository.
Lichman, M. (2013).  Some methods for classification and analysis of multivariate observations.
MacQueen, J. (1967). pages 281–297.  Scikitlearn: Machine learning in Python.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Journal of Machine Learning Research, 12:2825–2830.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology,.
Wolberg, W. and Mangasarian, O. (1990). pages 9193–9196.