sDRN: Stabilized Developmental Resonance Network
Abstract
Online incremental clustering of sequentially incoming data without prior knowledge suffers from changing cluster numbers and tends to fall into local extrema according to given data order. To overcome these limitations, we propose a stabilized developmental resonance network (sDRN). First, we analyze the instability of the conventional choice function during node activation process and design a scalable activation function to make clustering performance stable over all input data scales. Next, we devise three criteria for the node grouping algorithm: distance, intersection over union (IoU) and size criteria. The proposed node grouping algorithm effectively excludes unnecessary clusters from incrementally created clusters, diminishes the performance dependency on vigilance parameters and makes the clustering process robust. To verify the performance of the proposed sDRN model, comparative studies are conducted on six realworld datasets whose statistical characteristics are distinctive. The comparative studies demonstrate the proposed sDRN outperforms baselines in terms of stability and accuracy.
I Introduction
Clustering, one of unsupervised learning algorithms, aims to group data instances into a number of categories. Clustering algorithms allow the analysis of data characteristics without prior knowledge, which can be applied to memory design [15, 10, 13, 7]. Clustering includes two main types of approaches: 1) batch learning and 2) online learning. The batch learning approaches, whose representative algorithms include means [4] and GMM [14], are straightforward and simple to implement. However, they, in general, require a predefined cluster number from the user and all the training data to be given in advance. These features limit the application of batch learning algorithms in realworld applications where data are observed sequentially and continuously.
On the other hand, online learning approaches can handle the varying number of clusters and incrementally process continuous data. Thus, we focus on developing an effective online incremental clustering algorithm in this paper. Previous online learning approaches such as distance metric learning (DML) [11] and selforganizing incremental neural network (SOINN) [16] memorize all the given input and processing each input instance takes computation. Fusion adaptive resonance theory (ART) [3] and Fuzzy ART [6] networks are efficient in the perspective of computation and memory usage, but they demand inputs to be normalized in the range of [0, 1] and the problem of node proliferation lingers [2]. Developmental resonance network (DRN) [12] has attempted to solve the two limitations, although its remedy for the normalization problem works for a certain range of input and it suffers from an inefficient grouping algorithm which is to solve the node proliferation problem.
To overcome the limitations mentioned above, we propose a stabilized developmental resonance network (sDRN)
Next, we design a node grouping algorithm to alleviate the node proliferation problem. Since DRN and sDRN allow unrestricted input scales, they cannot employ the complement coding scheme to prevent node proliferation and a node grouping algorithm for inhibiting node proliferation is essential. Three criteria, distance, intersection over union (IoU) and size criteria, are devised for the node grouping algorithm to effectively exclude unnecessary clusters from incrementally created clusters. In particular, we define and formulate the concept of IoU criterion for the node grouping algorithm. With the proposed IoU criterion, the node grouping algorithm becomes both scalable and stable in that the performance dependency on the vigilance parameter decreases. The proposed node grouping algorithm of sDRN is computationally more efficient than that of DRN and sDRN displays more effective clustering performance than conventional methods due to the proposed node grouping algorithm.
The remainder of this paper is structured as follows. Section II summarizes DRN as a preliminary. Section III proposes the sDRN model and Section IV presents the experiment results with a thorough analysis. Concluding remarks follow in Section V.
Ii Developmental Resonance Network
In this section, we briefly delineate the computation flow of the DRN model as a preliminary.
Iia Global Weight Update
DRN utilizes a global weight vector (, where is the number of channels and ) to cope with unknown scales of multichannel inputs, which gets updated as follows:
(1) 
where is the th step input of the th channel and is the learning rate of .
IiB Node Activation
The input activates the th node as follows:
(2) 
where is a contribution parameter and is a slope parameter, is the choice function that normalizes the activation value to , and is the distance between and the weight vector .
IiC Template Matching
The template matching process identifies if the node with the largest activation value (say th node) resonates with the activity vector . First, the ratio between the two vectors and for each element ( and is the dimension of the th channel) is calculated using the global diagonal vector of the th channel and the decision diagonal vector of the th node.
Then, the resonance condition is defined as
(3) 
where is a resonance value, = , and is a vigilance parameter.
IiD Template Learning
If the th node has resonated in the template matching process, the weight gets updated by
(4) 
where is the learning rate of the th channel.
IiE Node Selection and Connection
After the node activation process, DRN selects the largest nodes and connects them to improve the efficiency of the following grouping process. connections are created if has updated or connections if has created a new node. Moreover, DRN defines the center point vector for the th node and the concept of synaptic strength that represents the strength of the connection between th and th nodes.
IiF Node Grouping
In the final computation step, DRN iterates through the connected nodes in the order of the synaptic strength. DRN groups a pair of nodes if the two nodes in the pair resonate satisfying the condition in (3) and stops iteration.
Iii sDRN
In this section, we describe the proposed sDRN, summarized as Algorithm 1, with the proposed activation function for scalability and the node grouping algorithm for stability. Moreover, we analyze the computational efficiency of the proposed sDRN.
Iiia Scalability
First, we analyze the normalization problem that remains with (2). For the exponential function to perform as a distance normalization function, it should satisfy the following condition:
(5) 
where is the minimum value a processor supports. (5) reduces to
(6) 
As modern 64bit processors support [5] and represents a commonlyused value set, the distance should be approximately less than 10,000 for the conventional activation function to perform normally. Otherwise, the clustering performance degrades dramatically (Fig. 1)
To overcome the limitation and the normalization problem, we propose a scalable activation function as follows.
(7) 
With the proposed activation function, sDRN can handle all scales of input since is invariably satisfied.
IiiB Node Grouping
We propose a node grouping algorithm to mitigate the performance instability attributed to data input order and the dependency on vigilance parameters. The proposed node grouping process compares the activated cluster with nearby clusters when an input vector arrives and groups a pair if two clusters in the pair satisfy three criteria: distance, IoU and size criteria. The three conditions are examined over all channels and all the channels should satisfy each condition for the examination of the next condition.
For the formulation of the criteria, let and denote a pair of neighboring clusters (Fig. 2). The weight vectors representing each cluster for the th channel are:
(8) 
where is the number of channels and semicolon represents concatenation. We define the distance vector between a pair of clusters as
(9) 
where each element of the vector is defined as
(10) 
where operator chooses the minimum element of a vector. The proposed distance criterion is
(11) 
where operator chooses the maximum element of a vector and is a vigilance parameter for the template matching. Note that in sDRN, is used instead of due to unnecessity of vigilance parameter for each channel.
We propose the IoU criterion since the distance criterion can become loose and combine all the clusters when a low valued vigilance parameter is used. The IoU criterion tests if the hypothetically grouped cluster could encompass the two compared clusters with the least extension. This guarantees the grouped cluster does not occupy uninvestigated feature space substantially. The below represents the hypothetically grouped cluster for the th channel:
(12) 
For each category cluster, we define the volume of the th channel as
(13) 
Next, we define the IoU criterion for the th channel as
(14) 
where determines the final threshold for the grouping process. The range of IoU value is in [0,2] and we set as 0.85.
The size criterion limits the maximum size of a category cluster. Excessively large clusters resulted from node grouping hinder the normal template matching process. Thus, we limit the size of a cluster. The maximum size of the th cluster for the th channel () is limited to , which is congruent to (3).
IiiC Computational Efficiency
The computational complexity of fusion ART on which DRN is based is , where is the number of categories, is the dimension of the input, and is the number of data samples. With its grouping algorithm, the computational complexity of DRN becomes
(15) 
where and are the average numbers of global weight updates and connected category pairs, respectively.
On the other hand, the computational complexity of sDRN is
(16) 
The increase of computation with sDRN is minuscule compared to that of DRN which is .
Iv Experiments
In this section, we illustrate the experiment setting for performance verification and establish the effectiveness of the proposed sDRN model.
Iva Experiment Setting
Datasets
We retrieved six realworld benchmark datasets from the UCI machine learning repository
Metrics
For quantitative analysis, we employed three performance metrics. First, DaviesBouldin index (DBI) [1] estimates the ratio of withincluster scatter to betweencluster separation as follows:
(17) 
where is the cluster number, is the center point of cluster , is the average distance of every element in a cluster to and is the distance between and . The lower value of DBI indicates higher clustering performance.
Next, clustering purity (CP) [9] matches each output cluster to the groundtruth cluster as follows:
(18) 
where is the set of clusters and is the set of groundtruth classes. Since a large number of clusters can bias CP, we complemented CP with normalized mutual information (NMI) [8] which is defined as
(19) 
where is entropy and is mutual information between and . Both CP and NMI lie in the range [0, 1], where a larger value implies higher performance.
Baseline
For comparative studies, we employed three baseline algorithms: means [4], GMM [14] and DRN [12]. means and GMM are two representative batchbased clustering algorithms and the number of clusters should be given in advance. On the other hand, DRN and sDRN are online learning algorithms and the number of clusters increases in an incremental manner.
Implementation Detail
To reduce the effect of randomness, we conducted each experiment 100 times and report the average and the standard deviation of each metric. In addition, each experiment received the data instances in different orders. For means and GMM, we split the datasets into train and test sets with the ratio of 5:5. We set the ratio, which showed the best performance for means and GMM after sweeping the ratio from 1:9 to 9:1. Moreover, we provided means and GMM with the groundtruth cluster numbers.
For DRN and sDRN, we sequentially input data instances and did not provide the groundtruth cluster numbers. We set one vigilance parameter, for both DRN and sDRN. Parameters were obtained using the follow metric:
(20) 
where , and are reciprocals of standard deviations of DBI, CP and NMI, respectively. We swept the vigilance from 0.1 to 0.9 and found the best value (0.7 and 0.5 for DRN and sDRN, respectively) according to (20). We use one vigilance parameter since vigilance parameter cannot be finetuned in the realworld setting. In the realworld setting, no prior knowledge of dataset is given and data instances come sequentially.
IvB Results and Analysis



Algorithm  Balance Scale  Liver Disorder  Blood Transfusion  
DBI  CP  NMI  DBI  CP  NMI  DBI  CP  NMI  
means  1.6059  0.6805  0.1592  1.1229  0.3526  0.1483  0.4588  0.7620  0.0535 
(0.0266)  (0.0198)  (0.0510)  (0.1690)  (0.0176)  (0.0121)  (0.0416)  (0.0146)  (0.0048)  
GMM  1.6562  0.6899  0.1383  1.7940  0.3364  0.1069  0.8218  0.7541  0.0130 
(0.0475)  (0.0265)  (0.0265)  (0.3177)  (0.0025)  (0.0180)  (0.0065)  (0.0004)  (0.0011)  
DRN  1.3065  0.6734  0.1459  1.0624  0.3426  0.0594  0.5851  0.7645  0.0264 
(0.1546)  (0.0425)  (0.0280)  (0.5307)  (0.0037)  (0.0190)  (0.1452)  (0.0029)  (0.0081)  
sDRN  1.0707  0.8137  0.2572  0.6951  0.3642  0.1309  0.4802  0.7679  0.0306 
(0.0622)  (0.0230)  (0.0185)  (0.1266)  (0.0134)  (0.0341)  (0.0609)  (0.0010)  (0.0029)  
Algorithm  Banknote  Car Evaluation  Wholesale Customers  
DBI  CP  NMI  DBI  CP  NMI  DBI  CP  NMI  
means  0.9871  0.7653  0.1976  1.8841  0.6944  0.1742  0.7585  0.5818  0.1447 
(0.0570)  (0.0369)  (0.0560)  (0.0797)  (0.0120)  (0.0368)  (0.1000)  (0.0295)  (0.0141)  
GMM  1.5829  0.7754  0.2861  1.9428  0.6963  0.1492  1.7571  0.5674  0.2075 
(0.3689)  (0.1465)  (0.2059)  (0.2084)  (0.0032)  (0.0374)  (0.3343)  (0.0174)  (0.0182)  
DRN  0.9437  0.6040  0.0525  1.6907  0.7145  0.1404  2.3920  0.5065  0.0698 
(0.2176)  (0.0341)  (0.0459)  (0.2976)  (0.0176)  (0.0397)  (0.8174)  (0.0357)  (0.0556)  
sDRN  0.8734  0.7088  0.1630  1.1625  0.8041  0.2246  0.3720  0.4975  0.0748 
(0.2124)  (0.0633)  (0.0648)  (0.0571)  (0.0179)  (0.0148)  (0.1930)  (0.0170)  (0.0246)  

Table I summarizes the results of comparative studies. sDRN consistently displays superior performance over all six datasets achieving small values for DBI and large values for CP and NMI. We note that sDRN outperforms means and GMM on average although means and GMM were given the groundtruth cluster numbers and half of each dataset was given as a training set. The comparative studies corroborate that sDRN guarantees satisfactory clustering performance in an online incremental manner compared to batchbased clustering algorithms. Moreover, the performance of sDRN surpasses that of DRN over all six datasets, which verifies the effectiveness of the proposed node grouping algorithm.
Particularly, the performance gap between DRN and sDRN is the largest for the wholesale customer dataset. The large input scale of the dataset interrupts DRN’s activation function and its performance deteriorates sharply. The result of the wholesale customer dataset confirms that the proposed activation function truly resolves the normalization problem. Fig. 3 further investigates the effect of input scale on the clustering performance. We tested each algorithm on the liver disorder dataset and varied the scale from to . The effect of input scale on other algorithms including sDRN is insignificant while the performance of DRN gets sensitively affected.
Fig. 4 illustrates the effect of the vigilance parameter on clustering performance for DRN and sDRN. For all six datasets, we varied the vigilance parameter from 0.1 to 0.9 and observed the performance variation in DBI. As the figure exhibits, the clustering performance of sDRN is stable over all vigilance values in all six datasets. However, the clustering performance of DRN strongly depends on the value of the vigilance parameter. For quantitative analysis, we report the averages of standard deviations of DBI scores for DRN and sDRN, which are 0.307 and 0.143, respectively.
V Conclusion
In this paper, we proposed a resonancebased online incremental clustering network, sDRN, which is a stabilized model of DRN. The proposed sDRN model resolves the normalization problem remaining in conventional methods with the proposed activation function. Thus, sDRN can effectively handle all input scales. Moreover, sDRN equipped with the proposed node grouping algorithm becomes robust to variation of vigilance parameter, and the need for finetuning vigilance parameter disappears. In addition, the clustering performance improves with the proposed node grouping algorithm. A thorough examination of sDRN through experiments on six realworld benchmark datasets established the effectiveness of sDRN. We expect sDRN can be applied to various realworld settings where no prior knowledge on sequentially incoming data is given.
InUg Yoon received the M.S. and B.S. degrees in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2018 and 2016, respectively. He is currently pursuing the Ph.D. degree at KAIST. His current research interests include anomaly detetion, learning algorithms and computational memory systems. 
UeHwan Kim received the M.S. and B.S. degrees in Electrical Engineering from KAIST, Daejeon, Korea, in 2015 and 2013, respectively. He is currently pursuing the Ph.D. degree at KAIST. His current research interests include visual perception, service robot, cognitive IoT, computational memory systems, and learning algorithms. 
JongHwan Kim (F’09) received the Ph.D. degree in electronics engineering from Seoul National University, Korea, in 1987. Since 1988, he has been with the School of Electrical Engineering, KAIST, Korea, where he is leading the Robot Intelligence Technology Laboratory as KT Endowed Chair Professor. Dr. Kim is the Director for both of KoYoungKAIST AI Joint Research Center and Machine Intelligence and Robotics MultiSponsored Research and Education Platform. His research interests include intelligence technology, machine intelligence learning, and AI robots. He has authored 5 books and 5 edited books, 2 journal special issues and around 400 refereed papers in technical journals and conference proceedings. 
Footnotes
 Source code available at https://github.com/Uehwan/IncrementalLearning
 https://archive.ics.uci.edu/ml/index.php
References
 (2012) Automatic feature selection for bci: an analysis using the daviesbouldin index and extreme learning machines. In The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. Cited by: §IVA2.
 (2000) General fuzzy minmax neural network for clustering and classification. IEEE Transactions on Neural Networks 11 (3), pp. 769–783. Cited by: §I.
 (2013) Adaptive resonance theory: how a brain learns to consciously attend, learn, and recognize a changing world. Neural Networks 37, pp. 1–47. Cited by: §I.
 (2013) Extensions of kmeanstype algorithms: a new clustering framework by integrating intracluster compactness and intercluster separation. IEEE Transactions on Neural Networks and Learning Systems 25 (8), pp. 1433–1446. Cited by: §I, §IVA3.
 (1996) IEEE standard 754 for binary floatingpoint arithmetic. Lecture Notes on the Status of IEEE 754 (947201776), pp. 11. Cited by: §IIIA.
 (2010) The fuzzy art algorithm: a categorization method for supplier evaluation and selection. Expert Systems with Applications 37 (2), pp. 1235–1240. Cited by: §I.
 (Early Access, 2018) A stabilized feedback episodic memory (sfem) and home service provision framework for robot and iot collaboration. IEEE Transactions on Cybernetics. Cited by: §I.
 (2006) Normalized mutual information based registration using kmeans clustering and shading correction. Medical Image Analysis 10 (3), pp. 432–439. Cited by: §IVA2.
 (2019) Clustergan: latent space clustering in generative adversarial networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 4610–4617. Cited by: §IVA2.
 (2017) User preferencebased dualmemory neural model with memory consolidation approach. IEEE Transactions on Neural Networks and Learning Systems 29 (6), pp. 2294–2308. Cited by: §I.
 (2019) Kernelbased distance metric learning for supervised kmeans clustering. IEEE Transactions on Neural Networks and Learning Systems 30 (10), pp. 3084–3095. Cited by: §I.
 (201904) Developmental resonance network. IEEE Transactions on Neural Networks and Learning Systems 30 (4), pp. 1278–1284. External Links: Document, ISSN 2162237X Cited by: §I, §IVA3.
 (2017) Deep art neural model for biologically inspired episodic memory and its application to task performance of robots. IEEE Transactions on Cybernetics 48 (6), pp. 1786–1799. Cited by: §I.
 (2013) Approximating gaussian mixture model or radial basis function network with multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems 24 (7), pp. 1161–1166. Cited by: §I, §IVA3.
 (2012) Neural modeling of episodic memory: encoding, retrieval, and forgetting. IEEE Transactions on Neural Networks and Learning Systems 23 (10), pp. 1574–1586. Cited by: §I.
 (2019) Online topology learning by a gaussian membershipbased selforganizing incremental neural network.. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §I.