# A Novel Adaptive Kernel for the RBF Neural Networks

###### Abstract

In this paper, we propose a novel adaptive kernel for the radial basis function (RBF) neural networks. The proposed kernel adaptively fuses the Euclidean and cosine distance measures to exploit the reciprocating properties of the two. The proposed framework dynamically adapts the weights of the participating kernels using the gradient descent method thereby alleviating the need for predetermined weights. The proposed method is shown to outperform the manual fusion of the kernels on three major problems of estimation namely nonlinear system identification, pattern classification and function approximation.

## I Introduction

The RBF neural networks have shown excellent performance in a number of problems of practical interest. In [2] the reservoirs of brine are analyzed for physicochemical properties using the RBF neural networks with the genetic algorithms. The proposed model is called the GA-RBF model and has shown good results compared to the previous approaches. In [3] the RBF kernel is used to predict the pressure gradient with high accuracy. In the context of nuclear physics, RBF has been effectively used to model the stopping power data of materials as in [4]. A comprehensive discussion of various applications can be found in [5].

In the recent years, considerable advancement has been made in the field. In [6] a couple of new RBF construction algorithms are proposed with the aim of increasing error convergence rates with fewer computational nodes. The first method expands popular Incremental Extreme Learning Machine algorithms by adding Nelder-Mead simplex optimization. The second algorithm uses Levenberg-Marquardt algorithm to optimize the positions and heights of RBF. The results have shown better error performance compared to the previous research. A new architecture of the optimized RBF neural network classifier is developed with the aid of fuzzy clustering and data preprocessing techniques in [7]. In [8] a bee-inspired algorithm, called cOptBees, has been used with heuristics to automatically select the number, location and dispersions of basis functions to be used in the RBF networks. The resultant BeeRBF is shown to be competitive and has the advantage of automatically determining the number of centers. To accelerate the learning for the large-scale data sequence an incremental learning algorithm is proposed in [9]. The merits of fuzzy and crisp clustering are effectively combined in [10].

In [11] orthogonal least-square based alternative learning procedure is proposed. In the algorithm, the centers of the RBF are selected one by one in a rational way until an adequate network has been constructed. In [12] a novel RBF network with the multi-kernel is proposed to obtain an optimized and flexible regression model. The unknown centres of the multi-kernels are determined by an improved k-means clustering algorithm. An orthogonal least squares (OLS) algorithm is used to determine the remaining parameters. Another learning algorithm proposed in [13] simplifies the neural network training through the use of an adaptive computation algorithm (ACA). The convergence of the ACA is analyzed by the Lyapunov criterion. In [14] a sequential framework Meta-Cognitive Radial Basis Function network (McRBFN) and its Projection based Learning (PBL) referred to as PBL-McRBFN is proposed. The PBL-McRBFN is inspired by human meta-cognitive learning principles. The proposed algorithm is evaluated on two practical problems namely, the acoustic emission signal classification and the mammogram for cancer classification. In [15] a non-parametric supervised classifier based on neural networks is proposed and is referred to as Self Adaptive Growing Neural Network (SAGNN). The SAGNN allows a neural network to adapt its size and structure according to the training data. The performance of the method is evaluated for fault diagnosis and compared with various non-parametric supervised neural networks. A hybrid optimization strategy is proposed in [16] by incorporating the adaptive optimization of particle swarm optimization (PSO) into a genetic algorithm (GA), named the HPSOGA. The proposed strategy is used for determining the parameters of radial basis function neural networks automatically (e.g., the number of neurons and their respective centers and radii).

Essentially the architecture of RBF networks consists of three layers: (1) an input layer, (2) a nonlinear hidden layer, and (3) a linear output layer, refer to Figure 1. Let be the input vector, then the overall mapping of the RBF network, , is given as:

(1) |

where is the number of neurons in the hidden layer, are the centers of the RBF network, are the synaptic weights connecting the hidden layer to output neuron, is the bias term of the output neuron and is the basis function of the hidden neuron. Without loss of generality and for simplicity a single output neuron is considered. Conventional RBF networks employ a number of kernels such as multiquadrics, inverse multiquadrics and Gaussian ([17]). The Gaussian kernel, due to its versatility, is considered to be the most commonly used kernel ([18]):

(2) |

where is the spread of the Gaussian kernel. In one form or the other, the kernels use the concept of distance measure with the centers of the network. Conventionally, the Euclidean distance has been used as an efficient distance metric. Recently, it has been argued that the cosine distance metric has some complimentary properties to offer compared to the Euclidean distance measure ([19]):

(3) |

where the term , a very small constant, is added to the denominator to avoid the indeterminant form of (3) in case or is zero. Accordingly a novel kernel has been proposed to fuse the cosine and Euclidean distances ([19]):

(4) |

where and are the cosine and Euclidean kernels respectively with corresponding fusion weights and .

Harnessing the distinctive properties of the cosine and Euclidean kernels, the formulation in (4) has shown some good results compared to the conventional Euclidean kernel ([19]). We however argue that the fusion of the two kernels is manual and the weights and are adjusted in a hit-and-trial manner. Without any prior information, a common practice is to assign equal weights to the two kernels i.e. . As such, there is no dynamic method of optimizing these weights for a given data set. We therefore propose a novel framework to adaptively optimize the weight assignment using the steepest descent method ([20]).

## Ii Proposed Method

We consider and in (4) to be dynamically adaptive variables:

(5) |

(6) |

where the normalization of the mixing weights ensures that . The new kernel is therefore defined as:

(7) |

The overall mapping, at the learning iteration linked to a specific epoch, can now be written as:

(8) |

where the synaptic weights and bias are adapted at each iteration. We define a cost function as:

(9) |

where is the desired output at the iteration and the instantaneous error between the desired output and the actual output of the neuron . The update rule for the kernel’s weight is given by:

(10) |

Using the chain rule of differentiation for the cost function in (9) yields:

(11) |

which upon the simplification of the partial derivatives in (11) results in:

(12) |

(13) |

Similarly the update rule for can be shown to be:

(14) |

The update equations of the weight and bias are given as:

(15) |

(16) |

The proposed approach is dynamic and does not require prior assignment of the weights for the participating kernels.

## Iii Experimental Results

The proposed novel kernel for the RBF is evaluated for three important tasks: (1) nonlinear system identification, (2) pattern recognition, and (3) function approximation. All the experiments were conducted using Matlab on an Intel(R) Core(TM) i7-3770 CPU @ 3.4GHz machine with 4GB memory.

### Iii-a Nonlinear System Identification

Complex control systems and industrial processes can be effectively modeled using nonlinear systems ([21]). Nonlinear system identification is a method for estimating the mathematical model of a nonlinear system using the inputs and outputs to the system. RBF neural networks have shown to achieve good performance in this context ([22, 23, 24]). To evaluate the efficacy of the proposed novel kernel, we consider a highly nonlinear system, shown in Figure 2:

(17) |

where and are the input and output of the system respectively, is the disturbance modeled assumed to be , s are the polynomial coefficients describing the zeros of the system and is a constant. For the purpose of this experiment is taken to be a step function. In Figure 2, the system is defined by its impulse response while and are the estimated output, estimated impulse response and the error of estimation respectively. The simulation parameters chosen for the experiments are: =2, =-0.5, =-0.1, =-0.7, and =0.0025.

For the RBF structure, the number of neurons were selected to be 401 and the centers were uniformly spaced between -50 to 50 with a step size of 0.25. The initial weights and bias values were initialised to zero. For the Gaussian kernel the spread was set to 0.1 and for the cosine kernel a small value of was used. For the proposed approach, the initial values of and are taken to be 0.5. Figure 3 shows the estimated output of the proposed approach compared to the actual output, the Euclidean kernel (), cosine kernel () and the manual fusion of the two kernels (). Note that due to the most precise estimation, the Euclidean kernel overlaps the actual output and therefore cannot be distinguished. The mean square error (MSE) curves are depicted in Figure 4. The Euclidean kernel produces the best performance achieving a minimum MSE of -6.1943 dB in 1379 iteration epochs, while the cosine kernel performs poorly with a MSE of 2.7887 dB. Without any prior information, the proposed approach dynamically gives more weight to the Euclidean kernel, attaining a minimum mean square error (MSE) of -6.1547 dB in 1447 iterations which is quite comparable to the Euclidean kernel. The final values of the weights were found to be and . The proposed approach is substantially better compared to the manual fusion of kernels which achieved a minimum mean square error (MSE) of -5.5176 dB in 1992 iterations. Variation of the mixing parameters with respect to the iteration epochs is depicted in Figure 5. For the comparison of time complexity of the proposed method with manual fusion of the two kernels, we investigated the training time for 2000 epochs. The proposed method utilizes 550.78 seconds whereas the manual fusion of the two kernel takes 537.74 seconds. The experiment clearly shows that in the absence of any prior knowledge, the proposed approach adaptively emphasizes the effective Euclidean kernel and achieves a comparable performance.

### Iii-B Pattern Classification

Machine learning methods have been used with great success in bioinformatics ([25]). One of the important applications is the prediction of cancer using gene micro array data. In this experiment we target the prediction of leukemia disease using the standard Leukemia ALL/AML data ([26]). The data set consists of 38 training samples from bone marrow specimens (27 ALL and 11 AML) and 34 testing samples. There are 34 test samples (20 ALL and 14 AML) prepared under different experimental conditions including 24 bone marrow and 10 blood sample specimens. The data set consists of 7129 genes. The Minimum Redundancy and Maximum Relevance (mRMR) is an established technique to select the most significant genes ([25]). The mRMR technique was used to select only the top five genes for our experiments. For the RBF structure, the number of neurons were selected to be 38 and the centers were chosen using the subtractive clustering method of [27] with an influence factor of 0.1. The initial weights and bias values were initialised to zero. For the Gaussian kernel the spread was set to 0.2 and for the cosine kernel a small value of was used. For the proposed approach, the initial values of and are taken to be 0.5. For the training phase, the MSE curves of different approaches are shown in Figure 6.

The Euclidean kernel outperforms the cosine kernel achieving a minimum MSE of -279.9331 dB. The proposed method dynamically gives more weight to the Euclidean kernel achieving an MSE of -122.4990 dB with and . Note that although the Euclidean kernel achieves the minimum MSE for the training data, it is merely the case of overfitting where a classifier achieves the best performance on the training set but fails on the test data. Variation of the mixing parameters with respect to epochs is depicted in Figure 7. In Figure 6, after the epoch the MSE of Euclidean kernel becomes lower than the cosine kernel, noteworthy is the corresponding flip in the weights adaptively assigned by the proposed approach in Figure 7. Note that the weights become stable after 400 epochs. The manual fusion of the two kernels () results in an MSE of -73.6652 dB which is inferior to the proposed method. The training accuracies of all the approaches are presented in Table I, note that all approaches result in 100% accuracy for the training samples. The total training time for the proposed method is found to be 12.98 seconds whereas the manual fusion of the two kernel takes 12.65 seconds.

Approach | Training Accuracy | Testing Accuracy |
---|---|---|

Cosine kernel | 100.00% | 94.12% |

Euclidean kernel | 100.00% | 58.82% |

Manual fusion of the two kernels | 100.00% | 94.12% |

Proposed dynamic fusion | 100.00% | 97.06% |

True evaluation of any predictive system is for the case of unseen samples i.e the “testing phase”. Although the Euclidean kernel achieves the minimum MSE during the training phase, the proposed approach demonstrated that the best performance for the testing stage is achieved with an accuracy of 97.06%. The Euclidean kernel was trained “too well” on the training samples and therefore incurred the problem of “overfitting” attaining a test accuracy of only 58.82%. The proposed dynamic fusion of the two kernels outperformed the manual fusion () by a margin of 2.94%.

We provide an intuitive understanding of the proposed approach using this pattern classification problem. The data which is not linearly separable in the original space poses a challenging task in the classification theory. Cover’s theorem states that such data can be mapped into a high dimensional space using a nonlinear mapping function (kernel function), thereby resulting in a linearly separable data in the transformed space.

Selection of an appropriate kernel is an important issue to be considered. A good kernel will result in optimal separation of classes in the transformed space thereby improving the performance on unseen test samples. Using fusion of multiple kernels is often a good idea to harness the complementary properties of various kernels. The weights of the combining kernels play an important role in such cases. Selecting weights on random bases may result in an inefficient fusion. The proposed adaptive fusion framework automatically selects the best weights for the combining kernels resulting in maximum separation of classes. We demonstrate this through clustering of the Leukemia dataset consisting of 38 samples (27 Class A and 11 Class B) and 5 attributes. For demonstration purposes we choose two centers and which are the means of classes A and B respectively. The mapping of the samples in the 2D - space using various kernels is shown in Figure 8.

It can be seen that cosine kernel efficiently separates the two classes in the 2D-space while the Euclidean kernel maps all the samples to origin (overlapping samples seen as one green circle). The manual fusion of the kernels (with equal weights) results in a decreased class separation compared to the cosine kernel. The proposed adaptive fusion of the two kernels automatically assigns more weight to the cosine kernel thereby resulting in better clustering compared to the manual fusion.

### Iii-C Function Approximation

We consider the problem of approximation of a non-linear function defined by:

(18) |

The function in equation (18) is approximated using various kernels. For all experiments 121 centers were considered and the learning rate was taken to be . The centers were chosen through the subtractive clustering method of [27] with an influence factor of 0.1. The initial weights and bias values were initialised to zero. For the Gaussian kernel the spread was set to 0.2 and for the cosine kernel a small value of was used. For the proposed approach, the initial values of and are taken to be 0.5. A total of 121 values of and are used for training ranging from -1 to 1 with a step size of 0.2. Testing has been conducted on 100 data points ranging from -0.9 to 0.9 with a step size of 0.2. For the test data, Figure 9 shows the estimated output of the proposed approach compared to the actual output, the Euclidean kernel (), the cosine kernel () and the manual fusion of the two kernels () in reduced dimension.

The mean square error (MSE) curves are depicted in Figure 10. The Euclidean kernel produces the best performance achieving a minimum MSE of -18.6619 dB, while the cosine kernel performs poorly with an MSE of -4.9277 dB. Without any prior information, the proposed approach dynamically gives more weight to the Euclidean kernel. The proposed approach attains a minimum mean square error (MSE) of -18.4076 dB which is comparable to the Euclidean method. The proposed approach is substantially better compared to the manual fusion of kernels which achieved a minimum mean square error (MSE) of -15.6181 dB. Variation of the mixing parameters with respect to the iteration epochs is depicted in Figure 11. The final values of the weights were found to be and . The experiment clearly shows that in the absence of any prior knowledge, the proposed approach adaptively emphasizes the effective Euclidean kernel and achieves better performance. For the comparison of the time complexity of the proposed method with manual fusion of the two kernels, we investigated the training time for 10000 epochs. The total training time for the proposed method is found to be 586.3 seconds whereas the manual fusion of the two kernel takes 578.2 seconds.

## Iv Conclusion

In this research a novel kernel for the RBF neural network is proposed. The proposed framework adaptively fuses the Euclidean and cosine distance measures thereby harnessing the complementary properties of the two. The proposed algorithm is dynamic and adaptively learns the optimum weights of the participating kernels for a given problem. The efficacy of the proposed kernel is demonstrated on three important problems, namely nonlinear system identification, pattern classification and function approximation. The proposed algorithm has shown to comprehensively outperform the manual fusion of the two kernels. For the problem of nonlinear system identification, the proposed framework adaptively assigns a higher fusion weight to the Euclidean kernel achieving a comparable performance. The proposed algorithm performs better than the manual fusion of the two kernels. Therefore, in the absence of any prior knowledge, the proposed method is shown to emphasize the most effective kernel. For the pattern classification problem, the proposed method dynamically assigns more weight to the Euclidean kernel and achieves a comparable training accuracy of 100%. For the more challenging testing phase, the proposed optimized fusion attains the best accuracy of 97.06%. Note that the proposed approach outperformed the best conventional kernel i.e. the Euclidean kernel by meaningfully utilizing the complementary properties of the cosine kernel. For the function approximation problem, the Euclidean kernel produces the best performance achieving a minimum MSE of -18.6619 dB, while the cosine kernel performs poorly with an MSE of -4.9277 dB. Without any prior information, the proposed approach dynamically gives more weight to the Euclidean kernel and achieved a minimum MSE of -18.4076 dB. The experiments clearly demonstrate that the proposed optimum fusion of kernels will always perform equal to or better than the best participating kernel.

## V Acknowledgement

The authors would like to thank University of Western Australia (UWA), Pakistan Air Force - Karachi Institute of Economics and Technology (PAF-KIET), and Iqra University (IU), for providing the necessary support towards conducting this research and the anonymous reviewers for their important comments.

## References

- [1] S. Khan, I. Naseem, R. Togneri, and M. Bennamoun, “A novel adaptive kernel for the rbf neural networks,” Circuits, Systems, and Signal Processing, vol. 36, no. 4, pp. 1639–1653, 2017.
- [2] A. Tatar, S. Naseri, N. Sirach, M. Lee, and A. Bahadori, “Prediction of reservoir brine properties using radial basis function (rbf) neural network,” Petroleum, vol. 1, no. 4, pp. 349–357, 2015.
- [3] M. A. Halali, V. Azari, M. Arabloo, A. H. Mohammadi, and A. Bahadori, “Application of a radial basis function neural network to estimate pressure gradient in water–oil pipelines,” Journal of the Taiwan Institute of Chemical Engineers, vol. 58, pp. 189–202, 2016.
- [4] M. M. Li and B. Verma, “Nonlinear curve fitting to stopping power data using rbf neural networks,” Expert Systems with Applications, vol. 45, pp. 161–171, 2016.
- [5] W. Chen, Z.-J. Fu, and C.-S. Chen, Recent advances in radial basis function collocation methods. Springer, 2014.
- [6] P. D. Reiner, “Algorithms for optimal construction and training of radial basis function neural networks,” Ph.D. dissertation, Auburn University, 2015.
- [7] S.-K. Oh, W.-D. Kim, and W. Pedrycz, “Design of radial basis function neural network classifier realized with the aid of data preprocessing techniques: design and analysis,” International Journal of General Systems, pp. 1–21, 2015.
- [8] D. P. F. Cruz, R. D. Maia, L. A. da Silva, and L. N. de Castro, “Beerbf: A bee-inspired data clustering approach to design rbf neural network classifiers,” Neurocomputing, vol. 172, pp. 427–437, 2016.
- [9] S. H. A. Ali, K. Fukase, and S. Ozawa, “A fast online learning algorithm of radial basis function network with locality sensitive hashing,” Evolving Systems, pp. 1–14, 2016.
- [10] A. D. Niros and G. E. Tsekouras, “A novel training algorithm for rbf neural network using a hybrid fuzzy clustering approach,” Fuzzy Sets and Systems, vol. 193, pp. 62–84, 2012.
- [11] S. Chen, C. F. Cowan, and P. M. Grant, “Orthogonal least squares learning algorithm for radial basis function networks,” Neural Networks, IEEE Transactions on, vol. 2, no. 2, pp. 302–309, 1991.
- [12] L. Fu, M. Zhang, and H. Li, “Sparse rbf networks with multi-kernels,” Neural processing letters, vol. 32, no. 3, pp. 235–247, 2010.
- [13] H.-G. Han and J.-F. Qiao, “Adaptive computation algorithm for rbf neural network,” Neural Networks and Learning Systems, IEEE Transactions on, vol. 23, no. 2, pp. 342–347, 2012.
- [14] G. S. Babu and S. Suresh, “Meta-cognitive rbf network and its projection based learning algorithm for classification problems,” Applied Soft Computing, vol. 13, no. 1, pp. 654–666, 2013.
- [15] M. Barakat, D. Lefebvre, M. Khalil, F. Druaux, and O. Mustapha, “Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues,” International journal of machine learning and cybernetics, vol. 4, no. 3, pp. 217–233, 2013.
- [16] J. Wu, J. Long, and M. Liu, “Evolving rbf neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm,” Neurocomputing, vol. 148, pp. 136–142, 2015.
- [17] S. O. Haykin, Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1994.
- [18] D. Wettschereck and T. Dietterich, “Improving the performance of radial basis function networks by learning center locations,” in Advances in Neural Information Processing Systems, vol. 4. Morgan Kaufmann, San Mateo, Calif, USA, 1992, pp. 1133–1140.
- [19] W. Aftab, M. Moinuddin, and M. S. Shaikh, “A Novel Kernel for RBF Based Neural Networks,” Abstract and Applied Analysis, vol. 2014, 2014.
- [20] D. Psaltis, A. Sideris, A. Yamamura et al., “A multilayered neural network controller,” IEEE control systems magazine, vol. 8, no. 2, pp. 17–21, 1988.
- [21] K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks,” Neural Networks, IEEE Transactions on, vol. 1, no. 1, pp. 4–27, 1990.
- [22] V. Elanayar, Y. C. Shin et al., “Radial basis function neural network for approximation and estimation of nonlinear stochastic dynamic systems,” Neural Networks, IEEE Transactions on, vol. 5, no. 4, pp. 594–603, 1994.
- [23] R. P. Lippmann, “Pattern classification using neural networks,” Communications Magazine, IEEE, vol. 27, no. 11, pp. 47–50, 1989.
- [24] M. J. Er, S. Wu, J. Lu, and H. L. Toh, “Face recognition with radial basis function (rbf) neural networks,” Neural Networks, IEEE Transactions on, vol. 13, no. 3, pp. 697–710, 2002.
- [25] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 8, pp. 1226–1238, 2005.
- [26] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri et al., “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” science, vol. 286, no. 5439, pp. 531–537, 1999.
- [27] N. R. Pal and D. Chakraborty, “Mountain and subtractive clustering method: improvements and generalizations,” International Journal of Intelligent Systems, vol. 15, no. 4, pp. 329–341, 2000.