Dense Morphological Network: An Universal Function Approximator
Artificial neural networks are built on the basic operations comprising linear combination and non-linear activation function. Theoretically this structure can approximate any continuous function with three-layer architecture. The choice of activation function usually greatly influences the performance of the network. In this paper we propose the use of elementary morphological operations (dilation and erosion) as the basic operations in neurons. We show that the proposed network (called DenMo-Net) consisting of single layer of morphological neurons followed by a linear combination layer can approximate any smooth function. As DenMo-Net has an in-built non-linearity in the structure, no separate activation function is needed. But the use of max (resp. min) function in dilation (resp. erosion) results in optimization issues as the functions are only piecewise differentiable. To overcome this problem we have softened the min/max function to make it differentiable everywhere and, as a result, Soft DenMo-Net evolves. To best of our knowledge, this is the first work on network using dilation-erosion neurons, hence we focus only on the fully-connected layers. We have visually shown that the Soft DenMo-Net can classify circle data very accurately using only two morphological neurons. We have also evaluated our algorithm quantitatively on MNIST, Fashion-MNIST, SVHN, CIFAR-10 and HIGGS dataset. The results show that our network performs similarly well to similar structured neural network and some times better.
In artificial neural networks, the basic building block is an artificial neuron or perceptron that simply computes the linear combination of the input (Rosenblatt, 1958). It is usually followed by a non-linear activation function to model the non-linearity of the output. Although the neurons are simple in nature, when connected together they can approximate any continuous function of the input (Hornik, 1991). This has been successfully utilized in solving different real world problems like image classification (Krizhevsky et al., 2012), semantic segmentation (Long et al., 2015) and image generation (Isola et al., 2017). While these models are quite powerful in nature, their efficient training can be hard in general (LeCun et al., 2012) and they need support of specials techniques, such as batch normalization (Ioffe and Szegedy, 2015) and dropout (Srivastava et al., 2014), in order to achieve better generalization capabilities. Their training time also depends on the choice of activation function (Mishkin et al., 2017).
In this paper we are proposing new building blocks for building networks similar to neural network. Here, instead of the linear combination operation of the artificial neurons, we use a non-linear operation that eliminates the need of additional activation function while requiring a small number of neurons to attain same performance or better. More specifically, we use morphological operations (i.e. dilation and erosion) as the elementary operation of the neurons in the network. Our contribution in this paper is building a network with these operations that has the following properties.
Networks built with with dilation-erosion neurons followed by linear combination can approximate any continuous function given enough dilation/erosion neurons.
As dilation and erosion operation are non-linear by themselves, requirement of separate non-linear activation function is eliminated.
The use of dilation-erosion operation greatly increases number of possible decision boundaries. As a result, complex decision boundaries can be learned using small number of parameters.
As an alternative to the max/min operation of dilation/erosion, using their soft version retains the above mentioned properties while making the operations differentiable.
The rest of the paper is organized as follows. Section 2 describes the prior work on morphological neural network. In Section 3, we introduce our proposed network and prove its capabilities theoretically. We further demonstrate its capabilities empirically on a few benchmark datasets in Section 4. Next, in Section 5 we have discussed about possible variants of our network, then concluding the paper in Section 6.
2 Related Work
Morphological neuron was first introduced by Davidson and Hummer (1993) in their effort to learn the structuring element of dilation operation in images. Similar effort has been made to learn the structuring elements in a more recent work by Masci et al. (2013). Use of morphological neurons in a more general setting was first proposed by Ritter and Sussner (1996). They restricted the network to a single layer architecture and focused only on binary classification task. To classify the data, these networks use two axis parallel hyperplanes as the decision boundary. This single layer architecture of Ritter and Sussner (1996) has been extended to two layer architecture by Sussner (1998) . This two layer architecture is able to learn multiple axis parallel hyperplanes, and therefore is able to solve arbitrary binary classification task. But, in general the decision boundaries may not be axis parallel, as a result this two layer network may need to learn a large number of hyperplanes to achieve good results. So, one natural extension is to incorporate the option to rotate the hyperplanes. Taking a cue from this idea, Barmpoutis and Ritter (2006) proposed to learn a rotational matrix that rotates the input before trying to classify the data using axis parallel hyperplanes. In a separate work by Ritter et al. (2014) the use of and norm has been proposed as a replacement of the max/min operation of dilation and erosion in order to smooth the decision boundaries.
Ritter and Urcid (2003) first introduced the dendritic structure of biological neurons to the morphological neurons. This new structure creates hyperbox based decision boundaries instead of hyperplanes. The authors have proved that with hyperboxes any compact region can be estimated, therefore any two class classification problems can be solved. A generalization of this structure to the multiclass case has also been done by (Ritter and Urcid, 2007). Sussner and Esmi (2011) had proposed a new type of structure called morphological perceptrons with competitive neurons, where the output is computed in winner-take-all strategy. This is modelled using the argmax operator and this allows the network to learn more complex decision boundaries. Later Sossa and Guevara (2014) proposed a new training strategy to train this model with competitive neurons.
The non-differentiability of the max-min operations has forced the researchers to propose specialized training procedures for their models. So, a separate line of research has attempted to modify these networks so that gradient descent based optimizer can be used for training. Pessoa and Maragos (2000) have combined the classical perceptron with the morphological perceptron. The output of each node is taken as the convex combination of the classical and the morphological perceptron. Although max/min operation is not differentiable, they have proposed methodology to circumvent this problem. They have shown that this network can perform complex classification tasks. Morphological neurons have also been employed for regression task. de A. Araújo (2012) has utilized network architecture similar to morphological perceptrons with competitive learning to forecast stock markets. The argmax operator is replaced with a linear function so that the network is able to regress forecasts. The use of linear activation function enables the use of gradient descent for training which is not possible with the argmax operator. For morphological neurons with dendritic structure Zamora and Sossa (2017) had proposed to replace the argmax operator with a softmax function. This overcomes the problem of gradient computation and therefore gradient descent is employed to train the network. So, this retains the hyperbox based boundaries of the dendritic networks, but facilitates easy training with gradient descent.
3 Dense Morphological Network
In this section we introduce the basic components and structure of our network and establish its approximation power.
3.1 Dilation and Erosion neurons
Dilation and Erosion are two basic operations of our proposed network. Given an input and some structuring element , dilation () and erosion () neurons computes the following two functions respectively
Where and denotes the component of vector . The 0 is appended to the input to take care of the ‘bias. The only parameter that involves in this morphological operation is (). Note that erosion operation can also be written in the following form.
3.2 Network Structure
The Dense Morphological Net or ‘DenMo-Net’, in short, that we propose here is a simple feed forward network with some dilation and erosion neurons followed by linear combination (Figure 1). We call the layer of dilation and erosion neurons as the dilation-erosion layer and the following layer as the linear combination layer. Let’s assume the dilation-erosion layer contains dilation neurons and erosion neurons, followed by neurons in the linear combination layer. Let is the input to the network. Let and be the output of dilation neuron and erosion node, respectively. Then we can write,
where, and are the structuring elements of the dilation neuron and erosion neuron respectively. Note that and . The final output from a node of the linear combination layer is computed in the following way.
where and are the weights of the artificial neuron in the linear combination layer. In following subsection we show that can approximate any continuous function .
3.3 Function Approximation
Here we show that with the linear combination of dilation and erosion, any function can be approximated, and the approximation error decreases with increase in the number of neurons in the dilation-erosion layer. Before that we need to describe some concepts.
Definition 1 (-order Hinge Function)
(Wang and Sun, 2005) A -order hinge function consists of hyperplanes continuously joined together. it is defined by the following equation,
Definition 2 (-order hinging hyperplanes (-Hh) )
(Wang and Sun, 2005) A -order hinging hyperplanes (-HH) is defined as the sum of multi-order hinge function as follows,
with , .
From Wang and Sun (2005) the following can be said about hinging hyperplanes.
For any given positive integer and arbitrary continuous piece-wise linear function , there exists finite, say , positive integers and corresponding such that
This says that any continuous piece-wise linear function of variables can be written as an -HH, i.e. the sum of multi-order hinge functions. Now to show that our network can approximate any continuous functions, we show the following.
is sum of multi-order hinge functions.
The proof of this lemma is given in the supplementary document. There we show that can written as the sum of hinge functions in the following form.
where (number of neurons in the dilation-erosion layer), and ’s are -order hinge function.
Proposition 2 (Stone-Weierstrass approximation theorem)
Let be a compact domain () and a continuous function. Then there exists a continuous piece wise linear function such that for all , for some .
Theorem 1 (Universal approximation)
Only a single dilation-erosion layer followed by a linear combination layer can approximate any continuous smooth function provided there are enough nodes in dilation erosion-layer.
Sketch of Proof From lemma 1 we know that our DenMo-Net with of dilation and erosion neurons followed by a linear combination layer computes , which is a sum of multi-order hinge functions. Now from proposition 1 we get that any continuous piecewise linear function can be written by a finite sum of multi-order hinge function. Now from Proposition 2 we can say that any continuous function can be well approximated by a piecewise linear function. In general if then . If we increase the number of neurons in the dilation-erosion layer the approximation error decreases. Therefore, we can say that a DenMo-Net with enough dilation and erosion neurons can approximate any continuous function.
3.4 Learned Decision Boundary
The DenMo-Net we have defined above learns the following function,
Where each is collection of multiple hyperplanes joined together. Therefore the number of hyperplanes learned by the network with neurons in the dilation-erosion layer is much more than . Each morphological neuron allows only one of the inputs to pass through because of operation after addition with the structuring element. So, effectively each neuron in the dilation-erosion layer chooses one component of the -dimensional input vector. Depending on which component is being chosen, the final linear combination layer computes the hyperplane by taking either all the components of the input or only some of them (when a subset of input components is chosen more than once in the dilation-erosion layer). Note that this choice depends on the input and the structuring element together. For a network with dimensional input data and neurons () in the dilation-erosion layer, theoretically maximum hyperplanes can be formed in dimension. Out of the all possible planes only planes can span anywhere in the dimensional space. Therefore, increasing the number of neurons in the dilation-erosion layer exponentially increases the possible number of hyperplanes, i.e., the decision boundaries. This implies that, using only a small number of neurons, complex decision boundaries can be learned.
3.5 Soft DenMo-Net
Morphological neurons, i.e dilation and erosion neurons use and operation respectively. and operations are piecewise differentiable. However while back propagation from a single morphological neurons,only a single value of structuring elements is updated due to and operation. To get smooth approximation of and we follow Cook (2011) and redefine dilation and erosion operation by the following.
Where is the ”hardness” of the soft maximum. In general, if then equation 12 and equation 13 converges to equation 1 and equation 2 respectively. However, taking very high value of may overflow the bit floating point variable.
Please see the supplementary material for the proof. We call the network with soft dilation and soft erosion as Soft DenMo-Net.
Here we empirically evaluate the performance of DenMo-Net and Soft DenMo-Net and demonstrate their advantages.
As we have defined our network with 1D structuring elements, we compare our method with similar structured fully connected neural network with various activation functions, like tanh (NN-tanh) and ReLU (NN-ReLU) and Maxout network (Goodfellow et al., 2013). We have particularly chosen the maxout network for comparison, because it uses the max function as a replacement of the activation function but with added nodes to compute the maximum. The experiments have been carried out on benchmark datasets like MNIST (LeCun et al., 1998), Fashion-MNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR-10 and Higgs (Baldi et al., 2014). However, at the beginning experiment is carried out on a toy dataset consisting of data approximately on two concentric circles for visualizing the decision boundaries. In our experiment with benchmark image datasets, each image data is flattened in row major order before it is fed into the network. So, network is unaware of the spatial structure of image. For all the tasks we have used categorical cross entropy as the loss and in the last layer softmax function is used. In the training phase, network is optimized using Adam optimizer (learning rate=0.001, =0.9, =0.999) (Kingma and Ba, 2014). We have used glorot uniform initialization (Glorot and Bengio, 2010) for initializing all the structuring elements and weights of neural network. We have initialized all the bias to zero. We have used same number of dilation (resp. erosion) neurons in dilation (resp. erosion) layer unless otherwise stated.
|Methods||# Parameters||Training accuracy|
|Maxout Network (h=2)||18||90.02|
4.2 Visualization with a toy dataset
For visualizing the decision boundaries learned by the classifiers, we have generated data on two concentric circles belonging to two different classes with center at the origin. We compare the results when only two neurons are taken in the hidden layer in all the networks. It is observed that baseline neural network fails to classify this data with two hidden neurons as it learns one hyperplane per one hidden neuron. The boundaries learned by the network with ReLU activation function (NN-ReLU) is shown in figure 1(a). The result of maxout network is better (90.02% training accuracy) in this case, as it introduces extra parameters with function to achieve non-linearity. In the maxout layer we have taken maximum among features. As we see in the figure 1(b) the network learns () 4 hyper planes when trying to classify these data. For the same data and two morphological neurons in dilation-erosion layer, our DenMo-Net has learned 6 lines to form the decision boundary (figure 1(c)). Although from equation 11 we can say that we can get at most 8 distinct lines, only two of them can be placed anywhere in the 2D space while others are parallel to the axes. For this reason, we are getting two slanted lines and the remaining lines are parallel to the axes.
We have also shown the decision boundary learned by Soft DenMo-Net (figure 1(d)). We have taken hardness constant for Soft DenMo-Net. We see that the decision boundary learned by Soft DenMo-Net are smooth, hence it get perfect decision boundary in circle data, with only two hidden morphological neurons.
The classification accuracy achieved by the networks along with their number of parameters is reported in Table 1. The accuracy clearly reveals the efficacy of DenMo-Net.
|Dataset||DenMo-Net||Soft DenMo-Net ()||State-of-the-art|
|MNIST||98.39||98.90||99.79 (Wan et al., 2013)|
|Fashion-MNIST||89.87||89.84||89.70 (Xiao et al., 2017)|
4.3 Experiment on MNIST Dataset
MNIST dataset (LeCun et al., 1998) contains gray scale images of hand written numbers (0-9) of size . It has 60,000 training images and 10,000 test images. Since our network is defined on one dimensional input, we have converted each image to a column vector (in row major order) before using it as input. The network we use follows the structure we have previously defined: input layer, dilation-erosion layer and linear combination layer computing the output. As in this dataset we had to distinguish between 10 classes of images, 10 neurons are taken in the output layer. In Table 2 we have shown the accuracy on test data after training the network for 150 epochs with different number of nodes () in the dilation-erosion layer. We get average test accuracy of 98.39% and 98.90% after training 3 times with the DenMo-Net and Soft DenMo-Net () respectively with 200 dilation and 200 erosion neurons (Table 2) up to 400 epochs. However, it maybe noted that, better pre-processing of data may result higher accuracy.
4.4 Experiment on Fashion-MNIST Dataset
The Fashion-MNIST dataset (Xiao et al., 2017) has been proposed with the aim of replacing the popular MNIST dataset. Similar to the MNIST dataset this also contains images of 10 classes and 60,000 training and 10,000 testing samples. While MNIST is still a popular choice for benchmarking classifiers, the authors’ claim that MNIST is too easy and does not represent the modern computer vision tasks. This dataset aims to provide the accessibility of the MNIST dataset while posing a more challenging classification task. For the experiment, we have converted the images to a column vector similar to what we have done for the MNIST dataset. We have taken 250 dilation and 250 erosion nodes in the dilation-erosion layer for this experiment. The only pre-processing we have done is normalized the data between [0,1]. We have trained the network separately 3 times up to 300 epochs. The reported test accuracy (Table 2) is the average of the 3 runs. We see that our method gives comparable results with the state-of-the-art.
4.5 Experiment on SVHN Dataset
Street View House Numbers (SVHN) dataset (Netzer et al., 2011) is similar to MNIST dataset in the sense both of them contains images of numerals written in english. In this dataset the images are collected from house numbers in Google Street View images. Like MNIST all the images are centered on a single character. But unlike MNIST, here the images are not grayscale. They are color images. The dataset has around 73257 training samples and 26032 test samples. For the experiment we have flattened the image in row major order and normalized between [0,1]. In Table 4 we have reported the test accuracyn achieved by different networks along with their number of parameters. We see that even with increased number of parameters is performing the poorest. On the other hand both DenMo-Net and Soft DenMo-Net perform close the best performing classifier.
4.6 Experiment on CIFAR-10 Dataset
CIFAR-10 (Krizhevsky and Hinton, 2009) is natural image dataset with 10 classes. It has 50,000 training and 10,000 test images. Each of them is a color image of size . The images are converted to column vector before they are fed to the DenMo-Net. For all the networks we compare with, the experiments have been conducted keeping the number of neurons same in the hidden layer. Note that, in maxout network, each hidden neuron has two extra nodes over which the maximum is computed. In Table 3 we have reported the average test accuracy obtained over three runs of 150 epochs. It can be seen from the table that DenMo-Net achieves the best accuracy in all the cases. Maxout network lags behind even with more number of parameters. This happens because our network is able to learn more hyperplanes with number of parameters similar to standard artificial neural networks. However, using only a single type of morphological neurons in our network, we get a different result for this dataset (Figure 3). Soft DenMo-Net () achieves accuracy of % using morphological neurons.
4.7 Higgs Dataset
Higgs Dataset (Baldi et al., 2014) is built to benchmark the performance of neural networks in distinguishing signal process producing Higgs boson from the background process that does not. This is a synthetically generated dataset with 28 features commonly used by the physicists to distinguishing between the two. The dataset has 11 million data instances. Out of which we have taken random 80% as the training data and rest as the test data. The features have been normalized between -1 and 1 before they are sent to training. In Table 5, We have reported the performance of the network. We see that DenMo-Net performs better than Soft DenMo-Net. However the performance with other network are very similar.
5.1 Ablation Study
Theoretically DenMo-Net can act as an universal approximator with only dilation (or erosion) neurons. But in practice, presence of both dilation and erosion neurons in the dilation-erosion layer improves performance. To empirically justify this claim we have taken the help of CIFAR-10 dataset. We have reported the change in test accuracy over the epochs for DenMo-Net and Soft DenMo-Net in figure 4 and figure 5 respectively. In the experiments with both the networks the total number of neurons in the dilation-erosion layer has been kept the same (1200). In both the cases, we see the use of both types of nodes results in a jump in the performance, which is not attained even after several epochs when using one type of neurons only.
5.2 Stacking Multiple Layers
We have defined the network and have shown its properties when only three layers are employed in the network. Straight-forward stacking of the layers that we may use in our network can give rise to two kinds of network.
Multiple dilation-erosion layer followed by a single linear combination layer at the end.
Unit formed by a Dilation-Erosion layer followed by a linear combination layer repeated multiple times.
For the network of Type-I, it can be argued that the network is performing some concatenation of opening and closing operations and, finally, their linear combination. As there are dilation-erosion (DE) layers back to back, the problem of gradient propagation is amplified. As a result it takes much more time to train compared to single layer architecture(figure 7).
Similar explanation doesn’t hold good for Type-II networks. Type-II gives similar results as single hidden layer DenMo-Net as shown in figure 6 and figure 7. However,in cifar10, It highly overfits. We believe its understanding requires further exploration and extension towards 2D Morphological network which takes 2D image as input.
In this paper we have proposed a new class of networks that uses morphological neurons. These network consists of three layers only: input layer, dilation-erosion layer followed by linear combination layer giving the output of the network. We have presented analysis using this three layer network only, but its deeper version should be explored in future. We have shown that unlike standard Artificial Neural Network this proposed three layer architecture can approximate any smooth function without any activation function provided there are enough dilation,erosion neurons. Second, these proposed networks are able to learn a large number of hyper-planes with very few neurons in the dilation-erosion layer and thereby provide superior results compared to other networks with three layer architecture. In this work we have only worked with fully connected layers, i.e. a node in a layer is connected to all the nodes in the previous layer. This type of connectivity is not very efficient for image data where architectures with convolution layers perform better. So, extending this work to the case where a structuring element operates by sliding over the whole image, should be the next logical step.
- Searching for exotic particles in high-energy physics with deep learning. Nature communications 5, pp. 4308. Cited by: §4.1, §4.7.
- Orthonormal Basis Lattice Neural Networks. In 2006 IEEE International Conference on Fuzzy Systems, pp. 331–336. External Links: Cited by: §2.
- Basic properties of the soft maximum. UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series; Working Paper 70. Cited by: §3.5.
- Morphology neural networks: An introduction with applications. Circuits, Systems and Signal Processing 12 (2), pp. 177–210 (en). External Links: Cited by: §2.
- A morphological perceptron with gradient-based learning for Brazilian stock market forecasting. Neural Networks 28, pp. 61–81. External Links: Cited by: §2.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. Cited by: §4.1.
- Maxout Networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, Atlanta, GA, USA, pp. III–1319–III–1327. Cited by: §4.1.
- Approximation capabilities of multilayer feedforward networks. Neural Networks 4 (2), pp. 251–257. External Links: Cited by: §1.
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, pp. 448–456 (en). Cited by: §1.
- Image-to-Image Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. External Links: Cited by: §1.
- Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. Note: arXiv: 1412.6980 Cited by: §4.1.
- Learning multiple layers of features from tiny images. Technical report University of Toronto. Cited by: §4.6.
- ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. Cited by: §1.
- Efficient BackProp. In Neural Networks: Tricks of the Trade: Second Edition, G. Montavon, G. B. Orr, and K. Müller (Eds.), Lecture Notes in Computer Science, pp. 9–48 (en). External Links: Cited by: §1.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1, §4.3.
- Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440. External Links: Cited by: §1.
- A learning framework for morphological operators using counter–harmonic mean. In International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, pp. 329–340. Cited by: §2.
- Systematic evaluation of convolution neural network advances on the Imagenet. Computer Vision and Image Understanding 161, pp. 11–19. External Links: Cited by: §1.
- Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, Vol. 2011, pp. 5. Cited by: §4.1, §4.5.
- Neural networks with hybrid morphological/rank/linear nodes: a unifying framework with applications to handwritten character recognition. Pattern Recognition 33 (6), pp. 945–960. External Links: Cited by: §2.
- An introduction to morphological neural networks. In Proceedings of 13th International Conference on Pattern Recognition, Vol. 4, pp. 709–717 vol.4. External Links: Cited by: §2.
- Two lattice metrics dendritic computing for pattern recognition. In 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 45–52. External Links: Cited by: §2.
- Lattice algebra approach to single-neuron computation. IEEE Transactions on Neural Networks 14 (2), pp. 282–295. External Links: Cited by: §2.
- Learning in Lattice Neural Networks that Employ Dendritic Computing. In Computational Intelligence Based on Lattice Theory, V. G. Kaburlasos and G. X. Ritter (Eds.), Studies in Computational Intelligence, pp. 25–44 (en). External Links: Cited by: §2.
- The perceptron: A probabilistic model for information storage and organization in the brain.. Psychological Review 65 (6), pp. 386–408. External Links: Cited by: §1.
- Efficient training for dendrite morphological neural networks. Neurocomputing 131, pp. 132–142. External Links: Cited by: §2.
- Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: §1.
- Morphological perceptron learning. In Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell, pp. 477–482. External Links: Cited by: §2.
- Morphological perceptrons with competitive learning: Lattice-theoretical framework and constructive learning algorithm. Information Sciences 181 (10), pp. 1929–1950. External Links: Cited by: §2.
- Regularization of neural networks using dropconnect. In International Conference on Machine Learning, pp. 1058–1066. Cited by: Table 2.
- Generalization of hinging hyperplanes. IEEE Transactions on Information Theory 51 (12), pp. 4425–4431. Cited by: §3.3, Definition 1, Definition 2.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §4.1, §4.4, Table 2.
- Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260, pp. 420–431. External Links: Cited by: §2.