Dense Morphological Network: An Universal Function Approximator
Abstract
Artificial neural networks are built on the basic operations comprising linear combination and nonlinear activation function. Theoretically this structure can approximate any continuous function with threelayer architecture. The choice of activation function usually greatly influences the performance of the network. In this paper we propose the use of elementary morphological operations (dilation and erosion) as the basic operations in neurons. We show that the proposed network (called DenMoNet) consisting of single layer of morphological neurons followed by a linear combination layer can approximate any smooth function. As DenMoNet has an inbuilt nonlinearity in the structure, no separate activation function is needed. But the use of max (resp. min) function in dilation (resp. erosion) results in optimization issues as the functions are only piecewise differentiable. To overcome this problem we have softened the min/max function to make it differentiable everywhere and, as a result, Soft DenMoNet evolves. To best of our knowledge, this is the first work on network using dilationerosion neurons, hence we focus only on the fullyconnected layers. We have visually shown that the Soft DenMoNet can classify circle data very accurately using only two morphological neurons. We have also evaluated our algorithm quantitatively on MNIST, FashionMNIST, SVHN, CIFAR10 and HIGGS dataset. The results show that our network performs similarly well to similar structured neural network and some times better.
1 Introduction
In artificial neural networks, the basic building block is an artificial neuron or perceptron that simply computes the linear combination of the input (Rosenblatt, 1958). It is usually followed by a nonlinear activation function to model the nonlinearity of the output. Although the neurons are simple in nature, when connected together they can approximate any continuous function of the input (Hornik, 1991). This has been successfully utilized in solving different real world problems like image classification (Krizhevsky et al., 2012), semantic segmentation (Long et al., 2015) and image generation (Isola et al., 2017). While these models are quite powerful in nature, their efficient training can be hard in general (LeCun et al., 2012) and they need support of specials techniques, such as batch normalization (Ioffe and Szegedy, 2015) and dropout (Srivastava et al., 2014), in order to achieve better generalization capabilities. Their training time also depends on the choice of activation function (Mishkin et al., 2017).
In this paper we are proposing new building blocks for building networks similar to neural network. Here, instead of the linear combination operation of the artificial neurons, we use a nonlinear operation that eliminates the need of additional activation function while requiring a small number of neurons to attain same performance or better. More specifically, we use morphological operations (i.e. dilation and erosion) as the elementary operation of the neurons in the network. Our contribution in this paper is building a network with these operations that has the following properties.

Networks built with with dilationerosion neurons followed by linear combination can approximate any continuous function given enough dilation/erosion neurons.

As dilation and erosion operation are nonlinear by themselves, requirement of separate nonlinear activation function is eliminated.

The use of dilationerosion operation greatly increases number of possible decision boundaries. As a result, complex decision boundaries can be learned using small number of parameters.

As an alternative to the max/min operation of dilation/erosion, using their soft version retains the above mentioned properties while making the operations differentiable.
The rest of the paper is organized as follows. Section 2 describes the prior work on morphological neural network. In Section 3, we introduce our proposed network and prove its capabilities theoretically. We further demonstrate its capabilities empirically on a few benchmark datasets in Section 4. Next, in Section 5 we have discussed about possible variants of our network, then concluding the paper in Section 6.
2 Related Work
Morphological neuron was first introduced by Davidson and Hummer (1993) in their effort to learn the structuring element of dilation operation in images. Similar effort has been made to learn the structuring elements in a more recent work by Masci et al. (2013). Use of morphological neurons in a more general setting was first proposed by Ritter and Sussner (1996). They restricted the network to a single layer architecture and focused only on binary classification task. To classify the data, these networks use two axis parallel hyperplanes as the decision boundary. This single layer architecture of Ritter and Sussner (1996) has been extended to two layer architecture by Sussner (1998) . This two layer architecture is able to learn multiple axis parallel hyperplanes, and therefore is able to solve arbitrary binary classification task. But, in general the decision boundaries may not be axis parallel, as a result this two layer network may need to learn a large number of hyperplanes to achieve good results. So, one natural extension is to incorporate the option to rotate the hyperplanes. Taking a cue from this idea, Barmpoutis and Ritter (2006) proposed to learn a rotational matrix that rotates the input before trying to classify the data using axis parallel hyperplanes. In a separate work by Ritter et al. (2014) the use of and norm has been proposed as a replacement of the max/min operation of dilation and erosion in order to smooth the decision boundaries.
Ritter and Urcid (2003) first introduced the dendritic structure of biological neurons to the morphological neurons. This new structure creates hyperbox based decision boundaries instead of hyperplanes. The authors have proved that with hyperboxes any compact region can be estimated, therefore any two class classification problems can be solved. A generalization of this structure to the multiclass case has also been done by (Ritter and Urcid, 2007). Sussner and Esmi (2011) had proposed a new type of structure called morphological perceptrons with competitive neurons, where the output is computed in winnertakeall strategy. This is modelled using the argmax operator and this allows the network to learn more complex decision boundaries. Later Sossa and Guevara (2014) proposed a new training strategy to train this model with competitive neurons.
The nondifferentiability of the maxmin operations has forced the researchers to propose specialized training procedures for their models. So, a separate line of research has attempted to modify these networks so that gradient descent based optimizer can be used for training. Pessoa and Maragos (2000) have combined the classical perceptron with the morphological perceptron. The output of each node is taken as the convex combination of the classical and the morphological perceptron. Although max/min operation is not differentiable, they have proposed methodology to circumvent this problem. They have shown that this network can perform complex classification tasks. Morphological neurons have also been employed for regression task. de A. Araújo (2012) has utilized network architecture similar to morphological perceptrons with competitive learning to forecast stock markets. The argmax operator is replaced with a linear function so that the network is able to regress forecasts. The use of linear activation function enables the use of gradient descent for training which is not possible with the argmax operator. For morphological neurons with dendritic structure Zamora and Sossa (2017) had proposed to replace the argmax operator with a softmax function. This overcomes the problem of gradient computation and therefore gradient descent is employed to train the network. So, this retains the hyperbox based boundaries of the dendritic networks, but facilitates easy training with gradient descent.
3 Dense Morphological Network
In this section we introduce the basic components and structure of our network and establish its approximation power.
3.1 Dilation and Erosion neurons
Dilation and Erosion are two basic operations of our proposed network. Given an input and some structuring element , dilation () and erosion () neurons computes the following two functions respectively
(1)  
(2) 
Where and denotes the component of vector . The 0 is appended to the input to take care of the ‘bias. The only parameter that involves in this morphological operation is (). Note that erosion operation can also be written in the following form.
(3) 
3.2 Network Structure
The Dense Morphological Net or ‘DenMoNet’, in short, that we propose here is a simple feed forward network with some dilation and erosion neurons followed by linear combination (Figure 1). We call the layer of dilation and erosion neurons as the dilationerosion layer and the following layer as the linear combination layer. Let’s assume the dilationerosion layer contains dilation neurons and erosion neurons, followed by neurons in the linear combination layer. Let is the input to the network. Let and be the output of dilation neuron and erosion node, respectively. Then we can write,
(4)  
(5) 
where, and are the structuring elements of the dilation neuron and erosion neuron respectively. Note that and . The final output from a node of the linear combination layer is computed in the following way.
(6) 
where and are the weights of the artificial neuron in the linear combination layer. In following subsection we show that can approximate any continuous function .
3.3 Function Approximation
Here we show that with the linear combination of dilation and erosion, any function can be approximated, and the approximation error decreases with increase in the number of neurons in the dilationerosion layer. Before that we need to describe some concepts.
Definition 1 (order Hinge Function)
(Wang and Sun, 2005) A order hinge function consists of hyperplanes continuously joined together. it is defined by the following equation,
(7) 
Definition 2 (order hinging hyperplanes (Hh) )
(Wang and Sun, 2005) A order hinging hyperplanes (HH) is defined as the sum of multiorder hinge function as follows,
(8) 
with , .
From Wang and Sun (2005) the following can be said about hinging hyperplanes.
Proposition 1
For any given positive integer and arbitrary continuous piecewise linear function , there exists finite, say , positive integers and corresponding such that
(9) 
This says that any continuous piecewise linear function of variables can be written as an HH, i.e. the sum of multiorder hinge functions. Now to show that our network can approximate any continuous functions, we show the following.
Lemma 1
is sum of multiorder hinge functions.
The proof of this lemma is given in the supplementary document. There we show that can written as the sum of hinge functions in the following form.
(10) 
where (number of neurons in the dilationerosion layer), and ’s are order hinge function.
Proposition 2 (StoneWeierstrass approximation theorem)
Let be a compact domain () and a continuous function. Then there exists a continuous piece wise linear function such that for all , for some .
Theorem 1 (Universal approximation)
Only a single dilationerosion layer followed by a linear combination layer can approximate any continuous smooth function provided there are enough nodes in dilation erosionlayer.
Sketch of Proof From lemma 1 we know that our DenMoNet with of dilation and erosion neurons followed by a linear combination layer computes , which is a sum of multiorder hinge functions. Now from proposition 1 we get that any continuous piecewise linear function can be written by a finite sum of multiorder hinge function. Now from Proposition 2 we can say that any continuous function can be well approximated by a piecewise linear function. In general if then . If we increase the number of neurons in the dilationerosion layer the approximation error decreases. Therefore, we can say that a DenMoNet with enough dilation and erosion neurons can approximate any continuous function.
3.4 Learned Decision Boundary
The DenMoNet we have defined above learns the following function,
(11) 
Where each is collection of multiple hyperplanes joined together. Therefore the number of hyperplanes learned by the network with neurons in the dilationerosion layer is much more than . Each morphological neuron allows only one of the inputs to pass through because of operation after addition with the structuring element. So, effectively each neuron in the dilationerosion layer chooses one component of the dimensional input vector. Depending on which component is being chosen, the final linear combination layer computes the hyperplane by taking either all the components of the input or only some of them (when a subset of input components is chosen more than once in the dilationerosion layer). Note that this choice depends on the input and the structuring element together. For a network with dimensional input data and neurons () in the dilationerosion layer, theoretically maximum hyperplanes can be formed in dimension. Out of the all possible planes only planes can span anywhere in the dimensional space. Therefore, increasing the number of neurons in the dilationerosion layer exponentially increases the possible number of hyperplanes, i.e., the decision boundaries. This implies that, using only a small number of neurons, complex decision boundaries can be learned.
3.5 Soft DenMoNet
Morphological neurons, i.e dilation and erosion neurons use and operation respectively. and operations are piecewise differentiable. However while back propagation from a single morphological neurons,only a single value of structuring elements is updated due to and operation. To get smooth approximation of and we follow Cook (2011) and redefine dilation and erosion operation by the following.
(12) 
(13) 
Where is the ”hardness” of the soft maximum. In general, if then equation 12 and equation 13 converges to equation 1 and equation 2 respectively. However, taking very high value of may overflow the bit floating point variable.
Please see the supplementary material for the proof. We call the network with soft dilation and soft erosion as Soft DenMoNet.
4 Results
Here we empirically evaluate the performance of DenMoNet and Soft DenMoNet and demonstrate their advantages.
4.1 Baseline
As we have defined our network with 1D structuring elements, we compare our method with similar structured fully connected neural network with various activation functions, like tanh (NNtanh) and ReLU (NNReLU) and Maxout network (Goodfellow et al., 2013). We have particularly chosen the maxout network for comparison, because it uses the max function as a replacement of the activation function but with added nodes to compute the maximum. The experiments have been carried out on benchmark datasets like MNIST (LeCun et al., 1998), FashionMNIST (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR10 and Higgs (Baldi et al., 2014). However, at the beginning experiment is carried out on a toy dataset consisting of data approximately on two concentric circles for visualizing the decision boundaries. In our experiment with benchmark image datasets, each image data is flattened in row major order before it is fed into the network. So, network is unaware of the spatial structure of image. For all the tasks we have used categorical cross entropy as the loss and in the last layer softmax function is used. In the training phase, network is optimized using Adam optimizer (learning rate=0.001, =0.9, =0.999) (Kingma and Ba, 2014). We have used glorot uniform initialization (Glorot and Bengio, 2010) for initializing all the structuring elements and weights of neural network. We have initialized all the bias to zero. We have used same number of dilation (resp. erosion) neurons in dilation (resp. erosion) layer unless otherwise stated.
Methods  # Parameters  Training accuracy 

NNReLU  12  69.30 
Maxout Network (h=2)  18  90.02 
DenMoNet  12  94.10 
Soft DenMoNet()  12  97.3 
4.2 Visualization with a toy dataset
For visualizing the decision boundaries learned by the classifiers, we have generated data on two concentric circles belonging to two different classes with center at the origin. We compare the results when only two neurons are taken in the hidden layer in all the networks. It is observed that baseline neural network fails to classify this data with two hidden neurons as it learns one hyperplane per one hidden neuron. The boundaries learned by the network with ReLU activation function (NNReLU) is shown in figure 1(a). The result of maxout network is better (90.02% training accuracy) in this case, as it introduces extra parameters with function to achieve nonlinearity. In the maxout layer we have taken maximum among features. As we see in the figure 1(b) the network learns () 4 hyper planes when trying to classify these data. For the same data and two morphological neurons in dilationerosion layer, our DenMoNet has learned 6 lines to form the decision boundary (figure 1(c)). Although from equation 11 we can say that we can get at most 8 distinct lines, only two of them can be placed anywhere in the 2D space while others are parallel to the axes. For this reason, we are getting two slanted lines and the remaining lines are parallel to the axes.
We have also shown the decision boundary learned by Soft DenMoNet (figure 1(d)). We have taken hardness constant for Soft DenMoNet. We see that the decision boundary learned by Soft DenMoNet are smooth, hence it get perfect decision boundary in circle data, with only two hidden morphological neurons.
The classification accuracy achieved by the networks along with their number of parameters is reported in Table 1. The accuracy clearly reveals the efficacy of DenMoNet.
Test Accuracy  

Dataset  DenMoNet  Soft DenMoNet ()  Stateoftheart 
MNIST  98.39  98.90  99.79 (Wan et al., 2013) 
FashionMNIST  89.87  89.84  89.70 (Xiao et al., 2017) 
Architecture  l=200  l=400  l=600  

parameters  accuracy  parameters  accuracy  parameters  accuracy  
NNtanh  616,610  49.22  1,233,210  51.02  1,849,810  52.18 
NNReLU  616,610  48.89  1,233,210  51.66  1,849,810  51.57 
MaxoutNetwork  1,231,210  50.87  2,462,410  51.12  3,693,610  52.19 
DenMoNet  616,610  51.81  1,233,210  54.13  1,849,810  54.60 
Soft DenMoNet()  616,610  52.52  1,233,210  54.58  1,849,810  55.59 
Architecture  l=200  l=400  l=600  

parameters  accuracy  parameters  accuracy  parameters  accuracy  
NNtanh  616,610  75.24  1,233,210  76.72  1,849,810  77.28 
NNReLU  616,610  68.26  1,233,210  75.46  1,849,810  77.33 
MaxoutNetwork  1,231,210  68.12  2,462,410  71.85  3,693,610  75.59 
DenMoNet  616,610  71.87  1,233,210  75.95  1,849,810  77.80 
Soft DenMoNet()  616,610  73.07  1,233,210  75.68  1,849,810  78.24 
4.3 Experiment on MNIST Dataset
MNIST dataset (LeCun et al., 1998) contains gray scale images of hand written numbers (09) of size . It has 60,000 training images and 10,000 test images. Since our network is defined on one dimensional input, we have converted each image to a column vector (in row major order) before using it as input. The network we use follows the structure we have previously defined: input layer, dilationerosion layer and linear combination layer computing the output. As in this dataset we had to distinguish between 10 classes of images, 10 neurons are taken in the output layer. In Table 2 we have shown the accuracy on test data after training the network for 150 epochs with different number of nodes () in the dilationerosion layer. We get average test accuracy of 98.39% and 98.90% after training 3 times with the DenMoNet and Soft DenMoNet () respectively with 200 dilation and 200 erosion neurons (Table 2) up to 400 epochs. However, it maybe noted that, better preprocessing of data may result higher accuracy.
4.4 Experiment on FashionMNIST Dataset
The FashionMNIST dataset (Xiao et al., 2017) has been proposed with the aim of replacing the popular MNIST dataset. Similar to the MNIST dataset this also contains images of 10 classes and 60,000 training and 10,000 testing samples. While MNIST is still a popular choice for benchmarking classifiers, the authors’ claim that MNIST is too easy and does not represent the modern computer vision tasks. This dataset aims to provide the accessibility of the MNIST dataset while posing a more challenging classification task. For the experiment, we have converted the images to a column vector similar to what we have done for the MNIST dataset. We have taken 250 dilation and 250 erosion nodes in the dilationerosion layer for this experiment. The only preprocessing we have done is normalized the data between [0,1]. We have trained the network separately 3 times up to 300 epochs. The reported test accuracy (Table 2) is the average of the 3 runs. We see that our method gives comparable results with the stateoftheart.
4.5 Experiment on SVHN Dataset
Street View House Numbers (SVHN) dataset (Netzer et al., 2011) is similar to MNIST dataset in the sense both of them contains images of numerals written in english. In this dataset the images are collected from house numbers in Google Street View images. Like MNIST all the images are centered on a single character. But unlike MNIST, here the images are not grayscale. They are color images. The dataset has around 73257 training samples and 26032 test samples. For the experiment we have flattened the image in row major order and normalized between [0,1]. In Table 4 we have reported the test accuracyn achieved by different networks along with their number of parameters. We see that even with increased number of parameters is performing the poorest. On the other hand both DenMoNet and Soft DenMoNet perform close the best performing classifier.
4.6 Experiment on CIFAR10 Dataset
CIFAR10 (Krizhevsky and Hinton, 2009) is natural image dataset with 10 classes. It has 50,000 training and 10,000 test images. Each of them is a color image of size . The images are converted to column vector before they are fed to the DenMoNet. For all the networks we compare with, the experiments have been conducted keeping the number of neurons same in the hidden layer. Note that, in maxout network, each hidden neuron has two extra nodes over which the maximum is computed. In Table 3 we have reported the average test accuracy obtained over three runs of 150 epochs. It can be seen from the table that DenMoNet achieves the best accuracy in all the cases. Maxout network lags behind even with more number of parameters. This happens because our network is able to learn more hyperplanes with number of parameters similar to standard artificial neural networks. However, using only a single type of morphological neurons in our network, we get a different result for this dataset (Figure 3). Soft DenMoNet () achieves accuracy of % using morphological neurons.
4.7 Higgs Dataset
Higgs Dataset (Baldi et al., 2014) is built to benchmark the performance of neural networks in distinguishing signal process producing Higgs boson from the background process that does not. This is a synthetically generated dataset with 28 features commonly used by the physicists to distinguishing between the two. The dataset has 11 million data instances. Out of which we have taken random 80% as the training data and rest as the test data. The features have been normalized between 1 and 1 before they are sent to training. In Table 5, We have reported the performance of the network. We see that DenMoNet performs better than Soft DenMoNet. However the performance with other network are very similar.
Architecture(l=200)  Accuracy 

DenMoNet  73.23 
Soft DenMoNet()  72.84 
NNtanh  71.20 
NNRelu  74.34 
NNMaxout  74.88 
5 Discussions
5.1 Ablation Study
Theoretically DenMoNet can act as an universal approximator with only dilation (or erosion) neurons. But in practice, presence of both dilation and erosion neurons in the dilationerosion layer improves performance. To empirically justify this claim we have taken the help of CIFAR10 dataset. We have reported the change in test accuracy over the epochs for DenMoNet and Soft DenMoNet in figure 4 and figure 5 respectively. In the experiments with both the networks the total number of neurons in the dilationerosion layer has been kept the same (1200). In both the cases, we see the use of both types of nodes results in a jump in the performance, which is not attained even after several epochs when using one type of neurons only.
5.2 Stacking Multiple Layers
We have defined the network and have shown its properties when only three layers are employed in the network. Straightforward stacking of the layers that we may use in our network can give rise to two kinds of network.
 TypeI

Multiple dilationerosion layer followed by a single linear combination layer at the end.
 TypeII

Unit formed by a DilationErosion layer followed by a linear combination layer repeated multiple times.
For the network of TypeI, it can be argued that the network is performing some concatenation of opening and closing operations and, finally, their linear combination. As there are dilationerosion (DE) layers back to back, the problem of gradient propagation is amplified. As a result it takes much more time to train compared to single layer architecture(figure 7).
Similar explanation doesn’t hold good for TypeII networks. TypeII gives similar results as single hidden layer DenMoNet as shown in figure 6 and figure 7. However,in cifar10, It highly overfits. We believe its understanding requires further exploration and extension towards 2D Morphological network which takes 2D image as input.
6 Conclusion
In this paper we have proposed a new class of networks that uses morphological neurons. These network consists of three layers only: input layer, dilationerosion layer followed by linear combination layer giving the output of the network. We have presented analysis using this three layer network only, but its deeper version should be explored in future. We have shown that unlike standard Artificial Neural Network this proposed three layer architecture can approximate any smooth function without any activation function provided there are enough dilation,erosion neurons. Second, these proposed networks are able to learn a large number of hyperplanes with very few neurons in the dilationerosion layer and thereby provide superior results compared to other networks with three layer architecture. In this work we have only worked with fully connected layers, i.e. a node in a layer is connected to all the nodes in the previous layer. This type of connectivity is not very efficient for image data where architectures with convolution layers perform better. So, extending this work to the case where a structuring element operates by sliding over the whole image, should be the next logical step.
References
 Searching for exotic particles in highenergy physics with deep learning. Nature communications 5, pp. 4308. Cited by: §4.1, §4.7.
 Orthonormal Basis Lattice Neural Networks. In 2006 IEEE International Conference on Fuzzy Systems, pp. 331–336. External Links: Document Cited by: §2.
 Basic properties of the soft maximum. UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series; Working Paper 70. Cited by: §3.5.
 Morphology neural networks: An introduction with applications. Circuits, Systems and Signal Processing 12 (2), pp. 177–210 (en). External Links: ISSN 15315878, Document Cited by: §2.
 A morphological perceptron with gradientbased learning for Brazilian stock market forecasting. Neural Networks 28, pp. 61–81. External Links: ISSN 08936080, Document Cited by: §2.
 Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. Cited by: §4.1.
 Maxout Networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning  Volume 28, ICML’13, Atlanta, GA, USA, pp. III–1319–III–1327. Cited by: §4.1.
 Approximation capabilities of multilayer feedforward networks. Neural Networks 4 (2), pp. 251–257. External Links: ISSN 08936080, Document Cited by: §1.
 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning, pp. 448–456 (en). Cited by: §1.
 ImagetoImage Translation with Conditional Adversarial Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976. External Links: Document Cited by: §1.
 Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs]. Note: arXiv: 1412.6980 Cited by: §4.1.
 Learning multiple layers of features from tiny images. Technical report University of Toronto. Cited by: §4.6.
 ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. Cited by: §1.
 Efficient BackProp. In Neural Networks: Tricks of the Trade: Second Edition, G. Montavon, G. B. Orr, and K. Müller (Eds.), Lecture Notes in Computer Science, pp. 9–48 (en). External Links: ISBN 9783642352898, Document Cited by: §1.
 Gradientbased learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: §4.1, §4.3.
 Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440. External Links: Document Cited by: §1.
 A learning framework for morphological operators using counter–harmonic mean. In International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, pp. 329–340. Cited by: §2.
 Systematic evaluation of convolution neural network advances on the Imagenet. Computer Vision and Image Understanding 161, pp. 11–19. External Links: ISSN 10773142, Document Cited by: §1.
 Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, Vol. 2011, pp. 5. Cited by: §4.1, §4.5.
 Neural networks with hybrid morphological/rank/linear nodes: a unifying framework with applications to handwritten character recognition. Pattern Recognition 33 (6), pp. 945–960. External Links: ISSN 00313203, Document Cited by: §2.
 An introduction to morphological neural networks. In Proceedings of 13th International Conference on Pattern Recognition, Vol. 4, pp. 709–717 vol.4. External Links: Document Cited by: §2.
 Two lattice metrics dendritic computing for pattern recognition. In 2014 IEEE International Conference on Fuzzy Systems (FUZZIEEE), pp. 45–52. External Links: Document Cited by: §2.
 Lattice algebra approach to singleneuron computation. IEEE Transactions on Neural Networks 14 (2), pp. 282–295. External Links: ISSN 10459227, Document Cited by: §2.
 Learning in Lattice Neural Networks that Employ Dendritic Computing. In Computational Intelligence Based on Lattice Theory, V. G. Kaburlasos and G. X. Ritter (Eds.), Studies in Computational Intelligence, pp. 25–44 (en). External Links: ISBN 9783540726876, Document Cited by: §2.
 The perceptron: A probabilistic model for information storage and organization in the brain.. Psychological Review 65 (6), pp. 386–408. External Links: ISSN 19391471(Electronic),0033295X(Print), Document Cited by: §1.
 Efficient training for dendrite morphological neural networks. Neurocomputing 131, pp. 132–142. External Links: ISSN 09252312, Document Cited by: §2.
 Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15 (1), pp. 1929–1958. Cited by: §1.
 Morphological perceptron learning. In Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell, pp. 477–482. External Links: Document Cited by: §2.
 Morphological perceptrons with competitive learning: Latticetheoretical framework and constructive learning algorithm. Information Sciences 181 (10), pp. 1929–1950. External Links: ISSN 00200255, Document Cited by: §2.
 Regularization of neural networks using dropconnect. In International Conference on Machine Learning, pp. 1058–1066. Cited by: Table 2.
 Generalization of hinging hyperplanes. IEEE Transactions on Information Theory 51 (12), pp. 4425–4431. Cited by: §3.3, Definition 1, Definition 2.
 Fashionmnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. Cited by: §4.1, §4.4, Table 2.
 Dendrite morphological neurons trained by stochastic gradient descent. Neurocomputing 260, pp. 420–431. External Links: ISSN 09252312, Document Cited by: §2.