Interpretable neural networks based on continuous-valued logic and multicriteria decision operators
Combining neural networks with continuous logic and multicriteria decision making tools can reduce the black box nature of neural models. In this study, we show that nilpotent logical systems offer an appropriate mathematical framework for a hybridization of continuous nilpotent logic and neural models, helping to improve the interpretability and safety of machine learning. In our concept, perceptrons model soft inequalities; namely membership functions and continuous logical operators. We design the network architecture before training, using continuous logical operators and multicriteria decision tools with given weights working in the hidden layers. Designing the structure appropriately leads to a drastic reduction in the number of parameters to be learned. The theoretical basis offers a straightforward choice of activation functions (the cutting function or its differentiable approximation, the squashing function), and also suggests an explanation to the great success of the rectified linear unit (ReLU). In this study, we focus on the architecture of a hybrid model and introduce the building blocks for future application in deep neural networks. The concept is illustrated with some toy examples taken from an extended version of the tensorflow playground.
keywords:neural network, XAI, continuous logic, nilpotent logic, adversarial problems
AI techniques, especially deep learning models are revolutionizing the business and technology world. One of the greatest challenges is the increasing need to address the problem of interpretability and to improve model transparency, performance and safety. Although deep neural networks have achieved impressive experimental results e.g. in image classification, they may surprisingly be unstable when it comes to adversarial perturbations, that is, minimal changes to the input image that cause the network to misclassify it adv1 (); adv2 (); adv3 (); adv (). Combining deep neural networks with structured logical rules and multicriteria decision tools, where logical operators are applied on clusters created in the first layer, contributes to the reduction of the black box nature of neural models. Aiming at interpretability, transparency and safety, implementing continuous-valued logical operators offers a promising direction.
Although boolean units and multilayer perceptrons have a long history, to the best of our knowledge there has been little attempt to combine neural networks with continuous logical systems so far. The basic idea of continuous logic is the replacement of the space of truth values by a compact interval such as . This means that the inputs and the outputs of the extended logical gates are real numbers of the unit interval, representing truth values of inequalities. Quantifiers and are replaced by and , and logical connectives are continuous functions. Based on this idea, human thinking and natural language can be modeled in a sophisticated way.
Among other families of many-valued logics, t-norm fuzzy logics are broadly used in applied fuzzy logic and fuzzy set theory as a theoretical basis for approximate reasoning. In fuzzy logic, the membership function of a fuzzy set represents the degree of truth as a generalization of the indicator function in classical sets. Both propositional and first-order (or higher-order) t-norm fuzzy logics, as well as their expansions by modal and other operators, have been studied thoroughly. Important examples of t-norm fuzzy logics are monoidal t-norm logic of all left-continuous t-norms, basic logic of all continuous t-norms, product fuzzy logic of the product t-norm, or the nilpotent minimum logic of the nilpotent minimum t-norm. Some independently motivated logics belong among t-norm fuzzy logics as well, like Łukasiewicz logic (which is the logic of the Łukasiewicz t-norm) and Gödel-Dummett logic (which is the logic of the minimum t-norm).
Recent results bounded (); boundedeq (); boundedimpl (); aggr (); ijcci (); iwobi () show that in the field of continuous logic, nilpotent logical systems are the most suitable for neural computation, mainly because of their bounded generator functions. Moreover, among other preferable properties, the fulfillment of the law of contradiction and the excluded middle, and the coincidence of the residual and the S-implication Dubois (); Trillasimpl () also make the application of nilpotent operators in logical systems promising. In bounded (); boundedeq (); boundedimpl (); aggr (); ijcci (); iwobi () a rich asset of operators were examined thoroughly: in bounded (), negations, conjunctions and disjunctions, in boundedimpl () implications, and in boundedeq () equivalence operators. In aggr (), a parametric form of a general operator was given by using a shifting transformation of the generator function. Varying the parameters, nilpotent conjunctive, disjunctive, aggregative (where a high input can compensate for a lower one) and negation operators can all be obtained. Moreover, as it was shown in ijcci (), membership functions, which play a substantial role in the overall performance of fuzzy representation, can also be defined by means of a generator function.
In this study, we introduce a nilpotent neural model, where nilpotent logical operators and multicriteria decision tools are implemented in the hidden layers of neural networks (see Figure 3). Only the weights of the first layer (parameters of hyperplanes separating the decision space) are to be learned, and the architecture needs to be designed. In a more sophisticated version, left for future work, the type of the operators in the hidden layers can also be learned by the network, or e.g. by a genetic algorithm. Moreover, in ijcci () the authors showed that the most important logical operators can be expressed as a composition of a parametric unary operator and the arithmetic mean. This means that the neural network only needs to learn the parameters of the first layer and (if not initially given) the parameters of these unary operators in the hidden layers.
In the nilpotent neural model, the activation functions in the first layer are membership functions representing truth values of inequalities, normalizing the inputs. At the same time, the activation functions in the the hidden layers model the cutting function (or to avoid the vanishing gradient problem, its differentiable approximation, the so-called squashing function) in the nilpotent logical operators. The theoretical background offers a straightforward choice of activation functions: the squashing function, which is an approximation of the rectifier. The fact that the squashing function, in contrast to the rectifier, is bounded from above, makes the continuous logical concept applicable.
The article is organized as follows. After summarizing the most important related work in Section 2, we revisit the relevant preliminaries concerning nilpotent logical systems in Section 3. The nilpotent neural concept is described in Section 4. In Section 5, the model is illustrated with some extended tensorflow playground examples. Finally, the main results are summarized in Section 6.
2 Related Work
Combinations of neural networks and logic rules have been considered in different contexts. Neuro-fuzzy systems neurofuzzy () were examined thoroughly in the literature. These hybrid intelligent systems synergize the human-like reasoning style of fuzzy systems with the learning structure of neural networks through the use of fuzzy sets and a linguistic model consisting of a set of IF-THEN fuzzy rules. These models were the first attempts to combine continuous logical elements and neural computation.
Kulkarni et al. Kulkarni () used a specialized training procedure to obtain an interpretable neural layer of an image network.
In harness (), Hu et al. proposed a general framework capable of enhancing various types of neural networks (e.g., CNNs and RNNs) with declarative first-order logic rules. Specifically, they developed an iterative distillation method that transfers the structured information of logic rules into the weights of neural networks. With a few highly intuitive rules, they obtained substantial improvements and achieved state-of-the-art or comparable results to previous best-performing systems.
In xu (), Xu et al. developed a novel methodology for using symbolic knowledge in deep learning by deriving a semantic loss function that bridges between neural output vectors and logical constraints. This loss function captures how close the neural network is to satisfying the constraints on its output.
In dl2 (), Fischer et al. presented DL2, a system for training and querying neural networks with logical constraints. Using DL2, one can declaratively specify domain knowledge constraints to be enforced during training, as well as pose queries on the model to find inputs that satisfy a set of constraints. DL2 works by translating logical constraints into a loss function with desirable mathematical properties. The loss is then minimized with standard gradient-based methods.
All of these promising approaches point towards the desirable mathematical framework that nilpotent logical systems can offer. Our general aspiration here is to provide a general mathematical framework in order to benefit from a tight integration of machine learning and continuous logical methods.
3 Nilpotent Logical Systems and Multicriteria Decision Tools
In this Section, we show why a specific logical system, the nilpotent logical system is well-suited to the neural environment. First, we provide some basic preliminaries.
The most important operators in classical logic are the conjunction, the disjunction and the negation operator. These three basic operators together form a so-called connective system. When extending classical logic to continuous logic, compatibility and consistency are crucial. The negation should also be involutive; i.e. for Involutive negations are called strong negations.
The triple where is a t-norm, is a t-conorm and is a strong negation, is called a connective system.
As mentioned in the Introduction, numerous continuous logical systems have been introduced and studied in the literature. In this study, we will show how nilpotent logical systems relate to neural networks.
bounded () A connective system is nilpotent, if the conjunction is a nilpotent t-norm, and the disjunction is a nilpotent t-conorm.
In the nilpotent case, the generator functions of the disjunction and the conjunction (denoted by and respectively) are bounded functions, being determined up to a multiplicative constant. This means that they can be normalized the following way:
Note that the normalized generator functions are now uniquely defined.
Next, we recall the definition of the cutting function, to simplify the notations used. The differentiable approximation of this cutting function, the squashing function introduced and examined in Gera (), will be a ReLu-like bounded activation function in our model. In aggr (), the authors showed that all the nilpotent operators can be described by using one generator function and the cutting function.
Let us define the cutting operation by
Note that the cutting function has the same values as ReLu (rectified linear unit) for but it remains bounded for .
With the help of the cutting operator, we can write the conjunction and disjunction in the following form, where and are decreasing and increasing normalized generator functions respectively.
For the natural negations to coincide, as shown in bounded (), must hold for which means that only one generator function, e.g. is needed to describe the operators. Henceforth, is represented by .
Note that the and operators (often used as conjunction and disjunction in applications) can also be expressed by in the following way:
The associativity of t-norms and t-conorms permits us to consider their extensions to the multivariable case. In aggr (), the authors examined a general parametric operator of nilpotent systems.
|MULTICRITERIA DECISION TOOLS|
Let be an increasing bijection, , and where and let us define the general operator by
Note that the general operator for is conjunctive, for it is disjunctive and for it is self-dual.
On the basis of Remark 8, the conjunction, the disjunction and the aggregative operator can be defined in the following way.
Let be an increasing bijection, where . Let us define the conjunction, the disjunction and the aggregative operator by
respectively, where .
A conjunction, a disjunction and an aggregative operator differ only in one parameter of the general operator in (6). The parameter has the semantic meaning of the level of expectation: maximal for the conjunction, neutral for the aggregation and minimal for the disjunction. Next, let us recall the weighted form of the general operator:
Let an increasing bijection with where The weighted general operator is defined by
Note that if the weight vector is normalized; i.e. for
For future application, we introduce a threshold-based operator in the following way.
Let , and let be a strictly increasing bijection. Let us define the threshold-based nilpotent operator by
Note that for , (12) gives the functions modeled by perceptrons in neural networks:
Based on Equations (7) to (9), it is easy to see that the conjunction, the disjunction and also the aggregative operator can be expressed in this form. The most commonly used operators for and for special values of and , also for , are listed in Table 1.
Now let us focus on the unary (1-variable) case, examined in ijcci (), which also plays an important role in the nilpotent neural model. The unary operators are mainly used to construct modifiers and membership functions by using a generator function. The membership functions can be interpreted as modeling an inequality memeva (). Note that non-symmetrical membership functions can also be constructed by connecting two unary operators with a conjunction iwobi (); ijcci ().
Let , and let , a strictly increasing bijection. Then
The most important unary operators for special values are listed in Table 2.
Our attention can now be turned to the cutting function. The main drawback of the cutting function in the nilpotent operator family is the lack of differentiability, which would be necessary for numerous practical applications. Although most fuzzy applications (e.g. embedded fuzzy control) use piecewise linear membership functions owing to their easy handling, there are areas where the parameters are learned by a gradient-based optimization method. In this case, the lack of continuous derivatives makes the application impossible. For example, the membership functions have to be differentiable for each input in order to fine-tune a fuzzy control system by a simple gradient-based technique. This problem could be easily solved by using the so-called squashing function, which provides a solution to the above-mentioned problem by a continuously differentiable approximation of the cutting function.
By increasing the value of , the squashing function approaches the generalized cutting function. In other words, shows the accuracy of the approximation, while the parameters and determine the center and width. The error of the approximation can be upper bounded by , which means that by increasing the parameter , the error decreases by the same order of magnitude. The derivatives of the squashing function are easy to calculate and can be expressed by sigmoid functions and itself:
The squashing function defined above is an approximation of the rectifier (rectified linear unit, ReLU) for , with the benefit of having an upper bound. Being bounded from above makes the use of continuous logic possible. Also note the significant difference between the properties of the squashing function and the sigmoid. Using sigmoids, nilpotent logic can never be modeled. The fact that on the other hand, ReLu can approximate the cutting function, may offer an interpretation to its effectiveness and success. The fact that the squashing function is differentiable and its derivatives can be expressed by sigmoids improves efficiency in applications. An illustration of the nilpotent conjunction and disjunction operators with their soft approximations using the squashing function are shown in Figure 1. Note that not only logical operators, but also multicriteria decision tools, like the preference operator can be described similarly. This means that our model offers a unified framework, in which logic and multicriteria decision tools cooperate and supplement each other.
4 Nilpotent Logic-based Interpretation of Neural Networks
The results on nilpotent logical systems discussed in Section 3 offer a new approach to designing neural networks using continuous logic, since membership functions (representing the truth value of an ineqaulity), and also nilpotent logical operators can be modeled by perceptrons. Whether for image classification or for multicriteria decision support, structured logical rules can contribute to the performance of a deep neural network. Given that the network has to find a region in the decision space or in an image, after designing the architecture appropriately, the network only has to find the parameters of the boundary. Here, we propose creating basic building blocks by applying the nilpotent logical concept in the perceptron model and also in the neural architecture.
Boolean units and multilayer perceptrons have a long history. Logical gates (such as the AND, NOT and OR gates) are the basis of any modern day computer. It is well known that any Boolean function can be composed using a multiâlayer perceptron. As examples, the conjunction and the disjunction are illustrated in Figure 6. Note that for the XOR gate, an additional hidden layer is also required. It can be shown that a network of linear classifiers that fires if the input is in a given area with arbitrary complex decision boundaries can be constructed with only one hidden layer and a single output. This means that if a neural network learns to separate different regions in the -dimensional space having input values, each node in the first layer can separate the space into two half-spaces by drawing one hyperplane, while the nodes in the hidden layers can combine them using logical operators.
In Figure 5, some basic types of neural networks are shown with two input values, finding different regions of the plane. Generally speaking, each node in the neural net represents one threshold and therefore it can draw one line in the picture. The line may be diagonal if the nodes receives both of the inputs and . The line has to be horizontal or vertical if the node only receives one of the inputs. The deeper hidden levels are responsible for the logical operations.
From several perspectives, as mentioned in the Introduction, a continuous logical framework can provide a more sophisticated and effective approach to this problem than a boolean can.
Among continuous logical systems, the nilpotent logical framework described above is well-suited for the neural concept architecture, when it comes to implementing logical rules. For the sake of simplicity, henceforth we assume that for the generator function holds and we design the neural network architecture in the following way. In the first layer, the perceptrons model membership functions as truth values of inequalities, such as
representing a half space bounded by a hyperplane in the decision space (see Figure 4). Here, the weights and the bias are to be learned. The truth value of this inequality can be modeled by
or to avoid the vanishing gradient problem, the cutting function can be approximated by the so-called squashing function, by the differentiable approximation of the cutting function
The parameters of the squashing function in (22) now have a context-dependent semantic meaning.
Since the nilpotent logical operators also represent inequalities and therefore have the same structure (compare with Equation (14), and see also Table 1 and Figure 4), in the hidden layers, we can apply them on the clusters created in the first layer (see Figure 3). Here, the weights and biases characterize the type of the logical operator. As an illustration, the perceptron models of the conjunction and of the disjunction can be seen in Figure 6. This means that for a given logical operator, the weights and the bias can be frozen. The squashing function plays the role of the activation function in all of the layers. The backpropagation algorithm needs to be adjusted: the error function is calculated based on all of the weights and biases (frozen and learnable), but the backpropagation leaves the frozen layers out. Moreover, in this nilpotent model, the conjunction, the disjunction and the aggregation differ only in a translation parameter; i.e. the weights are equal for all of them and only the biases are different. This fact makes it possible for the network to learn the type of logical operators just by learning the bias.
To illustrate the model, two basic examples are given.
As an example, let us assume that a network needs to find positive examples which lie inside a triangular region. This means that we should design the network to conjunct three half planes, and to find the parameters of the boundary lines. The output values for a triangular domain using nilpotent logic and its continuous approximation are illustrated in Figure 8.
Additionally, taking into account the fact that the area inside or outside a circle is described by an inequality containing the squares of the input values, it is also possible to construct a novel type of unit by adding the square of each input into the input layer (see Figure 7). This way, the polygon approximation of the circle can be eliminated. For an illustration, see Figure 9. Note that by modifying the weights, an arbitrary conic section can also be described.
Choosing the right activation function for each layer is crucial and may have a significant impact on metric scores and the training speed of the neural model. In the model introduced in this Section, the smooth approximation of the cutting function is a natural choice for the activation function in the first layer as well as in the hidden layers, where the logical operators work. Although there are a vast number of activation functions (e.g. linear, sigmoid, , or the recently introduced Rectified Linear Unit (ReLU) relu (), exponential linear unit (ELU) elu (), sigmoid-weighted linear unit (SiLU) silu ()) considered in the literature, most of them are introduced based on some desired properties, without any theoretical background. The parameters are usually fitted only on the basis of experimental results. The squashing function stands out of the other candidates by having a theoretical background thanks to the nilpotent logic which lies behind the scenes.
To sum up, on the one hand, this structure leads to a drastic reduction in the number of parameters to be learned, and on the other hand, it supports the interpretation, making the debugging process manageable. Given the logical structure, the parameters to be learned are located in the first layer. The choice of the activation functions in the first layer as well as in the hidden, logical layers, have a sound theoretical background. Designing the network architecture appropriately, arbitrary regions can be described as intersections and unions of polyhedra in the decision space. Moreover, multicriteria decision tools can also be integrated with given weights and thresholds. Note that the weights and biases in the hidden layers define the type of operator to be used. These parameters can also be learned in a more sophisticated model to be examined in future work.
5 Playground Examples
To illustrate our model with some simple examples, we extended the Tensorflow Playground with the squashing function () as activation function and modified the backpropagation algorithm according to the frozen weights in the hidden layers.
Let us first consider an example on a particular data set based on Example 2. An image of a generated set of data is shown in Fig. 11. Orange data points have a value of and blue points have a value of . Here, the target variable is positive when and are both positive or both negative. In a logical network:
If AND THEN predict
If AND THEN predict
An efficient neural network can be built to make predictions for this logical expression even without using the cross feature . For the structure and for the frozen weights and biases, see Table 3.
According to our model, the smooth approximation of the cutting function called the squashing function is a natural choice for the activation function in the first layer as well as in the hidden layers, where the logical operators are used. If we design this logical structure before training, an interpretation of the network naturally emerges.
Notice how the neurons in the hidden layer reveal the logical structure of the network (Figure 11), assisting the interpretability of the neural model.
Another image of a generated set of data is shown in Figure 12. Orange data points have a value of and blue points have a value of . Here, the network has to learn the parameters of the straight lines separating the different regions. The target variable is positive when both and hold or where both and hold. In a logical network:
If AND THEN predict
If AND THEN predict
The network structure is illustrated in Figure 12.
Here, the expression is modeled by the preference operator (see Table 1 and 4). Notice how the neurons in the hidden layer reveal the logical structure of the network (see Figure 12), assisting the interpretability of the neural model.
Networks can also be readily designed for finding concave regions. For example, see Figure 13.
In this study, we suggested interpreting neural networks by using continuous nilpotent logic and multicriteria decision tools to reduce the black box nature of the neural models, aiming at the interpretability and improved safety of machine learning. We introduced the main concept and the basic building blocks of the model to lay the foundations for the future steps of the application. In our model, membership functions (representing truth values of inequalities), and also nilpotent operators are modeled by perceptrons. The network architecture is designed prior to training. In the first layer, the parameters of the membership functions are needed to be learnt, while in the hidden layers, the nilpotent logical operators work with given weights and biases. Based on previous results, a rich asset of logical operators with rigorously examined properties is available. A novel type of neural unit was also introduced by adding the square of each input to the input layer (see Figure 7) to describe the inside or the outside of a circle without polygon approximation.
The theoretical basis offers a straightforward choice of activation functions: the cutting function or its differentiable approximation, the squashing function. Both functions represent truth values of soft inequalities, and the parameters have a semantic meaning. Our model also seems to provide an explanation to the great success of the rectified linear unit (ReLU).
The concept was illustrated with some toy examples taken from an extended version of the tensorflow playground. The implementation of this hybrid model in deeper networks (by combining the building blocks introduced here) and its application e.g. in multicriteria decision making or image classification is left for future work.
This study was partially supported by grant TUDFO/47138-1/2019-ITM of the Ministry for Innovation and Technology, Hungary.
- journal: Information Sciences
- B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto and F. Roli, Evasion Attacks Against Machine Learning at Test Time, Lecture Notes in Computer Science, 387â402, 2013.
- D. Clevert, T. Unterthiner and S. Hochreiter, Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs), arXiv:1511.07289, 2015.
- O. Csiszár, J. Dombi, Generator-based Modifiers and Membership Functions in Nilpotent Operator Systems, IEEE International Work Conference on Bioinspired Intelligence (iwobi 2019), July 3-5, 2019, Budapest, Hungary, 2019.
- J. Dombi, Membership function as an evaluation, Fuzzy Sets Syst., 35, 1-21, 1990.
- J. Dombi, O. Csiszár, The general nilpotent operator system, Fuzzy Sets Syst., 261, 1-19, 2015.
- J. Dombi, O. Csiszár, Implications in bounded systems, Inform. Sciences, 283, 229-240, 2014.
- J. Dombi, O. Csiszár, Equivalence operators in nilpotent systems, Fuzzy Sets Syst., doi:10.1016/j.fss.2015.08.012, available online, 2015.
- J. Dombi, O. Csiszár, Self-dual operators and a general framework for weighted nilpotent operators, Int J Approx Reason, 81, 115-127, 2017.
- J. Dombi, O. Csiszár, Operator-dependent Modifiers in Nilpotent Logical Systems, Operator-dependent Modifiers in Nilpotent Logical Systems, In Proceedings of the 10th International Joint Conference on Computational Intelligence (IJCCI 2018), 126-134, 2018
- D. Dubois, H. Prade, Fuzzy sets in approximate reasoning. Part 1: Inference with possibility distributions, Fuzzy Sets and Syst., 40, 143-202, 1991.
- S. Elfwing, E. Uchibe and K. Doya, Sigmoid-weighted linear units for neural network function approximation in reinforcement learning, Neural Networks 107, 3-11, 2018.
- M. Fisher, M. Balunovic, D. Drachsler-Cohen, T. Gehr, C. Zhang and M. Vechev, DL2: Training and Querying Neural Networks with Logic, Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019.
- M. V., Franca, G. Zaverucha and A. S. d. Garcez, Fast relational learning using bottom clause propositionalization with artificial neural networks, Machine learning, 94(1):81-104, 2014.
- A. S. d. Garcez, K. Broda and D. M. Gabbay, Neural-symbolic learning systems: foundations and applications, Springer Science & Business Media, 2012.
- J. Dombi, Zs. Gera, Fuzzy rule based classifier construction using squashing functions. J. Intell. Fuzzy Syst. 19, 3-8, 2008.
- J. Dombi, Zs. Gera, The approximation of piecewise linear membership functions and Łukasiewicz operators, Fuzzy Sets Syst., 154, 275- 286, 2005.
- I. J. Goodfellow, J. Shlens and C. Szegedy, Explaining and Harnessing Adversarial Examples, arXiv:1412.6572, 2014.
- Z. Hu, X. Ma, Z. Liu, E. Hovy and E. P. Xing, Harnessing Deep Neural Networks with Logic Rules, ArXiv:1603.06318v5
- T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum, Deep convolutional inverse graphics network, In Proc. of NIPS, 2530-2538, 2015.
- C. T. Lin, C.S.G. Lee, Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems, Upper Saddle River, NJ: Prentice Hall, 1996.
- A. L. Maas, A. Y. Hannun and A. Y. Ng, Rectifier Nonlinearities Improve Neural Network Acoustic Models, 2014.
- C. Szegedy, Z. Wojciech, I. Sutskever J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, Intriguing properties of neural networks, arXiv:1312.6199, 2013.
- S. Thys, W. Van Ranst and T. GoedemÃ©, Fooling automated surveillance cameras: adversarial patches to attack person detection, arXiv:1904.08653, 2019.
- G. G. Towell, J. W. Shavlik and M. O. Noordewier, Refinement of approximate domain theories by knowledge-based neural networks, in Proceedings of the eighth National Conference on Artificial Intelligence, Boston, MA, 861-866, 1990.
- E. Trillas, L. Valverde, On some functionally expressable implications for fuzzy set theory, Proc. of the 3rd International Seminar on Fuzzy Set Theory, Linz, Austria, 173-190, 1981.
- J. Xu, Z. Zhang, T. Friedman, Y. Liang and G. V. den Broeck, A semantic loss function for deep learning with symbolic knowledge, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Volume 80, 5498â5507, 2018.