Handling Nominals and Inverse Roles using Algebraic Reasoning
This paper presents a novel tableau calculus which incorporates algebraic reasoning for deciding ontology consistency. Numerical restrictions imposed by nominals, existential and universal restrictions are encoded into a set of linear inequalities. Column generation and branch-and-price algorithms are used to solve these inequalities. Our preliminary experiments indicate that this calculus performs better on ontologies than standard tableau methods.
1 Introduction and Motivation
Description Logic (DL) is a formal knowledge representation language that is used for modeling ontologies. Modern description logic systems provide reasoning services that can automatically infer implicit knowledge from explicitly expressed knowledge. Designing reasoning algorithms with high performance has been one of the main concerns of DL researchers. One of the key features of many description logics is support for nominals. Nominals are special concept names that must be interpreted as singleton sets. They allow to use Abox individuals within concept descriptions. However, nominals carry implicit global numerical restrictions that increase reasoning complexity. Moreover, the interaction between nominals and inverse roles leads to the loss of the tree model property. Most state-of-the-art reasoners, such as Konclude , Fact++ , HermiT , have implemented traditional tableau algorithms. Konclude also incorporated consequence-based reasoning into its tableau calculus . These reasoners try to construct completion graphs in a highly non-deterministic way in order to handle nominals. For example, a small ontology models Canada consisting of its ten provinces: , , , , , , , , , . If one tries to model that Canada consists of 11 provinces, it is trivial to see that it is not possible because the cardinality of is implicitly restricted to the 10 provinces listed as nominals. However, according to our preliminary experiments, above mentioned DL reasoners are unable to decide this inconsistency within a reasonable amount of time. Consequence-based (CB) reasoning algorithms are also extended to more expressive DLs such as  and . Since their implementations are not available, we could not analyze these reasoners.
However, algebraic DL reasoners are considered more efficient in handling numerical restrictions [11, 13, 14, 28]. RacerPro  was the first highly optimized reasoner that combined tableau-based reasoning with algebraic reasoning . Other tableau-based algebraic reasoner for , , [12, 11] are also proposed to handle qualified number restrictions (QNRs) and their interaction with inverse roles or nominals. These reasoners use an atomic decomposition technique to encode number restrictions into a set of linear inequalities. These inequalities are then solved by integer linear programming (ILP). These reasoners perform very efficiently in handling huge values in number restrictions. However, their ILP algorithms are best-case exponential to the number of inequalities. For example, in case of inequalities they require variables in order to find the optimal solution. However, for ILP with a huge number of variables it is not feasible to enumerate all variables. To overcome this problem, the column generation technique has been used [28, 30] which considers a small subset of variables. However, to the best of our knowledge, no algebraic calculus can handle DLs supporting nominals and inverse roles simultaneously.
In this paper, we present a novel algebraic tableau calculus for to handle a large number of nominals and their interaction with inverse roles. The rest of this paper is structured as follows. Section 2 defines important terms and introduces . Section 3 presents the algebraic tableau calculus for . Section 4 provides evaluation results for the implemented prototype Cicada. The last section concludes our paper.
In this section, we introduce and some notations used later. Let where represents concept names and nominals. Let be a set of role names with a set of transitive roles . The set of roles in is where is called the inverse of . A function returns the inverse of a role such that if and if and . An interpretation consists of a non-empty set of individuals called the domain of interpretation and an interpretation function . Table 1 presents syntax and semantic of . We use () as an abbreviation for () for some . In the following denotes set cardinality.
A role inclusion axiom (RIA) of the form is satisfied by if . We denote with the transitive, reflexive closure of over . If , we call a subrole of and a superrole of . A general concept inclusion (GCI) is satisfied by if . A role hierarchy is a finite set of RIAs. A Tbox is a finite set of GCIs. A Tbox and its associated role hierarchy is satisfied by (or consistent) if each GCI and RIA is satisfied by . Such an interpretation is then called a model of . A concept description is said to be satisfiable by iff . An Abox is a finite set of assertions of the form (concept assertion) with , and (role assertion) with . Due to nominals, a concept assertion can be transformed into a concept inclusion and a role assertion into . Therefore, concept satisfiability and Abox consistency can be reduced to Tbox consistency by using nominals. We use as an abbreviation for and may write as . Moreover, we do not make the unique name assumption; therefore, two nominals might refer to the same individual.
Nominals carry implicit global numerical restrictions. For example, if (or ), then impose a numerical restriction that there can be at most (or at least, if are declared as pair-wise disjoint) three instances of . These restrictions are global because they affect the set of all individuals of in . These implicit numerical restrictions increase reasoning complexity.
3 An Algebraic Tableau Calculus for
In this section, we present an algebraic tableau calculus for that decides Tbox consistency. Since nominals carry numerical restrictions, algebraic reasoning is used to ensure their semantics. The algorithm takes a Tbox and its role hierarchy as input and tries to create a complete and clash-free completion graph in order to check Tbox consistency. The reasoner is divided into two modules: 1) Tableau Module (TM), and 2) Algebraic Module (AM).
Let be a completion graph for a Tbox where is a set of nodes and a set of edges. Each node is labelled with a set of concepts , and each edge with a set of role names . For each node , if contains a universal restriction on role and there exists an -neighbour of , then contains a tuple of the form where is an -neighbour of . We use to denote the cardinality of a node . For convenience, we assume that all concept descriptions are in negation normal form.
TM starts with some preprocessing and reduces all the concept axioms in a Tbox to a single axiom such that , where transforms a given concept expression to its negation normal form. The algorithm checks consistency of by testing the satisfiability of where is a fresh nominal in , which means that at least and . Moreover, since then every domain element must also satisfy . For creating a complete and clash-free completion graph, TM applies expansion rules (see Figure 1 and Section 3.1). AM handles all numerical restrictions using ILP. It generates inequalities and solves them using the branch-and-price technique (see Section 3.2 for details). We use equality blocking [18, 16] due to the presence of inverse roles.
3.1 Expansion Rules
In order to check the consistency of a Tbox , the proposed algorithm creates a completion graph using the expansion rules shown in Figure 1. A node in contains a clash if for or AM has no feasible solution for . is complete if no expansion rule is applicable to any node in . is consistent if is complete and no node in contains a clash.
The -Rule, -Rule and -Rule are similar to standard tableau expansion rules for . The -Rule preserves the semantics of transitive roles. The -Rule merges two nodes containing in their label the same nominal. Suppose there is and , and nodes and are not the same, then -Rule merges into . It adds to and moves all edges leading to (from) so that they lead to (from) . For each node , if and , then . Similarly, if and , then . It also merges into .
|-Rule||if and then set|
|-Rule||if and then set for some|
|-Rule||if and there with , and|
|-Rule||if and there exist with and , , and a node with and|
|-Rule||if for some there are nodes , with , then if is an initial node, then merge into , else merge into|
|-Rule||if , , and|
is not blocked then
|-Rule||if and , , then merge into and into , and for all with add to and to|
If and , then the -Rule encodes for AM the already existing -edge by adding a tuple to . AM plays also an important role if nominals occur in universal restriction. For example, consider the axioms , and , where , and . Suppose we have , and . Since nominals carry numerical restrictions, implies that we can have at most 2 -neighbours of . However, standard tableau reasoners might create two new -neighbours of without considering the existing -neighbour of . Then they try to merge these three nodes in a non-deterministic way to satisfy the numerical restriction imposed by nominals. In our approach, the -Rule encodes information about an existing -neighbour of and AM generates a deterministic solution.
For a node , AM transforms all existential restrictions, universal restrictions and nominals to a corresponding system of inequalities. AM then processes these inequalities and gives back a solution set . The set is either empty or contains solutions derived from feasible inequalities. In case of infeasibility AM signals a clash. A solution is defined by a set of tuples of the form with , , , and . Each tuple represents -neighbours of (where is a set of roles) that are instances of all elements of . Here, is an optional set that contains existing -neighbours of that must be reused and is added to their labels. Consider the axiom , where , , , , and . AM returns the solution . The -Rule is used to generate nodes based on the arithmetic solution that satisfies a set of inequalities. For the above solution, the -Rule creates one node with cardinality 1 such that and . The -Rule creates an edge between nodes and , and adds to and to . The -Rule always adds all implied superroles to edge labels.
3.2 Generating Inequalities
Dantzig and Wolfe  proposed a column generation technique for solving linear programming (LP) problems, called Dantzig–Wolfe decomposition, where a large LP is decomposed into a master problem and a subproblem (or pricing problem). In case of LP problems with a huge number of variables, column generation works with a small subset of variables and builds a Restricted Master Problem (RMP). The Pricing Problem (PP) generates a new variable with the most reduced cost if added to RMP (see [4, 29] for details). However, column generation may not necessarily give an integral solution for an LP relaxation, i.e., at least one variable has not an integer value. Therefore, the branch-and-price method  has been used which is a combination of column generation and branch-and-bound technique . We employ this technique by mapping number restrictions to linear inequality systems using a column generation ILP formulation (see  for details). CPLEX  has been used to solve our ILP formulation.
3.2.1 Encoding Existential Restrictions and Nominals into Inequalities
The atomic decomposition technique  is used to encode numerical restrictions on concepts and role fillers into inequalities. These inequalities are then solved for deciding the satisfiability of the numerical restrictions. The existential restrictions are converted into inequalities. The cardinality of a partition element containing a nominal is equal to due to the nominal semantics; for each nominal . Therefore, the decomposition set is defined as , where () contains existential (universal) restrictions and contains all related nominals. Each element represents a role and its qualification concept expression and each element represents a nominal . The elements in are used by AM to ensure the semantics of universal restrictions. The set of related nominals is defined as where is the closure of concept expression . The atomic decomposition considers all possible ways to decompose into sets that are semantically pairwise disjoint.
3.2.2 Branch-and-Price Method
In the following, we use a Tbox and its role hierarchy , a completion graph , a decomposition set and a partitioning that is the power set of containing all subsets of except the empty set. Each partition element represents the intersection of its elements. We decompose our problem into two subproblems: (i) restricted master problem (RMP), and (ii) pricing problem (PP). RMP contains a subset of columns and PP computes a column that can maximally reduce the cost of RMP’s objective. Whenever a column with negative reduced cost is found, it is added to RMP. Number restrictions are represented in RMP as inequalities, with a restricted set of variables. The flowchart in Figure 2 illustrates the whole process.
Restricted Master Problem
RMP is obtained by considering only variables with and and relaxing the integrality constraints on the variables. Initially is empty and RMP contains only artificial variables to obtain an initial feasible inequality system. Each artificial variable corresponds to an element in such that , and , . An arbitrarily large cost is associated with every artificial variable. If any of these artificial variables exists in the final solution, then the problem is infeasible. The objective of RMP is defined as the sum of all costs as shown in (1) of the RMP below.
where a decision variable represents the elements of the partition element . The coefficients are associated with variables and indicates whether an -neighbour that is an instance of exists in . Similarly, indicates whether a nominal exists in . The weight defines the cost of selecting and it depends on the number of elements contains. Since we minimize the objective function, in the objective (1) ensures that only subsets with entailed concepts will be added which are the minimum number of concepts that are needed to satisfy all the axioms. Constraint (2) encodes existential restrictions and (3) numerical restrictions imposed by nominals (i.e., ). Constraint (4) states the integrality condition relaxed from to .
The objective of PP uses the dual values as coefficients of the variables that are associated with a potential partition element. The binary variables , , () are used to ensure the description logic semantics. A binary variable is used to handle role hierarchy. A variable is set to 1 if there exists an instance of concept and is set to 1 if there exists an -neighbour that is an instance of concept . Likewise, is set to 1 if there exists a nominal . Otherwise these variable are set to 0. The PP is given below.
where vector and are dual variables associated with (2) and (3) respectively. For each at-least restriction represented in (2), Constraint (6) is added to PP, which ensures that if then variable for must exist in . Similarly, (7) ensures the semantics of nominals represented in (2). Constraints (8) - (10) ensure the semantics of universal restrictions and role hierarchies respectively.
We can also map the semantics of selected DL axioms, where only atomic concepts occur, into inequalities, as shown in Table 2. For every , AM adds to PP. Therefore, if PP generates a partition containing and , then it must also contain . Similarly, for every , AM adds to PP. This inequality ensures that if a partition contains , then it must also contain or .
|DL Axiom||Inequality in PP||Description|
|If a set contains , then it also contains .|
|If a set contains , then it also contains at least one concept from .|
|Encodes unsatisfiability and disjointness in case|
3.2.3 Soundness and Completeness of Algebraic Module
All existential restrictions and nominals are converted into linear inequalities and added to RMP. Other axioms, such as universal restrictions, role hierarchy, subsumption and disjointness, are embedded in PP. In case of feasible inequalities, the branch-and-price algorithm returns a solution set that contains valid partition elements. Since the branch-and-price algorithm satisfies all the axioms embedded in RMP and PP, this solution is sound. Moreover, it is also complete because CPLEX is used to solve linear inequalities and it does not overlook any possible solution.
For a set of inequalities, the arithmetic module either generates an optimal solution which satisfies all inequalities or detects infeasibility.
3.3 Example Illustrating Rule Application and ILP formulation
Consider the small Tbox
We start with root node and its label and by unfolding and applying the -Rule we get .
Since , AM generates a corresponding set of inequlities and applies ILP considering known subsumption and disjointness.
For solving these inequalities, RMP starts with artificial variables, is initially empty, and , and (see Fig. 3). The objective of (PP 1a) uses the dual values from (RMP 1a). For each at-least restriction a constraint (e.g., ) is added to (PP 1a), which indicates that if then a variable will also be 1. Constraint () ensures that and cannot exist in same partition element. Constraint () ensures the semantics of nominals.
The values of are 1 in (PP 1a), therefore, the variable is added to (RMP 1b). Since only one variable (i.e., ) is 1, the cost of is 1. and the value of the objective function is reduced from 30 in (RMP 1a) to 11 in (RMP 1b).
As the value of is 1 in (PP 1b), the variable is added to (RMP 1c). , and the cost is further reduced from 11 in (RMP 1b) to 2 in (RMP 1c).
All artificial variables in (RMP 1c) are zero which might indicate that we have reached a feasible solution. The reduced cost of (PP 1c) is not negative anymore which means that (RMP 1c) cannot be improved further. Therefore, AM terminates after third ILP iteration and returns the optimal solution , .
The -Rule creates two new nodes and with , , and .
The -Rule creates edges and with and (because ). It also creates back edges and with and .
By unfolding in the label of and by applying the -Rule we get .
The -Rule encodes information about existing -neighbour of by adding a tuple to .
AM uses to start ILP. Due to lack of space we cannot provide the complete RMP and PP solution process here. Since , the universal restriction is ensured by adding the following inequalities to PP: , , and for all we added an equality . Therefore, whenever the values of .
Since contains , AM adds node in solution. Therefore, AM returns the solution .
The -Rule creates only one new node with and , and updates the label of node with .
The -Rule creates edges and with and .
By unfolding in the label of we get . AM gives solution . The -Rule creates node with and . The -Rule creates edges and with and .
and after unfolding the -Rule adds to . However, already occurs in and . Therefore, the -Rule merges node into node .
Since no more rules are applicable, the tableau algorithm terminates.
4 Performance Evaluation
We developed a prototype system called Cicada111System and test ontologies: https://users.encs.concordia.ca/~haarslev/Cicada that implements our calculus as proof of concept. Besides the use of ILP and branch-and-price Cicada only implements a few standard optimization techniques such as lazy unfolding [2, 17], nominal absorption , and dependency directed backtracking  as well as a ToDo list architecture  to control the application of the expansion rules. Cicada might not perform well for ontologies that require other optimization techniques.
Therefore, we built a set of synthetic test cases to empirically evaluate Cicada. Figure 6 presents some metrics of benchmark ontologies and evaluation results. We compared Cicada with major OWL reasoners such as FaCT++ (1.6.5) , HermiT (1.3.8) , and Konclude (0.6.2) .
The first benchmark (see top part of Figure 6) uses two real-world ontologies. The ontology EU-Members (adapted from ) models 28 members of European Union (EU) whereas CA-Provinces models 10 provinces of Canada. We added nominals requiring 29 EU members and 11 Canadian provinces respectively. The results show that only Cicada can identify the inconsistency of EU-Members within the time limit. Moreover, Cicada is more than two orders of magnitude faster than FaCT++ in identifying the inconsistency of CA-Provinces.
The second benchmark (see bottom part of Figure 6) consists of small synthetic test ontologies that are using a variable for representing the number of nominals. In order to test the effect of increased number of nominals we defined concept and as and . Nominals and concepts are declared as pairwise disjoint. The first set consists of consistent ontologies in which we declared and as pairwise disjoint. The second set consists of inconsistent ontologies in which we declared and as pairwise disjoint. Only Cicada can process the ontologies with more than 10 nominals within the time limit.
|Ontology Name||Ontology Metrics||Evaluation Results|
|n||Ontology Metrics||Evaluation Results||Evaluation Results|
We presented a tableau-based algebraic calculus for handling the numerical restrictions imposed by nominals, existential and universal restrictions, and their interaction with inverse roles. These numerical restrictions are translated into linear inequalities which are then solved by using algebraic reasoning. The algebraic reasoning is based on a branch-and-price technique that either computes an optimal solution, or detects infeasibility. An empirical evaluation of our calculus showed that it performs better on ontologies having a large number of nominals, whereas other reasoners were unable to classify them within a reasonable amount of time. In future work, we will extend the technique presented here to .
-  Baader, F.: The description logic handbook: Theory, implementation and applications. Cambridge university press (2003)
-  Baader, F., Hollunder, B., Nebel, B., Profitlich, H.J., Franconi, E.: Am empirical analysis of optimization techniques for terminological representation systems. Applied Intelligence 4(2), 109–132 (1994)
-  Barnhart, C., Johnson, E.L., Nemhauser, G.L., Savelsbergh, M.W., Vance, P.H.: Branch-and-price: Column generation for solving huge integer programs. Operations research 46(3), 316–329 (1998)
-  Chvatal, V.: Linear programming. Macmillan (1983)
-  CPLEX optimizer, https://www.ibm.com/analytics/cplex-optimizer
-  Cucala, D.T., Cuenca Grau, B., Horrocks, I.: Consequence-based reasoning for description logics with disjunction, inverse roles, and nominals. In: Proceedings of the 30th International Workshop on Description Logics (July 2017)
-  Cucala, D.T., Cuenca Grau, B., Horrocks, I.: Consequence-based reasoning for description logics with disjunction, inverse roles, number restrictions, and nominals. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. pp. 1970–1976 (2018)
-  Dantzig, G.B., Orden, A., Wolfe, P., et al.: The generalized simplex method for minimizing a linear form under linear inequality restraints. Pacific Journal of Mathematics 5(2), 183–195 (1955)
-  Dantzig, G.B., Wolfe, P.: Decomposition principle for linear programs. Operations research 8(1), 101–111 (1960)
-  Desrosiers, J., Soumis, F., Desrochers, M.: Routing with time windows by column generation. Networks 14(4), 545–565 (1984)
-  Faddoul, J., Haarslev, V.: Algebraic tableau reasoning for the description logic SHOQ. Journal of Applied Logic 8(4), 334–355 (2010)
-  Faddoul, J., Haarslev, V.: Optimizing algebraic tableau reasoning for SHOQ: First experimental results. In: Proceedings of the 23rd International Workshop on Description Logics (DL’10). pp. 161–171 (2010)
-  Farsiniamarj, N., Haarslev, V.: Practical reasoning with qualified number restrictions: A hybrid Abox calculus for the description logic SHQ. AI Communications 23(2-3), 205–240 (2010)
-  Haarslev, V., Möller, R.: Racer system description. In: Proceedings of the International Joint Conference on Automated Reasoning. pp. 701–705. Springer (2001)
-  Haarslev, V., Timmann, M., Möller, R.: Combining tableaux and algebraic methods for reasoning with qualified number restrictions. In: Proceedings of the International Workshop on Description Logics (DL’01) (2001)
-  Hladik, J.: A tableau system for the description logic SHIO. In: Proceedings of the Doctoral Programme of IJCAR. vol. 106. Citeseer (2004)
-  Horrocks, I.: Using an expressive description logic: FaCT or fiction? In: Proceedings of the 6th International Conference on Principles of Knowledge Representation and Reasoning (KR’98). vol. 98, pp. 636–645 (1998)
-  Horrocks, I., Sattler, U.: A description logic with transitive and inverse roles and role hierarchies. Journal of logic and computation 9(3), 385–410 (1999)
-  Horrocks, I., Sattler, U.: A tableau decision procedure for SHOIQ. Journal of automated reasoning 39(3), 249–276 (2007)
-  Karmarkar, N.: A new polynomial-time algorithm for linear programming. In: Proceedings of the 16h annual ACM symposium on Theory of computing. pp. 302–311. ACM (1984)
-  Khachiyan, L.G.: Polynomial algorithms in linear programming. USSR Computational Mathematics and Mathematical Physics 20(1), 53–72 (1980)
-  Ohlbach, H., Köhler, J.: Modal logics, description logics and arithmetic reasoning. Artificial Intelligence 109(1-2), 1–31 (1999)
-  Roosta Pour, L., Haarslev, V.: Algebraic reasoning for SHIQ. In: Proceedings of the International Workshop on Description Logics (DL’12). pp. 530–540 (2012)
-  Shearer, R., Motik, B., Horrocks, I.: HermiT: A highly-efficient OWL reasoner. In: Proceedings of the OWL: Experiences and Directions (OWLED). vol. 432, p. 91 (2008)
-  Steigmiller, A., Glimm, B., Liebig, T.: Coupling tableau algorithms for expressive description logics with completion-based saturation procedures. In: Proceedings of the 7th International Joint Conference on Automated Reasoning (IJCAR’14). pp. 449–463. Springer (2014)
-  Steigmiller, A., Liebig, T., Glimm, B.: Konclude: system description. Web Semantics: Science, Services and Agents on the World Wide Web 27, 78–85 (2014)
-  Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: System description. In: Proceedings of the International Joint Conference on Automated Reasoning. pp. 292–297. Springer (2006)
-  Vlasenko, J., Daryalal, M., Haarslev, V., Jaumard, B.: A saturation-based algebraic reasoner for ELQ. In: Proceedings of the 5th Workshop on Practical Aspects of Automated Reasoning (PAAR 2016). pp. 110–124. CEUR (2016)
-  Vlasenko, J., Haarslev, V., Jaumard, B.: Pushing the boundaries of reasoning about qualified cardinality restrictions. In: Proceedings of the International Symposium on Frontiers of Combining Systems. pp. 95–112. Springer (2017)
-  Zolfaghar Karahroodi, N., Haarslev, V.: A consequence-based algebraic calculus for SHOQ. In: Proceedings of the International Workshop on Description Logics (DL’17) (2017)
Appendix A Integer Linear Programming
Linear Programming (LP) is the study of determining the minimum (or maximum) value of a linear function subject to a finite number of linear constraints. These constraints consist of linear inequalities involving variables . If all of the variables are required to have integer values, then the problem is called Integer Programming (IP) or Integer Linear Programming (ILP).
Simplex method, proposed by G. B. Dantzig , is one of the most frequently used methods to solve LP problems. Although LP is known to be solvable in polynomial time , the simplex method can behave exponentially for certain problems. Karmarkar’s algorithm  is the first efficient polynomial time method for solving a linear program. However, all these approaches are not reasonably efficient in solving problems with a huge number of variables. Therefore, the column generation technique is used to solve problems with a huge number of variables. It decomposes a large LP into the master problem and the subproblem. It only considers a small subset of variables.
a.1 Branch-and-price Method
The column generation technique may not necessarily give an integral solution for an LP relaxation. Therefore, we use branch-and-price method  that is hybrid of column generation and branch-and-bound method . We decompose our problem into two subproblems: (i) restricted master problem (RMP), and (ii) pricing problem (PP). Number restrictions are represented in RMP as inequalities, with a restricted set of variables. We employ this technique by mapping number restrictions to linear inequality systems using a column generation ILP formulation. These inequalities are then solved for deciding the satisfiability of the numerical restrictions.
a.1.1 Column Generation ILP Formulation:
In the following, we use a Tbox and its , a completion graph , a decomposition set and a partitioning that is the power set of containing all subsets of except the empty set. Each partition element represents the intersection of its elements. The ILP model associated with the feasibility problem of is as follows: