OWL2Vec{}^{*}: Embedding of OWL Ontologies

OWL2Vec: Embedding of OWL Ontologies


Semantic embedding of knowledge graphs has been widely studied and used for prediction and statistical analysis tasks across various domains such as Natural Language Processing and the Semantic Web. However, less attention has been paid to developing robust methods for embedding OWL (Web Ontology Language) ontologies. In this paper, we propose a language model based ontology embedding method named OWL2Vec, which encodes the semantics of an ontology by taking into account its graph structure, lexical information and logic constructors. Our empirical evaluation with three real world datasets suggests that OWL2Vec benefits from these three different aspects of an ontology in class membership prediction and class subsumption prediction tasks. Furthermore, OWL2Vec often significantly outperforms the state-of-the-art methods in our experiments.

Ontology Semantic Embedding Web Ontology Language OWL2Vec Membership Prediction Subsumption Prediction

1 Introduction

In recent years, the semantic embedding of knowledge graphs (KGs) has been widely investigated Wang et al. (2017). The objective of such embeddings is to represent in a vector space KG components such as entities and relations in a way that captures the structure of the graph. Various kinds of KG embedding algorithms have been proposed and successfully applied to KG refinement (e.g., link prediction Rossi et al. (2020) and entity alignment Sun et al. (2020)), recommendation systems Ristoski et al. (2019), zero-shot learning Chen et al. (2020b); Wang et al. (2018), interaction prediction in bioinformatics Smaili et al. (2018a); Myklebust et al. (2019), and so on. However, most of these algorithms focus on creating embeddings for multi-relational graphs composed of triples in RDF (Resource Description Framework)1 form such as England, isPartOf, UK and UK, hasCapital, London. They do not deal with OWL2 ontologies (or ontological schemas in OWL) which include not only graph structures3, but also logic constructors such as class disjointness, existential and universal quantification (e.g., a country must have at least one city as its capital), and meta data such as the synonyms, definitions and comments of a class. OWL ontologies have been widely used in many domains such as bioinformatics and the Semantic Web Myklebust et al. (2019); Horrocks (2008). They are capable of expressing complex domain knowledge and managing large scale domain vocabularies, and can often improve the quality and usability of the KG Paulheim and Gangemi (2015).

Inspired by the success of KG embeddings, more recently there has been a growing interest in embedding simple ontological schemas consisting, e.g., of hierarchical classes, and property domain and range Hao et al. (2019); Moon et al. (2017); Alshargi et al. (2018); Guan et al. (2019); however, these methods rely on having a large number of facts (i.e., an ABox), and do not support more expressive OWL ontologies which contain some widely used logic constructors such as the class disjointness and the existential quantification mentioned above. Embeddings for OWL ontologies have started to receive some attention as well. Kulmanov et al. Kulmanov et al. (2019) and Garg et al. Garg et al. (2019) proposed to model the semantics of the logic constructor by geometric learning, but their models only support some of the logic constructors from the description logics (DLs) (which is closely related to OWL EL – a fragment of OWL) and , respectively. Moreover, both methods consider only the logical and graph structure of an ontology, and ignore its lexical information that widely exists in the meta data (e.g., rdfs:label and rdfs:comment triples). OPA2Vec Smaili et al. (2018b) considers the ontology’s lexical information by learning a language model which encodes statistical correlations between items in a corpus. However, it treats each axiom as a sentence and fails to explore and utilize the semantic relationships between axioms. OWL2Vec Holter et al. (2019), which is our very preliminary work before OWL2Vec, captures the semantics of OWL ontologies by exploring the neighborhoods of classes, and learning embeddings using a language model. This was shown to be quite effective, but it does not fully exploit the lexical and (onto)logical semantics available in OWL ontologies.

In this work we have extended OWL2Vec in order to provide a more general and robust OWL ontology embedding framework which we call OWL2Vec. OWL2Vec exploits an OWL (or OWL 2) ontology by walking over its graph forms and generates a corpus of three documents that capture different aspects of the semantics of the ontology: (i) the graph structure and the logic constructors, (ii) the lexical information (e.g., entity names, comments and definitions), and (iii) a combination of the lexical information, graph structure and logical constructors. Finally, OWL2Vec uses a neural language model to create embeddings of both entities and words from the generated corpus. Note that the OWL2Vec framework is compatible to different neural language models, although the current implementation adopts the skip-gram model which is used in Word2Vec Mikolov et al. (2013b).

We have evaluated OWL2Vec in two case studies – class membership prediction and class subsumption prediction, using three large scale real world ontologies – a healthy lifestyle ontology named HeLis Dragoni et al. (2018), a food ontology named FoodOn Dooley et al. (2018) and the Gene Ontology (GO) Consortium (2008). In the case studies we empirically analyze the impact of (i) different document and embedding settings which correspond to combinations of the semantics of the graph structure, lexical information and logic constructors, (ii) different graph structure exploration settings (e.g., the transformation methods from OWL ontology to graph, and the graph walking strategies), (iii) ontology entailment reasoning, and (iv) language model pre-training. The results suggest that OWL2Vec can achieve significantly better performance than the baselines including the state-of-the-art ontology embeddings Kulmanov et al. (2019); Garg et al. (2019); Smaili et al. (2018b); Holter et al. (2019) and some classic KG embeddings such as RDF2Vec Ristoski and Paulheim (2016), TransE Bordes et al. (2013) and DistMult Yang et al. (2014). We also calculated the Euclidean distance between entities and visualized the embeddings of some example entities to analyze different embedding methods.

The remainder of the paper is organized as follows. The next section introduces the preliminaries including both background and related work. Section 3 introduces the technical details of OWL2Vec as well as the case studies. Section 4 presents the experiments and the evaluation results. The last section concludes and discusses future work.

2 Preliminaries

2.1 OWL Ontologies

Our OWL2Vec embedding targets OWL ontologies Bechhofer et al. (2004), which are based on the description logic (DL) Baader et al. (2017). Consider a signature , where , and are pairwise disjoint sets of, respectively, atomic concepts, atomic roles and individuals. Complex concepts and roles can be composed using DL constructors such as conjunction (e.g., , disjunction (e.g., ), existential restriction (e.g., ) and universal restrictions (e.g., ) where and are concepts, and is a role. An OWL ontology comprises a TBox and an ABox . The TBox is a set of axioms such as General Concept Inclusion (GCI) axioms (e.g., ), Role Inclusion (RI) axioms (e.g., ) and Inverse Role axioms (e.g., ), where and are concepts, and are roles, and denotes the inverse of . The ABox is a set of assertions such as concept assertions (e.g., ), role assertions (e.g., ) and individual equality and inequality assertions (e.g., and ), where is a concept, is a role, and are individuals.

In OWL, atomic concepts, roles and individuals are referred to as entities; concepts, roles and individuals are referred to as classes, object properties and instances, respectively. A GCI axiom corresponds to a subsumption relation between the class and the class , while a concept assertion corresponds to a membership relation between the instance and the class . Each entity in an OWL ontology is uniquely represented by a Uniform Resource Identifier (URI). These URIs may be lexically ‘meaningful’ (e.g., vc:AlcoholicBeverages in Figure 0(a)) or consist of internal IDs that do not carry useful lexical information (e.g., obo:FOODON_00002809 in Figure 0(b)); in either case the intended meaning may also be indicated via annotations (see below).

In OWL, complex classes, complex properties, axioms and assertions can be serialised as (sets of) RDF triples. These triples use a combination of bespoke object properties (e.g., vc:hasNutrient) and RDF, RDFS4 and OWL built-in properties (e.g., rdfs:subClassOf, rdf:type and owl:someValuesFrom). In Figure 1, for example, the relationship between the instances vc:FOOD-4001 and vc:VitaminC_100 is represented by a triple using the property vc:hasNutrient, while the existential restriction involving the class obo:FOODON_00002809 and the object property obo:RO_0001000 is represented by triples using the OWL built-in properties owl:Restriction, owl:onProperty and owl:someValuesFrom. As in RDF, the object of an OWL role assertion triple can also be a literal value; for example, the calories amount of vc:FOOD-4001 (Blonde Beer) is represented by a triple using the bespoke data property vc:amountCalories and the literal value of type xsd:double.

In addition to axioms and assertions with formal logic-based semantics, an ontology often contains metadata information in the form of annotation axioms. These annotations can also be represented in a triple form using annotation properties as predicate; e.g., the class obo:FOODON_00002809 is annotated using rdfs:label to specify a name string, using rdfs:comment to specify a description, and using obo:IAO-0000115 (a bespoke annotation property) to specify a natural language “definition”.

(a) The HeLis Ontology
(b) The FoodOn Ontology
Figure 1: Fragments of the ontologies.5

Knowledge graph (KG) refers to structured knowledge resources which are often expressed as a set of RDF triples Hogan et al. (2020). Many KGs only contain instances and facts which are equivalent to an OWL ontology ABox. Some other KGs such as DBpedia Auer et al. (2007) are also enhanced with an schema which is equivalent to the TBox of an OWL ontology. Thus, a KG can often be understood as an ontology.

2.2 Semantic Embedding

Semantic embedding refers to a series of representation learning (or feature learning) techniques that encode the semantics of data such as sequences and graphs into vectors, such that they can be utilized by downstream machine learning prediction and statistical analysis tasks Bengio et al. (2013). Neural language models such as Feed-Forward Neural Networks, Recurrent Neural Networks and Transformers are widely used for semantic embedding, and they have shown good performance in embedding the context (e.g., item co-occurrence) in sequences Mikolov et al. (2013a); Peters et al. (2018); Devlin et al. (2019). Two classic auto-encoding architectures for learning representations of sequential items are continuous skip-gram and continuous Bag-of-Words (CBOW) Mikolov et al. (2013b, a). The former aims at predicting the surroundings of an item, while the latter aims at predicting an item based on its surroundings. Word2Vec is a well known group of neural language models for learning word embeddings from a large corpus, and was initially developed by a team at Google; it can be configured to use either skip-gram or CBOW architectures Mikolov et al. (2013b, a).

Semantic embedding has also been extended to KGs composed of role assertions Wang et al. (2017). The entities and relations (object properties) are represented in a vector space while retaining their relative relationships (semantics), and the resulting vectors are then applied to downstream tasks including link prediction Rossi et al. (2020), entity alignment Sun et al. (2020), and erroneous fact detection and correction Chen et al. (2020a). One paradigm for learning KG representations is computing the embeddings in an end-to-end manner, iteratively adjusting the vectors using an optimization algorithm to minimize the overall loss across all the triples, where the loss is usually calculated by scoring the truth/falsity of each triple (positive and negative samples). Algorithms based on this technique include translation based models such as TransE Bordes et al. (2013) and TransR Lin et al. (2015) and latent factor models such as DistMult Yang et al. (2014).

Another paradigm is to first explicitly explore the neighborhoods of entities and relations in the graph, and then learn the embeddings using a language model. Two representative algorithms based on this paradigm are node2vec Grover and Leskovec (2016) and Deep Graph Kernels Yanardag and Vishwanathan (2015). The former extracts random graph walks and creates skip-gram or CBOW models as the corpus for training, while the latter uses graph kernels such as Weisfeiler-Lehman (WL) subtree kernels as the corpus. However, both embedding algorithms were originally developed for undirected graphs, and thus may have limited performance when directly applied to KGs. RDF2Vec addresses this issue by extending the idea of the above two algorithms to directed labeled RDF graphs, and has been shown to learn effective embeddings for large scale KGs such as DBpedia Ristoski and Paulheim (2016); Ristoski et al. (2019). Recent studies have explored the use of new neural language models for learning embeddings; one example is RW-LMLM which combines a random walk algorithm with a Transformer model Wang et al. (2019).

Our OWL2Vec technique belongs to the language model paradigm, but we focus on OWL ontologies instead of typical KGs, with the goal of preserving the semantics not only of the graph structure, but also of the lexical information and the logical constructors. Note that the graph of an ontology, which includes hierarchical categorization structure, differs from the multi-relation graph composed of role assertions of a typical KG; furthermore the ontology’s lexical information and logical constructors can not be successfully exploited by the aforementioned KG embedding methods.

2.3 Ontology Embedding

The use of machine learning prediction and statistical analysis with ontologies is receiving wider attention, and some approaches to embedding the semantics of OWL ontologies can already be found in the literature. Unlike typical KGs, OWL ontologies include not only graph structure but also logical constructors, and entities are often augmented with richer lexical information specified using rdfs:label, rdfs:comment and many other bespoke or built-in annotation properties. The objective of OWL ontology embedding in this study is to represent each OWL named entity (class, instance or property) by a vector, such that the inter-entity relationships indicated by the above information are kept in the vector space, and the performance of the downstream tasks, where the input vectors can be understood as learned features, is maximized.

EL Embedding Kulmanov et al. (2019) and Quantum Embedding Garg et al. (2019) are two OWL ontology embedding algorithms of the end-to-end paradigm. They construct specific score functions and loss functions for logical axioms from and , respectively, by transforming logical relations into geometric relations. This encodes the semantics of the logical constructors, but ignores the additional semantics provided by the lexical information of the ontology. Moreover, although the graph structure is explored by considering class subsumption and class membership axioms, the exploration is incomplete as it uses only rdfs:subClassOf and rdf:type edges, and ignores edges involving other relations.

Onto2Vec Smaili et al. (2018a) and OPA2Vec Smaili et al. (2018b) are two ontology embedding algorithms of the language model paradigm using a neural language model of either the skip-gram architecture or the CBOW architecture. Onto2Vec uses the axioms of an ontology as the corpus for training, while OPA2Vec complements the corpus of Onto2Vec with the lexical information provided by, e.g., rdfs:comment. They have been evaluated with the Gene Ontology for predicting protein-protein interaction (i.e., a domain-specific relationship between classes), which is quite different from the class membership prediction and the class subsumption prediction in this study. Both methods treat each axiom as a sentence, which means that they cannot explore the correlation between axioms. This makes it hard to fully explore the graph structure and the logical relation between axioms, and may also lead to the problem of corpus shortage for small to medium scale ontologies. OWL2Vec deals with the above issues of OPA2Vec and Onto2Vec by complementing their axiom corpus with a corpus generated by walking over RDF graphs that are transformed from the OWL ontology with its graph structure and logical constructors considered. In addition, to fully utilize the lexical information, OWL2Vec creates embeddings for not only the ontology entities as the current KG/ontology embedding methods but also for the words in the lexical information.

3 Methodology

Figure 2 presents the overall framework of OWL2Vec, which mainly consists of two core steps: (i) corpus extraction from the ontology, and (ii) language model training with the corpus and entity embedding. The corpus includes a structure document, a lexical document, and a document combining the structure and the lexical information. The former two aim at exploring the ontology’s graph structure, logical constructors and lexical information, while the third aims at preserving the correlation between entities (URIs) and their lexical labels (words). Briefly, given an input ontology and the target entities of for embedding, OWL2Vec outputs a vector for each entity in , denoted as , where is the (configurable) embedding dimension. Note that can be all the entities in or just a part needed for a specific application. For class membership prediction we set to all the named classes and instances; for class subsumption prediction we set to all the named classes.

Figure 2: The overall framework of OWL2Vec.

3.1 From OWL Ontology to RDF Graph

Axiom of Condition 1 Axiom or Triple(s) of Condition 2 Projected Triple(s)
(domain) (range) for
has been projected
has been projected
have been projected
Table 1: Projection rules, based on Soylu et al. (2018); Holter et al. (2019), used in the second strategy to generate an RDF graph. is one of: , , , . , , and are atomic concepts, , and are roles (object properties), is the inverse of a relation , and are individuals (instances), is the top concept (defined by owl:Thing).

OWL2Vec incorporates two strategies to turn the original OWL ontology into a graph in RDF form. The first strategy implements the transformation according to the OWL to RDF Graph Mapping defined by the W3C.6 For example, the existential restriction of the class obo:FOODON_00002809 in Figure 0(b), namely ObjectSomeValuesFrom(obo:RO_0001000, obo:FOODON_03411347) is transformed into four triples, i.e., obo:FOODON_00002809, rdfs:subClassOf, _:x, _:x, owl:someValuesFrom, obo:FOODON_03411347, _:x, rdf:type, owl:Restriction and _:x, owl:onProperty, obo:RO_0001000, where _:x denotes a blank node. The second strategy is based on the projection rules proposed in Soylu et al. (2018); Holter et al. (2019) (see Table 1). Every RDF triple (where is an object property, and are atomic concepts or instances) in the projection is justified by one or more axioms in the ontology. For example, the above mentioned existential restriction of the class obo:FOODON_00002809 would be represented with obo:FOODON_00002809, obo:RO_0001000, obo:FOODON_03411347. This strategy avoids the use of blank nodes in the RDF graph; but, unlike the first strategy, it approximates the logical constructors of the OWL ontology.

Both strategies can incorporate an OWL reasoner to compute the TBox classification and ABox realization before is transformed into an RDF graph . Such reasoning grounds the axioms of logical constructors and leads to explicit representation of some hidden knowledge. In our experiments we use the HermiT reasoner Glimm et al. (2014), and we evaluate the impact of enabling or disabling reasoning.

3.2 Structure Document

The structure document aims at capturing both the graph structure and the logical constructors of the ontology. With the RDF graph , one option is computing random walks for each target entity in with the RDF graph . Each walk, which is a sequence of entity URIs, acts as a sentence of the structure document. An example of a random walk of depth four starting from the class vc:Beer in Figure 0(a) is (vc:Beer, rdf:type, vc:FOOD-4001, vc:hasNutrient, vc:VitaminC_100). Another option is to use the Weisfeiler Lehman (WL) RDF sub-tree kernel, which encodes the structure of a sub-tree into a unique identity to enable the comparison of sub-tree structures. Briefly, the WL subtree kernel solution replaces the final entity of each random walk with the kernel (identity) of the sub-tree rooted in this entity.

To capture the logical constructors, OWL2Vec extracts all the axioms of the ontology as a complement of the sentences of the structure document, where each axiom is transformed into a sequence following the OWL Manchester Syntax7. For example, the axiom of the existential restriction of the class obo:FOODON_00002809 in Figure 0(b) is transformed into the sequence (obo:FOODON_00002809, subClassOf, obo:RO_0001000, some, obo:FOODON_03411347).

3.3 Lexical Document

The lexical document includes word sentences transformed from the entity URI sentences in the structure document and the relevant lexical annotation axioms in the ontology. For the former, given an entity URI sentence, each of its entities is replaced by its English label defined by rdfs:label. Note the label is parsed and transformed into lowercase tokens, and those tokens with none letter characters are filtered out, before it replaces the entity URI. It is possible that some entities have no annotations or no English annotations, such as the class vc:MilkAndYogurt and the instance vc:VitaminC_1000 in Figure 0(a). In this case, we prefer to use the name part of the URI, assuming that the name follows the camel case. As an example to show the transformation, the above mentioned random walk (vc:Beer, rdf:type, vc:FOOD-4001, vc:hasNutrient, vc:VitaminC_100) is transformed into (“beer”, “type”, “blonde”, “beer”, “has”, “nutrient”, “vitamin”, “c”).

For the latter, OWL2Vec selects those annotation axioms by bespoke annotation properties such as obo:IAO_0000115 (definition) and those by built-in annotation properties such as rdfs:comment. Note that annotation axioms by rdfs:label are ignored as these labels are already considered in replacing the entity URIs mentioned above. More specifically, for each annotation axiom, OWL2Vec replaces the subject entity by its English label or URI name as in transforming the URI sentence, and keeps the lowercase word tokens parsed from the annotation value. For example, axiom (obo:FOODON_00002809, obo:IAO_0000115, “Edammame is a preparation of immature soybean …”) is turned into (“edamame”, “edamame”, “is”, “a”, “preparation”, “of”, “immature”, “soybean”, …) which can help build a correlation between “soybean” and “edamame”.

3.4 Combined Document

OWL2Vec further extracts a combined document from the structure document and the entity annotations, so as to preserve the correlation between entities (URIs) and words in the lexical information. To this end, we developed two strategies to deal with each URI sentence in the structure document. One strategy is to randomly select an entity in an URI sentence, keep the URI of this entity, and replace the other entities of this sentence by their lowercase word tokens extracted from their labels or URI names as in the creation of the lexical document. For example, for the URI sentence (vc:FOOD-4001, vc:hasNutrient, vc:VitaminC_100), if the first entity is selected, then the generated combined sentence is (vc:FOOD-7000637, “has”, “nutrient”, “vitamin”, “c”) which can help build correlations between vc:FOOD-4001 and words such as “nutrient” and “vitamin”. The other strategy is traversing all the entities in a URI sentence. For each entity, it generates a combined sentence by keeping the URI of this entity, and replacing the others by their lowercase word tokens as in the random strategy. Thus for one URI sentence, it generates combined sentences where is the number of entities of the URI sentence.

On the one hand the combined document captures the correlation between URIs and words, which may benefit the embedding of URIs with word semantics. On the other hand it may add noise for the correlation among words. The impact of the combined document and its two strategies is analyzed in our evaluation (cf. Section 4.3.1).

3.5 Embeddings

OWL2Vec first merges the structure document, the lexical document and the combined document as one document, and then uses this document to train a Word2Vec neural language model with the skip-gram architecture. The training is ended when the loss trends to be stable. The hyperparameter of the minimum count of words is set to such that each word or entity (URI) is encoded as long as it appears in the documents at least once. Specially, we can pre-train the Word2Vec model by a large and general corpus such as a dump of Wikipedia articles. This brings some prior correlations between words, especially between a word’s synonyms and between a word’s variants, which enables the downstream machine learning tasks to identify their semantic equality or similarity in the word vector space. However, such prior correlations may also be noisy and play a negative role in a domain specific task (cf. the evaluation in Section 4.3.4). Note that OWL2Vec is compatible to the CBOW architecture and other neural language models, but the selection and evaluation is out of the scope of this study.

With the trained neural language model, OWL2Vec calculates the embedding of each target entity in . Its embedding is the concatenation of and , where is the vector of the URI of , and is the average of the vectors of all the lowercase word tokens of . As in the case of constructing lexical sentences from URI sentences, the word tokens of are extracted from its English label if such a label exists, or from its URI name otherwise. Due to the concatenation, the embedding size of , i.e., , is twice the original embedding size of the neural language model. and can also be independently used. A comparison of their performance can be found in Section 4.3.1.

3.6 Case Studies

We applied OWL2Vec in ontology completion which first trains a prediction model from known relations (axioms) and then predicts those plausible relations.8 It includes two tasks: class membership prediction and class subsumption prediction, where the embedding of an entity can be understood as the features automatically learned from its neighbourhood, relevant axioms and lexical information without any supervision.

Given a head entity and a tail entity , where is an instance and is a class, the membership prediction task aims at training a model to predict the plausibility that is a member of (i.e., ). The input is the concatenation of the embeddings of and , i.e., , while the output is a score in , where a higher indicates a more plausible membership relation. For the prediction model, a basic binary machine learning classifier such as Random Forest can be adopted.

In training, the positive training samples (membership axioms) are directly from the ontology, while the negative samples are constructed by corrupting each positive sample. Namely, for each positive sample , one negative sample is generated, where is a random class of the ontology and is not a member of even after entailment reasoning. In prediction, given a head entity (i.e., the target), a candidate set of classes are selected (e.g., all the classes except for the top class owl:Thing, or a subset after filtering via some heuristic rules), each candidate is predicted with a score, and the candidates are then ranked according to their scores where the top is the most likely class of the instance. Class subsumption prediction is similar to class membership prediction, except that and are both classes, the goal is to predict whether is subsumed by (i.e., ), and the head entity itself is excluded from the candidate classes.

4 Evaluation

4.1 Experimental Setting

We evaluated OWL2Vec on class membership prediction with the HeLis9 ontology Dragoni et al. (2018), and on class subsumption prediction with the FoodOn10 ontology Dooley et al. (2018) and the Gene ontology (GO)11. HeLis captures general knowledge about both food and healthy lifestyles, FoodOn captures more detailed knowledge about food, and GO is a major bioinformatics initiative to unify the representation of gene and gene product attributes. Some statistics of the two ontologies are shown in Table 2. Due to different knowledge representations, HeLis has a large number of membership axioms but a very small number of subsumption axioms, while FoodOn and GO has only subsumptions axioms. This is the reason why we evaluated membership prediction on HeLis, and subsumption prediction on FoodOn and Go. Data and code are available at https://github.com/KRR-Oxford/OWL2Vec-Star.

DL Expressivity Instances # Classes # Axioms # Membership # Subsumption #
Table 2: Statistics of the HeLis ontology, the FoodOn ontology and the GO ontology.12

The experiment on membership and subsumption prediction follows the following setting: all the explicitly declared class membership axioms (or class subsumption axioms) are randomly divided into three sets for training (), validation () and testing (), respectively. For each axiom in the validation/testing set, the head entity (i.e., an instance for membership prediction and a class for subsumption prediction) is the target whose class is to be predicted from all the candidates and compared against the tail entity (as the ground truth class) in evaluation. All the candidates are ranked according to the predicted score which indicates the likelihood of being the head entity’s class. We calculate the following widely adopted metrics: Hits@, Hits@, Hits@ and MRR (Mean Reciprocal Rank). The first three measure the recall of the ground truths within the top // ranking positions, while the fourth averages the reciprocals of the ranking positions of the ground truths. The higher the metrics, the better the performance.

The performance of OWL2Vec is reported with the following settings. If not specified, OWL2Vec uses OWL to RDF Graph Mapping without entailment reasoning. For the Word2Vec model, the dimension is set to if no pre-training is adopted, and otherwise set to be consistent with the pre-trained model (we used a model pre-trained on a 2019 Wikipedia dump with a dimension of ); the window size is set to ; the minimum count of words is set to ; the iteration number of training is set to , which is based on the observation of the loss. Random Forest is adopted as the basic binary classifier. Other hyperparameters such the walking strategy (WL subtree kernel or random walk) and the walking depth, as well as the hyperparameters of the baselines are adjusted through the validation set as well – the setting that leads to the highest MRR on the validation set is adopted.

The evaluation is organized as follows. We first compare OWL2Vec with the baselines, then analyze the impact of different settings including the type of document, the use of reasoning, the selection of URI and word embeddings, and the adoption of pre-trained embeddings, and finally analyze the embeddings via visualization and comparing Euclidean distances. The selected baselines include (i) four well-known knowledge graph embedding methods, i.e., RDF2Vec, TransE, TransR and DistMult, (ii) four state-of-the-art ontology embedding methods, i.e., Onto2Vec, OPA2Vec, EL Embedding and Quantum Embedding,13 (iii) the original OWL2Vec which is equivalent to OWL2Vec using the URI embedding, structure document and ontology projection rules, and (iv) the pre-trained Word2Vec model. The embeddings of these baselines are applied to the two tasks in the same way as OWL2Vec. Note that RDF2Vec, TransE, TransR and DistMult are trained with the RDF graph using OWL to RDF Graph Mapping without entailment reasoning, while the pre-trained Word2Vec calculates the average word vector of an entity according to its label (or its URI name if the label does not exist) as in OWL2Vec.

4.2 Comparison with Baselines

Table 3 reports the performance of OWL2Vec, with the setting optimized via the validation set. It shows that OWL2Vec outperforms all the baselines. Note OWL2Vec performance with different settings can be found in Section 4.3. Among all these ontology embedding and KG embedding baselines which directly calculate the URI’s vector without considering the word vector, OPA2Vec achieves the best performance on FoodOn and GO for subsumption prediction; while the KG embedding method RDF2Vec performs the best on HeLis for class membership prediction. In contrast, the two logic embedding methods Quantum Embedding and EL Embedding, and TransE perform poorly on all the three ontologies. Our preliminary work OWL2Vec achieves promising results on HeLis (close to RDF2Vec) and FoodOn (close to OPA2Vec), but performs poorly on GO. OWL2Vec outperforms both KG embedding methods and ontology embedding methods; for example, it has higher Hits@ than RDF2Vec on HeLis, and higher Hits@ than OPA2Vec on FoodOn.

Meanwhile, OWL2Vec also outperforms the pre-trained Word2Vec, with , and higher MRR on HeLis, FoodOn and GO, respectively. It is interesting to see that the pre-trained Word2Vec using entity labels or URI names achieves good performance, outperforming those ontology and KG embedding baselines such as RDF2Vec and OPA2Vec. It means that the lexical information plays a very important role in embedding real world ontologies, especially for membership prediction and subsumption prediction as the names of instances and classes with a membership or subsumption relationship often use some common words, synonyms or word variants. This is verified by our following analysis on different settings of OWL2Vec (see , and in Table 4). A key difference between OWL2Vec and Word2Vec is that the word embedding of OWL2Vec is trained by an ontology tailored corpus underpinned by its graph structure and logical axioms.

Note that the performance of membership prediction with HeLis is much higher than that of the subsumption prediction with FoodOn and Go. This is because the former has much less candidate classes (cf. Table 2) and is thus less challenging.

Method MRR Hits@ Hits@ Hits@
Quantum Embeding
Pre-trained Word2Vec
(a) Membership Prediction
FoodOn GO
Method MRR Hits@ Hits@ Hits@ MRR Hits@ Hits@ Hits@
EL Embeding
Pre-trained Word2Vec
(b) Subsumption Prediction
Table 3: Overall results of OWL2Vec and the baselines.

4.3 Analysis of OWL2Vec Settings

Lexical Information

According to Table 4 we can find that the lexical document leads to a significant improvement of performance when it is merged with the structure document (i.e., ). The MRR of outperforms by on HeLis, by on FoodOn and by on GO when the URI embedding () is used, and by , and respectively when both URI embedding and word embedding () are used.

Unlike the lexical document, the combined documents ( and ), which also rely on the lexical information of the ontology, lead to a limited positive impact. For class membership prediction, the best performance of and the best performance of are both very close to the best performance as , while for class subsumption prediction, they are both worse than the best performance of . We find that the combined document has a positive impact when the URI embedding alone is adopted, but often has a negative impact when the word embedding is concatenated or used alone. This is because the combined sentences build the correlation between words and URIs, which benefits the URI embedding, but brings noise to the correlation between words. The traversal combination strategy, which corrupts more word correlations, has a similar impact to the random combination strategy on HeLis, but a more negative impact on FoodOn.

Setting MRR Hits@ Hits@ Hits@
(a) Membership Prediction
FoodOn GO
Setting MRR Hits@ Hits@ Hits@ MRR Hits@ Hits@ Hits@
(b) Subsumption Prediction
Table 4: The results of OWL2Vec under different document () and embedding () settings. Subscripts: (resp. ) denotes the structure (resp. lexical) document; (resp. ) denotes the combined document with the random (resp. traversal) strategy; (resp. ) denotes the URI (resp. word) embedding.

Besides the lexical document, the word embedding which also benefits from the utilization of the lexical information of the ontology shows a very strong positive impact. On the one hand, as discussed in Section 4.2, the two methods that use the word embedding, i.e., OWL2Vec and the pre-trained Word2Vec, both dramatically outperform the remaining methods. On the other hand, in Table 4, the best performance on HeLis comes from , while the best performance on FoodOn and GO comes from . The outperformance of and over is quite significant; for example, with the lexical and structure documents, the Hits@ of can be while the Hits@ of is only . The combined document can improve the performance of a little due to the correlation between URIs and words, but the improvement is very limited in comparison with directly using the word embedding.

Regarding the URI embedding, on the one hand it can alone outperform the baselines in Table 3 except for the pre-trained Word2Vec. On the other hand, the impact of the URI embedding when it is concatenated with the word embedding varies from task to task. It has a positive impact on class membership prediction with HeLis; for example, when trained by the structure document and lexical document (), the MRR of is higher than . However, on class subsumption prediction with FoodOn and GO, the URI embedding shows a negative impact.

Graph Structure

Figure 3 shows the performance of the URI embedding of OWL2Vec when it is trained using structure documents extracted under different graph structure exploration settings. We first compare the two solutions that generate the RDF graph : (i) the OWL 2 to RDF Graph Mapping defined by W3C, which leads to redundant blank nodes and longer paths between relevant entities but keeps all the semantics, and (ii) the ontology projection rules which lead to a more compact graph but approximate most axioms with logical constructors (with much semantics loss) (cf. Section 3.1). On HeLis, the former has a higher MRR in out of cases, and its top MRR value (i.e., ) is also higher than that of the projection rules (i.e., ), while on FoodOn, the former has a higher MRR in out of cases, but its top MRR value is still a bit higher than that of the latter. Therefore, the OWL 2 to RDF Graph Mapping is adopted for OWL2Vec, in contrast to our preliminary work OWL2Vec.

With Figure 3 we can also compare different settings used in extracting URI sentences from the RDF graph . Two observations are made. First, the walking depth is important for both WL subtree kernel and random walk. In general, to achieve the best performance, the former needs a smaller walking depth. Consider the OWL 2 to RDF Mapping, the optimal walking depth is three on HeLis and two on FoodOn for the WL subtree kernel, but is four for the random walk. Second, the top MRR of the WL subtree kernel is higher than that of the random walk on both HeLis and FoodOn. This is expected because the WL subtree kernel incorporates the structure information of the subtree of the final entity of a random walk.

(a) Membership Prediction (HeLis)
(b) Subsumption Prediction (FoodOn)
Figure 3: Comparison of structure documents by different graph structure exploration settings, where the results of MRR of OWL2Vec ( + ) on HeLis and FoodOn are reported.

Logical Constructors

On the one hand, the performance of the baselines in Table 3 which adopt the logical structure alone, including EL Embedding, Quantum Embedding and Onto2Vec, is relatively poor in comparison with the other methods. On the other hand, the logical structure has a positive impact when it works together with the graph structure. Note that the difference between OWL2Vec with the setting of the structure document and the URI embedding (i.e., + ) and RDF2Vec is that the former additionally uses axiom sentences. From the results in Table 3 and Table 4, we can see that the former achieves (resp. ) higher MRR than the latter on class membership prediction (resp. class subsumption prediction).

We also analyzed the impact of using reasoning (provided by OWL 2 reasoner HermiT) before the ontology is transformed into an RDF graph, as shown in Table 5. We can see that reasoning has a limited impact in the conducted experiments; the MRR results with and without reasoning are quite close w.r.t. all four methods tested.

Setting Onto2Vec OPA2Vec OWL2Vec ( + ) OWL2Vec ( + )
Table 5: Performance with and without reasoning. MRR on membership prediction with HeLis is reported.


It is worth noting that using a pre-trained Word2Vec as an initial language model in OWL2Vec does not help, but dramatically decreases the performance, although the pre-trained Word2Vec itself can achieve a good performance. For example, MRR of OWL2Vec (, ) drops from to on class membership prediction (HeLis) when pre-training is used. In fact, the pre-trained Word2Vec is short of prior correlations involving entity URIs, and its usage also leads to a less compact embedding size. This also indicates that the word correlation in the generated documents underpinned by the graph structure and the logical structure are tailored to the specific characteristics of the given ontology.

4.4 Interpretation and Visualization

To show that the learned embeddings (i.e., input features of the classifier for membership and subsumption prediction) are discriminative and effective, we analyze the Euclidean distance between the embeddings of the two entities in a membership or subsumption axiom. We calculate the average distance for the positive training axioms and the negative training axioms, for the embeddings learned by OPA2Vec, the pre-trained Word2Vec, and OWL2Vec with two settings, as shown in Fig. 4. Note that the difference of the Euclidean distance between the entities in the positive axioms and the entities in the negative axioms is sufficient to indicate the discrimination of the features, but it is not necessary. We can find that Word2Vec and OWL2Vec with + (i.e., using the structure document, the lexical document and the word embedding) have quite discriminative average distances for all the three ontologies. Namely, the positive axioms lead to much shorter average distance than the negative axioms. This is consistent with their final good performance shown above. Specially, for OPA2Vec and OWL2Vec with + (i.e., using the structure document and the URI embedding) on HeLis, we can find the distance is also discriminative. However, in contract, the positive axioms has longer average distance than the negative axioms. This is because the instance usually lies in one end of a sequence where it co-occurs with its class (i.e., a walk of WL sub-tree kernel of depth for OWL2Vec, or a membership axiom for OPA2Vec), and thus its distance of co-occurrence to its class becomes larger than to a random class.

Figure 4: The average Euclidean distance between the class and its instance (resp. subclass) for the the positive and negative memberships (resp. subsumptions) used in classifier training. The number above every pair of positive and negative bars is their ratio.

We also visualize the embeddings of some example classes/instances via t-SNE Maaten and Hinton (2008) in order to obtain further insights about the quality of the computed embeddings. In Figure 4(a) (for HeLis) we can find two characteristics for the embeddings learned by OWL2Vec with and : (i) the instances of each class are clustered into a compact cluster, and (ii) these instances are very close to their corresponding class. Both characteristics are promising: they verify that the embeddings are discriminative and explain why the embeddings enable a very good performance in membership prediction (e.g., Hits@ is as high as ). For the embeddings learned by OPA2Vec and OWL2Vec with and , they have the first characteristic as well, but the distance of an instance to its class is often longer than its distance to some other class, which is consistent with the average Euclidean distance analyzed above. Such embeddings can still benefit membership prediction under the standard supervised learning setting adopted in our evaluation, where some instances of one class are used for training while the other instances of this class, which are close to the training instances in the embedding space, are for testing. However, the generalization will be dramatically impacted, especially under a zero-shot learning setting where the instances of a new class, which have never appeared in the training samples, are used for testing.

In Figure 4(b) (for FoodOn) we can observe similar characteristics for the embeddings learned by OWL2Vec with and . Namely, for each class, its subclasses are mostly quite close to each other (i.e., being clustered into one cluster), and their distances to this class are mostly shorter than their distance to any other class. However, the two characteristics are not as significant as in HeLis, especially for the class “Barley Malt Beverage” and its subclasses, indicating that embedding FoodOn, which has more axioms and entities (see Table 2), is more challenging. On the other hand, the two characteristics of OWL2Vec with and are more significant than those of the other three methods — Word2Vec, OPA2Vec and OWL2Vec with and , which verifies its better performance on subsumption prediction. For example, in comparison with Word2Vec which has the second best performance, OWL2Vec with and closes the distance between “Fish” and its subclasses, and makes the subclasses of “Yogurt Food Product” closer to each other.

(a) Three classes of HeLis – “Yogurt”, “Beer” and “Wine”, and their instances (, and respectively).
(b) Four classes of FoodOn – “Yogurt Food Product” (YFP), “Barley Malt Beverage” (BMB), “Fruit Wine” (FW) and “Fish”, and their subclasses (, , and respectively).
Figure 5: Embedding visualization via t-SNE.

5 Discussion and Outlook

In this paper we have presented OWL2Vec, a robust semantic embedding framework for OWL ontologies. OWL2Vec extracts documents from the ontology that capture its graph structure, axioms of logical constructors, as well as its lexical information, and then learns a neural language model for both entity embedding and word embedding. We applied OWL2Vec to class membership prediction and class subsumption prediction with three real world ontologies, namely HeLis, FoodOn and GO, and we empirically analysed different semantics and techniques such as entailment reasoning and ontology to RDF graph transformation. The evaluation demonstrates that on these tasks OWL2Vec can significantly outperform state-of-the-art methods.

Ontology Text Understanding. Our experiments suggest that lexical information plays a very important role in both class membership prediction and class subsumption prediction. In real world ontologies such as HeLis, FoodOn and GO, entity names often reflect, in natural language, their relationships to surrounding entities; in HeLis, for example, the instance vc:FOOD-700637 (Soy Milk) is an instance of the class vc:SoyProducts. In addition, ontologies often contain a large number of entity annotations ranging from short phrases to long textual descriptions. In FoodOn, for example, out of axioms are annotations. However, patterns within the textual information in the ontologies, which is underpinned by the graph and logical structure, are quite different from normal natural language text (cf. Section 4.3.4). To further improve ontology embedding in the future, we need to develop new language model architectures and training methods that are tailored to the kinds of textual information typically present in state-of-the-art ontologies.

Ontology Completion via Prediction. In this study OWL2Vec has been applied to ontology completion by discovering plausible axioms. We adopted a typical supervised learning setting to model a common scenario in ontology completion, where satisfactory results have been achieved; in class membership prediction, the classes of of the test instances can be recalled. In some real world cases, however, there is often a bias between the axioms for training and the axioms for prediction. For example, consider the case of membership prediction for a new class defined on the fly without any known instances (i.e., zero-shot learning scenario discussed Section 4.4). This leads to sample shortage in training and becomes much more challenging — the above metric drops to for OWL2Vec and less than for other KG embedding and ontology embedding methods in Table 3. In future work we plan to develop more robust ontology embeddings with higher generalization for dealing with such cases, and to consider other more challenging tasks such as ontology alignment and ontology error detection.


This work was supported by the SIRIUS Centre for Scalable Data Access (Research Council of Norway, project 237889), Samsung Research UK, Siemens AG, and the EPSRC projects AnaLOG (EP/P025943/1), OASIS (EP/S032347/1), UK FIRES (EP/S019111/1) and the AIDA project (Alan Turing Institute), .


  1. https://www.w3.org/RDF/
  2. https://www.w3.org/OWL/
  3. In this paper an ontology’s graph structure includes the relation between instances (e.g., isPartOf) as in RDF KGs, the subsumption relation between classes (i.e., the class hierarchy defined by rdfs:subClassOf), the membership relation between instances and classes, and the relation between instances and literals.
  4. https://www.w3.org/TR/rdf-schema/
  5. Note vc is the prefix associated to the URI namespace of http://www.fbk.eu/ontologies/virtualcoach#, while obo, xsd, rdf, rdfs and owl are prefixes referring to standard vocabularies.
  6. https://www.w3.org/TR/owl2-mapping-to-rdf/
  7. https://www.w3.org/TR/owl2-manchester-syntax/
  8. Our ontology completion task is different from ontology reasoning. Our goal is not to infer relations that logically follows from the given input, but to try to discover plausible relations that complement the original ontology. (Most) plausible relations may not be inferred, and our evaluation focuses exactly on those plausible relations that cannot be inferred.
  9. HeLis project: https://horus-ai.fbk.eu/helis/
  10. FoodOn project: https://foodon.org/
  11. GO was accessed on August 05, 2020 via http://www.geneontology.org/ontology/
  12. Membership and subsumption in Table 2 denote the declared membership and subsumption axioms with named classes alone, i.e., those involving composed classes and those inferred are not counted.
  13. For EL (resp. Quantum) Embedding, HeLis and FoodOn are first transformed into DL (resp. DL ) by removing logical axioms outside the supported expressivity.


  1. Metrics for Evaluating Quality of Embeddings for Ontological Concepts. Cited by: §1.
  2. DBpedia: A Nucleus for a Web of Open Data. In The semantic web, pp. 722–735. Cited by: §2.1.
  3. Introduction to Description Logic. Cambridge University Press. Cited by: §2.1.
  4. OWL web ontology language reference. W3C recommendation 10 (02). Cited by: §2.1.
  5. Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8), pp. 1798–1828. Cited by: §2.2.
  6. Translating Embeddings for Modeling Multi-relational Data. In NeurIPS, pp. 2787–2795. Cited by: §1, §2.2.
  7. Correcting Knowledge Base Assertions. In Proceedings of The Web Conference 2020, pp. 1537–1547. Cited by: §2.2.
  8. Ontology-guided Semantic Composition for Zero-Shot Learning. In KR, Cited by: §1.
  9. The gene ontology project in 2008. Nucleic acids research 36 (suppl_1), pp. D440–D444. Cited by: §1.
  10. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pp. 4171–4186. Cited by: §2.2.
  11. FoodOn: A harmonized food ontology to increase global food traceability, quality control and data integration. npj Science of Food 2 (1), pp. 1–10. Cited by: §1, §4.1.
  12. HeLis: An ontology for supporting healthy lifestyles. In International Semantic Web Conference, pp. 53–69. Cited by: §1, §4.1.
  13. Quantum Embedding of Knowledge for Reasoning. In NeurIPS, pp. 5595–5605. Cited by: §1, §1, §2.3.
  14. HermiT: An OWL 2 reasoner. Journal of Automated Reasoning 53 (3), pp. 245–269. Cited by: §3.1.
  15. Node2vec: scalable feature learning for networks. In KDD, pp. 855–864. Cited by: §2.2.
  16. Knowledge graph embedding with concepts. Knowledge-Based Systems 164, pp. 38–44. Cited by: §1.
  17. Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts. In KDD, Cited by: §1.
  18. Knowledge graphs. arXiv preprint arXiv:2003.02320. Cited by: §2.1.
  19. Embedding OWL ontologies with OWL2Vec. In International Semantic Web Conference (Posters & Demos), Cited by: §1, §1, §3.1, Table 1.
  20. Ontologies and the Semantic Web. Commun. ACM 51 (12), pp. 58–67. Cited by: §1.
  21. EL Embeddings: Geometric construction of models for the description logic EL++. In IJCAI, Cited by: §1, §1, §2.3.
  22. Learning entity and relation embeddings for knowledge graph completion. In AAAI, Cited by: §2.2.
  23. Visualizing data using t-SNE. Journal of machine learning research 9 (Nov), pp. 2579–2605. Cited by: §4.4.
  24. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Cited by: §2.2.
  25. Distributed representations of words and phrases and their compositionality. In NeurIPS, pp. 3111–3119. Cited by: §1, §2.2.
  26. Learning entity type embeddings for knowledge graph completion. In CIKM, pp. 2215–2218. Cited by: §1.
  27. Knowledge Graph Embedding for Ecotoxicological Effect Prediction. In ISWC, pp. 490–506. Cited by: §1.
  28. Serving DBpedia with DOLCE–more than just adding a cherry on top. In International Semantic Web Conference, pp. 180–196. Cited by: §1.
  29. Deep contextualized word representations. In Proceedings of NAACL-HLT, pp. 2227–2237. Cited by: §2.2.
  30. RDF2Vec: RDF graph embeddings for data mining. In ISWC, pp. 498–514. Cited by: §1, §2.2.
  31. RDF2Vec: RDF graph embeddings and their applications. Semantic Web 10 (4), pp. 721–752. Cited by: §1, §2.2.
  32. Knowledge Graph Embedding for Link Prediction: A Comparative Analysis. arXiv preprint arXiv:2002.00819. Cited by: §1, §2.2.
  33. Onto2vec: joint vector-based representation of biological entities and their ontology-based annotations. Bioinformatics 34 (13), pp. i52–i60. Cited by: §1, §2.3.
  34. OPA2Vec: combining formal and informal content of biomedical ontologies to improve similarity-based prediction. Bioinformatics 35 (12). Cited by: §1, §1, §2.3.
  35. OptiqueVQS: A visual query system over ontologies for industry. Semantic Web 9 (5), pp. 627–660. Cited by: §3.1, Table 1.
  36. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs. arXiv preprint. Cited by: §1, §2.2.
  37. Capturing semantic and syntactic information for link prediction in knowledge graphs. In International Semantic Web Conference, pp. 664–679. Cited by: §2.2.
  38. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29 (12), pp. 2724–2743. Cited by: §1, §2.2.
  39. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6857–6866. Cited by: §1.
  40. Deep graph kernels. In 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374. Cited by: §2.2.
  41. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv preprint arXiv:1412.6575. Cited by: §1, §2.2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description