The RDF Virtual Machine

The RDF Virtual Machine

Marko A. Rodriguez M.A. Rodriguez is with the Digital Library Research and Prototyping Team, the Center for Nonlinear Studies, and the Applied Mathematics and Plasma Physics Group of the Los Alamos National Laboratory, Los Alamos, New Mexico 87545.
E-mail: marko@markorodriguez.com Website: http://markorodriguez.com
Abstract

The Resource Description Framework (RDF) is a semantic network data model that is used to create machine-understandable descriptions of the world and is the basis of the Semantic Web. This article discusses the application of RDF to the representation of computer software and virtual computing machines. The Semantic Web is posited as not only a web of data, but also as a web of programs and processes.

Resource Description Framework, Virtual Machines, Distributed Computing, Semantic Web, Web Computing.

1 Introduction

At its core, the Semantic Web is a global graph data structure used to describe web resources in a machine-understandable way [1]. Unlike the World Wide Web, in which document resources are interconnected through a single type of relationship (i.e. href, hyper-text links), on the Semantic Web, resources are related to one another through a heterogeneous set of relationships. The set of resources and relationship types are identified by Uniform Resource Identifiers (URI) [2]. The Resource Description Framework (RDF) is a standard for graphing (i.e. relating) URIs, literal values, and blank nodes (or anonymous nodes) [3]. If is the set of all URIs, is the set of all literal values, and is the set of all blank nodes, then an RDF triple (or link) is defined as , where , , and . The union of all triples constitutes the Semantic Web graph and can be generally defined as

At the level of RDF, the Semantic Web is simply a collection of triples. These triples form a data structure known as a directed edge labeled graph (or multi-relational network). However, in order to create a layer of abstraction to describe how resources should be interrelated and to reason about and infer non-explicit relationship between resources, ontological languages have been developed. The two most prevalent Semantic Web languages are the RDF Schema (RDFS) [4] and the Web Ontology Language (OWL) [5]. For a fine, practical review of these two languages see [6].

The prevalent conception of the Semantic Web is that of a well-structured, massive-scale distributed data repository that can be utilized by applications for various purposes. However, the RDF data model is general enough to support not only the representation of data, but also the representation process. The purpose of this article is to discuss the general use of rule and process information in the Semantic Web and their explicit realization as RDF encoded software programs and RDF virtual machines (RVM). The remainder of this section will introduce 1.) the Linked Data initiative and its intention of creating a massive-scale distributed data structure and 2.) the RDF programming and virtual machine initiative and its intention of creating a massive-scale distributed process infrastructure. The latter initiative is a nascent movement which has the potential to greatly advance the utility of the Semantic Web and, as previously stated, forms the primary point of discussion for this article.

1.1 Linked Data as a Distributed Data Structure

The Linked Data initiative is concerned with exposing data within the URI address space much like the World Wide Web initiative is concerned with exposing documents and media in the URL address space [7]. Before discussing the Linked Data movement, it is important to understand how the Semantic Web serves not simply as a data repository, but more importantly as a single massive-scale distributed database. Moreover, it is important to discuss how the Semantic Web provides both a technological and cultural differentiation from the traditional notion of a database as posited by the relational database community. These factors set the Semantic Web up for being a revolutionary means by which data is globally managed and accessed.

Technologically, the Semantic Web is reminiscent of the relational database model, insofar as it is a data storage environment that provides well-structured data to external applications; though this data is not represented as a collection of interlinked tables, but instead as an edge labeled graph (more specifically, an RDF graph). While the table and graph data structures can be mapped into one another without loss of information, the utility of the graph structure has yielded the development of specialized graph databases known as triple stores [8].111It is important to note that many RDF servers still utilize an underlying relational database to manage data. Examples of such architectures include the D2R Server [9]. Moreover, the data exposed by the Semantic Web is within the URI address space and as such is agnostic to the addressing scheme of the underlying machine supporting its representation. In this way, the data on multiple physical machines are able to reference each other and thus, the Semantic Web serves as a single unified graph spanning serves worldwide.

Culturally, the Semantic Web maintains the open, globally accessible nature of the World Wide Web. In contrast, rarely are relational database schemas reused and/or openly distributed and rarely are relational database ports (e.g. ODBC) made available for the public harvesting of information. The common paradigm in the relational database world is that data is accessed and manipulated by software with privileges to the data and only through that software is the information made available to other services, if at all. However, with respect to the Semantic Web, not only does the community encourage the distribution and reuse of ontologies222SchemaWeb available at http://www.schemaweb.info/., but it also provides open and accessible interfaces to the its data. Such interfaces are known as SPARQL endpoints [10] and HTTP-based linked RDF data [7]. The Semantic Web truly represents a new data management paradigm because of the way in which data is distributed and discovered: in an open, standards-based fashion.

The Semantic Web’s Linked Data community is focused on the systematic union of RDF datasets in order to allow

“[any man or machine] to start with one data source and then move through a potentially endless Web of data sources connected by RDF links. Just as the traditional document Web can be crawled by following hypertext links, the Web of Data can be crawled by following RDF links. Working on the crawled data, search engines can provide sophisticated query capabilities, similar to those provided by conventional relational databases. Because the query results themselves are structured data, not just links to HTML pages, they can be immediately processed, thus enabling a new class of applications based on the Web of Data.” [11]

There is far-reaching potential for the Web of data that currently exists and will continue to grow to become. However, one of the limiting factors in the Linked Data approach is that while the community is providing a massive-scale distributed data structure, they are not providing a massive-scale distribute process infrastructure to compute on this Web of data [12]. Without a distributed process infrastructure, Semantic Web applications are left with the typical server/client-download philosophy of the World Wide Web. For data intensive algorithms, this is an inefficient use of resources as it requires the movement of large amounts of data to the algorithm’s executing machine(s). It is this design choice that has made the World Wide Web (i.e. the web of HTML documents), at large, only accessible to those that have the processing power and space to download and index it.333A distributed process infrastructure is a feature of the Grid computing paradigm that provides a democratization of compute cycles [13]. With respect to the Semantic Web, systems like GridVine provide a means to efficiently query and update an RDF graph that is overlaid across multiple physical machines [14]. For the keyword search space of the World Wide Web, this problem is perhaps best solved by the few large-scale, search engines in existence today. However, the Semantic Web, with its rich data model and nearly endless potential, is poised to require a new Web infrastructure to support its processing within and between its various Linked Data repositories. No single institution or organization will have the compute power, nor the man power, to execute and implement all the potentially useful algorithms that will make the Semantic Web stand out as the defacto medium for representing data. In order to remedy this situation, a move towards a computing paradigm for the Semantic Web is necessary.

1.2 RVM Computing as a Distributed Process Infrastructure

The Semantic Web has the potential to not only act as a data storage environment, but also capture the more procedural aspects of computing, such as computer instructions and abstract virtual computing machines. In others words, given the flexibility of the RDF data model, it is possible to encode, in RDF, the rules by which RDF data is manipulated and thus, expose such information on the Semantic Web. Moreover, the URI address space is an infinite space that is only constrained by the size and number of physical machines that are supporting its representation. A flexible data model and an infinite address space make the Semantic Web an ideal medium for distributed, global computing. In this more computation-centric environment, instructions expressed in RDF are executed by RDF virtual machines (RVM). An RVM is any entity that processes RDF computing instructions, and in some instances, is represented in RDF as well. Thus, like other RDF data, computing instructions and RVMs are “first-class” citizens on the Semantic Web.

Many common computing models are made salient by the RVM paradigm, such as open (refer to Section 4.1), distributed (refer to Section 4.2), and reflective computing (refer to Section 4.3). RDF programming languages compile down to RDF and these RDF instructions can be accessed, annotated (i.e. RDF related), and reasoned on like any other RDF data on the Semantic Web. Furthermore, unique situations emerge when RDF code is represented across different physical machines. Because all RDF instructions are in the same URI address space, there is nothing that prevents the software, much like the data, to by physically distributed. With an RDF virtual machine executing RDF instructions, it is possible for the virtual machine and the instructions to be relocated by simply downloading the RDF subgraph that represents that virtual machine to another physical machine. Thus, instead of migrating large amounts of data to a local environment for processing, the RDF virtual machine and the instructions that it is processing can be migrated to the remote environment. In this way, the process is moved to the data, not the data to the process. Finally, because RDF computing instructions and, in some cases, RVMs are represented in RDF, reflection is possible at the object, instruction, and machine-level. Thus, nearly the entire computing stack is exposed to reasoning and self-modifying processes.

The structure of this article is as follows. Section 2 discusses RDF-based programming languages in general and one language specifically: Neno [15]. Related work is also presented at the end of Section 2. Section 3 discusses the relationship between RDF computing instructions and RVMs and more specifically the Fhat and r-Fhat RVMs. Finally, Section 4 discusses those aspects of computing on the Semantic Web – open, distributed, and reflective computing – that are conveniently exposed by the use of RDF-based programming languages, RVMs, RDF computing in general.

2 RDF-Based Programming Languages

RDF is used to express facts about the world in a structured and machine understandable fashion.

“The basic intuition of model-theoretic semantics is that asserting a sentence makes a claim about the world: it is another way of saying that the world is, in fact, so arranged as to be an interpretation which makes the sentence true. In other words, an assertion amounts to stating a constraint on the possible ways the world might be.” [16]

However, RDF is more generally useful and need not be constrained to asserting facts about the “world”. In this article, RDF is used to represent computational data structures such as software (i.e. a sequence of instructions) and machine state (i.e. operand stacks, program counters, etc.). Common data structures such as lists, trees, and graphs in general are conveniently represented in RDF, as are programs of Turing complete languages [17]. This section focuses on one Turing complete RDF-based programming language called Neno [15].444Neno/Fhat is currently available at http://neno.lanl.gov/. Other RDF-based programming languages include the functional, stack-based language Ripple [18]555Ripple is currently available at http://ripple.fortytwo.net/., the object-oriented FABL [19]666FABL is currently available at http://fabl.net/., Adenosine777Unfortunately, Adenosine has no formal publications nor a currently existing homepage. However, there are discussions of it on various Semantic Web mailing lists as well as a project homepage that is available through the Internet Archive. This article will briefly discuss its formalisms to provide a more complete picture of RDF-programming., and Adenine888Adenine is available at http://www.ifcx.org/wiki/Adenine.html. [20] languages. These languages, along with RDF toolkits, RDF-to-object mappers, and Web-based rule languages will be discussed in the related work section.

2.1 The Neno RDF Programming Language

Neno is an imperative programming language that takes an object-oriented perspective on the resources of the Semantic Web [15]. In Neno, the human readable/writable language’s grammar is similar to popular object-oriented programming languages such as Java and C++. However, as opposed to typical object-oriented languages, many of the constructs of the Neno language were designed to take advantage of the RDF data model and the standardized means by which RDF data in queried and modified (e.g. SPARQL [10] and SPARQL/Update [21], respectively). The motivation for many of the language constructs is to overcome the impedance mismatch between the typical object-oriented data model and the RDF data model.

Neno source code is written by a human programmer and is compiled by the Neno/Fhat compiler. The compilation processes generates a Fhat API represented in OWL. A Fhat API denotes Neno classes, their respective methods, and each method’s instructions. In this sense, a Fhat API is similar to the API of object-oriented languages (e.g. the typical Java jar file). Classes in a Fhat API can be instantiated to active computational objects represented in RDF. These instantiated objects maintain low-level computing instructions (e.g. add, set, branch, etc.) represented in RDF. These instructions denote computational primitives and specify the flow of execution within a method (refer to Section 3.2.1). Figure 1 diagrams the the stages of processing required to go from human readable/writeable Neno source code to instantiated computational objects.

Fig. 1: The various transformations from source code to computational object in Neno/Fhat.

This sub-section will discuss the Neno programming language in particular and Section 3.2.1 will discuss the role of the Fhat RVM in the processing of compiled Neno source code.

2.1.1 Neno Language Constructs

The following code example presents a simple Neno class and will be referred to throughout the remainder of this section.999For the sake of brevity, those operations that are typically found in other programing languages are not discussed in this section. For example, for-looping, while-looping, and if/else branching have a similar syntax and behavior as found in other programming languages such Java and C. For an in-depth discussion of the Neno programming language constructs, please refer to [15].

prefix lanl: <http://www.lanl.gov>;
prefix foaf: <http://xmlns.com/foaf/0.1/>;

foaf:Agent lanl:Person {
  xsd:string foaf:name[1];
  lanl:Person foaf:knows[0..*];

  makeFriend(lanl:Person p) {
    this.foaf:knows =+ p;
  }

  makeEnemy(lanl:Person p) {
    this.foaf:knows =- p;
  }

  makeAllEnemies() {
    this.foaf:knows =/;
  }

  xsd:boolean isFriend(lanl:Person p) {
    return this.foaf:knows =? p;
  }
}

The lanl:Person class description is written in object-oriented syntax and is saved to a single text file denoted Person.neno. The class is composed of two fields and four methods and states that

  • lanl:Person is an rdfs:subClassOf of foaf:Agent (i.e. extends),

  • foaf:name is an xsd:string field that has a cardinality owl:Restriction of ,

  • foaf:knows is a lanl:Person field that does not have a cardinality owl:Restriction,

  • makeFriend is a void method that takes a single lanl:Person as an argument,

  • makeEnemy is a void method that takes a single lanl:Person as an argument,

  • makeAllEnemies is a void method that takes no arguments, and

  • isFriend is an xsd:boolean method that takes a single lanl:Person as an argument.

Fields in Neno are assumed to be unordered sets because, for example, there may exist many foaf:knows relationship between two lanl:Person resources. As such, special set operators exist for interacting with Neno fields:

  • the set-plus operator (=+): adds (i.e. unions) a new value to a field.

  • the set-minus operator (=-): removes (i.e. set minuses) an existing value from a field.

  • the set-clear operator (=/): removes all existing values from a field.

  • the set-query operator (=?): returns a boolean specifying whether the provided value currently exists in the field.

  • the set operator (=): removes all existing values and adds the provided values to the field.

Typical object-oriented “dot” notation is used to reference the fields and methods of an object. For example, with respect to object fields, the statement

 lanl:marko.foaf:knows.foaf:name;

returns the xsd:string foaf:names of all resources that lanl:marko knows. That is, it first resolves all the lanl:Person instances that lanl:marko knows and then resolves all the xsd:string foaf:name values of those lanl:Persons. Neno also supports a non-standard “dot dot” notation that is used for inverse referencing. For example, the statement

 lanl:marko..foaf:knows.foaf:name;

returns the names of all the resources that know lanl:marko. That is, it determines all the lanl:Persons that foaf:know lanl:marko and then returns the xsd:string of their foaf:name. The difference between “dot” notation and “dot dot” notation can be illustrated with two SPARQL queries. The first “dot” statement translates to the query

SELECT ?y
  WHERE {
    <lanl:marko> <foaf:knows> ?x .
    ?x <foaf:name> ?y }

and the second “dot dot” statement translates to

SELECT ?y
  WHERE {
    ?x <foaf:knows> <lanl:marko> .
    ?x <foaf:name> ?y }.

The “dot dot” notation takes advantage of the network structure of the underlying RDF data model and the ability to traverse that graph in any direction using query languages such as SPARQL. This is related to the “ancestor” query mechanisms found in XPath [22] and used in semi-structured data environments such as Lorel [23].

Like typical object-oriented programming languages, “dot” notation can be used to invoke an object’s method. For example, suppose the makeEnemy method declaration for lanl:Person. The purpose of this method is to remove a foaf:knows relationships between the executing object (i.e. this) and the provided lanl:Person parameter p. Thus,

   lanl:marko.makeEnemy(lanl:dr_wh);
 

executes the following SPARQL/Update command:

DELETE {
  <lanl:marko> <foaf:knows> <lanl:dr_wh> }.

In Neno, the “dot dot” notation can also be applied to methods, and in such cases, it is called inverse method invocation. Inverse method invocation can be used to remove all the foaf:knows relations between those lanl:Persons that lanl:marko foaf:knows and lanl:dr_wh. In other words, all of Marko’s friends can be instructed to make enemies with Dr. Wh. In Neno syntax, this is represented as

 lanl:marko..foaf:knows.makeEnemy(lanl:dr_wh);

While Neno has many similarities to typical object-oriented programming languages such as Java and C++, perhaps the most interesting aspect of Neno’s programming constructs is the way in which it takes advantage of the underlying RDF representation of its instantiated objects. For a more in-depth review of the Neno programming languages which includes discussion of looping, branching, as well as object construction and destruction, please refer to [15]. Finally, note that Section 3.2.1 will discuss compiled Neno code and its representation in RDF.

2.2 Related Work

The ideas behind Neno come from a longline of Web-based process models. This subsection will provide a review of related work in this area with particular focus on other RDF programming languages, RDF toolkits, RDF-to-object mappers, and finally, other popular process description mechanisms for the Web.

2.2.1 Other RDF Programming Languages

Other known RDF programming languages include Ripple [18], FABL [19], Adenosine, and Adenine [20]. These languages have a similar philosophy to Neno in that that they are motivated by the desire to encode both data and process information within RDF and thus, take unique advantage of the Semantic Web infrastructure. The general theme behind all of these languages is to turn the Web into a distributed computing environment.

Ripple is a declarative programming language aimed at Semantic Web mashups and scripting applications. In Ripple, human-readable programs expressed in a Notation3-like [24] serialization language are translated to and from RDF computing instructions in the form of linked RDF lists. The RDF lists which make up a Ripple script are intended to reside in the Semantic Web itself and thus, are at the same level of abstraction as the data they operate upon.

Ripple is particularly useful for path-based traversals. For example, the Ripple query

krs:josh foaf:knows! foaf:name!

yields the name of all of the individuals that krs:josh knows. For example, consider the RDF graph illustrated in Figure 2. The above Ripple query has the effect of pushing both "marko"xsd:string and "gary"xsd:string onto the Ripple RVM stack.101010The term RVM was introduced in [15] and refers to a virtual machine that processes RDF instructions. However, unlike the languages presented here, the Fhat RVM of [15] was also encoded in RDF. For the purpose of this article, both RDF and non-RDF represented virtual machines that process RDF instructions are called RVMs. Once "marko"xsd:string and "gary"xsd:string are on the stack, other operations can be performed on that data.

Fig. 2: An example RDF graph.

An interesting aspect of the Ripple language is its ability to “walk” an RDF graph in a recursive fashion. For example, the following Ripple query

krs:josh foaf:knows* foaf:name!

yields Josh’s name, the names of those known by Josh, and so on, recursively. Figure 3 diagrams the explicit RDF representation of this Ripple program, where _:L1, _:L2, _:L3, _:L4, and _:L5 are blank nodes of rdf:type rdf:List.

Fig. 3: An RDF representation of a compiled Ripple program.

FABL is an object-oriented language that has some similarities in syntax to JavaScript, compiles down to RDF computing instructions, and is executed by the FABL RVM. Like Neno, the native objects of FABL are RDF resources, however, the classes of FABL are DAML+OIL classes [25]. An example of FABL code is:

allocate(’foaf:knows’,Property);

class("foaf:Person");
  restrict foaf:knows
    {allValuesFrom foaf:Person}
endClass();

boolean function isFriend(foaf:Person p,
      foaf:Person q) {
  return contains(p..foaf:knows,q);
}

The “dot dot” notation in the isFriend method iterates over all the foaf:knows properties as properties are treated as sequences of values.

Adenosine is described by its originator as

[…] a language designed both to work on, and be distributed over, the Semantic Web. The language exploits the expressiveness of RDF whilst adopting a clean syntax based on a combination of Notation3 and ECMAScript.111111Adenosine was previously available at http://www.netalleynetworks.com/community/jgeldart/research/adenosine/.

An example of the Adenosine syntax is:

lanl:isFriend a std:Method;
  std:onClass lanl:Person;
  std:function lanl:isFriendFunction.

@function lanl:isFriendFunction(p1, p2) {
  return p2 in p1.foaf:knows;
}

Adenosine RDF code is executed by an RVM called Callaghan.

Finally, the isFriend method is demonstrated in Adenine. Note that both Adenine and Adenosine have a similar syntax, and in fact, Adenosine was developed after Adenine in order to provide (as decided by the designer of Adenosine) a better syntax. Hence the similar names that the two languages have.

add { lanl:Person
  rdfs:subClassOf foaf:Person ;
}

add { foaf:knows
  rdf:type rdf:Property ;
  rdfs:domain foaf:Person ;
  rdfs:range foaf:Person ;
}

method lanl:isFriend p1 p2
    return (contains p1 foaf:knows p2)

2.2.2 RDF Toolkits and RDF-to-Object Mappers

The previously presented RDF programming languages serve a different purpose than RDF toolkits such as Jena [26]121212Jena is currently available at http://jena.sourceforge.net/., Sesame [27]131313OpenRDF is currently available at http://www.openrdf.org/., Redland [28]141414Redland is currently available http://librdf.org/., RDFStore151515RDFStore currently available at http://rdfstore.sourceforge.net/., RDFLib161616RDFLib is currently available at http://rdflib.net/., and Pyrple171717Pyrple is currently available at http://infomesh.net/pyrple/.. The purpose of these toolkits is to provide a mechanism by which RDF data can be accessed and manipulated through the constructs of a specific non-RDF-based programming language such as Java, C, PHP, Perl, Python, and/or Ruby. The difference between these languages and the RDF-based programming languages presented previous are that RDF programming languages are designed specifically to work with RDF and as such, can provide

  • type-checking at the RDF level,

  • data and process encapsulation,

  • language operators to deal specifically with the RDF data model,

  • no impedance mismatch between RDF and the manipulating language, and of specific differentiation,

  • can provide a representation of the procedural information within RDF.

RDF-to-object data mappers are related to RDF toolkits in that they aid a developer in utilizing RDF data in a programming language environment. Example RDF-to-object data mappers include Schemagen181818Schemagen is currently available at http://jena.sourceforge.net/., Elmo191919Elmo is currently available at http://www.openrdf.org/., Frege [29], and ActiveRDF [30]. The purpose of an RDF-to-object data mapper is to alleviate the issues surrounding the impedance mismatch between the RDF data model and typical object-oriented data models. This is accomplished by 1.) automatically generating class definition in the non-RDF language that can interact with an RDF representation and 2.) automatically populate these objects using RDF data. With RDF-to-object mapping, what is preserved in the Semantic Web is the description of the data contained in an object (i.e. object fields), not an explicit representation of the object’s process information (i.e. object methods). By explicitly encoding method data, the Semantic Web contains all the information required to retrieve and execute the behaviors of the object. In this way, with RDF programming languages, they do not require a separate, non-RDF programming environment to function. Moreover, the difficulties associated with RDF-to-object mappings [31, 30] are not present in RDF-based programming languages. Some of the distinctions between the semantics of oriented-oriented programming languages and the semantics of RDF are made salient when understanding the distinction between frame-based languages and ontological languages such as OWL [32].

2.2.3 Other Web-Based Process Descriptions

Rule-based markup languages, designed for the World Wide Web and the Semantic Web, are a related area of research. Similar to RDF programming languages, the intent of this research is to formalize computing instructions within the data repository itself, whether that data repository be the World Wide Web and/or the Semantic Web. The standardization group focused on the W3C RuleML202020RuleML is currently available at http://www.ruleml.org/. initiative have developed XML languages for encoding process information as well as translators for representing this information in RDF [33]. Particular variants of RuleML include a first order logic language (FOL-ML) [34] and an object-oriented language (OO-ML) [35]. To provide an example of FOL-ML, the following logic statement

states that for all the people that Marko knows, Dr. Wh also knows those people. This statement can be represented In FOL-ML as

<Forall>
  <Var>x</Var>
  <Implies>
    <Atom>
      <oid>
        <Ind uri="lanl:marko"/>
      </oid>
      <slot>
        <Ind uri="foaf:knows">
        <Var>x</Var>
      </slot>
    </Atom>
    <Atom>
      <oid>
        <Ind uri="lanl:dr_wh"/>
      </oid>
      <slot>
        <Ind uri="foaf:knows">
        <Var>x</Var>
      </slot>
    </Atom>
  </Implies>
</Forall>

The various RuleML reasoners serve as the virtual machines that execute RuleML documents. For example, jDrew and its object oriented variant OO jDrew [36] are deductive reasoning engines for RuleML.212121jDrew is currently available at http://www.jdrew.org/. With respect to rule-based systems designed specifically for the Semantic Web, there currently exists the Semantic Web Rule Language (SWRL) [37] which found its roots in RIF (Rule Interchange Format) [38]. The Pellet reasoner currently supports SWRL rules [39].222222Pellet is currently available at http://pellet.owldl.com/. Moreover, the Pellet reasoner along with other description logic reasoners can execute the reasoning rules of the OWL language. In this respect, the OWL language is a process description, albeit, it is not Turing complete. Also, there exist the Euler proof mechanism232323Euler is currently available at http://www.agfa.com/w3c/euler. for reasoning and Cwm242424Cwm is currently available at http://www.w3.org/2000/10/swap/doc/cwm. for general-purpose data processing on the Semantic Web. Finally, another area where process information is specifically encoded in the Semantic Web is the OWL-S service description framework [40].

3 RDF Instructions and the RVM

As the Semantic Web is simply a data structure and does not, in and of itself, have the ability to compute, external machines are required to manipulate its state by adding and removing triples. Even when state transition rules are explicitly encoded in the Semantic Web as machine instructions, there must still exist a computing machine that is able to process those instructions.

3.1 An Introduction to Virtual Machines

A virtual machine is a computing machine represented in software as opposed to a hardware (e.g. logic gates) [41]. There are many machines that fit this description and range in complexity from low-level VHDL machines [42] to the high-level interpreters of scripting languages such as Perl, JavaScript, and Python. Perhaps the most popular virtual machine is the Java virtual machine (JVM) of the Java programming environment [43]. There are two primary components to the Java environment: the Java compiler (i.e. javac) and the JVM (i.e java). The Java compiler translates human readable/writeable Java source code into Java byte-code (e.g. javac Person.java Person.class). Java byte-code is executed by the JVM (e.g. java Person). Each byte-code instruction alters the state of the JVM, whereby new variables are declared, changed, and ultimately carry out a user-defined computation.

3.2 An Introduction to RVMs

An RVM is any virtual machine that interprets RDF computing instructions. Like typical virtual machines, RVMs can vary in the degree of detail that they formally represent. In the Ripple environment, the Ripple RVM’s state and process rules are implemented in the Java language [18]. Thus, the Ripple RVM runs on the JVM.252525It is possible, given that Ripple is Turing complete, to build a completely RDF-based virtual machine in Ripple and thus, encode both Ripple programs and the Ripple RVM in the Semantic Web as RDF computing instructions. This is also possible with the other RDF programming languages as they are all Turing complete. In the Neno/Fhat environment, the Fhat RVM represents its state in RDF and the process by which that state is altered in Lisp [15]. Thus, in Neno/Fhat, not only are the RDF computing instructions encoded in the Semantic Web, but so is the state of the RVM (i.e. its stacks, frames, program counter, etc.). The purpose of encoding an RVM state in RDF is to migrate RVMs between physical machines (refer to Section 4.2). However, note that there will always be a level of indirection in which computation is moved out of the Semantic Web to the physical hardware which supports it. In the end, it is the physical hardware that changes the state of the Semantic Web. Moreover, it is the laws of physics that drive the evolution of a hardware processor. Thus, in order to compute, every level of process abstraction must be grounded in (or founded on) some physical process.

An RVM has four primary components: the RDF computing instructions, the RVM state, the RVM process, and a triple store or web server interface.262626The concepts presented in this subsection deal specifically with triple store interfaces only. All of these components are diagrammed in Figure 4, where RI represents RDF computing instructions, RS represents the RVM state in RDF, and D represents other, non-procedural triples in the triple store (i.e. other RDF data).

Fig. 4: The components of an RVM-based computer. The RDF instructions and RVM state are boxed in dotted lines to signify that they can be represented in RDF and thus, able to be placed in a triple store. The double-arrowed line connecting the triple store interface to the triple store is a read/write/delete protocol such as SPARQL/Update.

The graph of RDF computing instructions that the RVM interprets dictates the evolution of the RVM’s state: the instruction being executed, the state of the heap, the state of all the stacks, etc. Because the RVM state can be represented in RDF, it can be distributed in the same manner as any other RDF data (e.g. as a set of statements in an RDF triple store or as an RDF document on the Web). The role of the RVM process, on the other hand, is to manipulate the RVM state and thereby carry out a computation. RDF data is simply a data structure. Whether that structure includes static or procedural information does not endow it with the ability to compute. In order for that structure to evolve, it relies on some external process to read, write, and delete triples from it. Thus, in order to fully compute, an RVM state relies on an RVM process to alter it. It is through the triple store interface that an RVM process is able to query (i.e. read) and alter (i.e. write and delete) an RVM state, computing instructions, and other RDF data. In Figure 4, the double-arrowed line between the triple store interface and the triple store denotes a read/write/delete protocol such as SPARQL/Update [21].

With respect to Neno/Fhat and the other related RDF programming environments (i.e. Ripple, FABL, Adenosine, and Adenine), RI and RS are located at different levels of abstraction.272727The term “other” is used to denote programming environments other than Neno/Fhat. These differences are articulated in the following itemization.

  • Other: only D and RI are in the triple store. RS is represented in local memory.

  • r-Fhat: D is in the triple store, but RI and RS can move between the triple store and the local memory.

  • Fhat: D, RI, and RS are all contained in the triple store.

Theoretically, it is possible to both read RDF computing instructions and change the RVM state while it is represented in the triple store, as in Fhat. However, due to the read/write overhead incurred by such a model, it is preferable to move the instructions and RVM state to local memory for processing. In the Neno/Fhat environment, this is accomplished through the r-Fhat RVM.282828r-Fhat stands for “reduced Fhat” as it reduces the amount of RDF data being processed by representing the RVM state in the programming constructs of the RVM process. Also, in Ripple, computing instructions are moved to local memory to increase processing speed.

The remainder of this section will discuss the architecture and instruction set of the Fhat RVM.

3.2.1 The Fhat RDF Virtual Machine

The architecture of the Fhat virtual machine is defined in OWL.292929The Fhat RVM architecture and instruction set are currently available at http://markorodriguez.com/docs/nenoDoc/. This architecture provides an abstract description of an instance of a Fhat RVM. Figure 5 diagrams the types of resources and relationships present in the Fhat RVM architecture.

Fig. 5: The classes that compose the Fhat RVM. The dashed lines denote the rdfs:subClassOf property and the bracketed values (e.g. ) denote the cardinality owl:Restrictions on particular properties. All non-namespaced URIs are part of the http://neno.lanl.gov namespace.

The Fhat RVM was designed to work with a predetermined set of instructions known as the Fhat instruction set. Figure 6 diagrams some of the more important instructions supported by the Fhat RVM.

Fig. 6: The instruction set of the Fhat RVM. The dashed lines denote the rdfs:subClassOf or rdfs:subPropertyOf properties and the bracketed values (e.g. ) denote the cardinality owl:Restrictions on particular properties. All non-namespaced URIs are part of the http://neno.lanl.gov namespace.

The following itemization presents a few of the more common instructions in the Fhat instruction set and their relationship to the Fhat RVM:303030To preserve the readability of this article, the namespace prefix for the courier font URIs is assumed to be http://neno.lanl.gov.

  • Instruction: A Fhat RVM instance has a programLocation pointer (i.e. PC) to the current Instruction that it is processing. If no such programLocation exists, then the Fhat RVM can not compute.

  • PushValue: used to push resources on to the OperandStack. For example, to push the floating point value on the stack for a later operation.

  • Arithmetic: various subclass instructions include Add, Subtract, Multiply, and Divide. These pop two values off the top of the OperandStack, perform the specified operation, and push the computed value back on the OperandStack.

  • Invoke: initializes a Frame for a Method. The Frame contains the names (hasSymbol), values (hasValue), and scopes (fromBlock) of the local Variables of a Method.

  • Setter: used to assign a value to a Variable in the Frame of a Method. SetClear, SetMinus, Set, and SetPlus are subclasses of Setter.

  • Return: pushes the return value on the OperandStack and sets the programLocation to the Instruction popped off the ReturnStack. This instruction is used to return from a method.

When Neno source code is compiled using the Neno/Fhat compiler, a Fhat OWL API is generated. This API provides an abstract representation of a Neno object (known as a Neno class), its fields, its methods, and its method’s instructions. The API representation incorporates many owl:Restrictions to ensure that when a Neno object is instantiated, there is an unambiguous generation of instance-level RDF instructions. For example, Figure 7 demonstrates how owl:Restrictions are used to define the relationship between instructions within a method body.313131The Neno/Fhat compiler uses Universally Unique Identifiers (UUID) when minting instruction URIs [44]. UUIDs are 32-bit identifiers. For diagram clarity, only a few characters of a UUID are presented.

Fig. 7: An snippet of a Fhat API instruction sequence. The dashed lines represent the rdfs:subClassOf property. Note that other owl:Restrictions beside _:A1, _:A2, and _:A3 are not presented. For example, a PushValue instruction requires a value to push onto the operand stack. What is presented demonstrates how the sequence of instructions is fixed using owl:Restrictions. All non-namespaced URIs are part of the http://neno.lanl.gov namespace.

From a Fhat API, it is possible to instantiate Neno objects and their methods. Figure 8 diagrams an RDF instance of the lanl:Person makeFriend method previously presented in Section 2.1.1.

  makeFriend(lanl:Person p) {
    this.foaf:knows =+ p;
  }
Fig. 8: The instance level RDF representation of the makeFriend method of a lanl:Person class. The bolded terms above the resources denote the rdf:type of the resource. Note that these rdf:types are inferred types as the direct type is a minted UUID with specific owl:Restrictions as demonstrated in Figure 7. All non-namespaced URIs are part of the http://neno.lanl.gov namespace.

In Figure 8, the makeFriend method is associated with a particular lanl:Person, namely lanl:marko. Methods, like RDF properties, are not dependent upon the classes that utilize them in their description. Thus, with Neno it is possible for many objects to share the same method description, or given the requirements of the computation being executed, it is possible for each object instance to have a unique method instance. The latter is desirable when migrating objects between different triple store environments.

Finally, to provide another example of an RDF instruction sequence, Figure 9 diagrams a simple arithmetic instruction sequence that computes .

Fig. 9: An instance of a set of instructions that will set the variable to the value . The bolded terms above the resources denote the rdf:type of the resource. All non-namespaced URIs are part of the http://neno.lanl.gov namespace.

4 Models of Computing on the Semantic Web

The RVM architecture described in the previous section opens up a number of common computing models to the Semantic Web. The following three models will be discussed throughout the remainder of this section.

  • Open computing: an extension of Open Data in which algorithms, virtualized computing machines, and underlying hardware computing resources are made publicly available.323232In another context, the term Open Computing refers to services which allow people to freely use computers in a lab setting. This is not the definition that is used here.

  • Distributed computing: a means by which processes are moved to the data, as opposed to the data to the processes.

  • Reflective computing: as computational process descriptions reside in the URI address space, reflection from the API to the RVM is possible.

4.1 Open Computing

The Open Data movement has “a philosophy and practice requiring that certain data are freely available to everyone, without restrictions from copyright, patents or other mechanisms of control. Its ethos is similar to that of other open movements and communities such as Open Source and Open Access.”333333Quoted from the Wikipedia article on Open Data at http://en.wikipedia.org/wiki/Open_Data.

By Open Computing, we understand that

  • RDF computing instructions should be made freely available and easily accessible for code reuse, and

  • results of popular computations should be made publicly available.

4.1.1 Towards a Web of Programs

A key advantage of the RDF data model is that in using URIs to denote resources, RDF makes resource descriptions distributable in a way which leverages the existing infrastructure of the Web. For example, suppose that a Semantic Web application encounters the following URI:

http://www4.wiwiss.fu-berlin.de/factbook/resource/Nepal .

If the application is capable of dereferencing the URI over HTTP or querying on it through a SPARQL endpoint, it will find RDF statements about Nepal, including demographic and geographical data. The application may then use these statements to solve problems. The practice of serving an RDF representation of a resource against its URI, as well as establishing links between such URIs (for example, owl:sameAs links) is known as Linked Data. Thus, Linked Data is the RDF equivalent of the interlinked hypertext documents which make up the bulk of today’s Web. Furthermore, it is the mechanism by which physically isolated RDF data sets are amalgamated into a truly global Web of data.

The distributed nature of Linked Data provides a strong argument for the representation of programs and program state in RDF. Embedding data structures and algorithms in the web of Linked Data not only makes them universally available, but also eliminates the need for special-purpose software to retrieve and combine programs which reference each other across the underlying physical network. Effectively, generic Linked Data interfaces such as the Semantic Web Client Library343434Semantic Web Client Library is currently available at http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/ serve as language-agnostic program linkers, aggregating procedural as well as purely descriptive RDF data dynamically, as needed in a computation. The development of RDF programming languages were motivated by this idea of a “web of programs”. In RDF programming languages, programs become “first-class” entities in the Linked Data community’s distributed graph data structure. Incorporating them into further programs is as simple as referencing their URIs.

Similarly, as an alternative to what might be called “simply” linked programs, encapsulating a computational object in a named graph (as demonstrated later in Section 4.2.1 and Figure 11) makes those objects available to any application with knowledge of and access to the graph. To reference a computational object, an application must be able to infer from the object the URI of the named graph which describes it, as well as the location of a SPARQL endpoint from which it can retrieve the named graph. Emerging solutions such as the Semantic Web Crawling Sitemap Extension353535The Semantic Web Crawling Sitemap Extension is currently available at http://sw.deri.org/2007/07/sitemapextension/ aid in making this information discoverable by Semantic Web applications.

Finally, in publishing an RDF program to the Semantic Web, it is good practice to provide documentation of the API. Furthermore, it is natural to express Semantic Web API descriptions in RDF, for instance as OWL ontologies. Like the JavaDoc framework, OWLDoc363636OWLDoc is currently available at http://www.co-ode.org/downloads/owldoc/ is a useful aid to developers in learning an OWL API and using it in their applications. The combination of machine-accessible program code and API documentation is particularly appropriate for application scenarios involving the automated discovery and execution of programs.

4.1.2 Memoization and Computational Reuse

In some situations, it is best to query for the result of a previous computation than to re-compute it. This idea is known as memoization [45], and the Semantic Web and its open data philosophy provides an ideal medium for such computational reuse. Consider the simple function , where . If has been previously computed, the result can be represented by the RDF triple .373737For the purpose of this simple example, the caveat that RDF does not allow a literal to be the subject of a triple is ignored. The results of that computation can be reused by another RVM at a later time. Memoization sacrifices space for time. Of course, this is an impractical example, because re-computing is faster than querying the Semantic Web for the mapping. However, for other, more computationally complex operations, querying for a result may be orders of magnitude faster than recomputing it. For example, many graph analysis algorithms have a relatively high complexity, such as PageRank (or eigenvector centrality)—, closeness centrality—, and betweenness centrality—383838This complexity is for unweighted betweenness centrality. Weighted betweenness centrality has a complexity of . See [46] for more information on betweenness centrality., where is the number of vertices in the graph, is the number of edges in the graph, and is the number of iterations. Such algorithmic complexity become important when considering the use of graph analysis algorithms on the Semantic Web [47, 48]. The Semantic Web provides a unique medium by which computations such as these can be stored and later leveraged by other applications. In this sense, not only is metadata open, but so are computational results.

4.2 Distributed Computing

Virtual machine computing provides a layer of abstraction between program instructions and the underlying hardware CPU. It is the role of the virtual machine to serve as a proxy to translate high-level instructions into the respective instruction set of the underlying CPU. While this indirection slows the computation down by incurring a translation step, it permits the same high-level instructions to execute on various physical hardware architectures. This idea is captured in the popular Java slogan of “write once, run anywhere.” In the case of RVMs, this interoperability hides the underlying hardware infrastructure supporting the Semantic Web.

As RDF-based data sets become larger and more numerous, the Semantic Web will be composed of more large-scale RDF repositories serving Linked Data to third-party applications. However, some applications may draw upon more data than can reasonably be moved from server to client. In such instances, it may be worthwhile to migrate an executing program, in the form of RVM state and RDF instructions, to the provider’s environment for local processing. The notion of process migration has been proposed to remedy issues surrounding massive-scale data processing [49, 50] and forms one of the primary purposes of Grid computing [13].

The remainder of this section illustrates one possible mechanism for the migration of an RVM across different hardware hosts in order to accomplish a distributed computation within the Semantic Web infrastructure. However, before doing so, a discussion of the role of named graphs in distributed Semantic Web computing is required.

4.2.1 The Role Named Graphs

In a Semantic Web computing environment where instructions, virtual machines, and data commingle within a single RDF data structure, there is an increased need for trust, security, and provenance mechanisms. For example, it may be necessary to

  • group RVM states and RDF instructions for ease of migration and identification,

  • “sandbox” RVMs and RDF instructions to prevent malicious or poorly written code from destroying a triple store’s data integrity, and

  • control the permissions that foreign RVMs and RDF instructions have in a triple store environment.

The most fundamental construct supporting the above three requirements is RDF reification. Reification provides a way to make statements about statements. In RDF, a reified triple may be the subject or object of another triple. While the concept of RDF reification was initially introduced in the RDF specification with the rdf:Statement construct, recent developments in named graphs (or quads) provide a more manageable solution to triple reification [51].

In a named graph, a “triple” is an ordered set of four elements .393939While such triples are actually a quads, the term “triple” will be used as this is a popular convention. The URI, which represents a named graph, can be used as the subject or object of another statement and thus serves as a mechanism for attaching metadata to the graph. This metadata may include usage statistics, human-level descriptions (e.g. rdfs:comment), access control permissions [52], and/or provenance information.

Figure 10 demonstrates how a named graph can encapsulate data, state, and instructions.

Fig. 10: Using named graphs to encapsulate RVM state and RDF computing instructions. The rdf:type of the RVM state resources is provided in bold next to the resource. All non-namespaced URIs are part of an example RVM namespace denoted http://example.com/rvm.

It is possible to attach security metadata to named graph B such that the RVM and RDF computing instructions contained in it have specific permissions with respect to the triples in named graph A. It is also straightforward to extract the RVM state and its current instructions by simply selecting triples from named graph B. For example, the query

SELECT ?x ?y ?z
  WHERE {
     GRAPH <B> {
    ?x ?y ?z }}

yields the entire B graph and thus, both the RDF computing instructions and the RVM state.

The Neno/Fhat programming environment uses named graphs to encapsulate computational objects for ease of migration, code sharing, and internally for garbage collection. For example, in Figure 11, lanl:marko is a URI; however, at a higher-level of abstraction, lanl:marko is a graph-based object with method declarations and explicit method instructions.

Fig. 11: Using named graphs to encapsulate computational objects. All non-namespaced URIs are part of an example RVM namespace denoted http://example.com/rvm.

Section 4.2.2 will now discuss distributed computing on the Semantic Web using named graphs.

4.2.2 RVM Compute Farms

An open hardware provider may use an RVM farm to manage concurrent RVMs computing on the local triple store. For example, a Linked Data repository may provide an RVM farm that is used to execute RVMs and instructions that are working with the data currently in its repository. An RVM farm polls its associated triple store for non-executing RVM states. Once a non-executing state has been found, the RVM farm will spawn an RVM process to execute it. For instance, an RVM farm might use the query

SELECT ?x
  WHERE {
    ?x <rdf:type> <rvm:RVM> .
    ?x <rvm:needsProcess> "true"^^xsd:boolean }

to locate RVM states that need processing. If a URI binds to ?x, an RVM process is created. The newly created RVM process is passed the bound ?x URI as a parameter. Thus, the newly created RVM process knows which RVM state to harvest and execute. An RVM process maintains no state information and thus, if an RVM process halts (for whatever reason), the current state of computation is maintained in the RVM state. That is, the state of the stacks, program counter, etc. are frozen until another RVM process can continue its execution. In this way, RVM state encoding makes it desirable for migration between triple store environments, and thus, RVM farms.

Figure 12 diagrams a migration pattern between two triple store environments, where one environment is located at 127.0.0.1 and the other is located at 127.0.0.2.

Fig. 12: Migrating RVM and RDF computing instructions between host triple stores.

Starting at in Figure 12 (top left corner), an RVM state and RDF instructions named graph (denoted R/I) exist in the 127.0.0.1 triple store. At , the local RVM farm locates the RVM state and spawns a new RVM process. The RVM process moves the RVM state and instructions to local memory to perform a computation.404040Moving the RVM state and instructions to memory ensure a more efficient use of clock cycles. This was articulated previously when discussing the r-Fhat RVM. Assume that the RVM uses the data in the named graph denoted D1 in its computation. At the RVM has finished its computation with D1 and inserts its state and instructions back into the 127.0.0.1 triple store. The RVM process at 127.0.0.1 notifies the RVM farm at 127.0.0.2 that it has an RVM that wishes to migrate to its triple store. The RVM farm at 127.0.0.2 then harvests the RVM and its intructions using a SPARQL SELECT query or HTTP GET request.414141Another approach would be to have the RVM process at 127.0.0.1 INSERT/PUT the RVM state and instructions into the triple store at 127.0.0.2. Thus, what is migrated is the RDF instructions as well as the state of the RVM which includes, amongst other information, a populated heap that it is using in its computation.

At , the R/I named graph is located in the triple store at 127.0.0.2. At , the RVM farm at 127.0.0.2 locates R/I and then spawns an RVM process. The newly created RVM process moves the RVM state and instructions to main memory for local processing on the D2 named graph data set. When the RVM no longer requires D2 for its computation, it can be moved back to the 127.0.0.2 triple store at and ultimately, migrate to yet another triple store at .

Such a model of distributed computing is dependent on mechanisms of trust and security on the Semantic Web. A simple solution would be to allow a foreign RVM to read, write, or delete from its own named graph and any named graphs that it spawns, but to allow it only to read from other named graphs in the triple store. Furthermore, limiting the number of triples in a named graph can prevent the creation of an excessive amount of data by a foreign RVM. Finally, in Neno/Fhat, a simple “halt” mechanism regulates the number of clock cycles that an RVM state can utilize.

4.3 Reflective Computing

The concept of reflection in computing refers to the ability for software to modify itself during runtime [53, 54]. For instance, in the Java programming language, it is possible for an object to “look at” the API at runtime and make choices as to flow of execution. This is made possible through the java.lang.reflect package of the core Java API. This type of reflection exists because the description of an object is at the same level of abstraction as the object itself. In general, reflective computing is possible when particular states of a program are made available within a representation that is processable by the executing program. In many respects, reflective computing draws many parallels to the common model of reification in logic, and specifically in RDF using named graphs or the rdf:Statement construct. With reification, it is possible to make a statement about a statement. With reflection, it is possible to compute with the description of a computation – whether that description is of a particular state of computation or of the program itself [55].

Reflective computing has found application in heterogeneous object environments where an object may need to discover new objects and “learn” what functionality they have or reason about their functionality before leveraging them within a computation. OWL-S [40] and other web service description frameworks serve a similar purpose in that they provide detailed machine-readable descriptions of the input requirements, processing stages, and ultimate output of a service. With respect to RDF-based programming languages and RDF-encoded RVMs, both procedural and machine information are encoded at the same level of abstraction, namely in RDF, thus making them readily available for run-time analysis. While programming constructs such as packages, class descriptions, methods, and so forth are utilized for the purpose of procedural encapsulation, it is possible to make use of queries on and manipulations of an RDF graph in order to reason on and alter the full computing stack during run-time. This sub-section will present three types of reflection that are made salient in RDF computing: object and method reflection, instruction reflection, and machine reflection.

4.3.1 Object and Method Reflection

A triple store supports three basic operations: read, write, and delete. RDF programming languages make use of these operations to query and manipulate an RDF graph in order to evolve the RDF graph and thus, compute. It is possible for an RDF program to query a triple store in order to retrieve information about the state of an object whether that object be itself or another object. For example, an object might execute a query to locate all objects of rdf:type lanl:Person as well as any methods associated with that object. The following SPARQL query returns a list of instantiated lanl:Person objects and their associated method URIs:

SELECT ?x ?y
  WHERE {
    ?x <rdf:type> <lanl:Person> .
    ?x <rvm:hasMethod> ?y }.

Given the URIs bound to ?x (the URI lanl:Person resources) and ?y (the URI of the lanl:Person methods), the querying object can make decisions as to how to utilize these URIs in its processing. For example, it could decide that if a particular lanl:Person resource has a makeFriend method, then it must be a “friendly” object and will make friends with that object as well as invoke that object to make friends with it. Thus, creating a symmetric foaf:knows relationship.

This type of reflection is analogous to class reflection in Java. For example, in Java, the previous method reflection query may be executing as

Method[] methods = Person.getMethods();

The class Person is queried for its set of methods which are returned as an array of Method objects. These Method objects can then be computed with like any other object. For example,

Person marko = new Person();
Method[] methods =
    josh.getClass().getMethods();
for(Method m : methods) {
  if(m.getName().equals("makeFriend")) {
    marko.makeFriend(josh);
    m.invoke(josh, marko);
  }
}

Both makeFriend and invoke are presented in the last two instruction lines to demonstrate the two ways in which the same method can be executed.

4.3.2 Instruction Reflection

In an RDF computing environment, not only are methods exposed, but so are the instructions that composes those methods. What is returned by Person.getMethods() in the previous example is an array of pointers to the methods that are available from that class. In this way, the program, at run-time, is able to inspect the Person Java API. Once this method pointer has been acquired, in Java, it is possible to invoke the method:

Person marko = new Person();
methods[0].invoke(marko, null);

In an RDF computing environment this is equivalent to adding an Invoke instruction resource as the next instruction in the current instruction sequence. This ensures that the next instruction to be processed by the executing RVM will invoke the method. Assuming y is the URI of the ?y binding of the previous SPARQL select query, the following SPARQL/Update command will provide the appropriate alteration of the flow of execution:

INSERT DATA {
  <ex:001> <rvm:nextInst> <ex:010> .
  <ex:010> <rdf:type> <rvm:Invoke> .
  <ex:010> <rvm:invokeMethod> <y> }.

As demonstrated, it is possible to reason about the current instruction sequence of a program and perhaps, insert new instructions as a result. This type of direct code manipulation supports evolutionary (or genetic) computing: at runtime, new code can be introduced into the system. Again, the runtime creation and manipulation of code is made possible by the fact that the API and the instructions are at the same level of abstraction: URIs, literals, blank nodes, and triples.

4.3.3 Machine Reflection

An RVM may execute instructions that manipulate itself. Thus, a machine can modify itself at runtime. This type of machine reflection is diagrammed in Figure 13.

Fig. 13: RDF reflection at the RVM-level. Time is only specified for the RVM state, not the instructions. The rdf:type of the resource is presented in bold with the resource. All non-namespaced URIs are part of an example RVM namespace denoted http://example.com/rvm.

In Figure 13, at , the RVM (ex:01) is pointing to an instruction (represented by the program counter PC property) that requires the RVM to push a pointer of itself onto its own operand stack (i.e. the Push instruction’s value is the RVM resource). At , the RVM URI is on the top of the operand stack (i.e. rdf:first). Thus, at this point, any instruction that utilizes the operand stack will involve the RVM computing on itself (in Figure 13 this next operation in a NoOp, which will not alter the state of the machine). In this way, it is possible for the RVM to manipulate itself at runtime. If the machine’s process is also encoded in RDF (e.g. by coding Fhat’s process in Neno), then the complete RVM is subject to such machine-level reflection. In short, the RVM’s stacks, frames, program counter, or whichever modeled components the RVM in question represents can be manipulated through machine-level reflection.

The previous example is an extreme case of reflection which is not found in typical programming environments. For instance, in Java, it is not possible for a program to obtain a pointer to the JVM. Furthermore, the JVM is represented according to the instruction set of the underlying hardware CPU and thus, is not at the same level of abstraction. This reduces the ability to facilitate machine reflection. In languages which compile to machine-specific code, such as C and C++, it is possible to get a direct pointer to the program being executed. In this way, a program may be manipulated at runtime.

In an RDF-based programming language, reflection can be applied to the entire computational stack. However, it is still possible to engineer code that respects the common principles of data hiding, encapsulation, and modularity. In many cases, such common development practices are preferable. The purpose of this section was to demonstrate the flexibility of this RDF-based programming style.

In a Semantic Web computing environment where object persistence is an expected feature, the ability for objects to discover, reason, and ultimately interact with other objects will be a key component of RDF-based applications. Moreover, with persistent RVMs, it will be important for RVM processes to discover and reason on RVM states before executing them.

5 Conclusion

The Semantic Web is a distributed environment in which descriptive world-models can be queried and manipulated by external applications. These external applications leverage the world-wide repository of structured multi-relational data for the purpose of computation. This article has addressed the potential role of encoding such applications in the Semantic Web. Given the modeling power of RDF, it is possible to represent not only data, but also software and virtual machines in RDF. RDF-based programming languages were designed to take explicit advantage of the RDF data model. Unlike RDF APIs in other languages such as Java, C, etc., these languages do not require the developer to work with two different data models. There is no disjoint experience for the developer [19]. Furthermore, with the ability to encode virtual machines and their state in RDF, it is possible to migrate software and machines to other data sets around the world. This provides a distribute process infrastructure to the Semantic Web’s distributed data structure. When the more procedural aspects of computing are embedded in the Semantic Web, new computing models emerge that push the Semantic Web towards a distributed general-purpose computing environment.

6 Acknowledgements

Joshua Shinavier of Knowledge Reef Systems Inc. and the Rensselaer Polytechnic Institute and the designer of the Ripple RDF programming language provided much insight during the development of these ideas. Moreover, Josh contributed helpful text to an original draft version. Joe Geldart of the University of Durham and designer of the Adenosine RDF programming language and Callaghan RVM provided assistance in the discussion of Adenosine. Likewise, Chris Goad, the designer of the FABL language and RVM provide assistance with FABL. Ryan Chute of the Digital Library Research and Prototyping Team of the Los Alamos National Laboratory engineered the Neno/Fhat compiler and Fhat RVM process. This work was supported by both the Digital Library Research and Prototyping Team and the Center for Nonlinear Studies of the Los Alamos National Laboratory.

References

  • [1] T. Berners-Lee, J. A. Hendler, and O. Lassila, “The Semantic Web,” Scientific American, pp. 34–43, May 2001.
  • [2] W3C/IETF, “URIs, URLs, and URNs: Clarifications and recommendations 1.0,” September 2001. [Online]. Available: http://www.w3.org/TR/uri-clarification/
  • [3] F. Manola and E. Miller, “RDF primer: W3C recommendation,” February 2004. [Online]. Available: http://www.w3.org/TR/rdf-primer/
  • [4] D. Brickley and R. Guha, “RDF vocabulary description language 1.0: RDF schema,” World Wide Web Consortium, Tech. Rep., 2004. [Online]. Available: http://www.w3.org/TR/rdf-schema/
  • [5] D. L. McGuinness and F. van Harmelen, “OWL web ontology language overview,” February 2004. [Online]. Available: http://www.w3.org/TR/owl-features/
  • [6] L. W. Lacy, OWL: Representing Information Using the Web Ontology Language.   Trafford Publishing, 2005.
  • [7] T. Berners-Lee, “Linked data,” World Wide Web Consortium, Tech. Rep., 2006. [Online]. Available: http://www.w3.org/DesignIssues/LinkedData.html
  • [8] R. Lee, “Scalability report on triple store applications,” Massachusetts Institute of Technology, Tech. Rep., 2004.
  • [9] C. Bizer and R. Cyganiak, “D2R server - publishing relational databases on the semantic web,” in Proceedings of the 5th International Semantic Web Conference, ser. Lecture Notes in Computer Science.   Athens, Georgia: Springer-Verlag, 2006.
  • [10] E. Prud’hommeaux and A. Seaborne, “SPARQL query language for RDF,” World Wide Web Consortium, Tech. Rep., October 2004. [Online]. Available: http://www.w3.org/TR/2004/WD-rdf-sparql-query-20041012/
  • [11] C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee, “Linked data on the web,” in Proceedings of the International World Wide Web Conference, ser. Linked Data Workshop, Beijing, China, April 2008.
  • [12] M. A. Rodriguez, “A distributed process infrastructure for a distributed data structure,” Semantic Web and Information Systems Bulletin, 2008. [Online]. Available: http://arxiv.org/abs/0807.3908
  • [13] I. Foster and C. Kesselman, The Grid.   Morgan Kaufmann, 2004.
  • [14] P. Cudré-Mauroux, S. Agarwal, and K. Aberer, “GridVine: An infrastructure for peer information managment,” IEEE Internet Computing, vol. 11, no. 5, pp. 36–44, 2007.
  • [15] M. A. Rodriguez, Emergent Web Intelligence.   Berlin, DE: Springer-Verlag, 2008, ch. General-Purpose Computing on a Semantic Network Substrate. [Online]. Available: http://arxiv.org/abs/0704.3395
  • [16] P. Hayes and B. McBride, “RDF semantics,” World Wide Web Consortium, Tech. Rep., February 2004. [Online]. Available: http://www.w3.org/TR/rdf-mt/
  • [17] A. M. Turing, “On computable numbers, with an application to the entscheidungsproblem,” Proceedings of the London Mathematical Society, vol. 42, no. 2, pp. 230–265, 1937.
  • [18] J. Shinavier, “Functional programs as linked data,” in 3rd Workshop on Scripting for the Semantic Web, Innsbruck, Austria, 2007.
  • [19] C. Goad, “Describing computation within RDF,” in Proceedings of the International Semantic Web Working Symposium, 2004. [Online]. Available: http://fabl.net/papers/swws.html
  • [20] D. Quan, D. F. Huynh, V. Sinha, and D. Karger, “Adenine: A metadata programming language,” Massachusetts Institute of Technology, Tech. Rep., February 2003. [Online]. Available: http://haystack.lcs.mit.edu/documentation/adenine.pdf
  • [21] A. Seaborne and G. Manjunath, “SPARQL/Update: A language for updating RDF graphs,” Hewlett-Packard, Tech. Rep., 2007. [Online]. Available: http://jena.hpl.hp.com/~afs/SPARQL-Update.html
  • [22] J. Clark and S. DeRose, “XML Path Language (XPath) Version 1.0,” World Wide Web Consortium, Tech. Rep., 1999. [Online]. Available: http://www.w3.org/TR/xpath
  • [23] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, “The Lorel Query Language for Semistructured Data,” International Journal on Digital Libraries, vol. 1, no. 1, pp. 68–88, April 1997.
  • [24] T. Berners-Lee, “Notation 3,” World Wide Web Consortium, Tech. Rep., 1998. [Online]. Available: http://www.w3.org/DesignIssues/Notation3
  • [25] D. L. McGuinness, R. Fikes, J. Hendler, and L. A. Stein, “DAML+OIL: An Ontology Language for the Semantic Web,” IEEE Intelligent Systems, vol. 17, no. 5, pp. 72–80, 2002.
  • [26] B. McBride, “Jena: A Semantic Web toolkit,” IEEE Internet Computing, vol. 6, no. 6, pp. 55–59, 2002. [Online]. Available: http://dx.doi.org/10.1109/MIC.2002.1067737
  • [27] J. Broekstra, A. Kampman, and F. van Harmelen, “Sesame: A generic architecture for storing and querying RDF,” in Proceedings of the International Semantic Web Conference ISWC02, Sardinia, Italy, June 2002. [Online]. Available: http://www.openrdf.org/doc/papers/Sesame-ISWC2002.pdf
  • [28] D. Beckett, “The design and implementation of the Redland RDF application framework,” in Proceedings of the 10th International World Wide Web Conference.   Hong Kong, China: ACM Press, May 2001.
  • [29] J. Geldart, “RDF without revolution: A test of RDF and ontology,” Bachelor’s dissertation, Durham University, April 2005.
  • [30] E. Oren, B. Heitmann, and S. Decker, “ActiveRDF: Embedding semantic web data into object-oriented languages,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 6, no. 3, pp. 191–202, 2008.
  • [31] A. Kalyanpur, D. Pastor, S. Battle, and J. Padget, “Automatic mapping of owl ontologies into java,” in Proceedings of Software Engineering. - Knowledge Engineering, Banff, Canada, 2004.
  • [32] H. H. Wang, N. Noy, A. Rector, M. Musen, T. Redmond, D. Rubin, S. Tu, T. Tudorache, N. Drummond, M. Horridge, and J. Sedenberg, “Frames and OWL side by side,” in 10th International Protégé Conference, Budapest, Hungary, July 2007.
  • [33] H. Boley, J. Mei, M. Sintek, and G. Wagner, “RDF/RuleML interoperability,” in Proceedings of the Workshop on Rule Languages for Interoperability, April 2005. [Online]. Available: http://www.ruleml.org/w3c-ws-rules/rdf-ruleml-w3c-ws.html
  • [34] H. Boley, M. Dean, B. Grosof, M. Sintek, B. Spencer, S. Tabet, and G. Wagner, “FOL RuleML: The First-Order Logic Web Language,” World Wide Web Consortium, Tech. Rep., 2005. [Online]. Available: http://www.w3.org/Submission/FOL-RuleML/
  • [35] H. Boley, “Object-Oriented RuleML: User-Level Roles, URI-Grounded Clauses and Order-Sorted Terms,” in Proceedings of the Workshop Rules and Rule Markup Languages for the Semantic Web, October 2003.
  • [36] M. Ball, H. Boley, D. Hirtle, J. Mei, and B. Spencer, “The OO jDREW Reference Implementation of RuleML,” in Proceedings of the International Conference on Rules and Rule Markup Languages for the Semantic Web, ser. Lecture Notes in Computer Science, vol. 3791.   Galway, Ireland: Springer, November 2005, pp. 218–223.
  • [37] I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean, “SWRL: A Semantic Web rule language combining OWL and RuleML,” World Wide Web Consortium, Tech. Rep., May 2004. [Online]. Available: http://www.w3.org/Submission/SWRL/
  • [38] H. Boley and M. Kifer, “RIF framework for logic dialects,” World Wide Web Consortium, Tech. Rep., July 2008.
  • [39] B. Parsia and E. Sirin, “Pellet: An OWL DL reasoner,” in Proceedings of the International Semantic Web Conference ISWC04, ser. Lecture Notes in Computer Science, vol. 3298.   Hiroshima, Japan: Springer-Verlag, November 2004.
  • [40] D. Martin, M. Burstein, J. Hobbs, O. Lassila, D. McDermott, S. McIlraith, S. Narayanan, M. Paolucci, B. Parsia, T. Payne, E. Sirin, N. Srinivasan, and K. Sycara, “OWL-S: Semantic markup for web services,” World Wide Web Consortium, Tech. Rep., 2004. [Online]. Available: http://www.w3.org/Submission/OWL-S/
  • [41] I. D. Craig, Virtual Machines.   Springer, 2005.
  • [42] D. Coelho and A. Stanculescu, “A state-of-the-art VHDL simulator,” in Thirty-Third IEEE Computer Society International Conference, San Francisco, CA, 1988, pp. 320–323.
  • [43] T. Lindholm and F. Yellin, The Java Virtual Machine Specification.   Addison-Wesley, 1999.
  • [44] P. Leach, “A Universally Unique IDentifier (UUID) URN Namespace,” Network Working Group, Tech. Rep., 2005. [Online]. Available: http://www.rfc-archive.org/getrfc.php?rfc=4122
  • [45] D. Michie, “Memo functions and machine learning,” Nature, no. 218, pp. 19–22, 1968.
  • [46] U. Brandes, “A faster algorithm for betweeness centrality,” Journal of Mathematical Sociology, vol. 25, no. 2, pp. 163–177, 2001.
  • [47] M. A. Rodriguez, “A multi-relational network to support the scholarly communication process,” International Journal of Public Information Systems, vol. 2007, no. 1, pp. 13–29, 2007.
  • [48] ——, “Grammar-based random walkers in semantic networks,” Knowledge-Based Systems, vol. 21, no. 7, pp. 727–739, 2008. [Online]. Available: http://arxiv.org/abs/0803.4355
  • [49] W. C. Hsieh, P. Wang, and W. E. Weihl, “Computation migration: enhancing locality for distributed-memory parallel systems,” in Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming.   New York, NY, USA: ACM, 1993, pp. 239–248.
  • [50] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” in Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation.   Berkeley, CA, USA: USENIX Association, 2004, p. 10.
  • [51] J. J. Carroll, C. Bizer, P. Hayes, and P. Stickler, “Named graphs, provenance and trust,” in The Fourteenth International World Wide Web Conference (WWW05).   Chiba, Japan: ACM Press, 2005, pp. 613–622.
  • [52] P. Reddivari, T. Finin, and A. Joshi, “Policy based Access Control for a RDF Store,” in Proceedings of the 14th International World Wide Web Conference.   Chiba, Japan: ACM Press, May 2005, pp. 78–83. [Online]. Available: http://ebiquity.umbc.edu/get/a/publication/157.pdf
  • [53] P. Maes, “Computational reflection,” Ph.D. dissertation, Vrije Universiteit Brussel, Brussels, Belgium, January 1987.
  • [54] F.-N. Demers and J. Malenfant, “Reflection in logic, functional and object-oriented programming: a short comparative study,” in Workshop on Reflection and Metalevel Architectures and their Applications in AI, 1995, pp. 29–38.
  • [55] J. M. Sobel and D. P. Friedman, “An introduction to reflection-oriented programming,” in Proceedings of Reflection, San Francisco, CA, Apil 1996.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
67599
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description