A Logical Approach to Cloud Federation††thanks: This paper is based upon work supported by the US National Science Foundation through the GENI Initiative and under NSF grants OCI-1032873, CNS-0910653, and CNS-1330659.
Federated clouds raise a variety of challenges for managing identity, resource access, naming, connectivity, and object access control. This paper shows how to address these challenges in a comprehensive and uniform way using a data-centric approach. The foundation of our approach is a trust logic in which participants issue authenticated statements about principals, objects, attributes, and relationships in a logic language, with reasoning based on declarative policy rules. We show how to use the logic to implement a trust infrastructure for cloud federation that extends the model of NSF GENI, a federated IaaS testbed. It captures shared identity management, GENI authority services, cross-site interconnection using L2 circuits, and a naming and access control system similar to AWS Identity and Access Management (IAM), but extended to a federated system without central control.
A Logical Approach to Cloud Federation††thanks: This paper is based upon work supported by the US National Science Foundation through the GENI Initiative and under NSF grants OCI-1032873, CNS-0910653, and CNS-1330659.
|Qiang Cao, Yuanjun Yao, Jeff Chase|
The IaaS market today is dominated by a small number of megaproviders, which compete on price and services for market position, and face disincentives to combine their offerings. However, as the technology develops, some speculate that cloud providers will face natural market incentives to interconnect their service offerings (cloud peering), leading to the emergence of an “intercloud” following the historical development of infrastructure networks including the Internet and the power grid . Peering enables providers to shift load to absorb demand spikes. The IBM Reservoir project [44, 45] and others popularized this model.
An overlapping trend is the emergence of multi-cloud applications that span multiple providers. They occur naturally in cloud peering scenarios, but cloud adopters may also use multiple providers to manage cost or risk. Multi-clouds are also attractive for peer-to-peer application platforms and for services that benefit from proximity to the edge of the network (cloudlet, fog, or locavore computing). The multi-cloud model was also popularized as “sky computing” . Various efforts have sought to develop stacks and standards to launch, manage and/or migrate application networks seamlessly and safely across multiple clouds: these include the Open Cloud Computing Interface (OCCI) , various research works (e.g., ), and Cisco’s Intercloud Fabric offerings .
A decade ago the research community launched major initiatives to combine network testbeds to leverage benefits of scale, diversity, geographic dispersion, and heterogeneity. NSF GENI [11, 14] in the US and FIRE in the EU exemplify this trend. Both initiatives have funded deployment of IaaS federations spanning many sites and providers. They also embody a third dimension of federation: they serve a common community of member researchers, requiring some form of federated identity for their users (a community cloud ). Other recent efforts take a similar approach to linking accounts across providers (e.g., ).
These three dimensions of cloud federation—peering, multi-cloud, and community—present a common set of overlapping challenges for identity, trust, access, and governance. Federation requires some means to represent and certify trust relationships among users and providers, including their terms of peering. It also places new pressure on the mechanisms to manage multi-tenancy, including naming, ownership, and access control of protected cloud objects (machine instances, virtual storage objects, networks), and accounting and accountability for the use of resources. The US government has identified federated/community/multi-cloud scenarios as a priority area for standards, focusing on “credentials, namespaces, and trust infrastructure” .
This paper takes a comprehensive approach to trust infrastructure for cloud federation. We advocate a data-centric approach that captures the attributes and relationships of identities and objects, with trust and authorization based on queries over the data model. Our approach is fully decentralized: participants exchange certificates with statements in a logic language, and issue local queries against locally cached sets of relevant assertions and declarative policy rules. It provides end-to-end authorization : each participant can verify for itself that its interactions comply with its policy based on statements that it has received from other parties. We use a simplified trust logic based on Datalog—a well-studied logic language with a rigorous semantics —within a novel system for managing certificates.
This paper uses the architecture of the GENI deployment as a model for federation. It addresses key issues of federated identity, trust, governance, and coordination that are common to peering, multi-cloud, and community federation scenarios. We show how to capture the GENI trust model using logic, and extend it with access control for protected objects, using features similar to those in Amazon’s Identity and Access Management (IAM ), but built for a multi-cloud scenario. Finally, we show how to authorize linked private networks (virtual private clouds or VPCs) in a multi-cloud, cross-tenant peering of VPC networks, and more complex cross-federation structures.
The contributions of this paper include:
Specify trust and naming for federated IaaS scenarios in a way that captures the naming and trust model of the existing GENI deployment. We show how to use logic to frame the design issues and specify solutions in a way that is concise, precise, and verifiable.
Demonstrate use of trust logic as an implementation technology for federated clouds. The logical specification is directly deployable using the SAFE framework  to manage the exchange of logic content as linked certificates, and execute trust queries against assembled sets of logic statements. (See §2.)
Evaluate the performance of logical federation. Microbenchmarks and synthetic workloads show that key trust operations are fast enough to be practical in a deployment: they are at least an order of magnitude faster than the typical cost of the operations they protect, e.g., instantiating or linking cloud resources.
|root.endorseAggregate(PID)||Issue root endorsement for an aggregate (infrastructure provider).|
|root.endorseAuthority(PID, type)||Issue root endorsement for an authority service to certify users (MA), projects (PA), or tenants (slices: SA).|
|PA.createProject(ownerPID, attributes) returns projectID||Create a project with owner ownerPID, checking its permission. This is an API call of a Project Authority (PA).|
|member(PID, projectID, role, delegatable)||Delegate project membership to PID with a named role.|
|SA.createSlice( ownerPID, projectID, attributes) returns sliceID||Create a slice (tenant) with owner ownerPID in a project, checking its permission. This is an API call of a Slice Authority (SA).|
|delegateSlice(PID, sliceID, perms, delegatable)||Delegate named permissions to operate on a slice.|
|Agg.createSliver( sliceID, attributes)||Check requester’s permission to instantiate virtual infrastructure at this aggregate for use by a slice.|
|Agg.sliceOperation( sliceID, type, attributes)||Check requester’s permission to perform a control action on a slice’s resources at this aggregate.|
|createNameEntry(dirID, <component>, targetID)||Create a name for targetID in naming context of dirID. The caller must control dirID.|
|resolveName( <pathname>) returns ID||Resolve a multi-component pathname, which may cross domain boundaries.|
|createGroup() returns GID||Create a new empty group and return its scid.|
|groupMember(GID, PID, delegatable)||Grant membership in group GID to principal PID; delegatable is a boolean that determines whether transitive delegation is permitted.|
|checkAccess( subjectID, targetID)||Check whether a principal or group with subjectID has the right to access an object with targetID.|
This paper describes a trust core for cloud federation using logic (“CFlo”) based on the SAFE logical trust framework. SAFE factors trust concerns out of the cloud services and tools, and isolates them in application-supplied logic scripts. We implemented CFlo in about 600 lines of SAFE scripting code, including logic templates for all credential formats, exemplary policy rules, and compliance queries. The scripts implement a trust API to manage credentials and make trust decisions (see Table 1). The scripts run in a SAFE interpreter engine that is local to each participant and under its direct control.
This paper is not about SAFE itself, which is the topic of a companion paper . Rather, it is about using logical trust to address a range of issues in cloud federation. SAFE provides an exemplary trust logic language and system that enables us to evaluate how CFlo would perform in a real deployment.
GENI. The design of SAFE was motivated by our experience in applying logical trust in the development of GENI. Although GENI was conceived as a network testbed, it is best understood as a federation of autonomous IaaS providers (“aggregates”) linked by various trust relationships and agreements. GENI serves a community of registered researchers with various institutional and project affiliations. Each provider has various policies governing client access. These policies consider endorsements and delegations of trust among the participants, including a root trust anchor that certifies the aggregates and various authority services to govern membership and coordination. In this respect GENI is representative of federated cloud systems in general, although there are differences in terminology.
GENI uses the slice abstraction for multi-cloud scenarios, first introduced in PlanetLab . A slice is a logical container for a set of virtual resources (e.g., VMs, network links) that may span multiple providers and are allocated and used for a common purpose. A sliver is a typed virtual resource unit that is provisioned from a single aggregate and is named and managed independently of other slivers. Each sliver is bound to exactly one slice at the time that the sliver is created. Users may link slivers from multiple providers to form end-to-end environments (slices) for networked applications.
ExoGENI. One goal of CFlo is to extend ExoGENI [9, 7] to enable richer forms of cross-tenant interaction, including discretionary access control for slivers and cross-slice network peering. ExoGENI is a federation of xCAT/OpenStack cloud clusters on 20 campuses, linked by the Internet2 AL2S and ESnet network circuit fabrics. It supports elastic multi-cloud slices with private networks (VPCs constructed by stitching VLANs at layer2) that may be tenant-managed via OpenFlow. It automates end-to-end assembly of the slice VPC dataplane across multiple providers . To do this, it provisions cross-cloud circuits on demand, bridging among circuit fabrics at exchange points (e.g., at Starlight) as needed. As of February 2017 ExoGENI has supported over 56,000 experiments/slices submitted by more than 1400 distinct users.
To integrate CFlo into ExoGENI, we must modify its control servers to invoke CFlo APIs in a local SAFE engine to check each action for compliance with a trust policy before executing it. Beyond enabling new functionality, CFlo can place the existing security and peering mechanisms in ExoGENI on a more uniform and extensible foundation.
Scope. The CFlo trust scripts implement the GENI trust core in logic. For this paper, we added script support for user groups, hierarchical names, and access control for cloud objects (e.g., ACLs for virtual network links), all modeled on AWS Identity and Access Management (AWS-IAM ). We also added logic for authorization that takes place during VPC stitching in ExoGENI, and combined it with ACLs to enable network peering among tenant VPCs by mutual consent. This paper does not address how the underlying operations (VPCs, L2 stitching) are implemented and orchestrated; refer to [7, 20]. We also do not address resource discovery, resource brokering, or payment models. ExoGENI is based on our earlier work on these topics (e.g., ). Integrating these mechanisms with logical trust is future work.
2.1 Building with Trust Logic
Logical trust has a long history in the research community [2, 34, 23, 31, 38, 22, 3, 1, 46]. (See §5.) Like many logical trust systems, SAFE is a credentials-based PKI system. Each principal has a keypair to authenticate its requests and sign any credentials that it issues. Participants exchange security assertions and policy rules as semantically rich logic-based certificates, and run a local engine to generate proofs of policy compliance end-to-end. Certificates have a period of validity that is checked along with the signature on import or use: the prover sees only logic content that is fresh and authentic.
SAFE’s trust logic is based on Datalog , a rigorously defined and extensively studied general-purpose logic language that is a subset of Prolog, a popular language for logic programming with a standard syntax. It adds a modal operator says to Datalog, enabling its direct use as a logic of belief and attribution, following Binder , SD3 , and SENDLOG .
Datalog content consists of atomic statements (atoms) and rules built up from atoms and the logical operators conjunction and implication. An atom is a predicate symbol applied to a list of parameters, which may be variables or term constants representing principals, objects, or values. Predicate symbols are user-defined: they may represent properties, attributes, roles, relationships, rights, powers, or permissions. Atoms whose parameters are term constants (ground) represent simple assertions equivalent to a row of a database table. Rules embody implication and may contain variables. A rule has a head and a body. The head of a rule is a single atom. The body is a sequence of atoms (goals) separated by commas, which indicate conjunction: all of the atoms in the body must be true for the rule to “fire”. A rule allows the prover to infer that the head is true for some substitution of its variables with constants, if the body is true under that substitution.
In Datalog-with-says, every atom has a first (prefix) parameter representing a principal who says it (the speaker). If the parameter is omitted, it defaults to the current principal ($Self). In this way, a statement about a principal naturally represents a delegation or endorsement that is restricted by the speaker and predicate; another principal considers the statement only if it has a policy rule with a matching goal, conferring trust in the speaker.
Datalog-with-says is sufficiently powerful to represent common access control features hierarchical naming, nested groups, roles and other attribute assertions, ACLs, and capabilities. Delegations may be constrained by a predicate/role and by parameters (e.g., “Alice owns file F”). Conjunctive policy rules permit reasoning from multiple attributes of a principal or object, and policies are mobile: they may be passed in certificates.
SAFE defines conventions for self-certifying term constants (IDs) to name principals and objects. A principalID is a SHA hash of the principal’s public key, following SPKI/SDSI . All statements in a valid certificate must have a speaker ID that matches the issuer who signed the certificate. Each object named in a logic statement has some principal who is its controlling authority. The objectID consists of an identifier (a UUID/GUID) chosen by its authority, concatenated with the authority’s principalID to form a self-certifying identifier (scid). SAFE scripts use a builtin function rootID to obtain a scid’s controlling principalID. Self-certifying IDs ensure that parties have distinct names for their objects, and a malicious principal cannot “hijack” another’s names. In this way logical trust extends conventional identity-based PKI security to incorporate rich statements about principals, objects and their security attributes, and avoids the need for a global naming root.
2.2 SAFE Logic Scripting
SAFE synthesizes elements from previous trust logic systems and extends them with additional system support to enable practical deployment. The novel elements of SAFE include a scripting language to insulate applications from logic concerns, and an interface to a shared key-value store (e.g., a DHT), which stores authenticated logic content as signed certificates in a native SAFE format. Certificates in the store are indexed by self-certifying links (tokens), and can be written only by their issuers. The application trust scripts contain parameterized logic templates to generate certificates easily, and also to link certificates to construct DAGs programmatically as a side effect of delegations.
This use of certificate linking simplifies discovery and retrieval of the content relevant to a trust decision. The certificate links (tokens) also enable pass-by-reference and caching of certificate content at the authorizers. The shared certificate store enables an issuer to update or revoke its certificates by their tokens, addressing common PKI concerns.
Scripting is organized around the abstraction of logic sets — sets of logic statements that represent credentials, delegations, endorsements, and policies. Scripts use templated constructors (defcon) to construct and modify sets and link them to form unions.
A principal may issue (post) its logic sets and share them by reference; posted sets are materialized as certificates spoken by the issuer and signed under its keypair. A posted set is accessible to any client that knows its token, but only its issuer can modify it. Scripts name their locally constructed sets with arbitrary string names (labels); the token is a SHA hash of the issuer ID and the label. Thus tokens are “unguessable”, but anyone who knows the label and the issuer’s public key can synthesize a set’s token. Some CFlo script actions (e.g., name resolution) obtain links in this way.
SAFE guard scripts (defguard) combine linked sets to construct query contexts, and issue queries to check policy compliance for trust decisions. SAFE fetches a certificate when a guard references a logic set by its token. After validation SAFE extracts the semantic content of the certificate into a logic set cached in an in-memory set cache. The scripts deal only with the semantic content: the SAFE runtime encodes and decodes logic material, handles cryptographic operations, and performs fetch, retrieval, and caching automatically and transparently.
We assume that all CFlo participants run scripts with common logic/certificate templates, although they may install different policy rule sets. (This assumption assures interoperability, but it is not required for security.) Each participant’s TCB includes the interpreter and scripts, which are all under its direct control: authorization is naturally end-to-end .
3 Logical Cloud Federation
Cloud peering and multi-cloud models raise the question of how providers are qualified to serve users, and the degree of trust that users have in them. The community model raises the question of how providers authenticate consumers (users), qualify them for service, and hold them accountable for their actions in the cloud. A user’s privilege at a provider is based on membership and roles within organizations, relationships and agreements among organizations and providers, and community policies and provider policies for authorization and resource management. These affiliations, roles, relationships, and policies may be dynamic.
This trust information flows from organizational processes outside the scope of the trust system, but the system must capture it and reason about it. Key aspects of trust in federated systems reduce to choices about whose assertions to believe or whose commands to accept. Trust logic offers a formalism to represent these choices. This section presents examples from CFlo to illustrate this power and flexibility and to expose key issues and techniques for cloud federation. They also illustrate the role of linking to organize sets and certificates in SAFE; some links and labels are omitted or simplified for brevity. Figure LABEL:fig:CFLO-name-groups and LABEL:fig:geni-fed-linking illustrate some linking patterns relevant to this section.
Listing 1 shows how to generate a logic set from a template in a script. This rule defines a set constructor (defcon), which returns a logic set formed by substituting script variables in a template. Each item listed within the brackets is a logic statement with an application-defined predicate asserting an attribute for this user: the value of the ?User variable resolves to the user’s PrincipalID and is substituted in the template using the $ operator.
The set is materialized as a certificate signed by its issuer, the principal executing the script ($Self). If another principal (an authorizer) imports the certificate, its prover sees each statement within the set as spoken by the authenticated issuer. Any principal may issue an endorsement, but an authorizer considers them according to its policy rules to determine whether or not to accept any given statement based on the identity of its speaker.
In this case, policy rules reject such endorsements unless they are issued by a Member Authority service endorsed by a federation trust anchor, as specified by the policy in Listing 2. These statements are policy rules: the terms ?U, ?MA, and ?R are variables. These rules specify conditions to accept that a given principal is a registered user in the federation with the attribute fedUser and/or fedLeader. The first rule concludes that a ?U (whose value is the principalID of a user) is a member only if some principal ?MA says that ?U is a member and ?MA is locally accepted as a Member Authority (MA).
The other rules in Listing 2 have a similar structure. These rules are examples of attribute-based delegation: they accept statements based on the attributes of their speakers. The third rule says that a principal is accepted as an MA only if a fedRoot trust anchor says that it is an MA. A server’s operator may configure its accepted trust anchors by asserting them as fedRoot facts. Given a certificate from a configured root anchor endorsing an MA, and a certificate from the MA endorsing a user, these rules accept the user endorsement.
This example shows how to establish authority services in a federation to certify users (and providers) and attest to their attributes. The MA bases its assertions on external information about the users, e.g., from a Web identity (SSO) protocol such as OAUTH or Shibboleth/SAML . For example, GENI runs a portal service that harvests attributes about each academic user from a Shibboleth identity provider (IdP) at the user’s institution. Once logged in, the user may supply a profile and accept required conditions. If the user provides its key hash, the portal may issue endorsements to approve its principalID as a federation user (fedUser) or a research team leader (fedLeader) based on attributes supplied by the IdP (e.g., user is a faculty member).
In this example the participants accept authorities endorsed by a common trust anchor, but they might instead configure local policies for accepting authorities. They might select a locally accepted set, or subscribe to multiple root anchors.
3.1 Groups, Names, and Authority
It is often useful for participants to assert their own attributes about one another. For example, AWS-IAM provides a rich API for user-defined groups, a common basis for access control. In CFlo, any principal may declare a group as an object, and issue certificates granting ownership or membership in the group with named privileges or roles. Members may delegate their rights to others transitively using a capability model. CFlo uses a standard set of logic rules (not shown) to govern this delegation by checking endorsement chains similar to the rules in Listing 2. The rule set is linked to the group, and may be customized, e.g., to manage specific roles or privileges.
Listing 3 shows a constructor to create a group, and Listing 4 shows a simple constructor to grant membership in a group. When invoked with concrete IDs as parameters, these constructors return sets with logical assertions declaring the existence of the group, its owner and members, and its governing policy set. These sets may be posted as linked certificates, enabling other parties to query group memberships, e.g., to control access.
A principal may also issue a symbolic string name for any ID, specifying any object that it controls to serve as a parent context for the name (i.e., a directory). Listing 5 shows a constructor that generates a set with the name entry as a logic statement. When posted as a certificate, its token is hashed from the parentID and name string. A resolution procedure can synthesize this token and retrieve the set to look up the objectID by its name, given the parentID. The named object may itself act as a parent/directory for another component of a hierarchical pathname.
These primitives enable a common namespace of groups and other objects that span principals (naming domains). Because the named object in Listing 5 may be controlled by a different principal, the name space is federated in a structure equivalent to DNSSEC . CFlo includes a script to resolve and certify hierarchical names relative to a root object chosen by the caller. To share a common name space, participants must choose a common root by some convention, as with DNSSEC. A federation may certify the naming authority as with the Member Authority example in Listing 2. A common naming root enables CFlo to create a name space equivalent to the URN conventions used in GENI , which relies on an external service—DNS.
In the same way, a federation may wish to designate authorities to control the creation of groups used for federation-mandated access policies, as opposed to user-defined policies. GENI takes this approach to manage an authoritative space of project groups to organize user activity. Creating a GENI-sanctioned project is an action reserved to a designated authority role—Project Authority (PA)—which serves requests to create projects and is endorsed by the federation root.
A PA restricts the creation of projects to qualified users—for example, users qualified as research team leaders as shown in Listing 1 by the fedLeader attribute. This is important because all user activity in GENI is associated with a project, and the project leader is accountable for that activity.
Listing 6 gives a simple example of a policy guard to enforce this restriction, including a simple policy rule set linked by a standard label. The PA server invokes the defguard action in Listing 6 on a request. The guard creates a set of statements, similarly to the constructor examples, and then issues an approveProject query against this set (the query context). The query is a guard condition: if it is provable, then the request is approved, else it is denied. The policy rule at line 2 concludes approveProject if the project owner (in this case the $Subject who issued the request) is a fedLeader, as governed by the rules in Listing 2.
The guard imports all of the needed rules through links to the policy set (line 8) and to the authorizer’s AnchorSet (line 7), its set of configured facts (e.g., trust anchors) and rules, including the rules in Listing 2. It also imports a standard BearerRef variable, which resolves to a token that must be passed by the requester. For example, if the requester passes a link to the certificate issued in Listing 1 (or any set that links to it), then the guard fetches the MA’s endorsement assertions into the context. These in turn link to the root’s certification of the MA, enabling the query to succeed. This example illustrates the power of certificate linking in assembling a query context for a guard. Figures LABEL:fig:CFLO-name-groups and LABEL:fig:geni-fed-linking illustrate linking patterns for CFlo.
3.2 Resource Access
To determine whether or not to approve a given request for resources, a provider policy may consider the purpose and authority of the request as well as the identity and attributes of the requester. In GENI, every request for resources (i.e., a sliver) is linked to a slice, and every slice is linked to a project. The project and slice are objects that may have arbitrary attributes associated with them (e.g., high priority, top secret) by their controlling authorities.
Since a provider’s policies may use these attributes to govern resource access and accounting, the provider must accept the authorities that certify them, e.g., they must be federation-approved like the MA in Listing 2 and the PA in Listing 6. GENI defines a third authority role (Slice Authority, SA) to approve creation of slices. As with all of the authority types, there may be many SAs in the federation. Providers may choose which authorities to accept, and they may consider attributes of the authorities as well in their policy decisions.
Listing 7 shows an exemplary policy rule used by an SA guard to create a slice. To approve a request, the SA must be convinced that it is associated with a valid project group (line 5) approved by an eligible PA (line 4), and that the subject has permission within the project to bind a slice to it according to the project policies (line 6).
Listing 7 illustrates the use of the RootID builtin to obtain the ID of the controlling principal (the PA) for the project from its object ID. Like all objects, the project ID is named by a self-certifying identifier that incorporates the principal ID. It also illustrates how a policy rule can delegate policy control to rules issued by another principal (the PA) and evaluated locally (policy mobility). The PA’s policy rules for the project group are spoken by the PA and linked from the project set (Listing 6, line 8); the guard fetches them when it pulls the closure of the requester’s BearerRef: the BearerRef links to its membership certificate (Listing 4), which links to the policy set at line 4. Trust logic enables an authorizer (the SA) to evaluate policy rules spoken by another party (the PA) to determine if that party “says or believes” that the request is valid according to its own policies (lines 5-6 of Listing 7); attribution is sound across inference. Of course, the authorizer may add restrictions of its own.
Similarly, Listing 8 shows an exemplary policy used by a provider as a condition to authorize a caller to control a slice, e.g., to approve a resource request for the slice. The slice is also associated with its own group whose members have various roles in the slice, and may obtain these privileges through group delegations according to the policy of the controlling principal—the slice’s SA. As with all groups, the members may have been endorsed by different MAs in this federated system, e.g., they may be associated with different institutions.
Once a request for cloud resources is authorized, a provider may limit, delay, or reject the request based on a separate resource allocation policy. This policy may consider arbitrary attributes of the user identity, slice, or project, and/or attributes of their approving authorities. For example, a simple policy might be to treat projects as a unit of accounting, analogous to accounts in AWS.
3.3 Protected Objects and ACLs
Cloud services enable their users to assemble sets of virtual resources, including VMs, images, storage buckets, and network links—slivers. Advanced cloud systems like AWS enable account owners to control access to these resources for users within their accounts. (AWS accounts are identity domains and also are similar to projects in that all slivers are linked to an account.) AWS-IAM allows account owners to organize their objects within a hierarchical name space, manage groups of users, and attach policies to groups of users and objects (e.g., objects with common name prefixes) governing access on the basis of user and group identities.
While AWS is controlled by a single provider, the group and naming mechanisms outlined above are sufficiently powerful to extend these features to a federated system. CFlo enables users to manage their own groups and control access to their objects on the basis of those groups or groups created by others. The objects and groups may originate anywhere within the system. What is needed is to add fine-grained ACLs to objects, including slivers.
GENI bases access to a sliver on a requester’s role in the containing slice. Listing 9 gives an example of a guard rule for control of a sliver under the GENI model. It simply checks that the requester has control privilege in the sliver’s slice under the policy of the controlling SA. The structure is similar to Listing 8, but it illustrates the association of the sliver with its slice (line 3). CFlo asserts a sliverOf statement in a set when a sliver is created (see below), along with a name and other attributes. The sliver set links to a name entry, an ACL set, and the containing slice; the closure of all of these sets are fetched into the context for guard operations involving the sliver.
An ACL is a logic set containing a list of policy rules each stating that a specified identity or group (or a conjunction/intersection of groups) has access to the protected object. Listing 10 shows how CFlo checks access to a sliver according to its ACL, by querying an access condition (sliverPriv at line 5). Note that the sliver may be associated with a different provider: the rule identifies the provider (line 3), validates it as a qualifying peer (e.g., endorsed by a common root anchor), and checks sliverPriv access according to its policy. The ACL is a set of rules to infer sliverPriv. This access is based on any rules installed in the ACL set, or control over the containing slice. Rules are added to the ACL by a guarded operation that requires control over the sliver.
3.4 Stitched Interconnection
While it may seem odd to operate on slivers across provider boundaries in Listing 10, CFlo uses this to authorize stitching operations on cross-aggregate network links—dynamic circuits. ExoGENI defines a sliver type called stitchport to represent a logical network endpoint that is stitchable at an adjacent switch. Abstractly, a stitchport occupies some tag that is unique among other endpoints in a network zone of location that is controlled by a single provider. An endpoint of a network link—a locally attached circuit or a slice dataplane network (VPC)—is assigned a VLAN tag that is unique within the containing provider’s network, which may be zoned for scaling. VLANs may be stitched to node slivers (VMs), or to other VLANs (with tag translation). Cross-provider stitching occurs at zone borders. Example code and explanation omitted for space: one column.
A key element of this scenario is that adjacency implies trust among the adjacent providers, who are cooperating to establish a virtual network spanning providers within a federation. As with examples above, the provider who executes the operation is trusted to respect and enforce the federation policy. In general, providers control their own domains, and any compliance with external policies is inherently voluntary. Participants respect these policies because they agreed to do so as a condition of their cooperation. The federation trust structure and root endorsements ensure that they do not expose themselves to other parties who are not trusted to respect rules within the federation.
We evaluate logical federation by running representative workloads on a cluster of SAFE instances loaded with CFlo trust scripts for cloud federation. Each SAFE instance is a Scala process serving a REST API to invoke its trust scripts. For these experiments, we evaluated the cost of logical trust with a multi-threaded load generator process that invokes the CFlo trust scripts directly according to synthetic request mixes designed to demonstrate and stress specific functions and behaviors in a federated cloud. The SAFE engine and scripts handle all certificate generation, validation, and logical policy compliance checking needed to implement these functions. The point is to show that these trust functions for a federated cloud can be implemented compactly using scripted logical trust (about 600 lines), and that the resulting implementation is fast enough to use in practice.
In a real deployment, each individual cloud site manager and each control server (e.g., an authority for slices or projects) is a server that possesses an RSA keypair (it is a principal) and runs a private SAFE engine as a local companion process. Each server uses the REST API to invoke its trust scripts in its local engine through a protected socket. Each server trusts its local engine and scripts to fetch and validate relevant certificates for its clients, to perform all access checks for its policy, and to generate certificates with its keypair and post them as needed, as programmed in its CFlo scripts.
We measure the client-perceived end-to-end latency for canned sequences of operations that implement basic cloud functions as described above. For example, a user U1 creates a project and delegates membership to U2, who requests to create a slice in the project, and then populates the slice with resources (e.g., VMs), perhaps provisioned from multiple sites and linked together in various ways. We measure the combined costs for all trust-related functions needed for these sequences: all round-trip script calls, certificate handling, posting/sharing certificates through the shared certificate store, script interpreter costs, and logic query prover/inference costs. We exclude costs for any actual manipulation of cloud resources (e.g., virtual machine provisioning) that would occur after request authorization is complete: those operations are implemented at a different layer (e.g., ExoGENI/OpenStack), and their costs are independent of the logical trust architecture.
For these experiments we serve the SAFE/CFlo calls of multiple participating principals on the same engine. In this way we measure the throughput that each engine can achieve under a heavy logic service mix that is representative of all the trust-related functions for a federated cloud. In a real deployment these costs are spread across many servers (e.g., one per principal) in parallel: capacity scales with the size of the federation. The system’s only fundamental scaling bottleneck is the underlying certificate store—a scalable key-value store. However, our bundling approach results in higher ratios in the logic set cache than the cloud servers would see in practice, reducing costs for fetches and signature checking.
Each SAFE/CFlo engine instance runs on a four-core KVM (Intel Xeon CPU E5520 @ 2.27GHz) with 12 GB of RAM and 1Gb/s Ethernet. One-way network delay between two instances is 0.46 ms. The certificate store runs on five similar VMs running Riak 2.1.4  with a replication degree and , . Each posted logic set is materialized as a certificate with a 2048-bit RSA signature. Tokens and principal IDs are self-certifying 256-bit SHA hashes (44-byte base64-encoded). The logic payloads of certificates range from 467-840 Unicode characters. All keypairs are pre-generated.
For these experiments, we created a synthetic federation with a root, ten cloud providers, and two authorities of each type (MA, PA, SA). We created 20K federation users, 10K of whom are team leaders, 10K projects (one for each leader), 5000 slices from users who have delegated membership in randomly selected projects, and tens of thousands of slivers. For the stitching experiments we created slices spanning all providers, and linked their dataplanes in rings while creating additional slivers.
Note that the cost for each request depends only on the number of certificates linked into the logic sets that are relevant to it, and the complexity of the CFlo policy applied to them: the certificate linking abstraction enables a server to identify and fetch the relevant certificates as needed. In particular, the cost for a request scales with the number of principals involved in that request—for example, the length of the delegation chain for a slice permission (capability). The load generator selects principals and objects randomly for each request, so the scale of the system—the total number of principals and objects and the number of participating cloud servers— influences only the effectiveness of the logic caches in each server, and not the processing costs for each request. The cost per principal or per request of using trust logic is the same at 100K principals/sites or a million principals/sites.
|# CFlo||# CFlo||Latency||Throughput|
Table 2 lists standard operations and their 95% latencies and peak throughputs on a single 4-core SAFE instance. Each high-level operation is implemented as a sequence of underlying CFlo API calls, including those shown in Table 1. The load generator is multi-threaded, so the sequences are interleaved. Figure 1 shows the distribution of latencies at both granularities: complete sequences, and individual primitive operations within the sequences. The results reflect latencies in the tens of milliseconds to issue certificates due to signing and posting costs, and much lower costs for the more common verify operations due to caching and other factors, as expected. Fetch latencies due to cache misses are visible in the latency distributions.
We also performed experiments with multiple SAFE instances in which each principal is assigned randomly to an instance, which performs all operations requested by that principal. This shows that the code can run in a fully distributed deployment, but the results do not add much insight. They show additional costs to fetch and share certificates through the shared store; this cost is sublinear in the number of certificates involved in each request, and is determined by access latency to the store. The fetches are partially parallelized according to the structure of the linked DAG.
These results show that operation costs for logical trust are practical for real deployments. The logical trust model is flexible and can represent a wide range of trust delegations and access policies concisely. It makes it possible to build and operate complex federations—and other multi-domain applications—with a small amount of “extra” code to capture trust concerns.
For these typical operations and scenarios in the cloud federation example, SAFE identifies and retrieves a tightly bounded superset of relevant certificates for each trust decision automatically, and the cost of compliance checks is linear with proof length. However, more complex policies may show higher costs, particularly for disjunctive policies (complex ACLs, cross-federation with multiple trust anchors). The multiple branches force the prover to search each branch looking for a proof. For example, a user request for access may search a long list of groups in an ACL, looking for one that includes the requester.
To illustrate this concern and focus on the cost of the logical reasoning itself, Figure LABEL:fig:safe-time shows the logical inference cost for access checks against a list of groups in an ACL, as a function of the length of the ACL list and the depth of delegation of the user’s membership in a single group in the list. Costs grow with the number of disjunctions (ACL length), as well as the cost to traverse the group delegation chain to form the proof of access. This delegation cost is linear in SAFE due to the use of a secondary index.
Overall, the results suggest that logical inference is cheap in the common case given that the certificate linking structures constructed by the CFlo scripts focus the prover on relevant logic content, and prune out extraneous statements. Thus logical trust is cheap in the common case: cost grows with the complexity of the policies, but we pay only for the policy complexity that we use.
5 Related Work
PlanetLab. The PlanetLab  network testbed is an early example of a distributed cloud. The terms slice and sliver and our exemplary model of slice-grained access control and signed capability-based delegations for projects and slices—as used in GENI—is derived from PlanetLab. We show how to implement these (and many other trust features that go beyond PlanetLab) in a unified and flexible logical system that can also capture a wide range of alternatives summarized below.
Cloud federation standards. The OGF Open Cloud Computing Interface (OCCI) standard API for cloud services . An IEEE working group is developing standards for cloud peering (Intercloud Interoperability and Federation IEEE P2302), supported by an Intercloud Testbed Initiative . Papers summarizing the effort and its trust architecture include [12, 13]. Briefly, it proposes federated identity management that encompasses the providers, with common trust anchors (e.g., cloud exchanges) certifying the providers (similar to the trust structure in this paper), and provider groups (trust zones) that reflect varying levels of trust of the providers. It raises the problem of how to incorporate dynamic trust into certificates issued by the anchors; we show how to solve that problem.
FIRE. The EU-FIRE federation architecture  plans a similar certification of identity providers and brokering services from a federation trust anchor, and rules-based authorization by participating providers. BonFIRE  uses a similar structure and supports OCCI.
Grid. The evolution of security architecture for grid computing  reflects similar concerns and choices. For example, many deployed grids today bridge web single sign-on (SSO) identity services such as Shibboleth  to a PKI-based certificate system for hands-free user control; examples include recent versions of MyProxy , the Short-Lived Credential Service portal (SLCS), and several others. GENI MemberAuthority (MA) is similar to these; they are also known as identity brokers. Many grid systems employ a service called Virtual Organization Management Service (VOMS ) to manage user membership in Virtual Organizations (VOs), which are groupings of principals spanning multiple identity domains. The VOMS issues credentials as X.509 attribute certificates signed under its own keypair and binding a user’s public key to one or more roles scoped to a named VO. VOs are similar to groups or projects in this paper.
Logical trust. Trust/authorization logic [2, 34, 23, 31, 38, 22, 3, 1, 46] is a unifying formalism that can capture these attribute-based mechanisms and policies declaratively and concisely, minimizing the need for custom software, formats, and protocols to implement each design choice. The contributions of this paper (and of SAFE)—generalized certificate linking with a common certificate store, programmable scripting, and layered cloud federation—are independent of the trust logic in use. We prefer to use a standard logic (Datalog) to balance expressive power, tractability, and accessibility for practical use. In fact, Datalog-with-says is provably the most expressive tractable logic for trust: other logics are either less powerful and lack essential features such as conjunction (SPKI/SDSI [23, 36]) or objects (RT0 ), or are merely syntactic variants of Datalog, or else are intractable and are therefore (in our view) not suited to practical use. One contribution of this paper is to show that Datalog is sufficient to represent cloud federation needs without these more complex logics.
Grid-inspired research has yielded several PKI-based trust systems that are logical in that they combine roles and delegations with some form of declarative policy . Examples include the PERMIS [18, 17] system used in European grid initiatives. These systems generally follow the approach pioneered by SPKI/SDSI, but they introduce custom policy languages. Most recently, FLANC has been proposed as a custom logic for software-defined network exchanges (SDX) , but it is no more powerful than Datalog with constraints , or else it is intractable.
GENI-ABAC. GENI uses custom certificate formats and custom validation code to implement its trust model, but alternative support for logical trust exists based on the ABAC software from USC-ISI , which is based on the RT family of logics . We contributed substantially to the GENI-ABAC design. However, GENI abandoned logical trust in favor of more ad hoc approaches for reasons of expediency in the face of various practical concerns: difficulty in identifying relevant credentials and passing them, difficulty in integrating with established software in multiple languages, and lack of expressiveness. This paper shows how to address these practical concerns via certificate linking, passing certificates by reference through a shared repository, certificate caching, decoupling of logic concerns from the application into trust scripts, integration of SAFE as a local process that interprets trust scripts and is accessed through a REST API, and use of a Datalog-complete trust logic with a standard syntax and a lightweight service-oriented implementation.
SAFE is a trust management system that uses a trust logic to represent policies, endorsements, and delegations. SAFE supports semantically rich certificates and a logic-based authorization engine implemented in a comprehensive framework that materializes logic sets as certificates and stores them as linked DAGs in a common key-value store. CFlo uses SAFE to implement the GENI trust and naming model, and extends it to support richer access control, cross-slice peering, and federation peering.
Trust logic is useful as a specification tool for federated cloud architecture, independent of the implementation. With SAFE, logical trust also enables a practical and concise implementation using declarative policy. This enables deployments to use a wide range of trust structures and policies specified in declarative logic, using the same software base. The policies and trust structure may evolve over time without modifying the software.
-  M. Abadi. Variations in access control logic. In Proceedings of the 9th international conference on Deontic Logic in Computer Science, DEON ’08, pages 96–109, Berlin, Heidelberg, 2008. Springer-Verlag.
-  M. Abadi, M. Burrows, B. Lampson, and G. Plotkin. A calculus for access control in distributed systems. ACM Transactions on Programming Languages and Systems (TOPLAS), 15(4):706–734, Sept. 1993.
-  M. Abadi and B. T. Loo. Towards a declarative language and system for secure networking. In Proceedings of the 3rd USENIX International Workshop on Networking Meets Databases, NETDB’07, pages 2:1–2:6, Berkeley, CA, USA, 2007. USENIX Association.
-  R. Alfieri, R. Cecchini, V. Ciaschini, L. dell’Agnello, Á. Frohner, A. Gianoli, K. Lõrentey, and F. Spataro. VOMS, an Authorization System for Virtual Organizations. In F. Fernández Rivera, M. Bubak, A. Gómez Tato, and R. Doallo, editors, Grid Computing, volume 2970 of Lecture Notes in Computer Science, pages 33–40. Springer Berlin / Heidelberg, 2004.
-  Amazon Web Services, Inc. Amazon identity and access management (IAM). https://aws.amazon.com/iam/.
-  L. Badger, D. Bernstein, R. Bohn, F. de Vaulx, M. Hogan, M. Iorga, J. Mao, J. Messina, K. Mills, E. Simmon, A. Sokol, J. Tong, F. Whiteside, and D. Leaf. US Government Cloud Computing Technology Roadmap Volume I. Useful Information for Cloud Adopters. NIST Cloud Computing Program Information Technology Laboratory, Oct 2014.
-  I. Baldin, C. Castillo, J. Chase, V. Orlikowski, Y. Xin, C. Heermann, A. Mandal, P. Ruth, and J. Mills. ExoGENI: A Multi-Domain Infrastructure-as-a-Service Testbed. In GENI: Prototype of the Next Internet. Springer-Verlag, 2016.
-  I. Baldine, Y. Xin, A. Mandal, C. Heermann, J. Chase, V. Marupadi, A. Yumerefendi, and D. Irwin. Autonomic Cloud Network Orchestration: A GENI Perspective. In 2nd International Workshop on Management of Emerging Networks and Services (IEEE MENS ’10), Co-Located with GLOBECOM’10, Dec. 2010.
-  I. Baldine, Y. Xin, A. Mandal, P. Ruth, A. Yumerefendi, and J. Chase. ExoGENI: A multi-domain infrastructure-as-a-service testbed. In TridentCom: International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, June 2012.
-  J. Basney, M. Humphrey, and V. Welch. The MyProxy online credential repository. Software Practice and Experience, 35(9):801–816, July 2005.
-  M. Berman, J. S. Chase, L. Landweber, A. Nakao, M. Ott, D. Raychaudhuri, R. Ricci, and I. Seskar. GENI: A federated testbed for innovative network experiments. Computer Networks, 61(0):5 – 23, 2014. Special issue on Future Internet Testbeds – Part I.
-  D. Bernstein and D. Vij. Intercloud security considerations. In Second International Conference on Cloud Computing Technology and Science (IEEE CloudCom), pages 537–544. IEEE, 2010.
-  D. Bernstein and D. Vij. Intercloud exchanges and roots: topology and trust blueprint. In Proc. of 11th International Conference on Internet Computing, pages 135–141, 2011.
-  M. Brinn, N. Bastin, A. Bavier, M. Berman, J. Chase, and R. Ricci. Trust as the foundation of resource exchange in GENI. In Proceedings of the 10th EAI International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities (TridentCom). European Alliance for Innovation, June 2015.
-  Q. Cao, V. Thummala, J. S. Chase, Y. Yao, and B. Xie. Certificate Linking and Caching for Logical Trust. http://arxiv.org/abs/1701.06562, 2016. Duke University Technical Report.
-  S. Ceri, G. Gottlob, and L. Tanca. What You Always Wanted to Know About Datalog (And Never Dared to Ask). IEEE Transactions on Knowledge and Data Engineering, 1(1):146–166, 1989.
-  D. Chadwick, G. Zhao, S. Otenko, R. Laborde, L. Su, and T. A. Nguyen. PERMIS: a modular authorization infrastructure. Concurrency and Computation: Practice and Experience, 20(11):1341–1357, August 2008.
-  D. W. Chadwick and A. Otenko. The PERMIS X.509 role based privilege management infrastructure. In Proceedings of the Seventh ACM Symposium on Access Control Models and Technologies (SACMAT), pages 135–140, June 2002.
-  D. W. Chadwick, K. Siu, C. Lee, Y. Fouillat, and D. Germonville. Adding federated identity management to OpenStack. Journal of Grid Computing, 12(1):3–27, 2014.
-  J. Chase and I. Baldin. A Retrospective on ORCA: Open Resource Control Architecture. In GENI: Prototype of the Next Internet. Springer-Verlag, 2016.
-  Cisco Systems. Cisco Intercloud Fabric: Hybrid Cloud with Choice, Consistency,Control and Compliance.
-  J. DeTreville. Binder, A Logic-Based Security Language. In IEEE Symposium on Security and Privacy, pages 105–113. IEEE, May 2002.
-  C. Ellison, B. Frantz, B. Lampson, R. Rivest, B. Thomas, and T. Ylonen. SPKI Certificate Theory. RFC 2693 (Experimental), September 1999.
-  T. Faber, S. Schwab, and J. Wroclawski. Authorization and Access Control: ABAC. In GENI: Prototype of the Next Internet. Springer-Verlag, 2016.
-  I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the grid: Enabling scalable virtual organizations. Int. J. High Perform. Comput. Appl., 15(3):200–222, 2001.
-  Y. Fu, J. Chase, B. Chun, S. Schwab, and A. Vahdat. SHARP: An Architecture for Secure Resource Peering. In Proceedings of the 19th ACM Symposium on Operating System Principles, October 2003.
-  A. Gupta, N. Feamster, and L. Vanbever. Authorizing network control at software defined internet exchange points. In Proceedings of the 2016 Symposium on SDN Research, SOSR ’16, 2016.
-  J. Howell and D. Kotz. End-to-end authorization. In Proceedings of the 4th Conference on Symposium on Operating System Design & Implementation - Volume 4, OSDI’00, pages 11–11, Berkeley, CA, USA, 2000. USENIX Association.
-  T. P. Hughes. Networks of power: electrification in Western society, 1880-1930. JHU Press, 1993.
-  IEEE. Intercloud Testbed Initiative. http://www.intercloudtestbed.org/.
-  T. Jim. SD3: A trust management system with certified evaluation. In IEEE Symposium on Security and Privacy, pages 106–115. IEEE, May 2001.
-  J. Jofre, C. Velayos, G. Landi, M. Giertych, A. C. Hume, G. Francis, and A. V. Oton. Federation of the bonfire multi-cloud infrastructure with networking facilities. Computer Networks, 61:184–196, 2014.
-  K. Keahey, M. Tsugawa, A. Matsunaga, and J. A. Fortes. Sky computing. Internet Computing, IEEE, 13(5):43–51, 2009.
-  B. Lampson, M. Abadi, M. Burrows, and E. Wobber. Authentication in distributed systems: Theory and practice. ACM Transactions on Computer Systems, 10(4):265–310, Nov. 1992.
-  B. Lang, I. Foster, F. Siebenlist, R. Ananthakrishnan, and T. Freeman. A flexible attribute based access control method for grid computing. Journal of Grid Computing, 7:169–180, November 2008. 10.1007/s10723-008-9112-1.
-  N. Li and C. Mitchell. Understanding SPKI/SDSI using first-order logic. International Journal of Information Security, 5(1):48–64, January 2006.
-  N. Li and J. C. Mitchell. Datalog with Constraints: A Foundation for Trust Management Languages. In Proceedings of the 5th International Symposium on Practical Aspects of Declarative Languages, PADL ’03, pages 58–73, 2003.
-  N. Li, J. C. Mitchell, and W. H. Winsborough. Design of a role-based trust-management framework. In Proceedings of the 2002 IEEE Symposium on Security and Privacy, SP ’02, pages 114–, Washington, DC, USA, 2002. IEEE Computer Society.
-  P. Mell and T. Grance. The NIST Definition of Cloud Computing. Special Publication 800-145, Recommendations of the National Institute of Standards and Technology, September 2011.
-  R. L. Morgan, S. Cantor, S. Carmody, W. Hoehn, and K. Klingenstein. Federated Security: The Shibboleth Approach. EDUCAUSE Quarterly, 27:12–17, 2004.
-  Open Grid Forum. Open Cloud Computing Interface (OCCI). http://occi-wg.org/.
-  L. Peterson, A. Bavier, M. E. Fiuczynski, and S. Muir. Experiences Building PlanetLab. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), November 2006.
-  Riak key value store. http://docs.basho.com/riak/, 2016.
-  B. Rochwerger, D. Breitgand, E. Levy, A. Galis, K. Nagin, I. M. Llorente, R. Montero, Y. Wolfsthal, E. Elmroth, J. Caceres, et al. The Reservoir model and architecture for open federated cloud computing. IBM Journal of Research and Development, 53(4):4–1, 2009.
-  B. Rochwerger, D. Breitgand, E. Levy, A. Maraschini, P. Massonet, H. Muñoz, G. Toffetti, A. Epstein, D. Hadas, I. Loy, et al. Reservoir-when one cloud is not enough. Computer, 44(3):44–51, 2011.
-  F. B. Schneider, K. Walsh, and E. G. Sirer. Nexus authorization logic (NAL): Design rationale and applications. ACM Transactions on Information Systems Security, 14(1):8:1–8:28, June 2011.
-  W. Vandenberghe, B. Vermeulen, P. Demeester, A. Willner, S. Papavassiliou, A. Gavras, M. Sioutis, A. Quereilhac, Y. Al-Hazmi, F. Lobillo, et al. Architecture for the heterogeneous federation of future internet experimentation facilities. In Future Network and Mobile Summit (FutureNetworkSummit), 2013, pages 1–11. IEEE, 2013.
-  D. Williams, H. Jamjoom, and H. Weatherspoon. The Xen-Blanket: Virtualize Once, Run Everywhere. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys ’12, pages 113–126, New York, NY, USA, 2012. ACM.