Streamforce: Outsourcing Access Control Enforcement for Stream Data to the Clouds
As tremendous amount of data being generated everyday from human activity and from devices equipped with sensing capabilities, cloud computing emerges as a scalable and cost-effective platform to store and manage the data. While benefits of cloud computing are numerous, security concerns arising when data and computation are outsourced to a third party still hinder the complete movement to the cloud. In this paper, we focus on the problem of data privacy on the cloud, particularly on access controls over stream data. The nature of stream data and the complexity of sharing data make access control a more challenging issue than in traditional archival databases. We present Streamforce — a system allowing data owners to securely outsource their data to the cloud. The owner specifies fine-grained policies which are enforced by the cloud. The latter performs most of the heavy computations, while learning nothing about the data content. To this end, we employ a number of encryption schemes, including deterministic encryption, proxy-based attribute based encryption and sliding-window encryption. In Streamforce, access control policies are modeled as secure continuous queries, which entails minimal changes to existing stream processing engines, and allows for easy expression of a wide-range of policies. In particular, Streamforce comes with a number of secure query operators including Map, Filter, Join and Aggregate. Finally, we implement Streamforce over an open-source stream processing engine (Esper) and evaluate its performance on a cloud platform. The results demonstrate practical performance for many real-world applications, and although the security overhead is visible, Streamforce is highly scalable.
An enormous amount of data is being generated everyday, with sources ranging from traditional enterprise systems to social applications. It becomes increasingly common to process such data as they arrive in continuous streams. Examples range from high-frequency streams such as generated from stock or network monitoring applications, to low-frequency streams originated from weather monitoring, social network or fitness monitoring 111nikeplus.nike.com, fitbit.com applications. The variety and abundance of data, combined with the potential of social interactivity, mash-up services and data sciences, has turned data sharing into a new norm. A critical problem with sharing data is security, which concerns the question of who gets access to which aspects of the data (fine-grained access control), and under which context (data privacy). This paper studies the former question, which we believe to be more challenging for stream data than for archival data because of three reasons. First, traditional archival data systems enforce access control by pre-computing views, which is not possible with stream data because of its infinite size. Second, access control over stream is inherently data-driven (triggered by arrival of specific data values) as opposed to user-driven with archival data, and it often involves temporal constraints (sliding windows). Third, many of the sharing activities take place in collaborative settings which entail a large number of users and even a larger number of policies.
At the same time, cloud computing is driving a paradigm shift in the computing landscape. More businesses and individual users are taking full advantage of the elastic, instantly available and virtually unbounded computing resources provided by various vendors at competitive prices. Many enterprise systems are migrating their infrastructure to the cloud, while the convenience and instant access to computing resource also spawns a plethora of small-to-medium size systems being developed and deployed on the cloud. In the context of stream data sharing, cloud computing emerges as an ideal platform for two reasons. First, data can be hosted and managed by a small number of cloud providers with unlimited resources, which is important since data streams are of infinite sizes. Second, data co-location makes it easy to share and to perform analytics. However, since data is outsourced to untrusted third parties, enforcing access control on the cloud becomes even more imperative and more challenging.
In this paper, we present Streamforce — a fine-grained access control system for stream data over untrusted clouds. Streamforce is designed with three goals. First, it supports specification and enforcement of fine-grained access control policies. Second, data is outsourced to the cloud where access control policies are enforced, with the latter learning nothing about the data content. Third, the system is efficient, in the sense that the cloud handles most of the expensive computations. The last two goals require the cloud to be more active than being merely a storage facility. To realize these goals, Streamforce uses a number of encryption schemes: deterministic encryption, proxy-based attribute based encryption, and a sliding-window based encryption. While encryption is necessary to protect data confidentiality against the cloud and against unauthorized access, we believe that directly exposing encryption details to the system entities (data owner, user and cloud) is not the ideal abstraction when it comes to access control. Instead, Streamforce models access control policies using secure query operators: secure Map, Filter, Join and Aggregate. These operators are at higher level and more human-friendly than raw encryption keys. Enforcement at the cloud is the same as executing the secure queries. Since existing stream processing engines are very efficient at executing continuous queries made from similar query operators, they can be leveraged by the cloud without major changes.
Streamforce occupies an unique position in the design space of outsourced access control. It considers untrusted (semi-honest) clouds, which is different to [3, 6]. Systems such as Plutus  and CryptDb  assume untrusted clouds, but they support only coarse-grained policies over archival data. Recent systems utilizing attribute-based encryption [9, 18] achieve more fine-grained access control on untrusted clouds, but they do not support stream data. Furthermore, the cloud is not fully utilized as it is used mainly for storage and distribution. To the best of our knowledge, Streamforce is the first system that allows secure, efficient outsourcing of fine-grained access control for stream data to untrusted clouds. It is not catered for applications demanding high throughput, but it presents important first steps towards supporting them. Our contributions are summarized as follows:
We present a system and formal security model for outsourcing access control of stream data to untrusted clouds. We discuss different security levels that different query operators can achieve.
We present details and analyze security properties of different encryption schemes used for fine-grained access control, including a new scheme supporting sliding window aggregation.
We show how to use these encryption schemes to construct secure query operators: secure Map, secure Filter, secure Join and secure Aggregate.
We implement a prototype of Streamforce [streamforce] over Esper — a high-performance stream processing engine. We then benchmark it on Amazon EC2. The results indicate practical performance for many applications. Although the cost of security is evident, we show that it can be compensated by the system’s high scalability.
Next we present the system and security model, followed by the constructions of the encryption schemes. We then describe how to construct secure query operators. Prototype implementation and evaluation is presented in Section 5. Related work follows in Section 6, before we draw conclusion and discuss future work.
2 System and Security Model
2.1 System Model
There are three types of entities: data owners (or owners), data users (or users) and a cloud. Their interactions are illustrated in Fig. 1: the owners encrypt their data and relay them to the cloud, which performs transformation and forwards the results to the users for final decryption. We do not consider how the owner determines access control policies, and we assume that the negotiation process (in which the owner grants policies to the user) happens out-of-band. The system goals are three-folds:
The owner is able to express flexible, fine-grained access control policies.
The system ensures data confidentiality against untrusted cloud, and access control against unauthorized users (as elaborated later).
Access control enforcement is done by the cloud. Decryptions at the user are light-weight operations compared to the transformations at the cloud.
2.1.2 Data Model.
A data stream has the following schema:
where is the timestamp, and all data attributes are of integer domains. A data tuple at time ts is written as . Queries over data streams are continuous, i.e. they are triggered when new data arrives. Each query is composed from one or more query operators, which take one or more streams as inputs and output another stream. We adopt the popular Aurora query model , and focus on four operators: Map, Join and Aggregate.
Map: outputs only the specified attributes.
Filter: outputs tuples satisfying a given predicate.
Join: takes as inputs two streams , two integers and a join attribute. Incoming data are added to the queues of size and , from which they are joined together.
Aggregate: outputs the averages over a sliding window. A sliding window is defined over the timestamp attribute, with a window size ws and an advance step step.
2.1.3 Access Control via Queries.
In Streamforce, access control is defined via views. As in traditional archival database systems, views are created by querying the database. In our settings, the access control process involves two steps. First, the owner specifies a policy by mapping it into a continuous query. Second, the query is registered to be executed by the cloud, whose outputs are then forwarded to authorized users.
The example depicted in Fig. 2 includes two streams: (TS, RTime, Name, HR, BP) and (TS, RTime, Name, Cals, Act, Loc). contains owner’s vital signs as produced by health monitoring devices, where RTime, HR, BP are the real time, heart rate and blood pressure respectively. contains fitness information, where Cals, Act Loc are the number of calories burned, the activity and the owner’s location respectively. Data users could be friends from social network, research institutes or insurance companies. For a friend, the owner may want to share vitals data when they exceed a certain threshold (), or average fitness information every hour (). A research institute may be given a joined view from both streams in order to monitor the individual’s vitals during exercises ().
2.2 Security Model
2.2.1 Adversary Model.
The cloud is not trusted, in the sense that it tries to learn content of the outsourced data, but it follows the protocol correctly. This passive (or semi-honest) adversary model reflects the cloud’s incentives to gain benefits from user data while being bound by the service level agreements. We do not consider malicious cloud, which may try to break data integrity, launch denial of service attacks, or compute using stale data. Security against such attacks is crucial for many applications, but it is out of the scope of this paper. Data users are considered dishonest, in the sense that they may proactively try to access unauthorized data. To this end, they may collude with each other and also with the cloud.
2.2.2 Encryptions Model.
To meet both fine-grained access control and data confidentiality requirements, we use three different encryption schemes. Proxy attribute based encryption is used for Map and Filter operators. Second, Join operator is possible via deterministic encryption. Aggregate is supported by sliding-window encryption. This section provides formal definition of these schemes and their security properties. Detailed constructions and proofs of security are presented in Section 3.
2.2.3 Deterministic encryption scheme.
is a private-key encryption scheme, where:
generates secret key SK using security parameter .
encrypts message with SK.
decrypts the ciphertext.
For any message , . Security of is defined via the security game consisting of three phases: Setup, Challenge, Guess
Setup: the challenger runs .
Challenge: the adversary sends to the challenger two messages: and , such that and are all distinct. The challenger chooses , runs and returns the ciphertext to the adversary.
Guess: the adversary outputs a guess .
The adversary is said to have an advantage .
is said to be secure with respect to deterministic chosen plaintext attacks, or Det-CPA secure, if the adversary advantage is negligible.
2.2.4 Proxy Attribute-Based Encryption scheme.
Attribute-Based Encryption (ABE) is a public-key scheme that allows for fine-grained access control: ciphertexts can only be decrypted if the security credentials satisfy a certain predicate. There are two types of ABE : Key-Policy (KP-ABE) and Ciphertext-Policy (CP-ABE). We opt for the former, in which the predicate is embedded in user keys and the ciphertext contains a set of encryption attributes. KP-ABE and CP-ABE can be used interchangeably, but the former is more data-centric (who gets access to the given data), while the latter is more user-centric (which data the given user has access to).
ABE’s encryption and decryption are expensive operations. Proxy Attribute Based Encryption  (or proxy ABE) is design to aid the decryption process by letting a third party transform the original ABE ciphertexts into a simpler form. It consists of five algorithms :
: generates public parameters PK and master key MK.
: creates a transformation key TK and a decryption key SK for the predicate .
: encrypts with the set of encryption attributes .
: partially decrypts the ciphertext using TK.
: decrypts the transformed ciphertext using the decryption key.
For any message , attribute set , policy , , , the following holds:
Security of is defined in  via a selective-set security game, consisting of five phase: Setup, Query-1, Challenge, Query-2, Guess:
Setup: the challenger executes to generate public parameters. It gives PK and an attribute set to the adversary.
Query-1: the adversary performs either private key query or decryption query. In the former, it asks the challenger for the keys of an access structure . The challenger calls to generates . If , it sends both to the adversary. If , it sends TK to the adversary. For decryption query, the adversary asks the challenger to decrypt a ciphertext (which has been transformed using a key TK. The challenger retrieves the corresponding SK, and calls and sends the result back to the adversary.
Challenge: the adversary sends two message of equal length to the challenger. The challenger chooses , computes and returns CT to the adversary.
Query-2: the adversary continues the queries like in Query-1, except that it cannot ask the challenger to decrypt CT.
Guess: the adversary outputs a guess .
The scheme is said to be secure with respect to replayable chosen ciphertext attacks, or R-CCA secure, in the selective-set model if the adversary advantage in the selective-set security game is negligible.
Modify the security game so that the adversary does not issue decryption queries. We say that is secure in the selective-set model with respect to chosen plaintext attacks (or CPA secure) if the adversary advantage is negligible.
2.2.5 Sliding-window encryption scheme (SWE).
is a private-key encryption scheme which allows an user to decrypt only the aggregate of a window of ciphertexts, and not the individual ciphertexts. Let and be the sum and product of the window sliding windows (size ws and advance step ) over a sequence .
: generates public parameters and the private keys.
: encrypts M using a set of window sizes , whose result is .
decrypts CT for the window size ws using the private key . The result is the aggregates of the sliding window, i.e. for all .
Security of is defined via a selective-window security game consisting of four phases: Setup, Corrupt, Challenge, Guess.
Setup: the challenger calls to setup public parameters. It chooses a value ws and sends it to the adversary.
Corrupt: the adversary asks the challenger for the private key of a window size , provided that .
Challenge: the adversary picks and , such that for all and sends them to the challenger. The adversary also sends a set of window sizes . The challenger chooses , invokes and forwards the result to the adversary.
Guess: the adversary outputs a guess .
is said to be secure with respect to restricted chosen encrypted window attacks (or Res-CEW secure) in the selective-window model if the adversary’s advantage is negligible. It is secure with respect to chosen window attacks (or CW secure) when the Corrupt phase is removed from the game.
The encryption schemes above have a different definition of security which makes different assumptions about the adversary’s capabilities. R-CCA is the strongest definition, as it assumes active adversary that has access to the decryption oracles. R-CCA ensures both data integrity and confidentiality. CPA security assumes a passive (eavesdropping) adversary who only tries to break the secrecy property of the ciphertext. CPA security ensures confidentiality, while allowing meaningful changes to be made on the ciphertext (which is necessary for transformation to work). Det-CPA is a weaker security level, as it protects data confidentiality only for unique messages.
Security of the sliding-window scheme is related to that of secure multi-party computation, which ensures that no other information is leaked during the computation of a function except from the final output. Our model is similar, but stronger than the aggregator oblivious model proposed in , since the security game allows for more types of adversarial attacks. More specifically,  requires the two message sequences and to have the same aggregate, but our model requires only the windows (sub-sequences) of and to have the same aggregate. Both Res-CEW and CW security allow for meaningful computations (aggregate) over ciphertexts. Res-CEW is secure against a weak form of collusion (between users with access to window sizes which are multiples of each others), whereas CW is not.
2.3.1 Access control via Encryption.
Encryption plays two roles in our system: protecting data confidentiality against untrusted cloud, and providing access control against unauthorized users. Neither of cloud nor the unauthorized user have access to decryption keys, hence they cannot learn the plaintexts. In addition, Res-CEW and CW security ensure that given access to a window size ws, the user cannot learn information of other window sizes (except from what can be derived from its own window). Res-CEW guarantees access control under weak collusion among dishonest users.
For access control to be enforced by the cloud, some information must be revealed to the latter. There exists a trade-off between security and functionality of the query operators that make up the policies. For Map and Filter policies, the cloud must be able to check if certain attributes are included in the ciphertexts, which is allowed by CPA security. For Join, the cloud needs to be able to compare if two ciphertexts are encryptions of the same message, which requires the encryption to be deterministic (or Det-CPA secure). For Aggregate, a homomorphic encryption is required, which in our case means the highest security level is Res-CEW.
3 Encryption Scheme Constructions
3.1 Deterministic Encryption
Let be a multiplicative group of prime order and generator . Let be a pseudorandom permutation with outputs in . The scheme is constructed as follows.
: where .
Assume that is a pseudorandom permutation, is Det-CPA secure
Proof sketch. Given any which are distinct, are independent and uniformly distributed. As a consequence, are also independent and indistinguishable from random. It follows that CT is independent from the choice of or , therefore , or the adversary advantage is .
3.2 Proxy ABE Construction
Since our adversary model assume passive attackers, we present here the CPA secure construction as proposed in  (a R-CCA secure construction can be found in the original paper). The scheme makes use of bilinear map where are multiplicative, cyclic groups of prime order . is efficient to compute, and for and . Its security relies on the bilinear decisional Diffie Hellman assumption: let be the generator of , for all , it is difficult to distinguish from .
: generates groups and the bilinear map as the public key. Let be the attribute universe, and . We have:
Let , the master key MK is: .
: translates into an access tree, in which the leaf nodes represent attributes , and the internal nodes represent threshold gates. An AND node corresponds to a -out-of- gate, an OR nodes to a -out-of- gate. For each -out-of- node we define a -degree polynomial . Starting from the root node , defines the polynomial with . Recursively, for a child node , define such that . When is a leaf node, let define:
where returns the attribute in represented by the leaf node. The transformation key TK and decryption key SK are defined as follows:
: assume (or has been mapped from a string to a group element). Let , the ciphertext is:
: given the access tree used to generate TK, when is a leaf node, compute:
When is a non-leaf node, let be the result from recursive call to and is a child node of . Let be the set of ’s children such that for . Let be the Lagrange coefficient for . We compute:
Thus, calling for the root node results in
: the message can be recovered as:
Theorem 3.2 ()
is CPA-secure in the selective-set model.
3.3 Sliding-Window Encryption
Let be the set of all possible window sizes, be a multiplicative group of prime order and generator . Assuming the message space is a small integer domain, we propose three different constructions for SWE.
3.3.1 Construction 1 :
masks the plaintext with random values whose sum over the sliding window is the user decryption key.
: for all , .
: for each , let such that and . The ciphertext is where .
: extracts from CT and compute:
The scheme is Res-CEW secure.
Proof sketch. Given input , window size ws and key , define two distributions and as:
where and such that for all :
It can be seen that and are indistinguishable (in the information theoretic sense), because is chosen at random and independently of .
Consider the single-window case, i.e. . In the security game, is the
distribution of ciphertext for the input . For input , this distribution is
indistinguishable from where . For input , the ciphertext distribution is indistinguishable
from where .
, and are the same distribution. Therefore, the adversary can only distinguish the two ciphertext distributions with probability .
Consider the case with multiple windows where for all . and are independent, because the random values are chosen independently. They are indistinguishable from and , which are also independent. Consequently, the combined distribution and are indistinguishable. Similar to the single-window case above, using the fact that , the adversary can only distinguish the two ciphertext distributions with probability .
3.3.2 Construction 2 :
uses an auxiliary encryption scheme to encrypt the window aggregates directly.
: let be a CPA-secure asymmetric encryption scheme. For all , invokes to generate a key pair .
: extracts from CT, then computes
Assuming that is CPA-secure, the construction is Res-CEW secure.
Proof sketch. The proof is similar to that of Theorem 3.5. Because is CPA-secure, its ciphertext distribution is independent of the input and is indistinguishable from random. Hence, given for all and , both ciphertext distributions and are indistinguishable from the following distribution:
3.3.3 Construction 3 :
masks the plaintexts with random values whose sums over the sliding window are encrypted using another encryption scheme.
: the same as in .
: let where , let . For all , let . Finally, .
: extracts from CT, then computes
Assuming that is CPA-secure, the scheme is Res-CEW secure.
Proof sketch. Given and , consider the ciphertext distribution:
Because is chosen independently from , is indistinguishable from where .
Let , given for all and , becomes:
Since is CPA-secure, it follows that is independent from its input and indistinguishable from random. That is, is indistinguishable from :
Given the challenge and , the ciphertext distribution is and respectively. Since, for all and such that , is the same as . Therefore, the adversary can only distinguish the two distributions with probability .
4 Secure Query Operators
The encryption schemes discussed in previous sections provide the underlying security assurance for Streamforce. Using encryption directly, access control can be implemented by distributing decryption keys to the authorized users. Streamforce exposes a higher-level abstraction: system entities deal only with secure query operators which hide the complex and mundane cryptographic details. This section focuses on the implementation of the secure operators using the encryption schemes from previous sections. There are three design components pertaining each operator: (1) how to map the corresponding policy to user decryption key, (2) how to encrypt the data at the owner, (3) how the transformation at the cloud is done. Many fine-grained policies can be constructed by using one of these operators directly. We also describe the design for combining these operators to support more complex policies.
This operator returns data tuples containing only attributes in a given set . We use to implement this operator. First, is invoked to setup the public parameters and master key MK. The user decryption key is created by , where:
The owner encrypts using:
When the ciphertext CT arrives at the cloud, it is transformed using before being forwarded to the user.
This implementation achieves the same level of security as , i.e. CPA security in the selective set model. The storage cost is bits per data tuple, where is the size of .
Let FA be the set of filter attributes. A filter predicate is defined by a tuple in which . The predicate returns true when returns true. Let be the bag-of-bit representation of in base , as explained in . In particular, assuming , we have:
Let be a set of primes, define
as the set of encryption attributes representing the value .
When op is , we consider three cases.
If : where .
If there exists and for some . Let be the representation of in base . Let
for and some values of . Then, we have .
The user decryption key is generated by , where
The owner encrypts data using:
When the ciphertext CT arrives at the cloud, the latter transforms it using and forwards the result to the user.
Similar to Map, this operator uses proxy ABE scheme directly, therefore it has CPA security in the selective set model. The storage cost per ciphertext is bits, which grows with the size of . The bigger the size of , the more policies of the type can be supported, but at the expense of more storage overhead. Notice that values of filtering attributes are exposed to the cloud in the form of encryption attributes, thus the data owner should only use non-sensitive attributes, such as TS, for the set FA.
Let be the join attributes of two streams . We assume that the join operator returns all data attributes (more complex cases are discussed in Section 4.5). We use a combination of proxy ABE scheme and deterministic scheme . Initially, the two owners of invoke in a way that satisfies two conditions: (1) both end up with the same group and pseudorandom function ; (2) and are the two secret keys such that .
The user decryption for stream is , where
The owner encrypts using:
The user who received both and computes where and sends it to the cloud. When two ciphertexts and arrive at the cloud, it checks if . If true, the ciphertexts can be joined. The cloud then performs , and forwards the results to the user.
Because is Det-CPA secure, the cloud can learn if the encryption of is the same as in both streams. But this check is only possible if the user requests it (by sending and to the cloud). Other attributes in are protected with CPA security by . The storage requirement is bits per data tuple, because produces a group element and encrypts the entire data tuple with only one encryption attribute.
4.4 Aggregate (Sliding Window)
In Streamforce, sliding windows are based on timestamp attribute TS, with advance steps being the same as the window sizes. Let be the aggregate attribute, over which the sums are computed. In the following, we present three implementations for this operator, and discuss their trade-offs at the end.
The owner first encrypts data using , the ciphertext is then encrypted with . The user decryption key is , where SK is the secret key generated by , and
To encrypt , the owner first executes as shown in Fig. 3[a], then computes:
For every window size ws, the cloud maintains a buffer of size ws. The incoming ciphertext CT is transformed using , and the result is added to the buffer. Once the buffer is filled, the cloud computes the product of its elements, sends the result to the user and clears the buffer.
This implementation uses with as the auxiliary encryption scheme. The owner itself computes the window aggregates and encrypts the result using . User decryption key is , where:
To encrypt , the owner first executes