More is Less: Perfectly Secure Oblivious Algorithms in the Multi-Server Setting

More is Less: Perfectly Secure Oblivious Algorithms in the Multi-Server Setting

T-H. Hubert Chan 1The University of Hong Kong,
1hubert@cs.hku.hk
   Jonathan Katz 2University of Maryland, College Park,
2jkatz@cs.umd.edu
   Kartik Nayak 2University of Maryland, College Park,
2jkatz@cs.umd.edu 3VMware Research,
3nkartik@vmware.com
   Antigoni Polychroniadou 4Cornell Tech,
4antigoni@cornell.edu
   Elaine Shi 5Cornell University, 5runting@gmail.com
Abstract

The problem of Oblivious RAM (ORAM) has traditionally been studied in a single-server setting, but more recently the multi-server setting has also been considered. Yet it is still unclear whether the multi-server setting has any inherent advantages, e.g., whether the multi-server setting can be used to achieve stronger security goals or provably better efficiency than is possible in the single-server case.

In this work, we construct a perfectly secure 3-server ORAM scheme that outperforms the best known single-server scheme by a logarithmic factor. In the process we also show, for the first time, that there exist specific algorithms for which multiple servers can overcome known lower bounds in the single-server setting.

Keywords:
O

blivious RAM, perfect security

1 Introduction

Oblivious RAM (ORAM) protocols [11] allow a client to outsource storage of its data such that the client can continue to read/write its data while hiding both the data itself as well as the client’s access pattern. ORAM was historically considering in a single-server setting, but has recently been considered in a multi-server setting [20, 24, 1, 16, 15, 18] where the client can store its data on multiple, non-colluding servers. Current constructions of multi-server ORAM are more efficient than known protocols in the single-server setting; in particular, the best known protocols in the latter setting (when server-side computation is not allowed) require bandwidth  [14, 17, 3, 6] for storing an array of length , whereas multi-server ORAM schemes achieving logarithmic bandwidth111Although Lu and Ostrovsky [20] describe their multi-server scheme using server-side computation, it is not difficult to see that it can be replaced with client-side computation instead. are known [20].

Nevertheless, there remain several unanswered questions about the multi-server setting. First, all work thus far in the multi-server setting achieves either computational or statistical security, but not perfect security where correctness is required to hold with probability 1 and security must hold even against computationally unbounded attackers. Second, although (as noted above) we have examples of multi-server schemes that beat existing single-server constructions, it is unclear whether this reflects a limitation of our current knowledge or whether there are inherent advantages to the multi-server setting.

We address the above questions in this work. (Unless otherwise noted, our results hold for arbitrary block size  as long as it is large enough to store an address, i.e., .) First, we construct perfectly secure, multi-server ORAM scheme that improves upon the overhead of the best known construction in the single-server setting. Specifically, we show:

Theorem 1.1

There exists a 3-server ORAM scheme that is perfectly secure for any single semi-honest corruption, and achieves bandwidth per logical memory access on an array of length . Further, our scheme does not rely on server-side computation.

As a point of comparison, the best known single-server, perfectly secure ORAM schemes require bandwidth [8, 5]. While Theorem 1.1 holds for any block size , we show that for block sizes our scheme achieves bandwidth as small as .

As part of our construction, we introduce new building blocks that are of independent theoretical interest. Specifically, we show:

Theorem 1.2

There exists a 3-server protocol for stable compaction that is perfectly secure for any single semi-honest corruption, and achieves bandwidth to compact an array of length  (that is secret shared among the servers). The same result holds for merging two sorted arrays of length .

In the single-server setting, Lin, Shi, and Xie [19] recently proved a lower bound showing that any oblivious algorithm for stable compaction or merging in the balls-and-bins model must incur at least bandwidth. The balls-and-bins model characterizes a wide class of natural algorithms where each element is treated as an atomic “ball” with a numeric label; the algorithm may perform arbitrary boolean computation on the labels, but is only allowed to move the balls around and not compute on their values. Our scheme works in the balls-and-bins model, and thus shows for the first time that the multi-server setting can enable overcoming known lower bounds in the single-server setting for oblivious algorithms. Furthermore, for stable compaction and merging no previous multi-server scheme was known that is asymptotically faster than existing single-server algorithms, even in the weaker setting of computational security. We note finally that our protocols are asymptotically optimal since clearly any correct algorithm has to read the entire array.

1.1 Technical Roadmap

Oblivious sorting is an essential building block in hierarchical ORAM schemes. At a high level, the key idea is to replace oblivious sorting, which costs time on an array of length , with cheaper, linear-time operations. Indeed, this was also the idea of Lu and Ostrovsky [20], but they apply it to a computationally secure hierarchical ORAM. Earlier single-server ORAM schemes are built from logarithmically many cuckoo hash tables of doubling size. Every time a memory request has been served, one needs to merge multiple stale cuckoo hash tables into a newly constructed cuckoo hash table — this was previously accomplished by oblivious sorting [17, 14, 3]. Lu and Ostrovsky show how to avoid cuckoo hashing, by having one permutation server permute the data in linear time, and by having a separate storage server, that is unaware of the permutation, construct a cuckoo hash table from the permuted array in linear time (with the client’s help). Unfortunately, Lu and Ostrovsky’s technique fails for the perfect security context due to its intimate reliance on pseudorandom functions (PRFs) and cuckoo hashing — the former introduces computational assumptions and the latter leads to statistical failures (albeit with negligible probability).

We are, however, inspired by Lu and Ostrovsky’s permutation-storage-separation paradigm (and a similar approach that was described independently by Stefanov and Shi [24]). The key concept here is to have one permutation-server that permutes the data; and have operations and accesses be performed by a separate storage server that is unaware of the permutation applied. One natural question is whether we can apply this technique to directly construct a linear-time multi-server oblivious sorting algorithm — unfortunately we are not aware of any way to achieve this. Chan et al. [4] and Tople et al. [26] show that assuming the data is already randomly permuted (and the permutation hidden), one can simply apply any comparison-based sorting algorithm and it would retain obliviousness. Unfortunately, it is well-known that comparison-based sorting must incur time, and this observation does not extend to non-comparison-based sorting techniques since in general RAM computations on numeric keys can leak information through access patterns.

New techniques at a glance.

We propose two novel techniques that allow us to achieve the stated results, both of which rely on the permutation-storage-separation paradigm:

  • First, we observe that with multiple servers, we can adapt the single-server perfect ORAM scheme by Chan et al. [5] into a new variant such that reshuffling operations which was realized with oblivious sorting in Chan et al. [5] can now be expressed entirely with merging and stable compaction operations without oblivious sorting.

  • Despite the known lower bounds in the single-server setting [19], we show that with multiple servers, we can indeed achieve linear-time oblivious merging and oblivious stable compaction. As many have observed earlier [6, 13, 4, 3] merging and compaction are also important building blocks in the design of many oblivious algorithms — we thus believe that our new building blocks are of independent interest.

1.1.1 Stable Compaction and Merging.

We first explain the intuition behind our stable compaction algorithm. For simplicity, for the time being we will consider only 2 servers and assume perfectly secure encryption for free (this assumption can later be removed with secret sharing and by introducing one additional server). Imagine that we start out with an array of length that is encrypted and resides on one server. The elements in the array are either real or dummy, and we would like to move all dummy elements to the end of the array while preserving the order of the real elements as they appear in the original array. For security, we would like that any single server’s view in the protocol leaks no information about the array’s contents.

Strawman scheme.

An extremely simple strawman scheme is the following: the client makes a scan of the input array on one server; whenever it encounters a real element, it re-encrypts it and writes it to the other server by appending it to the end of the output array (initially the output array is empty). When the entire input array has been consumed, the client pads the output array with an appropriate number of (encrypted) dummy elements.

At first sight, this algorithms seems to preserve security: each server basically observes a linear scan of either the input or the output array; and the perfectly secure encryption hides array contents. However, upon careful examination, the second server can observe the time steps in which a write happened to the output array — and this leaks which elements are real in the original array. Correspondingly, in our formal modeling later (Section 2), each server cannot only observe each message sent and received by itself, but also the time steps in which these events occurred.

A second try.

For simplicity we will describe our approach with server computation and server-to-server communication — but it is not hard to modify the scheme such that servers are completely passive. Roughly speaking, the idea is for the first server (called the permutation server) to randomly permute all elements and store a permuted array on the second server (called the storage server), such that the permutation is hidden from the storage server. Moreover, in this permuted array, we would like the elements to be tagged with pointers to form two linked lists: a real linked list and a dummy linked list. In both linked lists, the ordering of elements respects that of the original array. If such a permuted array encoding two linked lists can be constructed, the client can simply traverse the real linked list first from the storage server, and then traverse the dummy linked list — writing down each element it encounters on the first server (we always assume re-encryption upon writes). Since the storage server does not know the random permutation and since every element is accessed exactly once, it observes completely random access patterns; and thus it cannot gain any secret information.

The challenge remains as to how to tag each element with the position of the next element in the permuted array. This can be achieved in the following manner: the permutation server first creates a random permutation in linear time (e.g., by employing Fisher-Yates [10]), such that each element in the input array is now tagged with where it wants to be in the permuted array (henceforth called the position label). Now, the client makes a reverse scan of this input array. During this process, it remembers the position labels of the last real element seen and of the last dummy element seen so far — this takes client-side storage. Whenever a real element is encountered, the client tags it with the position label of the last real seen. Similarly, whenever a dummy is encountered, the client tags it with the position label of the last dummy seen. Now, the permutation server can permute the array based on the predetermined permutation (which can also be done in linear time). At this moment, it sends the permuted, re-encrypted array to the storage server and the linked list can now be traversed from the storage server to read real elements followed by dummy elements.

It is not difficult to see that assuming that the encryption scheme is perfectly secure and every write involves re-encrypting the data, then the above scheme achieves perfect security against any single semi-honest corrupt server, and completes in linear time. Later we will replace the perfectly secure encryption with secret sharing and this requires the introduction of one additional server.

Extending the idea for merging.

We can extend the above idea to allow linear-time oblivious merging of two sorted arrays. The idea is to prepare both arrays such that they are in permuted form on the storage server and in a linked list format; and now the client can traverse the two linked lists on the storage server, merging them in the process. In each step of the merging, only one array is being consumed — since the storage server does not know the permutation, it sees random accesses and cannot tell which array is being consumed.

1.1.2 3-Server Perfectly Secure ORAM.

We now explain the techniques for constructing a 3-server perfectly secure ORAM. A client, with blocks of local cache, stores blocks of data (secret-shared) on the 3 servers, one of which might be semi-honest corrupt. In every iteration, the client receives a memory request of the form or , and it completes this request by interacting with the servers. We would like to achieve amortized bandwidth blowup per logical memory request.

We start out from a state-of-the-art single-server perfectly-secure scheme by Chan et al. [5] that achieves amortized bandwidth blowup per memory request. Their scheme follows the hierarchical ORAM paradigm [12, 11] and meanwhile relies on a standard recursion technique most commonly adopted by tree-based ORAMs [23]. In their construction, there are logarithmically many hierarchical ORAMs (also called position-based ORAMs), where the ORAM at depth (called the parent depth) stores position labels for the ORAM at depth (called the child depth); and finally, the ORAM at the maximum depth stores the real data blocks.

As it turns out, the most intricate part of Chan et al. [5]’s scheme is the information passing between an ORAM at depth and its parent ORAM at depth . As Chan et al. describe it, all the logarithmically many ORAMs must perform coordinated reshuffles upon every memory request: during a reshuffle, the ORAM at depth must pass information back to the parent depth . More specifically, the depth- ORAM is aware of the updated position labels for blocks that have been recently visited, and this must be passed back to the parent depth to be recorded there.

More abstractly and somewhat imprecisely, here is a critical building block in Chan et al. [5]: suppose that the parent and the child each has an array of logical addresses and a position label for each address. It is guaranteed by the ORAM construction that all addresses the child has must appear in the parent’s array. Moreover, if some address appears in both the parent and child, then the child’s version is fresher. Now, we would like to combine the information held by the parent and the child, retaining the freshest copy of position label for every address. Chan et al. then relied on oblivious sorting to achieve this goal: if some address is held by both the parent and child, they will appear adjacent to each other in the sorted array; and thus in a single linear scan one can easily cross out all stale copies.

To save a logarithmic factor, we must solve the above problem using only merging and compaction and not sorting. Notice that if both the parent’s and the child’s arrays are already sorted according to the addresses, then the afore-mentioned information propagation from child to parent can be accomplished through merging rather than sorting (in the full scheme we would also need stable compaction to remove dummy blocks in a timely fashion to avoid blowup of array sizes over time). But how can we make sure that these arrays are sorted in the first place without oblivious sorting? In particular, these arrays actually correspond to levels in a hierarchical ORAM in Chan et al. [5]’s scheme, and all blocks in a level must appear in randomly permuted order to allow safe (one-time) accesses — this seems to contradict our desire for sortedness. Fortunately, here we can rely again on the permutation-storage-separation paradigm — for simplicity again we describe our approach for 2 servers assuming perfectly secure (re-)encryption upon every write. The idea is the following: although the storage server is holding each array (i.e., level) in a randomly permuted order, the permutation server will remember an inverse permutation such that when this permutation is applied to the storage server’s copy, sortedness is restored. Thus whenever shuffling is needed, the permutation server would first apply the inverse permutation to the storage server’s copy to restore sortedness, and then we could rely on merging (and compaction) to propagate information between adjacent depths rather than sorting.

1.2 Related Work

The notion of Oblivious RAM (ORAM) was introduced by the seminal work of Goldreich and Ostrovsky around three decades ago [12, 11]. Their construction used a hierarchy of buffers of exponentially increasing size, which was later known as the hierarchical ORAM framework. Their construction achieved an amortized bandwidth blowup of and was secure against a computationally bounded adversary. Subsequently, several works have improved the bandwidth blowup from to  [17, 14, 3, 6] under the same adversarial model. Ajtai [2] was the first to consider the notion of a statistically secure oblivious RAM that achieves bandwidth blowup. This was followed by the statistically secure ORAM construction by Shi et al. [23], who introduced the tree-based paradigm. ORAM constructions in the tree-based paradigm have improved the bandwidth blowup from to  [23, 25, 7, 22, 27]. Though the computational assumptions have been removed, the statistically secure ORAMs still fail with a failure probability that is negligibly small in the number of data blocks stored in the ORAM.

Perfectly-secure ORAMs.

Perfectly-secure ORAM was first studied by Damgård et al. [8]. Perfect security requires that a computationally unbounded server does not learn anything other than the number of requests with probability 1. This implies that the oblivious program’s memory access patterns should be identically distributed regardless of the inputs to the program; and thus with probability 1, no information can be leaked about the secret inputs to the program. Damgård et al. [8] achieve an expected simulation overhead and space blowup relative to the original RAM program. Raskin et al. [21] and Demertzis et al. [9] achieve a worst-case bandwidth blowup of and , respectively. Chan et al. [5] improve upon Damgård et al.’s result [8] by avoiding the blowup in space, and by showing a construction that is conceptually simpler. Our construction builds upon Chan et al. and improves the bandwidth blowup to worst-case while assuming three non-colluding servers.

Multi-server ORAMs.

ORAMs in this category assume multiple non-colluding servers to improve bandwidth blowup [20, 1, 16, 15, 18]. A comparison of the relevant schemes is presented in Table 1. Among these, the work that is closely related to ours is by Lu and Ostrovsky [20] which achieves a bandwidth blowup of assuming two non-colluding servers. In their scheme, each server performs permutations for data that is stored by the other server. While their construction is computationally secure, we achieve perfect security for access patterns as well as the data itself. Moreover, our techniques can be used to perform an oblivious tight stable compaction and an oblivious merge operation in linear time; how to perform these operations in linear time were not known even for the computationally secure setting. On the other hand, our scheme achieves an bandwidth blowup and uses three servers. We remark that if we assume a perfectly secure encryption scheme, our construction can achieve perfectly secure access patterns using two servers. Abraham et al. [1], Gordon et al. [15] and Kushilevitz and Mour [18] construct multi-server ORAMs using PIR. Each of these constructions require the server to perform computation for using PIR operations. While Abraham et al.[1] achieve statistical security for access patterns, other work [15, 18] is only computationally secure. While the work of Gordon et al. achieves a bandwidth blowup of , they require linear-time server computation. Abraham et al. and Kushilevitz and Mour, on the other hand, are poly-logarithmic and logarithmic respectively, both in computation and bandwidth blowup. In comparison, our construction achieves perfect security and requires a passive server (i.e., a server that does not perform any computation) at a bandwidth blowup of .

Construction Bandwidth Server Security
Blowup Computation
Lu-Ostrovsky [20] - Computational
Gordon et al. [15] Computational
Kushilevitz et al. [18] Computational
Abraham et al. [1] Statistical
Our work - Perfect
Table 1: Comparison with existing multi-server Oblivious RAM schemes for block size . All of the other schemes (including the statistically-secure schemes [1]) require two servers but assume the existence of an unconditionally secure encryption scheme. With a similar assumption, our work would indeed need only two servers too.

2 Definitions

In this section, we revisit how to define multi-server ORAM schemes for the case of semi-honest corruptions. Our definitions require that the adversary, controlling a subset of semi-honest corrupt servers, learns no secret information during the execution of the ORAM protocol. Specifically our adversary can observe all messages transmitted to and from corrupt servers, the rounds in which they were transmitted, as well as communication patterns between honest parties (including the client and honest servers). Our definition generalizes existing works [1] where they assume free encryption of data contents (even when statistical security is desired).

2.1 Execution Model

Protocol as a system of Interactive RAMs.

We consider a protocol between multiple parties including a client, henceforth denoted by , and servers, denoted by , respectively. The client and all servers are Random Access Machines (RAMs) that interact with each other. Specifically, the client or each server has a CPU capable of computation and a memory that supports reads and writes; the CPU interacts with the memory to perform computation. The atomic unit of operation for memory is called a block. We assume that all RAMs can be probabilistic, i.e., they can read a random tape supplying a stream of random bits.

Communication and timing.

We assume pairwise channels between all parties. There are two notions of time in our execution model, CPU cycles and communication rounds. Without loss of generality, henceforth we assume that it takes the same amount of time compute each CPU instruction and to transmit each memory block over the network to another party (since we can always take the maximum of the two). Henceforth in this paper we often use the word round to denote the time that has elapsed since the beginning of the protocol.

Although we define RAMs on the servers as being capable of performing any arbitrary computation, all of our protocols require the servers to be passive, i.e., the server RAMs only perform read/write operations from the memory stored by it.

2.2 Perfect Security under a Semi-Honest Adversary

We consider the client to be trusted. The adversary can corrupt a subset of the servers (but it cannot corrupt the client) — although our constructions later secure against any individual corrupt server, we present definitions for the more general case, i.e., when the adversary can control more than one corrupt server.

We consider a semi-honest adversary, i.e., the corrupt servers still honestly follow the protocol; however, we would like to ensure that no undesired information will leak. To formally define security, we need to first define what the adversary can observe in a protocol’s execution.

View of adversary .

Suppose that the adversary controls a subset of the servers — we abuse notation and use to denote the set of corrupt servers. The view of the adversary, denoted by in a random run of the protocol consists of the following:

  1. Corrupt parties’ views, including 1) corrupt parties’ inputs, 2) all randomness consumed by corrupt parties, and 3) an ordered sequence of all messages received by corrupt parties, including which party the message is received from, as well as the round in which each message is received. We assume that these messages are ordered by the round in which they are received, and then by the party from which it is received.

  2. Honest communication pattern: when honest parties (including the client) exchange messages, the adversary observes their communication pattern: including which pairs of honest nodes exchange messages in which round.

We stress that in our model only one block can be exchanged between every pair in a round — thus the above definition effectively allows to see the total length of messages exchanged between honest parties.

Remark 1

We remark that this definition captures a notion of timing patterns along with access patterns. For instance, suppose two servers store two sorted lists that needs to be merged. The client performs a regular merge operation to read from the two lists, reading the heads of the lists in each round. In such a scenario, depending on the rounds in which blocks are read from a server, an adversary that corrupts that server can compute the relative ordering of blocks between the two lists.

Defining security in the ideal-real paradigm.

Consider an ideal functionality : upon receiving the input from the client and inputs from each of the servers respectively, and a random string sampled from some distribution, computes

where is the client’s output, and denote the servers’ outputs respectively.

Definition 1 (Perfect security in the presence of a semi-honest adversary)

We say that “a protocol perfectly securely realizes an ideal functionality in the presence of a semi-honest adversary corrupting servers” iff for every adversary that controls up to corrupt servers, there exists a simulator such that for every input vector , the following real- and ideal-world experiments output identical distributions:

  • Ideal-world experiment. Sample at random and compute . Output the following tuple where we abuse notation and use to denote the fact that is corrupt:

  • Real-world experiment. Execute the (possibly randomized) real-world protocol, and let be the outcome of the client and each of the servers respectively. Let denote the view of the adversary in this run. Now, output the following tuple:

Note that throughout the paper, we will define various building blocks that realize different ideal functionalities. The security of all building blocks can be defined in a unified approach with this paradigm. When we compose these building blocks to construct our full protocol, we can prove perfect security of the full protocol in a composable manner. By modularly proving the security of each building block, we can now think of each building block as interacting with an ideal functionality. This enables us to prove the security of the full protocol in the ideal world assuming the existence of these ideal functionalities.

Active-server vs. passive-server protocols.

In active-server protocols, servers can perform arbitrary computation and send messages to each other. In passive-server protocols, the servers act as passive memory and can only answer memory read and write requests from the client; furthermore, there is only client-server communication and servers do not communicate with each other. Obviously passive-server schemes are more general — in fact, all schemes in this paper are in the passive-server paradigm. We stress that all of our security definitions apply to both passive-server and active-server schemes.

2.3 Definition of -Server Oblivious RAM

Ideal logical memory.

The ideal logical memory is defined in the most natural way. There is a memory array consisting of blocks where each block is bits long, and each block is identified by its unique address which takes value in the range .

Initially all blocks are set to 0. Upon receiving , the value of the block residing at address is returned. Upon receiving , the block at address is overwritten with the data value , and its old value (before being rewritten) is returned.

-server ORAM.

A -server Oblivious RAM (ORAM) is a protocol between a client and servers which realizes an ideal logical memory. The execution of this protocol proceeds in a sequence of iterations: in each interaction, the client receives a logical memory request of the form or . It then engages in some (possibly randomized) protocol with the servers, at the end of which it produces some output thus completing the current iteration.

We require perfect correctness and perfect security as defined below. We refer to a sequence of logical memory requests as a request sequence for short.

  • Perfect correctness. For any request sequence, with probability , all of the client’s outputs must be correct. In other words, we require that with probability 1, all of the client’s outputs must match what the an ideal logical memory would have output for the same request sequence.

  • Perfect security under a semi-honest adversary. We say that a -server ORAM scheme satisfies perfect security w.r.t. a semi-honest adversary corrupting servers, iff for every that controls up to servers, and for every two request sequences and of equal length, the views and are identically distributed, where denotes the view of (as defined earlier in Section 2.2) under the request sequence .

Since we require perfect security (and our scheme is based on information-theoretic secret-sharing), our notion resists adaptive corruptions and is composable.

2.4 Resource Assumptions and Cost Metrics

We assume that the client can store blocks while the servers can store blocks. We will use the metric bandwidth blowup to characterize the performance of our protocols. Bandwidth blowup is the (amortized) number of blocks queried in the ORAM simulation to query a single virtual block. We also note that since the servers do not perform any computation, and the client always performs an computation on its storage, an bandwidth blowup also corresponds to an runtime for our protocol.

3 Core Building Blocks: Definitions and Constructions

Imagine that there are three servers denoted , , and , and a client denoted . We use  to refer to a specific server. Arithmetic performed on the subscript is done modulo 3.

3.1 Useful Definitions

Let denote a list of blocks where each block is either a real block containing a payload string and a logical address; or a dummy block denoted . We define sorted and semi-sorted as follows:

  • Sorted: is said to be sorted iff all real blocks appear before dummy ones; and all the real blocks appear in increasing order of their logical addresses. If multiple blocks have the same logical address, their relative order can be arbitrary.

  • Semi-sorted: is said to be semi-sorted iff all the real blocks appear in increasing order of their logical addresses, and ties may be broken arbitrarily. However, the real blocks are allowed to be interspersed by dummy blocks.

Array Notation.

We assume each location of an array stores a block which is a bit-string of length . Given two arrays and , we use to denote the resulting array after performing bitwise-XOR on the corresponding elements at each index of the two arrays; if the two arrays are of different lengths, we assume the shorter array is appended with a sufficient number of zero elements.

Permutation Notation.

When a permutation is applied to an array indexed by  to produce , we mean the element currently at location  will be moved to location . When we compose permutations, means that is applied before . We use to denote the identity permutation.

Layout.

A layout is a way to store some data on three servers such that the data can be recovered by combining information on the three servers. Recall that the client has only blocks of space, and our protocol does not require that the client stores any persistent data.

Whenever some data is stored on a server, informally speaking, we need to ensure two things: 1) The server does not learn the data itself, and 2) The server does not learn which index i of the data is accessed. In order to ensure the prior, we XOR secret-share the data between three servers such that stores . For a server to not learn which index in is accessed, we ensure that the data is permuted, and the access happens to the permuted data. If the data is accessed on the same server that permutes the data, then the index will still be revealed. Thus, for each share , we ensure that one server permutes it and we access it from another server, i.e., we have two types of servers:

  • Each server acts as a storage server for the -th share, and thus it knows .

  • Each server also acts as the permutation server for the -th share, and thus it also knows as well as .

Throughout the paper, a layout is of the following form

where and are stored by server . As mentioned, not only knows its own share () but also the permutation and share of the next server .

Specifically, denote lists of blocks of equal length: we denote . Further, is a permutation stored by server for the list . Unless there is ambiguity, we use to mean applying to three underlying arrays.

The above layout is supposed to store the array that can be recovered by:

Henceforth, given a layout , we say that the layout is sorted (or semi-sorted) iff is sorted (or semi-sorted).

Special Case. Sometimes the blocks secret-shared among , , may be unpermuted, i.e., for each , is the identity permutation . In this case, the layout is

For brevity, the unpermuted layout is also denoted by the abstract array .

Definition 2 (Secret Write)

An abstract array corresponds to some unpermuted layout . We say that the client secretly writes a value to the array at index , when it does the following:

  • Sample random values and independently, and compute .

  • For each , the client writes on server (and ).

Definition 3 (Reconstruct)

Given some layout , the client reconstructs a value from using tuple of indices, when it does the following:

  • For each , the client reads from server . (It is important that the client reads from , even though is stored in both and .)

  • The reconstructed value is .

Protocol Notation.

All protocols are denoted as out Prot(sin, cin). Here, sin and cin are respectively server and client inputs to the protocol Prot. Except for in an ORAM Lookup, all the outputs out are sent to the server.

3.2 Permute and Unpermute

3.2.1 Non-oblivious random permutation.

Fisher and Yates [10] show how to generate a uniformly random permutation in time steps. This implies that the client can write a random permutation on a server with bandwidth. The permutation is non-oblivious, i.e., the server does learn the permutation generated.

3.2.2 Definition of .

is a protocol that realizes an ideal functionality as defined below. Intuitively, this functionality takes some unpermuted input layout (i.e., unpermuted secret-shared inputs) and three additional permutations from the three permutation servers . The functionality produces an output such that the three shares are secret-shared again, and the share received by storage server is permuted using . Secret-sharing the data again before applying the new permutations ensures that a storage server does not learn the permutation applied to its share.

  • :

    • Input: Let be the unpermuted layout provided as input. (Recall that and are stored in server .)

      Moreover, for each , has an additional permutation as input (which could be generated by the client for instance).

      The arrays have the same length , for some . The client obtains as the input.

    • Ideal functionality :

      Sample independently and uniformly random of length .

      Now, define , i.e., .

      For each , define .

      The output layout is , and the client’s output is .

3.2.3 Protocol .

The implementation of proceeds as follows:

  1. Mask shares. For each data block, the client first generates block “masks” that sum up to zero, and then applies mask to on server . Specifically, the client does the following, for each :

    • Generate block “masks” that sum up to zero, i.e., sample independent random blocks and , and compute .

    • Apply mask to stored on server , i.e., for each , the client writes on server .

  2. Permute share of and send result to . The client uses to permute a share on the permutation server and then sends this permuted share to the storage server, i.e., for each , the client computes computes on server , and sends the result to . Each server stores and ; hence, the new layout is achieved.

Theorem 3.1

The protocol perfectly securely realizes the ideal functionality in the presence of a semi-honest adversary corrupting a single server with bandwidth.

Proof

By construction, the implementation of the protocol applies the correct permutation on each server’s array and re-distributes the secret shares using fresh independent randomness. Hence, the marginal distribution of the protocol’s outputs is exactly the same as that of the ideal functionality.

Fix some and consider the of the corrupt server . Since the leakage of is empty, we will in fact show that given the inputs and the outputs to server , the is totally determined and has no more randomness. Hence, given and conditioning on , the is trivially independent of the outputs of the client and other servers.

Then, given the inputs and the outputs to , a simulator simply returns the view of uniquely determined by and .

The inputs to are the arrays and , and the permutation . The outputs are the arrays and , and also the permutation . We next consider each part of .

  1. Communication Pattern. The communication pattern between the client and all the servers only depends on .

  2. Data Structure. We next analyze the intermediate data that is observed by . The arrays and are in ’s outputs. Hence, it suffices to consider the intermediate array , which is totally determined by the outputs.

We have shown that the is actually a deterministic function of the inputs and the outputs of , as required.

Efficiency.

Recall that it takes time to process one block. From the construction, it is straightforward that linear scans are performed on the relevant arrays.

In particular, the two steps – masking shares and permuting the shares – can be done with bandwidth. Moreover, the client can generate a permutation on the server with bandwidth, when each block has bits, using the Fisher-Yates algorithm [10].

3.2.4 Definition of .

is a protocol that realizes an ideal functionality as defined below. Intuitively, this functionality reverses the effect of . It takes some permuted input layout, and returns the corresponding unpermuted layout. However, to avoid each server knowing its original permutation, the contents of each entry needs to be secret-shared again.

  • :

    • Input: Let be the layout provided as input. (Recall that and are stored in server .)

      The arrays have the same length , for some . The client obtains as the input.

    • Ideal functionality :

      Sample independently and uniformly random of length .

      Now, define , i.e., .

      The output layout is , and the client’s output is .

3.2.5 Protocol .

The implementation of proceeds as follows:

  1. Compute inverse permutations. For each , the client computes the inverse permutation on server .

  2. Mask shares. For each data block, the client generates block “masks” that sum up to zero and then applies the mask to on server . Specifically, the client performs the following, for each :

    • Generate block “masks” that sum up to zero, i.e., Sample independent random blocks and , and compute .

    • Apply mask to stored on server , i.e., for each , the client writes on server .

  3. For each , the server sends to .

    Hence, the new layout is achieved.

Theorem 3.2

The protocol perfectly securely realizes the ideal functionality in the presence of a semi-honest adversary corrupting a single server with bandwidth blowup.

Proof

The proof is essentially the same as Theorem 3.1. It can be checked that for each , the is a deterministic function of the inputs and the outputs of .

3.3 Stable Compaction

3.3.1 Definition of .

is a protocol that realizes an ideal functionality , as defined below:

  • :

    • Input layout: A semi-sorted, unpermuted layout denoted .

    • Ideal functionality : computes ; it then moves all dummy blocks in to the end of the array, while keeping the relative order of real blocks unchanged.

      Now, randomly samples of appropriate length and computes such that . The output layout is a sorted, unpermuted layout .

3.3.2 Protocol.

The input is a semi-sorted, unpermuted layout, and we would like to turn it into a sorted, unpermuted layout obliviously. The key idea is to permute each share of the list (stored on the 3 servers respectively), such that the storage server for each share does not know the permutation. Now, the client accesses all real elements in a sorted order, and then accesses all dummy elements, writing down the elements in a secret-shared manner as the accesses are made. We can achieve this if each real or dummy element is tagged with a pointer to its next element, and the pointer is in fact a -tuple that is also secret shared on the 3 servers — each element in the 3-tuple indicates where the next element is in one of the 3 permutations.

Therefore, the crux of the algorithm is to tag each (secret-shared) element with a (secret-shared) position tuple, indicating where its next element is — this will effectively create two linked list structures (one for real and one for dummy): each element in the linked lists is secret shared in to 3 shares, and each share resides on its storage server at an independent random location.

The detailed protocol is as follows:

  1. First, each server acts as the permutation server for . Thus, the client generates a random permutation on the permutation server using the Fisher-Yates algorithm described in Section 3.2.1. Basically, for each index of the original list the client writes down, on each , that its -th share (out of 3 shares), wants to be in position .

  2. Next, the client makes a reverse scan of for down to . The client can access by talking to . In this reverse scan, the client always locally remembers the position tuple of the last real element encountered (henceforth denoted ) and the position tuple of the last dummy element encountered (henceforth denoted . Thus, if is the last seen real element, then the client remembers . is updated analogously. Initially, and are set to .

    During this scan, whenever a real element is encountered, the client secretly writes the link , i.e., represents secret shares of the next pointers for the real element and itself represents an abstract linked list of real elements. The links for dummy elements are updated analogously using .

    At the end of this reverse scan, the client remembers the position tuple for the first real of the linked list denoted and position tuple for the first dummy denoted .

  3. Next, we call inputting 1) the original layout — but importantly, now each element is tagged with a position tuple (that is also secret shared); and 2) the three permutations chosen by each (acting as the permutation server for ). Thus, is applied to the combined layout , where has input permutation . Let the output of Permute be denoted by .

  4. Finally, the client traverses first the real linked list (whose start position tuple is ) and then the dummy linked list (whose start position tuple is ). During this traversal, the client secretly writes each element encountered to produce the sorted and unpermuted output layout.

    More precisely, the client secretly writes an abstract array element by element. Start with and .

    The client reconstructs element and the next pointer of the linked list ; the client secretly writes to the abstract array .

    Then, it updates and , and continues to the next element; if the end of the real list is reached, then it sets . This continues until the whole (abstract) is secretly written to the three servers.

  5. The new layout is constructed.

Theorem 3.3

The protocol perfectly securely realizes the ideal functionality in the presence of a semi-honest adversary corrupting a single server with bandwidth.

Proof

By construction, the protocol correctly removes dummy elements and preserves the original order of real elements, where the secret shares are re-distributed using independent randomness. Hence, the marginal distribution on the outputs is the same for both the protocol and the ideal functionality.

We fix the inputs of all servers, and some . The goal is to show that (1) the follows a distribution that is totally determined by the inputs and the outputs of the corrupt ; (2) conditioning on , is independent of the outputs of the client and all other servers.

The second part is easy, because the inputs are fixed. Hence, conditioning on (which includes and ), has no more randomness and totally determines the outputs of other servers.

To prove the first part, our strategy is to decompose into a list components, and show that fixing and conditioning on and a prefix of the components, the distribution of the next component can be determined. Hence, this also gives the definition of a simulator.

First, observe that in the last step, the client re-distributes the shares, and gives output  to ; moreover, the shares of are generated with fresh independent randomness. Hence, the distribution of the part of excluding is independent of . We consider the components of in the following order, which is also how a simulator generates a view after seeing .

  1. Communication Pattern. Observe that from the description of the algorithm, the communication pattern between the client and the servers depends only on the length  of the input array.

  2. Random permutation . This is independently generated using fresh randomness.

  3. Link creation. The (abstract) array is created by reverse linear scan. The shares and received by are generated by fresh independent randomness.

  4. subroutine. By the correctness of the , the shares of the outputs received by follow an independent and uniform random distribution. By the perfect security of , the component of due to depends only on the inputs (which include the shares of and ) and the outputs of the subroutine.

  5. List traversal. Since does not know (which is generated using independent randomness by ), from ’s point of view, its array is traversed in an independent and uniform random order.

Therefore, we have described a simulator procedure that samples the step-by-step, given and .

Efficiency.

Each of the steps in the protocol can be executed with a bandwidth of . Step i can be performed using Fisher-Yates shuffle algorithm. In steps ii and iv, the client linearly scans the abstract lists and the links . Accessing each array costs bandwidth. Finally, step iii invokes , which requires bandwidth (Theorem 3.1).

3.4 Merging

3.4.1 Definition of .

is a protocol that realizes an ideal functionality as defined below:

  • :

    • Input layout: Two semi-sorted, unpermuted layouts denoted and denoting abstract lists and , where all the arrays have the same length .

    • Ideal functionality : First, merges the two lists and , such that the resulting array is sorted with all dummy blocks at the end. Let be this merged result. Now, randomly samples and independently of appropriate length and computes such that . The output layout is a sorted, unpermuted layout .

3.4.2 Protocol.

The protocol receives as input, two semi-sorted, unpermuted layouts and produces a merged, sorted, unpermuted layout as the output. The key idea is to permute the concatenation of the two semi-sorted inputs such that the storage servers do not know the permutation. Now, the client accesses real elements in both lists in the sorted order using the storage servers to produce a merged output. Given that a concatenation of the lists is permuted together, elements from which list is accessed is not revealed during the merge operation, thereby allowing us to merge the two lists obliviously. In order to access the two lists in a sorted order, the client creates a linked list of real and dummy elements using the permutation servers, similar to the StableCompact protocol in Section 3.3.

The detailed protocol works as follows:

  1. First, the client concatenates the two abstract lists and to obtain an abstract list of size , i.e., we interpret as the concatenation of and for each . Specifically, corresponds to and corresponds to .

  2. Now, each server acts as the permutation server for . The client generates a random permutation on server using the Fisher-Yates algorithm described in Section 3.2.1. represents the position of the -th share and is stored on server .

  3. The client now performs a reverse scan of for down to . During this reverse scan, the client always locally remembers the position tuples of the last real element and last dummy element encountered for both the lists. Let them be denoted by , , , and . Thus, if is the last seen real element from the first list, the client remembers . The other position tuples are updated analogously. Each of these tuples are initially set to .

    During the reverse scan, the client maintains an abstract linked list in the following manner. When is processed, if it is a real element from the first list, then the client secretly writes the link . represents secret shares of the next pointers for a real element from the first list. The cases for , and are analogous.

    At the end of this reverse scan, the client remembers the position tuple for the first real and first dummy elements of both linked lists. They are denoted by , , , and .

  4. We next call to the combined layout , where each server has input , to produce as output.

  5. The linked lists can now be accessed using the four position tuples , , , and . The client first starts accessing real elements in the two lists using and to merge them. When a real list ends, it starts accessing the corresponding dummy list.

    More precisely, the client secretly writes the merged result to the abstract output array .

    Start with , , .

    For each , the client reconstructs and at most once, i.e., if and have already been reconstructed once with the tuple , then they will not be reconstructed again.

    If should appear before , then the client secretly writes and updates , ; if the end of the real list is reached, then it updates . The case when should appear before is analogous.

    The next element is processed until the client has secretly constructed the whole abstract array .

  6. The new merged layout is produced.

Theorem 3.4

The protocol perfectly securely realizes the ideal functionality in the presence of a semi-honest adversary corrupting a single server with bandwidth.

Proof

We follow the same strategy as in Theorem 3.3. Again, from the construction, the protocol performs merging correctly and re-distributes secretes using independent randomness. Hence, the marginal distribution of the outputs is the same for both the protocol and the ideal functionality.

We fix the inputs of all servers, and some . Recall that the goal is to show that (1) the follows a distribution that is totally determined by the inputs and the outputs of ; (2) conditioning on , is independent of the outputs of the client and all other servers.

The second part is easy, because the inputs are fixed. Hence, conditioning on (which includes and ), has no more randomness and totally determines the outputs of other servers.

To prove the first part, our strategy is to decompose into a list components, and show that fixing and conditioning on and a prefix of the components, the distribution of the next component can be determined. Hence, this also gives the definition of a simulator.

First, observe that in the last step, the client re-distributes the shares of , and gives output  (including and ) to ; moreover, the shares of are generated with fresh independent randomness. Hence, the distribution of the part of excluding is independent of . We consider the components of in the following order, which is also how a simulator generates a view after seeing .

  1. Communication Pattern. Observe that from the description of the algorithm, the communication pattern between the client and the servers depends only on the length  of the input arrays.

  2. Random permutation . This is independently generated using fresh randomness.

  3. Link creation. The (abstract) array is created by reverse linear scan. The shares and received by are generated by fresh independent randomness.

  4. subroutine. By the correctness of the , the shares of the outputs received by follow an independent and uniform random distribution. By the perfect security of , the component of due to depends only on the inputs (which include the shares of and ) and the outputs of the subroutine.

  5. List traversal. Since does not know (which is generated using independent randomness by ), from ’s point of view, while the elements from the two underlying lists are being merged, its array is traversed in an independent and uniform random order.

Therefore, we have described a simulator procedure that samples the step-by-step, given and .

Efficiency.

The analysis for the protocol is similar to that for Theorem 3.3 except that the operations are performed on lists of size instead of .

4 Three-Server One-Time Oblivious Memory

We construct an abstract datatype to process non-recurrent memory lookup requests, i.e., between rebuilds of the data structure, each distinct address is requested at most once. Our abstraction is similar to the perfectly secure one-time oblivious memory by Chan et al. [5]. However, while Chan et al. only consider perfect security with respect to access pattern, our three-server one time memory in addition information-theoretically encrypts the data itself. Thus, in [5], since the algorithm does not provide guarantees for the data itself, it can modify the data structure while performing operations. In contrast, our one-time oblivious memory is a read-only data structure. In this data structure, we assume every request is tagged with a position label indicating which memory location to lookup in each of the servers. In this section, we assume that such a position is magically available during lookup; but in subsequent sections we show how this data structure can be maintained and provided during a lookup.

4.1 Definition: Three-server One-Time Oblivious Memory

Our (three-server) one-time oblivious memory supports three operations: 1) Build, 2) Lookup, and 3) Getall. Build is called once upfront to create the data structure: it takes in a set of data blocks (tagged with its logical address), permutes shares of the data blocks at each of the servers to create a data structure that facilitates subsequent lookup from the servers. Once the data structure is built, lookup operations can be performed on it. Each lookup request consists of a logical address to lookup and a position label for each of the three servers, thereby enabling them to perform the lookup operation. The lookup can be performed for a real logical address, in which case the logical address and the position labels for each of the three servers are provided; or it can be a dummy request, in which case is provided. Finally, a Getall operation is called to obtain a list of all the blocks that were provided during the Build operation. Later, in our ORAM scheme, the elements in the list will be combined with those in other lists to construct a potentially larger one-time oblivious memory.

Our three-server one-time oblivious memory maintains obliviousness as long as 1) for each real block in the one-time memory, a lookup is performed at most once, 2) at most total lookups (all of which could potentially be dummy lookups) are performed, and 3) no two servers collude with each other to learn the shares of the other server.

4.1.1 Formal Definition.

Our three-server one-time oblivious memory scheme is parameterized by , the number of memory lookup requests supported by the data structure. It is comprised of the following randomized, stateful algorithms:

  • :

    • Input: A sorted, unpermuted layout denoted representing an abstract sorted list . represents a key-value pair which are either real and contains a real address and value , or dummy and contains a . The list is sorted by the key . The client’s input is .

    • Functionality: The Build algorithm creates a layout of size that will facilitate subsequent lookup requests; intuitively, extra dummy elements are added, and the ’s maintain a singly-linked list for these dummy elements. Moreover, the tuple of head positions is secret-shared among the three servers.

      It also outputs a sorted list of key-value pairs sorted by where each ; the invariant is that if , then the data for is .

      The output list is stored as a sorted, unpermuted layout . Every real key from appears exactly once in and the remaining entries of are ’s. The client’s output is .

      Later in our scheme, will be propagated back to the corresponding data structure with preceding recursion depth during a coordinated rebuild. Hence, does not need to carry the value ’s.