We demonstrate a simple, statistically secure, ORAM with computational overhead ; previous ORAM protocols achieve only computational security (under computational assumptions) or require overheard. An additional benefit of our ORAM is its conceptual simplicity, which makes it easy to implement in both software and (commercially available) hardware.
Our construction is based on recent ORAM constructions due to Shi, Chan, Stefanov, and Li (Asiacrypt 2011) and Stefanov and Shi (ArXiv 2012), but with some crucial modifications in the algorithm that simplifies the ORAM and enable our analysis. A central component in our analysis is reducing the analysis of our algorithm to a “supermarket” problem; of independent interest (and of importance to our analysis,) we provide an upper bound on the rate of “upset” customers in the “supermarket” problem.
In this paper we consider constructions of Oblivious RAM (ORAM) [9, 10]. Roughly speaking, an ORAM enables executing a RAM program while hiding the access pattern to the memory. ORAM have several fundamental applications (see e.g. [10, 19] for further discussion). Since the seminal works for Goldreich  and Goldreich and Ostrovksy , constructions of ORAM have been extensively studied (see e.g., [27, 28, 1, 20, 11, 5, 22, 2, 12, 24, 14].) While the original constructions only enjoyed “computational security” (under the the assumption that one-way functions exists) and required a computational overhead of , more recent works have overcome both of these barriers, but only individually. State of the art ORAMs satisfy either of the following:
A natural question is whether both of these barriers can be simultaneously overcome; namely, does there exists a statistically secure ORAM with only overhead? In this work we answer this question in the affirmative, demonstrating the existence of such an ORAM.
There exists a statistically-secure ORAM with worst-case computational overhead, constant memory overhead, and CPU cache size , where is the memory size.
An additional benefit of our ORAM is its conceptual simplicity, which makes it easy to implement in both software and (commercially available) hardware. (A software implementation is available from the authors upon request.)
Our ORAM Construction
A conceptual breakthrough in the construction of ORAMs appeared in the recent work of Shi, Chan, Stefanov, and Li . This work demonstrated a statistically secure ORAM with overhead using a new “tree-based” construction framework, which admits significantly simpler (and thus easier to implemented) ORAM constructions (see also [7, 4] for instantiations of this framework which additionally enjoys an extremely simple proof of security).
On a high-level, each memory cell accessed by the original RAM will be associated with a random leaf in a binary tree; the position is specified by a so-called “position map” . Each node in the tree consists of a “bucket” which stores up to elements. The content of memory cell will be found inside one of the buckets along the path from the root to the leaf ; originally, it is put into the root, and later on, the content gets “pushed-down” through an eviction procedure—for instance, in the ORAM of  (upon which we rely), the eviction procedure consists of “flushing” down memory contents along a random path, while ensuring that each memory cell is still found on its appropriate path from the root to its assigned leaf. (Furthermore, each time the content of a memory cell is accessed, the content is removed from the tree, the memory cell is assigned to a new random leaf, and the content is put back into the root).
In the work of  and its follow-ups [7, 4], for the analysis to go through, the bucket size is required to be . Stefanov and Shi  recently provided a different instantiation of this framework which only uses constant size buckets, but instead relies on a single size “stash” into which potential “overflows” (of the buckets in the tree) are put; Stefanov and Shi conjectured (but did not prove) security of such a construction (when appropriately evicting elements from the “stash” along the path traversed to access some memory cell).222Although different, the “flush” mechanism in  is inspired by this eviction method.
In this work, we follow the above-mentioned approaches, but with the following high-level modifications:
We consider a binary tree where the bucket size of all internal buckets is , but all the leaf nodes still have bucket size .
As in , we use a “stash” to store potential “overflows” from the bucket. In our ORAM we refer to this as a “queue” as the main operation we require from it is to insert and “pop” elements (as we explain shortly, we additionally need to be able to find and remove any particular element from the queue; this can be easily achieved using a standard hash table). Additionally, instead of inserting memory cells directly into the tree, we insert them into the queue. When searching for a memory cell, we first check whether the memory cell is found in the queue (in which case it gets removed), and if not, we search for the memory cell in the binary tree along the path from the root to the position dictated by the position map.
Rather than just “flushing” once (as in ), we repeat the following procedure “pop and random flush” procedure twice.
We “pop” an element from the queue into the root.
Next, we flush according to a geometrically distributed random variable with expectation 2.333Looking forward, our actual flush is a little bit different than the one in  in that we only pull down a single element between any two consecutive nodes along the path, whereas in  all elements that can be pulled down get flushed down.
We demonstrate that such an ORAM construction is both (statistically) secure, and only has overhead.
The key element in our analysis is reducing the security of our ORAM to a “supermarket” problem. Supermarket problems were introduced by Mitzenmacher  and have seen been well-studied (see e.g., [16, 26, 18, 21, 17]). We here consider a simple version of a supermarket problem, but ask a new question: what is the rate of “upset” customers in a supermarket problem: There are cashiers in the supermarket, all of which have empty queues in the beginning of the day. At each time step : with probability a new customer arrives and chooses a random cashier444Typically, in supermarket problems the customer chooses random cashiers and picks the one with the smallest queue; we here focus on the simple case when . (and puts himself in that cashiers queue); otherwise (i.e., with probability ) a random cashier is chosen that “serves” the first customer in its queue (and the queue size is reduced by one). We say that a customer is upset is he chooses a queue whose size exceeds some bound . What is the rate of upset customers?555Although we here consider a discrete-time version of the supermarket problem (since this is the most relevant for our application), as we remark in Remark 1, our results apply also to the more commonly studied continuous-time setting.
We provide an upper bound on the rate of upset customers relying on Chernoff bounds for Markov chains [8, 13, 15, 3]—more specifically, we develop a variant of traditional Chernoff bounds for Markov chains which apply also with “resets” (where at each step, with some small probability, the distribution is reset to the stationary distribution of the Markov chain), which may be of independent interest, and show how such a Chernoff bound can be used in a rather straight-forward way to provide a bound on the number of upset customers.
Intuitively, to reduce the security of our ORAM to the above-mentioned supermarket problem, each cashier corresponds to a bucket on some particular level in the tree, and the bound corresponds to the bucket size, customers correspond to elements being placed in the buckets, and upset customers overflows. Note that for this translation to work it is important that the number of flushes in our ORAM is geometrically distributed—this ensures that we can view the sequence of opertaions (i.e., “flushes” that decrease bucket sizes, and “pops” that increase bucket sizes) as independently distributed as in the supermarket problem.
In a very recent independent work, Stefanov, van Dijk, Shi, Fletcher, Ren, Yu, and Devadas  prove security of the conjectured Path ORAM of . This yields a ORAM with overhead , whereas our ORAM has overhead ). On the other hand, the data structure required to implement our queue is simpler than the one needed to implement the “stash” in the Path ORAM construction. More precisely, we simply need a standard queue and a standard hash table (both of which can be implemented using commodity hardware), whereas the “stash” in [23, 25] requires using a data structure that additionally supports “range queries”, and thus a binary search tree is needed, which may make implementations more costly. We leave a more complete exploration of the benefits of the different approaches for future work.
A Random Access Machine (RAM) with memory size consists of a CPU with a small size cache (e.g., can store a constant or number of words) and an “external” memory of size . To simplify notation, a word is either or a bit string.
The CPU executes a program (given and some input ) that can access the memory by a and operations where is an index to a memory location, and is a word (of size ). The sequence of memory cell accesses by such read and write operations is referred to as the memory access pattern of and is denoted . (The CPU may also execute “standard” operations on the registers, any may generate outputs).
A polynomial-time algorithm is an Oblivious RAM (ORAM) compiler with computational overhead and memory overhead , if given and a deterministic RAM program with memory-size outputs a program with memory-size such that for any input , the running-time of is bounded by where is the running-time of , and there exists a negligible function such that the following properties hold:
Correctness: For any and any string , with probability at least , .
Obliviousness: For any two programs , , any and any two inputs if , then is -close to in statistical distance, where and .
Note that the above definition (just as the definition of ) only requires an oblivious compilation of deterministic programs . This is without loss of generality: we can always view a randomized program as a deterministic one that receives random coins as part of its input.
3 Algorithm for the ORAM.
Our ORAM data structure serves as a “big” memory table of size and exposes the following two interfaces.
: the algorithm returns the value of memory cell .
: the algorithm writes value to memory cell .
We start assuming that the ORAM is executed on a CPU with cache size is (in words) for a suitably large constant (the reader may imagine ). Following the framework in , we can then reduce the cache size to by recursively applying the ORAM construction; we provide further details on this transformation at the end of the section.
In what follows, we group each consecutive memory cells in the RAM into a block and will thus have blocks in total. We also index the blocks in the natural way, i.e. the block that contains the first memory cells in the table has index and in general the -th block contains memory cells with addresses from to .
Our algorithm will always be operating at the block level, i.e. memory cells in the same block will always be read/written together. In addition to the content of its memory cells, each block is associated with two extra pieces of information. First, it stores the index of the block. Second, it stores a “position” that specify it’s storage “destination” in the external memory, which we elaborate upon in the forthcoming paragraphs. In other words, a block is of the form , where is the content of its memory cells.
Our ORAM construction relies on the following three main components.
A full binary tree at the in the external memory that serves as the primary media to store the data.
A position map in the internal cache that helps us to search for items in the binary tree.
A queue in the internal cache that is the secondary venue to store the data.
We now walk through each of the building blocks in details.
The full binary tree . The depth of this full binary tree is set to be the smallest so that the number of leaves is at least (i.e., ).(In [22, 4] the number of leaves was set to ; here, we instead follow  and make the tree slightly smaller—this makes the memory overhead smaller.) We index nodes in the tree by a binary string of length at most , where the root is indexed by the empty string , and each node indexed by has left and right children indexed and , respectively. Each node is associated with a bucket. A bucket in an internal node can store up to blocks, and a bucket in a leaf can store up to blocks, where and are parameters to be determined later. The tree shall support the following two atomic operations:
: the tree will return all the blocks in the bucket associated with to the cache.
: the input is a node and an array of blocks (that will fit into the bucket in node ). This operation will replace the bucket in the node by .
The position map . This data structure is an array that maps the indices of the blocks to leaves in the full binary tree. Specifically, it supports the following atomic operations:
: this function returns the position that corresponds to the block with index .
: this function writes the position to .
The queue . This data structure stores a queue of blocks with maximum size , a parameter to be determined later, and supports the following three atomic operations:
: insert a block into the queue.
: the first block in the queue is popped and returned.
: if there is a block with index and position stored in the queue, then Find returns and deletes it from the queue; otherwise, it returns .
Note that in addition to the usual Insert and PopFront operations, we also require the queue to support a Find operation that finds a given block, returns and deletes it from the queue. This operation can be supported using a standard hash table in conjunction with the queue. We mention that all three operations can be implemented in time less than , and discuss the implementation details in Appendix A.
Our Construction. We now are ready to describe our ORAM construction, which relies the above atomic operations. Here, we shall focus on the read operation. The algorithm for the write operation is analogous.
For two nodes and in , we use to denote the (unique) path connecting and . Throughout the life cycle of our algorithm we maintain the following block-path invariance.
Block-path Invariance: For any index , there exists at most a single block with index that is located either in or in the queue. When it is in the tree, it will be in the bucket of one of the nodes on . Additionally, has position .
We proceed to describe our algorithm. At a high-level, consists of two sub-routines and , where we executes once, and then executes twice. Roughly, fetches the block that contains the memory cell from either in or in , then returns the value of memory cell , and finally inserts the block to the queue . On the other hand, pops one block from , inserts to the root of (provided there is a room), and performs a random number of “Flush” actions that gradually moves blocks in down to the leaves.
Let be the index of the block that contains the -th memory cell, and be the current position of . If (which means that the block is not initialized yet), let be a uniformly random leaf, create a block , and insert to the queue . Otherwise, Fetch performs the following actions in order.
Fetch from tree and queue : Search the block with index along in by reading all buckets in once and writing them back. If such a block is found, save it and write back a dummy block; otherwise, search the block with index and position in the queue by invoking . By the block-path invariance, we must find the block .
Update position map . Let be a uniformly random leaf, and update the position in .
Insert to queue : Insert the block to .
This sub-routine consists of two actions and . It starts by executing once, and then performs a random number of es as follows: Let be a biased coin with . It samples , and if the outcome is , then it continues to perform one and sample another independent copy of , until the outcome is . (In other words, the number of is a geometric random variable with parameter .)
Put-Back: This action moves a block from the queue, if any, to the root of . Specifically, we first invoke a . If returns a block then add it to .
Flush : This procedure selects a random path (namely, the path connecting the root to a random leaf ) on the tree and tries to move the blocks along the path down subject to the condition that the block always finds themselves on the appropriate path from the root to their assigned leaf node (see the block-path invariance condition). Let be the nodes along . We traverse the path while carrying out the following operations for each node we visit: in node , find the block that can be “pulled-down” as far as possible along the path (subject to the block-path invariance condition), and pull it down to . For , if there exists some such that contains more than blocks that are assigned to leafs of the form , then select an arbitrary such block , remove it from the bucket and invokes an procedure, which re-samples a uniformly random position for the overflowed block and inserts it back to the queue . (See Figure 1 and 2 in Appendix for the pseudocode)
Finally, the algorithm aborts and terminates if one of the following two events happen throughout the execution.
: If the size of the queue reaches , then the algorithm aborts and outputs AbortQueue.
: If the size of any leaf bucket reaches (i.e., it becomes full), then the algorithm aborts and outputs AbortLeaf.
This completes the description of our algorithm; the algorithm is defined in identically the same way, except that instead of inserting into the queue (in the last step of Fetch), we insert a modified where the content of the memory cell (inside ) has been updated to .
It follows by inspection that the block-path invariance is preserved by our construction. Also, note that in the above algorithm, Fetch increases the size of the queue by and Put-back is executed twice which decreases the queue size by . On the other hand, the Flush action may cause a few Overflow events, and when an Overflow occurs, one block will be removed from and inserted to . Therefore, the size of the queue changes by minus one plus the number of Overflow for each Read operation. The crux of our analysis is to show that the number of Overflow is sufficiently small in any given (short) period of time, except with negligible probability.
We remark that throughout this algorithm’s life cycle, there will be at most non-empty blocks in each internal node except when we invoke , in which case some intermediate states will have blocks in a bucket (which causes an invokation of Overflow).
Reducing the cache’s size. We now briefly describe how the cache can be reduced to . We will set the queue size (specifically, we can set for an arbitrarily small constant ). The key observation here is that the position map shares the same set of interfaces with our ORAM data structure. Thus, we may substitute the position map with a (smaller) ORAM of size . By recursively substituting the position map times, the size of the cache will reduce to .
Efficiency and setting parameters. By inspection, it is not hard to see that the runtime of our Read and Write algorithms is . Also, note that the position map of the base construction has size , and each recursive level has a position map that is a constant factor smaller. Thus, the overall external memory required by our ORAM construction remains . To achieve the claims efficiency in Theorem 1, we set and .
4 Security of our ORAM
Key observation: Let denote the sum of two independent geometric random variables with mean . Each and operation traverses the tree along randomly chosen paths, independent of the history of operations so far.
The key observation follows from the facts that (1) just as in the schemes of [22, 4], each position in the position map is used exactly once in a traversal (and before this traversal, no information about the position is used in determining what nodes to traverse), and (2) we invokes the Flush action times and the flushing, by definition, traverses a random path, independent of the history.
Armed with the key observation, the security of our construction reduces to show that our ORAM program does not aborts except with negligible probability, which follows by the following two lemmas.
Given any program , let be the compiled program using our ORAM construction. We have
Given any program , let be the compiled program using our ORAM construction. We have
The proof of Lemma 1 is found in the Appendix and follows by a direct application of the (multiplcative) Chernoff bound. The proof of Lemma 2 is significantly more interesting. Towards proving it, in Section 5 we consider a simple variant of a “supermarket” problem (introduced by Mitzenmacher) and show how to reduce Lemma 2 to an (in our eyes) basic and natural question that seems not to have been investigated before.
5 Proof of Lemma 2
We here prove Lemma 2: in Section 5.1 we consider a notion of “upset” customers in a supermarket problem [16, 26, 6]; in Section 5.2 we show how Lemma 2 reduced to obtaining a bound on the rate of upset customers, and in Section 5.3 we provide an upper bound on the rate of upset customers.
5.1 A Supermarket Problem
In a supermarket problem, there are cashiers in the supermarket, all of which have empty queues in the beginning of the day. At each time step ,
With probability , an arrival event happens, where a new customer arrives. The new customer chooses uniformly random cashiers and join the one with the shortest queue.
Otherwise (i.e. with the remaining probability ), a serving event happens: a random cashier is chosen that “serves” the first customer in his queue and the queue size is reduced by one; if the queue is empty, then nothing happens.
We say that a customer is upset if he chooses a queue whose size exceeds some bound . We are interested in large deviation bounds on the number of upset customers for a given short time interval (say, of or time steps).
Supermarket problems are traditionally considered in the continuous time setting [16, 26, 6]. But there exists a standard connection between the continuous model and its discrete time counterpart: conditioned on the number of events is known, the continuous time model behaves in the same way as the discrete time counterpart (with parameters appropriately rescaled).
Most of the existing works [16, 26, 6] study only the stationary behavior of the processes, such as the expected waiting time and the maximum load among the queues over the time. Here, we are interested in large deviation bounds on a statistics over a short time interval; the configurations of different cashiers across the time is highly correlated.
For our purpose, we analyze only the simple special case where the number of choice ; i.e. each new customer is put in a random queue.
We provide a large deviation bound for the number of upset customers in this setting.666It is not hard to see that with cashiers, probability parameter , and “upset” threshold , the expected number of upset customers is at most for any steps time interval. .
For the (discrete-time) supermarket problem with cashier, one choice (i.e., ), probability parameter , and upset threshold , for any steps time interval , let be the number of upset customers in this time interval. We have
Note that Proposition 1 would trivially follow from the standard Chernoff bound if is sufficiently large (ı.e., ) to guarantee that we individually get concentration on each of the queue (and then relying on the union bound). What makes Proposition 1 interesting is that it applies also in a setting when is .
One can readily translate the above result to an analogous deviation bound on the number of upset customers for (not-too-short) time intervals in the continuous time model. This follows by noting that the number of events that happen in a time interval is highly concentrated (provided that the expected number of events is not too small), and applying the above proposition after conditioning on the number of events happen in the time interval (since conditioned on the number of events, the discrete-time and continous-time processes are identical).
5.2 From ORAM to Supermarkets
This section shows how we may apply Proposition 1 to prove Lemma 2. Central to our analysis is a simple reduction from the execution of our ORAM algorithm at level in to a supermarket process with cashiers. More precisely, we show there exists a coupling between two processes so that each bucket corresponds with two cashiers; the load in a bucket is always upper bounded by the total number of customers in the two cashiers it corresponds to.
To begin, we need the following Lemma.
Let be the sequence of Put-Back/Flush operations defined by our algorithm, i.e. each and between any consecutive Put-Backs, the number of Flushes is a geometric r.v. with parameter . Then is a sequence of i.i.d. random variables so that .
To prove Lemma 3, we may view the generation of as generating a sequence of i.i.d. Bernoulli r.v. with parameter . We set be a if and only if . One can verify that the generated in this way is the same as those generated by the algorithm.
We are now ready to describe our coupling between the original process and the supermarket process. At a high-level, a block corresponds to a customer, and sub-trees in level of corresponds to cashiers. More specifically, we couple the configurations at the -th level of in the ORAM program with a supermarket process as follows.
Initially, all cashiers have customer.
For each , a corresponding arrival event occurs: if a ball with position (where ) is moved to , then a new customer arrives at the -th cashier; otherwise (e.g. when the queue is empty), a new customer arrives at a random cashier.
For each along the path to leaf (where ), a serving event occurs at the -th cashier.
For each , nothing happens in the experiment of the supermarket problem. (Intuitively, translates to extra “deletion” events of customers in the supermarket problem, but we ignore it in the coupling since it only decreases the number of blocks in buckets in .)
Correctness of the coupling. We shall verify the above way of placing and serving customers exactly gives us a supermarket process. First recall that both Put-Back and Flush actions are associated with uniformly random leaves. Thus, this corresponds to that at each timestep a random cashier will be chosen. Next by Lemma 3, the sequence of Put-Back and Flush actions in the execution of our ORAM algorithm is a sequence of i.i.d. variables with . Therefore, when a queue is chosen at a new timestep, an (independent) biased coin is tossed to decide whether an arrival or a service event will occur.
Dominance. Now, we claim that at any timestep, for every , the number of customers at -th cashier is at least the number of blocks stored at or above level in with position . This follows by observing that (i) whenever there is a block with position moved to (from ), a corresponding new customer arrives at the -th cashier, i.e. when the number of blocks increase by one, so does the number of customers, and (ii) for every along the path to : if there is at least one block stored at or above level in with position , then one such block will be flushed down below level (since we flush the blocks that can be pulled down the furthest)—that is, when the number of customers decreases by one, so does the number of blocks (if possible). This in particular implies that throughout the coupled experiments, for every the number of blocks in the bucket at node is always upper bounded by the sum of the number of customers at cashier and .
We summarize the above in the following lemma.
For every execution of our ORAM algorithm (i.e., any sequence of Read and Write operations), there is a coupled experiment of the supermarket problem such that throughout the coupled experiments, for every the number of blocks in the bucket at node is always upper bounded by the sum of the number of customers at cashier and .
From Lemma 4 and Proposition 1 to Lemma 2. Note that at any time step , if the queue size is , then by Proposition 1 with and Lemma 4, except with negligible probability, at time step , there have been at most overflows per level in the tree and thus at most in total. Yet during this time “epoch”, element have been “popped” from the queue, so, except with negligible probability, the queque size cannot exceed .
It follows by a union bound over length time “epochs”, that except with negligible probability, the queue size never exceeds .
5.3 Analysis of the Supermarket Problem
We now prove Proposition 1. We start with interpreting the dynamics in our process as evolutions of a Markov chain.
A Markov Chain Interpretation. In our problem, at each time step , a random cashier is chosen and either an arrival or a serving event happens at that cashier (with probability and , respectively), which increases or decreases the queue size by one. Thus, the behavior of each queue is governed by a simple Markov chain with state space being the size of the queue (which can also be viewed as a drifted random walk on a one dimensional finite-length lattice). More precisely, each state of transits to state and with probability and , respectively, and for state , it transits to state and stay at state with probability and , respectively. In other words, the supermarket process can be rephrased as having copies of Markov chains , each of which starts from state , and at each time step, one random chain is selected and takes a move.
We shall use Chernoff bounds for Markov chains [8, 13, 15, 3] to derive a large deviation bound on the number of upset customers. Roughly speaking, Chernoff bounds for Markov chains assert that for a (sufficiently long) -steps random walk on an ergodic finite state Markov chain , the number of times that the walk visits a subset of states is highly concentrated at its expected value , provided that the chain has spectral expansion777For an ergodic reversible Markov chain , the spectral expansion of is simply the second largest eigenvalue (in absolute value) of the transition matrix of . The quantity is often referred to as the spectral gap of . bounded away from . However, there are a few technical issues, which we address in turn below.
Overcounting. The first issue is that counting the number of visits to a state set does not capture the number of upset customers exactly—the number of upset customers corresponds to the number of transits from state to with . Unfortunately, we are not aware of Chernoff bounds for counting the number of transits (or visits to an edge set). Nevertheless, for our purpose, we can set and the number of visits to provides an upper bound on the number of upset customers.
Truncating the chain. The second (standard) issue is that the chain for each queue of a cashier has infinite state space , whereas Chernoff bounds for Markov chains are only proven for finite-state Markov chains. However, since we are only interested in the supermarket process with finite time steps, we can simply truncate the chain at a sufficiently large (say, ) to obtain a chain with finite states ; that is, is identical to , except that for state , it stays at with probability and transits to with probability . Clearly, as we only consider time steps, the truncated chain behaves identical to . It’s also not hard to show that has stationary distribution with , and spectral gap .888One can see this by lower bounding the conductance of and applying Cheeger’s inequality.
Correlation over a short time frame. The main challenge, however, is to establish large deviation bounds for a short time interval (compared to the number of chains). For example, or even , and in these cases the expected number of steps each of the chains take can be a small constant or even . Therefore, we cannot hope to obtain meaningful concentration bounds individually for each single chain. Finally, the chains are not completely independent: only one chain is selected at each time step. This further introduces correlation among the chains.
We address this issue by relying on a new variant of Chernoff bounds for Markov chains with “resets,” which allows us to “glue” walks on separate chains together and yields a concentration bound that is as good as a -step random walk on a single chain. We proceed in the following steps.
Recall that we have copies of truncated chains starting from state . At each time step, a random chain is selected and we takes one step in this chain. We want to upper bound the total number of visits to during time steps .
We first note that, as we are interested in upper bounds, we can assume that the chains start at the stationary distribution instead of the state (i.e., all queues have initial size drawn from instead of being empty). This follows by noting that starting from can only increase the queue size throughout the whole process for all of queues, compared to starting from empty queues, and thus the number of visits to can only increase when starting from in compared to starting from state (this can be formalized using a standard coupling argument).
Since walks from the stationary distribution remain at the stationary distribution, we can assume w.l.o.g. that the time interval is . Now, as a thought experiment, we can decompose the process as follows. We first determine the number of steps each of the chains take during time interval ; let denote the number of steps taken in the -th chain. Then we take steps of random walk from the stationary distribution for each copy of the chain , and count the total number of visit to .
Finally, we can view the process as taking a -step random walk on with “resets.” Namely, we start from the stationary distribution , take steps in , ”reset” the distribution to stationary distribution (by drawing an independent sample from ) and take more steps, and so on. At the end, we count the number of visits to , denoted by , as an upper bound on the number of upset customers.
Intuitively, taking a random walk with resets injects additional randomness to the walk and thus we should expect at least as good concentration results. We formalize this intuition as the following Chernoff bound for Markov chains with ”resets”—the proof of which follows relatively easy from recent Chernoff bounds for Markov chains  and is found in Appendix B.2—and use it to finish the proof of Proposition 1.
Theorem 2 (Chernoff Bounds for Markov Chains with Resets).
Let be an ergodic finite Markov chain with state space , stationary distribution , and spectral expansion . Let and . Let and . Let denote a -step random walk on from stationary with resets at steps ; that is, for every , and are random walks from . Let iff for every and . We have
-  Miklós Ajtai. Oblivious rams without cryptogrpahic assumptions. In STOC, pages 181–190, 2010.
-  Dan Boneh, David Mazieres, and Raluca Ada Popa. Remote oblivious storage: Making oblivious ram practical, 20121, howpublished = CSAIL Technical Report: MIT-CSAIL-TR-2011-018.
-  K. M. Chung, H. Lam, Z. Liu, and M. Mitzenmacher. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In Proceedings of the 29th International Symposium on Theoretical Aspects of Computer Science (STACS), 2012.
-  Kai-Min Chung and Rafael Pass. A simple oram. Cryptology ePrint Archive, Report 2013/243, 2013.
-  Ivan Damgård, Sigurd Meldgaard, and Jesper Buus Nielsen. Perfectly secure oblivious ram without random oracles. In TCC, pages 144–163, 2011.
-  Derek L. Eager, Edward D. Lazowska, and John Zahorjan. Adaptive load sharing in homogeneous distributed systems. IEEE Trans. Software Eng., 12(5):662–675, 1986.
-  Craig Gentry, Kenny A. Goldman, Shai Halevi, Charanjit S. Jutla, Mariana Raykova, and Daniel Wichs. Optimizing oram and using it efficiently for secure computation. In Privacy Enhancing Technologies, pages 1–18, 2013.
-  D. Gillman. A chernoff bound for random walks on expander graphs. SIAM Journal on Computing, 27(4), 1997.
-  Oded Goldreich. Towards a theory of software protection and simulation by oblivious rams. In STOC, pages 182–194, 1987.
-  Oded Goldreich and Rafail Ostrovsky. Software protection and simulation on oblivious rams. J. ACM, 43(3):431–473, 1996.
-  Michael T. Goodrich and Michael Mitzenmacher. Privacy-preserving access of outsourced data via oblivious ram simulation. In ICALP (2), pages 576–587, 2011.
-  Michael T. Goodrich, Michael Mitzenmacher, Olga Ohrimenko, and Roberto Tamassia. Privacy-preserving group data access via stateless oblivious ram simulation. In SODA, pages 157–167, 2012.
-  N. Kahale. Large deviation bounds for markov chains. Combinatorics, Probability, and Computing, 6(4), 1997.
-  Eyal Kushilevitz, Steve Lu, and Rafail Ostrovsky. On the (in)security of hash-based oblivious ram and a new balancing scheme. In SODA, pages 143–156, 2012.
-  P. Lezaud. Chernoff-type bound for finite markov chains. Annals of Applied Probability, 8(3):849–867, 1998.
-  Michael Mitzenmacher. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst., 12(10):1094–1104, 2001.
-  Michael Mitzenmacher, Balaji Prabhakar, and Devavrat Shah. Load balancing with memory. In FOCS, pages 799–808, 2002.
-  Michael Mitzenmacher and Berhold Vocking. The asymptotics of selecting the shortest of two, improved. In PROCEEDINGS OF THE ANNUAL ALLERTON CONFERENCE ON COMMUNICATION CONTROL AND COMPUTING, volume 37, pages 326–327, 1999.
-  Rafail Ostrovsky and Victor Shoup. Private information storage (extended abstract). In STOC, pages 294–303, 1997.
-  Benny Pinkas and Tzachy Reinman. Oblivious ram revisited. In CRYPTO, pages 502–519, 2010.
-  Devavrat Shah and Balaji Prabhakar. The use of memory in randomized load balancing. In Information Theory, 2002. Proceedings. 2002 IEEE International Symposium on, page 125. IEEE, 2002.
-  Elaine Shi, T.-H. Hubert Chan, Emil Stefanov, and Mingfei Li. Oblivious ram with o((logn)3) worst-case cost. In ASIACRYPT, pages 197–214, 2011.
-  Emil Stefanov and Elaine Shi. Path o-ram: An extremely simple oblivious ram protocol. CoRR, abs/1202.5150v1, 2012.
-  Emil Stefanov, Elaine Shi, and Dawn Song. Towards practical oblivious ram. In NDSS, 2012.
-  Emil Stefanov, Marten van Dijk, Elaine Shi, Christopher Fletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas. Path o-ram: An extremely simple oblivious ram protocol. CoRR, abs/1202.5150v2, 2013.
-  Nikita Dmitrievna Vvedenskaya, Roland L’vovich Dobrushin, and Fridrikh Izrailevich Karpelevich. Queueing system with selection of the shortest of two queues: An asymptotic approach. Problemy Peredachi Informatsii, 32(1):20–34, 1996.
-  Peter Williams and Radu Sion. Usable pir. In NDSS, 2008.
-  Peter Williams, Radu Sion, and Bogdan Carbunar. Building castles out of mud: practical access pattern privacy and correctness on untrusted storage. In ACM Conference on Computer and Communications Security, pages 139–148, 2008.
Appendix A Implementation details.
This section discusses a number of implementation details in our algorithm.
The queue at the cache. We now describe how we may use a hash table and a standard queue (that could be encapsulated in commodity chips) to implement our queue. Here, we only assume the hash table uses universal hash function and it resolves collisions by using a linked-list. To implement the procedure, we simply insert to both the hash table and the queue. The key we use is ’s value at the position map. Doing so we may make sure the maximum load of the hash table is whp [MV08]. To implement , we find the block from the hash table. If it exists, return the block and delete it. Notice that we do not delete at the queue. So this introduces inconsistencies between the hash table and the queue.
We now describe how we implement . Here, we need to be careful with the inconsistencies. We first pop a block from the queue. Then we need to check whether the block is in hash table. If not, that means the block was already deleted earlier. In this case, will not return anything (because we need a hard bound on the running time). One can see that takes time and the other two operations take time whp.
Appendix B Missing Proofs
b.1 Proof of Lemma 1
We turn to showing that the probability of overflow in any of the leaf nodes is small. Consider any leaf node and some time . For there to be an overflow in at time , there must be out of elements in the position map that map to . Recall that all positions in the position map are uniformly and independently selected; thus, the expected number of elements mapping to is and by a standard multiplicative version of Chernoff bound, the probability that elements are mapped to is upper bounded by when (see Theorem 4.4 in [mitzenmacher2005probability]). By a union bound, we have that the probability of any node ever overflowing is bounded by
To analyze the full-fledged construction, we simply apply the union bound to the failure probabilities of the different ORAM trees (due to the recursive calls). The final upper bound on the overflow probability is thus , which is negligible as long as for a suitably large constant . ∎
b.2 Proof of Theorem 2
We here prove Theorem 2. The high level idea is simple—we simulate the resets by taking a sufficiently long “dummy” walk, where we “turn off” the counter on the number of visits to the state set . However, formalizing this idea requires a more general version of Chernoff bounds that handles “time-dependent weight functions,” which allows us to turn on/off the counter. Additionally, as we need to add long dummy walks, a multiplicative version (as opposed to an additive version) Chernoff bound is needed to derive meaningful bounds. We here rely on a recent generalized version of Chernoff bounds for Markov chains due to Chung, Lam, Liu and Mitzenmacher .
Theorem 3 ().
Let be an ergodic finite Markov chain with state space , stationary distribution , and spectral expansion . Let denote a -step random walk on starting from stationary distribution , that is, . For every , let be a weight function at step with expected weight . Let . Define the total weight of the walk by . Then
We now proceed to prove Theorem 2.
Proof of Theorem 2.
We use Theorem 3 to prove the theorem. Let be an indicator function on (i.e., iff ) .The key component from Theorem 3 we need to leverage here is that the functions can change over the time. Here, we shall design a very long walk on so that the marginal distribution of a specific collections of “subwalks” from will be statistically close to . Furthermore, we design in such a way that those “unused” subwalks will have little impact to the statistics we are interested in. In this way, we can translate a deviation bound on to a deviation bound on . Specifically, let be the mixing time for (i.e. the number of steps needed for a walk to be -close to the stationary distribution in statistical distance). Here, we let ( is chosen in an arbitrary manner so long as it is sufficiently small). Given , we define and as follows: will start from and take steps of walk. In the mean time, we shall set for all . Then we “turn off” the function while letting keep walking for more steps, i.e. we let for all . Intuitively, this means we let take a long walk until it becomes close to again. During this time, is turned off so that we do not keep track of any statistics. After that, we “turn on” the function again for the next steps (i.e. for all , followed by turning off for another steps. We continue this “on-and-off” process until we walk through all ’s.