A Local Linearizability with Shape Observers

Local Linearizability1

Abstract

The semantics of concurrent data structures is usually given by a sequential specification and a consistency condition. Linearizability is the most popular consistency condition due to its simplicity and general applicability. Nevertheless, for applications that do not require all guarantees offered by linearizability, recent research has focused on improving performance and scalability of concurrent data structures by relaxing their semantics.

In this paper, we present local linearizability, a relaxed consistency condition that is applicable to container-type concurrent data structures like pools, queues, and stacks. While linearizability requires that the effect of each operation is observed by all threads at the same time, local linearizability only requires that for each thread T, the effects of its local insertion operations and the effects of those removal operations that remove values inserted by T are observed by all threads at the same time. We investigate theoretical and practical properties of local linearizability and its relationship to many existing consistency conditions. We present a generic implementation method for locally linearizable data structures that uses existing linearizable data structures as building blocks. Our implementations show performance and scalability improvements over the original building blocks and outperform the fastest existing container-type implementations.

(concurrent) data structures, relaxed semantics, linearizability

1]Andreas Haas 2]Thomas A. Henzinger 3]Andreas Holzer 4]Christoph M. Kirsch 1]Michael Lippautz 1]Hannes Payer 5]Ali Sezgin 4]Ana Sokolova 6,7]Helmut Veith 1]Google Inc. 2]IST Austria, Austria 3]University of Toronto, Canada 4]University of Salzburg, Austria 5]University of Cambridge, UK 6]Vienna University of Technology, Austria 7]Forever in our hearts

\subjclass

D.3.1 [Programming Languages]: Formal Definitions and Theory—Semantics; E.1 [Data Structures]: Lists, stacks, and queues; D.1.3 [Software]: Programming Techniques—Concurrent Programming

1 Introduction

Concurrent data structures are pervasive all along the software stack, from operating system code to application software and beyond. Both correctness and performance are imperative for concurrent data structure implementations. Correctness is usually specified by relating concurrent executions, admitted by the implementation, with sequential executions, admitted by the sequential version of the data structure. The latter form the sequential specification of the data structure. This relationship is formally captured by consistency conditions, such as linearizability, sequential consistency, or quiescent consistency [25].

Linearizability [26] is the most accepted consistency condition for concurrent data structures due to its simplicity and general applicability. It guarantees that the effects of all operations by all threads are observed consistently. This global visibility requirement imposes the need of extensive synchronization among threads which may in turn jeopardize performance and scalability. In order to enhance performance and scalability of implementations, recent research has explored relaxed sequential specifications [23, 40, 2], resulting in well-performing implementations of concurrent data structures [2, 18, 23, 28, 38, 6]. Except for [27], the space of alternative consistency conditions that relax linearizability has been left unexplored to a large extent. In this paper, we explore (part of) this gap by investigating local linearizability, a novel consistency condition that is applicable to a large class of concurrent data structures that we call container-type data structures, or containers for short. Containers include pools, queues, and stacks. A fine-grained spectrum of consistency conditions enables us to describe the semantics of concurrent implementations more precisely, e.g., we show in our appendix that work stealing queues [35] which could only be proven to be linearizable wrt pool are actually locally linearizable wrt double-ended queue.


The thread-induced history of thread  is enclosed by a dashed line while the thread-induced history of thread  is enclosed by a solid line.

Figure 1: Local Linearizability

Local linearizability is a (thread-)local consistency condition that guarantees that insertions per thread are observed consistently. While linearizability requires a consistent view over all insertions, we only require that projections of the global history—so called thread-induced histories—are linearizable. The induced history of a thread  is a projection of a program execution to the insert-operations in combined with all remove-operations that remove values inserted by irrespective of whether they happen in or not. Then, the program execution is locally linearizable iff each thread-induced history is linearizable. Consider the example (sequential) history depicted in Figure 1. It is not linearizable wrt a queue since the values are not dequeued in the same order as they were enqueued. However, each thread-induced history is linearizable wrt a queue and, therefore, the overall execution is locally linearizable wrt a queue. In contrast to semantic relaxations based on relaxing sequential semantics such as [23, 2], local linearizability coincides with sequential correctness for single-threaded histories, i.e., a single-threaded and, therefore, sequential history is locally linearizable wrt a given sequential specification if and only if it is admitted by the sequential specification.

Local linearizability is to linearizability what coherence is to sequential consistency. Coherence [22], which is almost universally accepted as the absolute minimum that a shared memory system should satisfy, is the requirement that there exists a unique global order per shared memory location. Thus, while all accesses by all threads to a given memory location have to conform to a unique order, consistent with program order, the relative ordering of accesses to multiple memory locations do not have to be the same. In other words, coherence is sequential consistency per memory location. Similarly, local linearizability is linearizability per local history. In our view, local linearizability offers enough consistency for the correctness of many applications as it is the local view of the client that often matters. For example, in a locally linearizable queue each client (thread) has the impression of using a perfect queue—no reordering will ever be observed among the values inserted by a single thread. Such guarantees suffice for many e-commerce and cloud applications. Implementations of locally linearizable data structures have been successfully applied for managing free lists in the design of the fast and scalable memory allocator scalloc [5]. Moreover, except for fairness, locally linearizable queues guarantee all properties required from Dispatch Queues [1], a common concurrency programming mechanism on mobile devices.

In this paper, we study theoretical and practical properties of local linearizability. Local linearizability is compositional—a history over multiple concurrent objects is locally linearizable iff all per-object histories are locally linearizable (see Thm. 3) and locally linearizable container-type data structures, including queues and stacks, admit only “sane” behaviours—no duplicated values, no values returned from thin air, and no values lost (see Prop. 4). Local linearizability is a weakening of linearizability for a natural class of data structures including pools, queues, and stacks (see Sec. 4). We compare local linearizability to linearizability, sequential, and quiescent consistency, and to many shared-memory consistency conditions.

Finally, local linearizability leads to new efficient implementations. We present a generic implementation scheme that, given a linearizable implementation of a sequential specification , produces an implementation that is locally linearizable wrt  (see Sec. 6). Our implementations show dramatic improvements in performance and scalability. In most cases the locally linearizable implementations scale almost linearly and even outperform state-of-the-art pool implementations. We produced locally linearizable variants of state-of-the-art concurrent queues and stacks, as well as of the relaxed data structures from [23, 28]. The latter are relaxed in two dimensions: they are locally linearizable (the consistency condition is relaxed) and are out-of-order-relaxed (the sequential specification is relaxed). The speedup of the locally linearizable implementation to the fastest linearizable queue (LCRQ) and stack (TS Stack) implementation at 80 threads is 2.77 and 2.64, respectively. Verification of local linearizability, i.e. proving correctness, for each of our new locally linearizable implementations is immediate, given that the starting implementations are linearizable.

2 Semantics of Concurrent Objects

The common approach to define the semantics of an implementation of a concurrent data structure is (1) to specify a set of valid sequential behaviors—the sequential specification, and (2) to relate the admissible concurrent executions to sequential executions specified by the sequential specification—via the consistency condition. That means that an implementation of a concurrent data structure actually corresponds to several sequential data structures, and vice versa, depending on the consistency condition used. A (sequential) data structure is an object with a set of method calls . We assume that method calls include parameters, i.e., input and output values from a given set of values. The sequential specification of is a prefix-closed subset of . The elements of are called -valid sequences. For ease of presentation, we assume that each value in a data structure can be inserted and removed at most once. This is without loss of generality, as we may see the set of values as consisting of pairs of elements (core values) and version numbers, i.e. . Note that this is a technical assumption that only makes the presentation and the proofs simpler, it is not needed and not done in locally linearizable implementations. While elements may be inserted and removed multiple times, the version numbers provide uniqueness of values. Our assumption ensures that whenever a sequence  is part of a sequential specification , then, each method call in appears exactly once. An additional core value, that is not an element, is empty. It is returned by remove method calls that do not find an element to return. We denote by Emp the set of values that are versions of empty, i.e., .

{definition}

[Appears-before Order, Appears-in Relation] Given a sequence in which each method call appears exactly once, we denote by the total appears-before order over method calls in . Given a method call , we write for appears in .

Throughout the paper, we will use pool, queue, and stack as typical examples of containers. We specify their sequential specifications in an axiomatic way [24], i.e., as sets of axioms that exactly define the valid sequences.

(1)
(2)
(3)
(4)
(5)
Table 1: The pool axioms (1), (2), (3); the queue order axiom (4); the stack order axiom (5)
{definition}

[Pool, Queue, & Stack] A pool, queue, and stack with values in a set have the sets of methods , , and , respectively. We denote the sequential specification of a pool by , the sequential specification of a queue by , and the sequential specification of a stack by . A sequence belongs to iff it satisfies axioms (1) - (3) in Table 1the pool axioms—when instantiating ) with ) and ) with ). We keep axiom (1) for completeness, although it is subsumed by our assumption that each value is inserted and removed at most once. Specification  contains all sequences that satisfy the pool axioms and axiom (4)—the queue order axiom—after instantiating ) with ) and ) with ). Finally, contains all sequences that satisfy the pool axioms and axiom (5)—the stack order axiom—after instantiating ) with ) and ) with ).

We represent concurrent executions via concurrent histories. An example history is shown in Figure 1. Each thread executes a sequence of method calls from ; method calls executed by different threads may overlap (which does not happen in Figure 1). The real-time duration of method calls is irrelevant for the semantics of concurrent objects; all that matters is whether method calls overlap. Given this abstraction, a concurrent history is fully determined by a sequence of invocation and response events of method calls. We distinguish method invocation and response events by augmenting the alphabet. Let and denote the sets of method-invocation events and method-response events, respectively, for the method calls in . Moreover, let be the set of thread identifiers. Let and denote the sets of method-invocation and -response events augmented with identifiers of executing threads. For example, is the invocation of method call by thread . Before we proceed, we mention a standard notion that we will need in several occasions.

{definition}

[Projection] Let be a sequence over alphabet and . By we denote the projection of on the symbols in , i.e., the sequence obtained from by removing all symbols that are not in .

{definition}

[History] A (concurrent) history is a sequence in where (1) no invocation or response event appears more than once, i.e., if and and , for , then and , and (2) if a response event appears in , then the corresponding invocation event also appears in and .

{example}

A queue history (left) and its formal representation as a sequence (right):

A history is sequential if every response event is immediately preceded by its matching invocation event and vice versa. Hence, we may ignore thread identifiers and identify a sequential history with a sequence in , e.g., identifies the sequential history in Figure 1.

A history  is well-formed if is sequential for every thread identifier where denotes the projection of on the set of events that are local to thread . From now on we will use the term history for well-formed history. Also, we may omit thread identifiers if they are not essential in a discussion.

A history determines a partial order on its set of method calls, the precedence order: {definition}[Appears-in Relation, Precedence Order] The set of method calls of a history  is . A method call appears in , notation , if . The precedence order for is the partial order such that, for , we have that iff . By we denote , the subset of the precedence order that relates pairs of method calls of thread , i.e., the program order of thread .

We can characterize a sequential history as a history whose precedence order is total. In particular, the precedence order of a sequential history coincides with its appears-before order . The total order for history  in Fig. 1 is .

{definition}

[Projection to a set of method calls] Let be a history, , , and . Then, we write for .

Note that inherits ’s precedence order:

A history is complete if the response of every invocation event in appears in . Given a history , Complete() denotes the set of all completions of , i.e., the set of all complete histories that are obtained from by appending missing response events and/or removing pending invocation events. Note that iff is a complete history.

A concurrent data structure over a set of methods is a (prefix-closed) set of concurrent histories over . A history may involve several concurrent objects. Let be a set of concurrent objects with individual sets of method calls and sequential specifications  for each object . A history  over is a history over the (disjoint) union of method calls of all objects in , i.e., it has a set of method calls . The added prefix ensures that the union is disjoint. The projection of to an object , denoted by , is the history with a set of method calls obtained by removing the prefix  in every method call in .

{definition}

[Linearizability [26]] A history is linearizable wrt the sequential specification  if there is a sequential history  and a completion such that (1) is a permutation of , and (2) preserves the precedence order of , i.e., if , then . We refer to as a linearization of . A concurrent data structure is linearizable wrt  if every history  of  is linearizable wrt . A history  over a set of concurrent objects is linearizable wrt the sequential specifications  for if there exists a linearization of such that for each object .

3 Local Linearizability

Local linearizability is applicable to containers whose set of method calls is a disjoint union of insertion method calls Ins, removal method calls Rem, data-observation method calls DOb, and (global) shape-observation method calls SOb. Insertions (removals) insert (remove) a single value in the data set or empty; data observations return a single value in ; shape observations return a value (not necessarily in ) that provides information on the shape of the state, for example, the size of a data structure. Examples of data observations are (queue), (stack), and (pool). Examples of shape observations are that returns true if the data structure is empty and false otherwise, and that returns the number of elements in the data structure.

Even though we refrain from formal definitions, we want to stress that a valid sequence of a container remains valid after deleting observer method calls:

(1)

There are also containers with multiple insert/remove methods, e.g., a double-ended queue (deque) is a container with insert-left, insert-right, remove-left, and remove-right methods, to which local linearizability is also applicable. However, local linearizability requires that each method call is either an insertion, or a removal, or an observation. As a consequence, set is not a container according to our definition, as in a set acts as a global observer first, checking whether (some version of) is already in the set, and if not inserts . Also hash tables are not containers for a similar reason.

Note that the arity of each method call in a container being one excludes data structures like snapshot objects. It is possible to deal with higher arities in a fairly natural way, however, at the cost of complicated presentation. We chose to present local linearizability on simple containers only. We present the definition of local linearizability without shape observations here and discuss shape observations in Appendix A.

{definition}

[In- and out-methods] Let  be a container history. For each thread we define two subsets of the methods in , called in-methods  and out-methods  of thread , respectively:
= =      .

Hence, the in-methods for thread are all insertions performed by . The out-methods are all removals and data observers that return values inserted by . Removals that remove the value empty are also automatically added to the out-methods of as any thread (and hence also ) could be the cause of “inserting” empty. This way, removals of empty serve as means for global synchronization. Without them each thread could perform all its operations locally without ever communicating with the other threads. Note that the out-methods  of thread  need not be performed by , but they return values that are inserted by .

{definition}

[Thread-induced History] Let be a history. The thread-induced history is the projection of to the in- and out-methods of thread , i.e., .

{definition}

[Local Linearizability] A history is locally linearizable wrt a sequential specification  if (1) each thread-induced history is linearizable wrt , and (2) the thread-induced histories form a decomposition of , i.e., for some thread . A data structure  is locally linearizable wrt  if every history  of  is locally linearizable wrt . A history  over a set of concurrent objects is locally linearizable wrt the sequential specifications  for if each thread-induced history is linearizable over  and the thread-induced histories form a decomposition of , i.e., for some thread .

Local linearizability is sequentially correct, i.e., a single-threaded (necessarily sequential) history is locally linearizable wrt a sequential specification iff . Like linearizability [25], local linearizability is compositional. The complete proof of the following theorem and missing or extended proofs of all following properties can be found in Appendix B.

{theorem}

[Compositionality] A history over a set of objects  with sequential specifications  for is locally linearizable iff is locally linearizable wrt  for every .

Proof (Sketch).

The property follows from the compositionality of linearizability and the fact that for every thread  and object . ∎

The Choices Made.

Splitting a global history into subhistories and requiring consistency for each of them is central to local linearizability. While this is common in shared-memory consistency conditions [22, 31, 32, 3, 16, 4, 20], our study of local linearizability is a first step in exploring subhistory-based consistency conditions for concurrent objects.

We chose thread-induced subhistories since thread-locality reduces contention in concurrent objects and is known to lead to high performance as confirmed by our experiments. To assign method calls to thread-induced histories, we took a data-centric point of view by (1) associating data values to threads, and (2) gathering all method calls that insert/return a data value into the subhistory of the associated thread (Def. 3). We associate data values to the thread that inserts them. One can think of alternative approaches, for example, associate with a thread the values that it removed. In our view, the advantages of our choice are clear: First, by assigning inserted values to threads, every value in the history is assigned to some thread. In contrast, in the alternative approach, it is not clear where to assign the values that are inserted but not removed. Second, assigning inserted values to the inserting thread enables eager removals and ensures progress in locally linearizable data structures. In the alternative approach, it seems like the semantics of removing empty should be local.

An orthogonal issue is to assign values from shape observations to threads. In Appendix A, we discuss two meaningful approaches and show how local linearizability can be extended towards shape and data observations that appear in insertion operations of sets.

Finally, we have to choose a consistency condition required for each of the subhistories. We chose linearizability as it is the best (strong) consistency condition for concurrent objects.

4 Local Linearizability vs. Linearizability

We now investigate the connection between local linearizability and linearizability.

Proposition 1 (Lin 1).

In general, linearizability does not imply local linearizability.

Proof.

We provide an example of a data structure that is linearizable but not locally linearizable. Consider a sequential specification  which behaves like a queue except when the first two insertions were performed without a removal in between—then the first two elements are removed out of order. Formally, iff (1) where and for some , or (2) and for and . The example below is linearizable wrt . However, ’s induced history is not.


The following condition on a data structure specification is sufficient for linearizability to imply local linearizability and is satisfied, e.g., by pool, queue, and stack.

{definition}

[Closure under Data-Projection] A seq. specification  over is closed under data-projection2 iff for all and all , .

For we have , but , i.e., is not closed under data-projection.

Proposition 2 (Lin 2).

Linearizability implies local linearizability for sequential specifications that are closed under data-projection.

Proof (Sketch).

The property follows from Definition 1 and Equation (1). ∎

There exist corner cases where local linearizability coincides with linearizability, e.g., for or , or for single-producer/multiple-consumer histories.

We now turn our attention to pool, queue, and stack.

Proposition 3.

The seq. specifications , , and are closed under data-projection.

Proof (Sketch).

Let , , and let . Then, it suffices to check that all axioms for pool (Definition 1 and Table 1) hold for . ∎

{theorem}

[Pool & Queue & Stack, Lin] For pool, queue, and stack, local linearizability is (strictly) weaker than linearizability.

Proof.

Linearizability implies local linearizability for pool, queue, and stack as a consequence of Proposition 2 and Proposition 3. The history in Figure 3 is locally linearizable but not linearizable wrt pool, queue and stack (after suitable renaming of method calls). ∎

Although local linearizability wrt a pool does not imply linearizability wrt a pool (Theorem 3), it still guarantees several properties that ensure sane behavior as stated next.

Proposition 4 (LocLin Pool).

Let be a locally linearizable history wrt a pool. Then:
1. No value is duplicated, i.e., every remove method appears in at most once. 2. No out-of-thin-air values, i.e., 3. No value is lost, i.e., and .

Proof.

By direct unfolding of the definitions. ∎

Note that if a history is linearizable wrt a pool, then all of the three stated properties hold, as a consequence of linearizability and the definition of .

5 Local Linearizability vs. Other Relaxed Consistency Conditions

We compare local linearizability with other classical consistency conditions to better understand its guarantees and implications.

Figure 2: LL, not SC (Pool, Queue, Stack)

Figure 3: SC, not LL (Pool, Queue, Stack)

Sequential Consistency (SC).

A history  is sequentially consistent [25, 30] wrt a sequential specification , if there exists a sequential history  and a completion such that (1) is a permutation of , and (2) preserves each thread’s program order, i.e., if , for some thread , then . We refer to as a sequential witness of . A data structure  is sequentially consistent wrt  if every history  of is sequentially consistent wrt .

Sequential consistency is a useful consistency condition for shared memory but it is not really suitable for data structures as it allows for behavior that excludes any coordination between threads [39]: an implementation of a data structure in which every thread uses a dedicated copy of a sequential data structure without any synchronization is sequentially consistent. A sequentially consistent queue might always return empty in one (consumer) thread as the point in time of the operation can be moved, e.g., see Figure 3. In a producer-consumer scenario such a queue might end up with some threads not doing any work.

{theorem}

[Pool, Queue & Stack, SC] For pool, queue, and stack, local linearizability is incomparable to sequential consistency.∎

Figures 3 and 3 give example histories that show the statement of Theorem 5. In contrast to local linearizability, sequential consistency is not compositional [25].

(Quantitative) Quiescent Consistency (QC & QQC).

Like linearizability and sequential consistency, quiescent consistency [13, 25] also requires the existence of a sequential history, a quiescent witness, that satisfies the sequential specification. All three consistency conditions impose an order on the method calls of a concurrent history that a witness has to preserve. Quiescent consistency uses the concept of quiescent states to relax the requirement of preserving the precedence order imposed by linearizability. A quiescent state is a point in a history at which there are no pending invocation events (all invoked method calls have already responded). In a quiescent witness, a method call  has to appear before a method call  if and only if there is a quiescent state between and . Method calls between two consecutive quiescent states can be ordered arbitrarily. Quantitative quiescent consistency [27] refines quiescent consistency by bounding the number of reorderings of operations between two quiescent states based on the concurrent behavior between these two states.

The next result about quiescent consistency for pool is needed to establish the connection between quiescent consistency and local linearizability.

Proposition 5.

A pool history  satisfying 1.-3. of Prop. 4 is quiescently consistent. ∎

From Prop. 4 and 5 follows that local linearizability implies quiescent consistency for pool.

{theorem}

[Pool, Queue & Stack, QC] For pool, local linearizability is (strictly) stronger than quiescent consistency. For queue and stack, local linearizability is incomparable to quiescent consistency.∎

Local linearizability also does not imply the stronger condition of quantitative quiescent consistency. Like local linearizability, quiescent consistency and quantitative quiescent consistency are compositional [25, 27]. For details, please see Appendix D.

Consistency Conditions for Distributed Shared Memory.

There is extensive research on consistency conditions for distributed shared memory [3, 4, 8, 16, 20, 22, 30, 31, 32]. In Appendix E, we compare local linearizability against coherence, PRAM consistency, processor consistency, causal consistency, and local consistency. All these conditions split a history into subhistories and require consistency of the subhistories. For our comparison, we first define a sequential specification  for a single memory location. We assume that each memory location is preinitialized with a value . A read-operation returns the value of the last write-operation that was performed on the memory location or if there was no write-operation. We denote write-operations by ins and read-operations by head. Formally, we define as . Note that read-operations are data observations and the same value can be read multiple times. For brevity, we only consider histories that involve a single memory location. In the following, we summarize our comparison. For details, please see Appendix E.

Figure 4: Problematic shared-memory history.

While local linearizability is well-suited for concurrent data structures, this is not necessarily true for the mentioned shared-memory consistency conditions. On the other hand, local linearizability appears to be problematic for shared memory. Consider the locally linearizable history in Figure 4. There, the read values oscillate between different values that were written by different threads. Therefore, local linearizability does not imply any of the shared-memory consistency conditions. In Appendix E, we further show that local linearizability is incomparable to all considered shared-memory conditions.

6 Locally Linearizable Implementations

In this section, we focus on locally linearizable data structure implementations that are generic as follows: Choose a linearizable implementation of a data structure wrt a sequential specification , and we turn it into a (distributed) data structure called LLD that is locally linearizable wrt . An LLD implementation takes several copies of (that we call backends) and assigns to each thread a backend . Then, when thread inserts an element into LLD , the element is inserted into , and when an arbitrary thread removes an element from LLD , the element is removed from some eagerly, i.e., if no element is found in the attempted backend the search for an element continues through all other backends. If no element is found in one round through the backends, then we return empty.

Proposition 6 (LLD correctness).

Let be a data structure implementation that is linearizable wrt a sequential specification . Then LLD is locally linearizable wrt .

Proof.

Let be a history of LLD . The crucial observation is that each thread-induced history  is a backend history of and hence linearizable wrt . ∎

Any number of copies (backends) is allowed in this generic implementation of LLD . If we take just one copy, we end up with a linearizable implementation. Also, any way of choosing a backend for removals is fine. However, both the number of backends and the backend selection strategy upon removals affect the performance significantly. In our LLD implementations we use one backend per thread, resulting in no contention on insertions, and always attempt a local remove first. If this does not return an element, then we continue a search through all other backends starting from a randomly chosen backend.

LLD is an implementation closely related to Distributed Queues (DQs) [18]. A DQ is a (linearizable) pool that is organized as a single segment of length holding backends. DQs come in different flavours depending on how insert and remove methods are distributed across the segment when accessing backends. No DQ variant in [18] follows the LLD approach described above. Moreover, while DQ algorithms are implemented for a fixed number of backends, LLD implementations manage a segment of variable size, one backend per (active) thread. Note that the strategy of selecting backends in the LLD implementations is similar to other work in work stealing [35]. However, in contrast to this work our data structures neither duplicate nor lose elements. LLD (stack) implementations have been successfully applied for managing free lists in the fast and scalable memory allocator scalloc [5]. The guarantees provided by local linearizability are not needed for the correctness of scalloc, i.e., the free lists could also use a weak pool (pool without a linearizable emptiness check). However, the LLD stack implementations provide good caching behavior when threads operate on their local stacks whereas a weak pool would potentially negatively impact performance.

We have implemented LLD variants of strict and relaxed queue and stack implementations. None of our implementations involves observation methods, but the LLD algorithm can easily be extended to support observation methods. For details, please see App. LABEL:sec:lld_and_observers. Finally, let us note that we have also experimented with other locally linearizable implementations that lacked the genericity of the LLD implementations, and whose performance evaluation did not show promising results (see App. LABEL:sec:implementations). As shown in Sec. 4, a locally linearizable pool is not a linearizable pool, i.e., it lacks a linearizable emptiness check. Indeed, LLD implementations do not provide a linearizable emptiness check, despite of eager removes. We provide LL, a variant of LLD , that provides a linearizable emptiness check under mild conditions on the starting implementation  (see App. LABEL:sec:llplusd for details).

Experimental Evaluation.

All experiments ran on a uniform memory architecture (UMA) machine with four -core 2GHz Intel Xeon E7-4850 processors supporting two hardware threads (hyperthreads) per core, 128GB of main memory, and Linux kernel version 3.8.0. We also ran the experiments without hyper-threading resulting in no noticeable difference. The CPU governor has been disabled. All measurements were obtained from the artifact-evaluated Scal benchmarking framework [12, 19, 11], where you can also find the code of all involved data structures. Scal uses preallocated memory (without freeing it) to avoid memory management artifacts. For all measurements we report the arithmetic mean and the confidence interval (sample size=10, corrected sample standard deviation).

In our experiments, we consider the linearizable queues Michael-Scott queue (MS) [34] and LCRQ [36] (improved version [37]), the linearizable stacks Treiber stack (Treiber) [42] and TS stack [14], the -out-of-order relaxed -FIFO queue [28] and -Stack [23] and linearizable well-performing pools based on distributed queues using random balancing [18] (1-RA DQ for queue, and 1-RA DS for stack). For each of these implementations (but the pools) we provide LLD variants (LLD LCRQ, LLD TS stack, LLD -FIFO, and LLD -Stack) and, when possible, LLD variants (LLD MS queue and LLD Treiber stack). Making the pools locally linearizable is not promising as they are already distributed. Whenever LLD is achievable for a data structure implementation  we present only results for LL as, in our workloads, LLD  and LL implementations perform with no visible difference.

We evaluate the data structures on a Scal producer-consumer benchmark where each producer and consumer is configured to execute operations. To control contention, we add a busy wait of between operations. This is important as too high contention results in measuring hardware or operating system (e.g., scheduling) artifacts. The number of threads ranges between and (number of hardware threads) half of which are producers and half consumers. To relate performance and scalability we report the number of data structure operations per second. Data structures that require parameters to be set are configured to allow maximum parallelism for the producer-consumer workload with 80 threads. This results in for all -FIFO and -Stack variants (40 producers and 40 consumers in parallel on a single segment), for 1-RA-DQ and 1-RA-DS (40 producers and 40 consumers in parallel on different backends). The TS Stack algorithm also needs to be configured with a delay parameter. We use optimal delay () for the TS Stack and zero delay for the LLD TS Stack, as delays degrade the performance of the LLD implementation.

0246810121416182022242621020304050607080number of threadsMSLCRQk-FIFOLL+D MSLLD LCRQLLD k-FIFO1-RA DQ

“queue-like” data structures


0246810121416182022242621020304050607080number of threadsTreiberTS Stackk-StackLL+D TreiberLLD TS StackLLD k-Stack1-RA DS

“stack-like” data structures

Figure 5: Performance and scalability of producer-consumer microbenchmarks with an increasing number of threads on a 40-core (2 hyperthreads per core) machine

Figure 5 shows the results of the producer-consumer benchmarks. Similar to experiments performed elsewhere [14, 23, 28, 36] the well-known algorithms MS and Treiber do not scale for 10 or more threads. The state-of-the-art linearizable queue and stack algorithms LCRQ and TS-interval Stack either perform competitively with their -out-of-order relaxed counter parts -FIFO and -Stack or even outperform and outscale them. For any implementation , LLD  and LL (when available) perform and scale significantly better than does, even slightly better than the state-of-the-art pool that we compare to. The best improvement show LLD variants of MS queue and Treiber stack. The speedup of the locally linearizable implementation to the fastest linearizable queue (LCRQ) and stack (TS Stack) implementation at 80 threads is 2.77 and 2.64, respectively. The performance degradation for LCRQ between 30 and 70 threads aligns with the performance of fetch-and-inc—the CPU instruction that atomically retrieves and modifies the contents of a memory location—on the benchmarking machine, which is different on the original benchmarking machine [36]. LCRQ uses fetch-and-inc as its key atomic instruction.

7 Conclusion & Future Work

Local linearizability splits a history into a set of thread-induced histories and requires consistency of all such. This yields an intuitive consistency condition for concurrent objects that enables new data structure implementations with superior performance and scalability. Local linearizability has desirable properties like compositionality and well-behavedness for container-type data structures. As future work, it is interesting to investigate the guarantees that local linearizability provides to client programs along the line of [15].

Acknowledgments

This work has been supported by the National Research Network RiSE on Rigorous Systems Engineering (Austrian Science Fund (FWF): S11402-N23, S11403-N23, S11404-N23, S11411-N23), a Google PhD Fellowship, an Erwin Schrödinger Fellowship (Austrian Science Fund (FWF): J3696-N26), EPSRC grants EP/H005633/1 and EP/K008528/1, the Vienna Science and Technology Fund (WWTF) trough grant PROSEED, the European Research Council (ERC) under grant 267989 (QUAREM) and by the Austrian Science Fund (FWF) under grant Z211-N23 (Wittgenstein Award).

Appendix A Local Linearizability with Shape Observers

There are two possible ways to deal with shape observers: treat them locally, in the thread-induced history of the performing thread, or treat them globally. While a local treatment is immediate and natural to a local consistency condition, a global treatment requires care. We present both solutions next.

{definition}

[Local Linearizability LSO] A history is locally linearizable with local shape observers (LSO) wrt a sequential specification  if it is locally linearizable according to Definition 3 with the difference that the in-methods (Definition 3) also contain all shape observers performed by thread , i.e., .

Global observations require more notation and auxiliary notions. Let for be a collection of sequences over alphabet with pairwise disjoint sets of symbols . A sequence is an interleaving of for if and for all . We write for the set of all interleavings of with .

Given a history and a method call , we write for the (incomplete) history that is the prefix of up to and without , the response event of . Hence, contains all invocation and response events of that appear before .

{definition}

Let denote the sequential specification of a container . A shape observer in a history has a witness if there exists a sequence such that and for some that is a linearization of the thread-induced history .

Informally, the above definition states that a global shape observer must be justified by a (global) witness. Such a global witness is a sequence that (1) when extended by belongs to the sequential specification, and (2) is an interleaving of linearizations of the thread-induced histories up to .

{definition}

[Local Linearizability GSO] A history is locally linearizable with global shape observers (GSO) wrt a sequential specification  if it is locally linearizable and each shape observer has a witness.

We illustrate the difference in the local vs. the global approach for shape observers with the following example. {example} Consider the following queue history with global observer

where is just a placeholder for a concrete natural number. For , the history is locally linearizable LSO, but not locally linearizable GSO. For , the history is locally linearizable GSO, but not locally linearizable LSO.

Global observers and non-disjoint operations are expected to have negative impact on performance. If one cares for global consistency, local linearizability is not the consistency condition to be used. The restriction to containers and disjoint operations specifies, in an informal way, the minimal requirements for local consistency to be acceptable.

Neither sets nor maps are containers according to our definition. However, it is possible to extend our treatment to sets and maps similar to our treatment of global observers. Locally linearizable sets and maps will be weaker than their linearizable counterparts, but, due to the tight coupling between mutator and observer effects, the gain in performance is unlikely to be as substantial as the one observed in other data structures. The technicalities needed to extend local linearizability to sets and maps would complicate the theoretical development without considerable benefits and we, therefore, excluded such data structures.

Appendix B Additional Results and Proofs

Theorem 3 (Compositionality). A history over a set of objects  with sequential specifications  for is locally linearizable if and only if is locally linearizable with respect to  for every .

Proof.

The property follows from the compositionality of linearizability and the fact that for every thread  and object . Assume that over is locally linearizable. This means that all thread-induced histories  over are linearizable. Hence, since linearizability is compositional, for each object  the history  is linearizable with respect to . Now from we have that for every object  the history  is linearizable for every thread .

Similarly, assume that for every object  the history  is locally linearizable. Then, for every , is linearizable for every thread . From the compositionality of linearizability, is linearizable for every thread . This proves that is locally linearizable. ∎

Proposition 2 (Lin vs. LocLin 2). Linearizability implies local linearizability for sequential specifications that are closed under data-projection.

Proof.

Assume we are given a history  which is linearizable with respect to a sequential specification  that is closed under data-projection. Further assume that, without loss of generality, is complete. Then there exists a sequential history  such that (1) is a permutation of , and (2) if , then also . Given a thread , consider the thread-induced history and let . Then, is a permutation of since and consist of the same events. Furthermore, since is closed under data-projection and since Equation (1) holds for containers. Finally, we have for each and that, if , then also since and therefore which implies . Thereby, we have shown that is linearizable with respect to , for an arbitrary thread . Hence is locally linearizable with respect to . ∎

Proposition 3 (Data-Projection Closedness). The sequential specifications of pool, queue, and stack are closed under data-projection.

Proof.

Let , , and let

Then, it suffices to check that all axioms for pool (Definition 1 and Table 1) hold for . Clearly, all methods in appear at most once, as they do so in . If , then and, since , . But then also and hence . Finally, if for , then implying that and . But then as well and . This shows that is closed under data-projection.

Assume now that and is as before (with and for and , respectively). Then, as is closed under data-projection, satisfies the pool axioms. Moreover, the queue-order axiom (Definition 1 and Table 1) also holds: Assume and . Then and . Since we get and . But this means and . Hence, is closed under data-projection.

Finally, if and is as before (with and for and , respectively), we need to check that the stack-order axiom (Definition 1 and Table 1) holds. Assume . This implies and since we get and . But then and . So, is closed under data-projection. ∎

Proposition 4 (LocLin Pool). Let be a locally linearizable history wrt a pool. Then:

  1. No value is duplicated, i.e., every remove method appears in at most once.

  2. There are no out-of-thin-air values, i.e.,

  3. No value is lost, i.e., and

Proof.

Note that if a history is linearizable wrt a pool, then all of the three stated properties hold, as a consequence of linearizability and the definition of . Now assume that is locally linearizable wrt a pool.

If appears twice in , then it also appears twice in some thread-induced history contradicting that is linearizable with respect to a pool. This shows that no value is duplicated.

If , then for some and, since is linearizable with respect to a pool, and . This yields and . Hence, there are no thin-air values.

Finally, if for then for all . Let and let be such that . Then and since is linearizable with respect to a pool, and . This yields and . Similarly, the other condition holds. Hence, no value is lost. ∎

Theorem LABEL:thm:queue (Queue Local Linearizability). A queue concurrent history is locally linearizable with respect to the queue sequential specification if and only if

  1. is locally linearizable with respect to the pool sequential specification , and

  2. .

Proof.

Assume is locally linearizable with respect to . Since (with suitably renamed method calls), is locally linearizable with respect to . Moreover, since all are linearizable with respect to , by Theorem LABEL:thm:queue-lin, for all we have

Assume are such that and . Then and so and . This implies and .

For the opposite, assume that conditions 1. and 2. hold for a history . We need to show that (1) form a decomposition of , which is clear for a queue, and (2) each is linearizable with respect to .

By 1., each is linearizable with respect to a pool. Assume and . Then and hence by 2., . Again, as we get . According to Theorem LABEL:thm:queue-lin this is enough to conclude that each is linearizable with respect to . ∎

Theorem 5 (Pool, Queue, & Stack, SC). For pool, queue, and stack, local linearizability is incomparable to sequential consistency.

Proof.

The following histories, when instantiating with , , and , respectively, and instantiating with , , and , respectively, are sequentially consistent but not locally linearizable wrt pool, queue and stack:

  • Pool:

  • Queue:

  • Stack:

History (a) is already not locally linearizable wrt pool, queue, and stack, respectively, histories (b) and (c) provide interesting examples. The history in Figure 3 is locally linearizable but not sequentially consistent wrt a pool. The following histories are locally linearizable but not sequentially consistent wrt a queue and a stack, respectively:

  1. Queue:

The two thread-induced histories and are both linearizable with respect to a queue. However, the overall history has no sequential witness and is therefore not sequentially consistent: To maintain the queue behavior, the order of operations and cannot be changed. However, this implies that the value  instead of the value  would have to be removed directly after .

  1. Stack:

The two thread-induced histories and are both linearizable with respect to a stack. The operations and prevent the reordering of operations and . Therefore, the overall history has no sequential witness and hence it is not sequentially consistent.∎

Proposition 5 (Pool, QC). Let be a pool history in which no data is duplicated, no thin-air values are returned, and no data is lost, i.e., satisfies 1.-3. of Proposition 4. Then is quiescently consistent.

Proof.

Assume is a pool history that satisfies 1.-3. of Proposition 4. Let be histories that form a sequential decomposition of . That is and the only quiescent states in any are at the beginning and at the end of it. Note that this decomposition has nothing to do with a thread-local decomposition. Let be the set of methods of , for . Note that the sanity conditions 1.-3. ensure that none of the following two situations can happen:

  • ,

  • ,

Let