Local Linearizability^{1}
Abstract
The semantics of concurrent data structures is usually given by a sequential specification and a consistency condition. Linearizability is the most popular consistency condition due to its simplicity and general applicability. Nevertheless, for applications that do not require all guarantees offered by linearizability, recent research has focused on improving performance and scalability of concurrent data structures by relaxing their semantics.
In this paper, we present local linearizability, a relaxed consistency condition that is applicable to containertype concurrent data structures like pools, queues, and stacks. While linearizability requires that the effect of each operation is observed by all threads at the same time, local linearizability only requires that for each thread T, the effects of its local insertion operations and the effects of those removal operations that remove values inserted by T are observed by all threads at the same time. We investigate theoretical and practical properties of local linearizability and its relationship to many existing consistency conditions. We present a generic implementation method for locally linearizable data structures that uses existing linearizable data structures as building blocks. Our implementations show performance and scalability improvements over the original building blocks and outperform the fastest existing containertype implementations.
1]Andreas Haas 2]Thomas A. Henzinger 3]Andreas Holzer 4]Christoph M. Kirsch 1]Michael Lippautz 1]Hannes Payer 5]Ali Sezgin 4]Ana Sokolova 6,7]Helmut Veith 1]Google Inc. 2]IST Austria, Austria 3]University of Toronto, Canada 4]University of Salzburg, Austria 5]University of Cambridge, UK 6]Vienna University of Technology, Austria 7]Forever in our hearts
D.3.1 [Programming Languages]: Formal Definitions and Theory—Semantics; E.1 [Data Structures]: Lists, stacks, and queues; D.1.3 [Software]: Programming Techniques—Concurrent Programming
1 Introduction
Concurrent data structures are pervasive all along the software stack, from operating system code to application software and beyond. Both correctness and performance are imperative for concurrent data structure implementations. Correctness is usually specified by relating concurrent executions, admitted by the implementation, with sequential executions, admitted by the sequential version of the data structure. The latter form the sequential specification of the data structure. This relationship is formally captured by consistency conditions, such as linearizability, sequential consistency, or quiescent consistency [25].
Linearizability [26] is the most accepted consistency condition for concurrent data structures due to its simplicity and general applicability. It guarantees that the effects of all operations by all threads are observed consistently. This global visibility requirement imposes the need of extensive synchronization among threads which may in turn jeopardize performance and scalability. In order to enhance performance and scalability of implementations, recent research has explored relaxed sequential specifications [23, 40, 2], resulting in wellperforming implementations of concurrent data structures [2, 18, 23, 28, 38, 6]. Except for [27], the space of alternative consistency conditions that relax linearizability has been left unexplored to a large extent. In this paper, we explore (part of) this gap by investigating local linearizability, a novel consistency condition that is applicable to a large class of concurrent data structures that we call containertype data structures, or containers for short. Containers include pools, queues, and stacks. A finegrained spectrum of consistency conditions enables us to describe the semantics of concurrent implementations more precisely, e.g., we show in our appendix that work stealing queues [35] which could only be proven to be linearizable wrt pool are actually locally linearizable wrt doubleended queue.
The threadinduced history of thread is enclosed by a dashed line while the threadinduced history of thread is enclosed by a solid line. 
Local linearizability is a (thread)local consistency condition that guarantees that insertions per thread are observed consistently. While linearizability requires a consistent view over all insertions, we only require that projections of the global history—so called threadinduced histories—are linearizable. The induced history of a thread is a projection of a program execution to the insertoperations in combined with all removeoperations that remove values inserted by irrespective of whether they happen in or not. Then, the program execution is locally linearizable iff each threadinduced history is linearizable. Consider the example (sequential) history depicted in Figure 1. It is not linearizable wrt a queue since the values are not dequeued in the same order as they were enqueued. However, each threadinduced history is linearizable wrt a queue and, therefore, the overall execution is locally linearizable wrt a queue. In contrast to semantic relaxations based on relaxing sequential semantics such as [23, 2], local linearizability coincides with sequential correctness for singlethreaded histories, i.e., a singlethreaded and, therefore, sequential history is locally linearizable wrt a given sequential specification if and only if it is admitted by the sequential specification.
Local linearizability is to linearizability what coherence is to sequential consistency. Coherence [22], which is almost universally accepted as the absolute minimum that a shared memory system should satisfy, is the requirement that there exists a unique global order per shared memory location. Thus, while all accesses by all threads to a given memory location have to conform to a unique order, consistent with program order, the relative ordering of accesses to multiple memory locations do not have to be the same. In other words, coherence is sequential consistency per memory location. Similarly, local linearizability is linearizability per local history. In our view, local linearizability offers enough consistency for the correctness of many applications as it is the local view of the client that often matters. For example, in a locally linearizable queue each client (thread) has the impression of using a perfect queue—no reordering will ever be observed among the values inserted by a single thread. Such guarantees suffice for many ecommerce and cloud applications. Implementations of locally linearizable data structures have been successfully applied for managing free lists in the design of the fast and scalable memory allocator scalloc [5]. Moreover, except for fairness, locally linearizable queues guarantee all properties required from Dispatch Queues [1], a common concurrency programming mechanism on mobile devices.
In this paper, we study theoretical and practical properties of local linearizability. Local linearizability is compositional—a history over multiple concurrent objects is locally linearizable iff all perobject histories are locally linearizable (see Thm. 3) and locally linearizable containertype data structures, including queues and stacks, admit only “sane” behaviours—no duplicated values, no values returned from thin air, and no values lost (see Prop. 4). Local linearizability is a weakening of linearizability for a natural class of data structures including pools, queues, and stacks (see Sec. 4). We compare local linearizability to linearizability, sequential, and quiescent consistency, and to many sharedmemory consistency conditions.
Finally, local linearizability leads to new efficient implementations. We present a generic implementation scheme that, given a linearizable implementation of a sequential specification , produces an implementation that is locally linearizable wrt (see Sec. 6). Our implementations show dramatic improvements in performance and scalability. In most cases the locally linearizable implementations scale almost linearly and even outperform stateoftheart pool implementations. We produced locally linearizable variants of stateoftheart concurrent queues and stacks, as well as of the relaxed data structures from [23, 28]. The latter are relaxed in two dimensions: they are locally linearizable (the consistency condition is relaxed) and are outoforderrelaxed (the sequential specification is relaxed). The speedup of the locally linearizable implementation to the fastest linearizable queue (LCRQ) and stack (TS Stack) implementation at 80 threads is 2.77 and 2.64, respectively. Verification of local linearizability, i.e. proving correctness, for each of our new locally linearizable implementations is immediate, given that the starting implementations are linearizable.
2 Semantics of Concurrent Objects
The common approach to define the semantics of an implementation of a concurrent data structure is (1) to specify a set of valid sequential behaviors—the sequential specification, and (2) to relate the admissible concurrent executions to sequential executions specified by the sequential specification—via the consistency condition. That means that an implementation of a concurrent data structure actually corresponds to several sequential data structures, and vice versa, depending on the consistency condition used. A (sequential) data structure is an object with a set of method calls . We assume that method calls include parameters, i.e., input and output values from a given set of values. The sequential specification of is a prefixclosed subset of . The elements of are called valid sequences. For ease of presentation, we assume that each value in a data structure can be inserted and removed at most once. This is without loss of generality, as we may see the set of values as consisting of pairs of elements (core values) and version numbers, i.e. . Note that this is a technical assumption that only makes the presentation and the proofs simpler, it is not needed and not done in locally linearizable implementations. While elements may be inserted and removed multiple times, the version numbers provide uniqueness of values. Our assumption ensures that whenever a sequence is part of a sequential specification , then, each method call in appears exactly once. An additional core value, that is not an element, is empty. It is returned by remove method calls that do not find an element to return. We denote by Emp the set of values that are versions of empty, i.e., .
[Appearsbefore Order, Appearsin Relation] Given a sequence in which each method call appears exactly once, we denote by the total appearsbefore order over method calls in . Given a method call , we write for appears in .
Throughout the paper, we will use pool, queue, and stack as typical examples of containers. We specify their sequential specifications in an axiomatic way [24], i.e., as sets of axioms that exactly define the valid sequences.
(1)  

(2)  
(3)  
(4)  
(5) 
[Pool, Queue, & Stack] A pool, queue, and stack with values in a set have the sets of methods , , and , respectively. We denote the sequential specification of a pool by , the sequential specification of a queue by , and the sequential specification of a stack by . A sequence belongs to iff it satisfies axioms (1)  (3) in Table 1—the pool axioms—when instantiating ) with ) and ) with ). We keep axiom (1) for completeness, although it is subsumed by our assumption that each value is inserted and removed at most once. Specification contains all sequences that satisfy the pool axioms and axiom (4)—the queue order axiom—after instantiating ) with ) and ) with ). Finally, contains all sequences that satisfy the pool axioms and axiom (5)—the stack order axiom—after instantiating ) with ) and ) with ).
We represent concurrent executions via concurrent histories. An example history is shown in Figure 1. Each thread executes a sequence of method calls from ; method calls executed by different threads may overlap (which does not happen in Figure 1). The realtime duration of method calls is irrelevant for the semantics of concurrent objects; all that matters is whether method calls overlap. Given this abstraction, a concurrent history is fully determined by a sequence of invocation and response events of method calls. We distinguish method invocation and response events by augmenting the alphabet. Let and denote the sets of methodinvocation events and methodresponse events, respectively, for the method calls in . Moreover, let be the set of thread identifiers. Let and denote the sets of methodinvocation and response events augmented with identifiers of executing threads. For example, is the invocation of method call by thread . Before we proceed, we mention a standard notion that we will need in several occasions.
[Projection] Let be a sequence over alphabet and . By we denote the projection of on the symbols in , i.e., the sequence obtained from by removing all symbols that are not in .
[History] A (concurrent) history is a sequence in where (1) no invocation or response event appears more than once, i.e., if and and , for , then and , and (2) if a response event appears in , then the corresponding invocation event also appears in and .
A queue history (left) and its formal representation as a sequence (right):
A history is sequential if every response event is immediately preceded by its matching invocation event and vice versa. Hence, we may ignore thread identifiers and identify a sequential history with a sequence in , e.g., identifies the sequential history in Figure 1.
A history is wellformed if is sequential for every thread identifier where denotes the projection of on the set of events that are local to thread . From now on we will use the term history for wellformed history. Also, we may omit thread identifiers if they are not essential in a discussion.
A history determines a partial order on its set of method calls, the precedence order: {definition}[Appearsin Relation, Precedence Order] The set of method calls of a history is . A method call appears in , notation , if . The precedence order for is the partial order such that, for , we have that iff . By we denote , the subset of the precedence order that relates pairs of method calls of thread , i.e., the program order of thread .
We can characterize a sequential history as a history whose precedence order is total. In particular, the precedence order of a sequential history coincides with its appearsbefore order . The total order for history in Fig. 1 is .
[Projection to a set of method calls] Let be a history, , , and . Then, we write for .
Note that inherits ’s precedence order:
A history is complete if the response of every invocation event in appears in . Given a history , Complete() denotes the set of all completions of , i.e., the set of all complete histories that are obtained from by appending missing response events and/or removing pending invocation events. Note that iff is a complete history.
A concurrent data structure over a set of methods is a (prefixclosed) set of concurrent histories over . A history may involve several concurrent objects. Let be a set of concurrent objects with individual sets of method calls and sequential specifications for each object . A history over is a history over the (disjoint) union of method calls of all objects in , i.e., it has a set of method calls . The added prefix ensures that the union is disjoint. The projection of to an object , denoted by , is the history with a set of method calls obtained by removing the prefix in every method call in .
[Linearizability [26]] A history is linearizable wrt the sequential specification if there is a sequential history and a completion such that (1) is a permutation of , and (2) preserves the precedence order of , i.e., if , then . We refer to as a linearization of . A concurrent data structure is linearizable wrt if every history of is linearizable wrt . A history over a set of concurrent objects is linearizable wrt the sequential specifications for if there exists a linearization of such that for each object .
3 Local Linearizability
Local linearizability is applicable to containers whose set of method calls is a disjoint union of insertion method calls Ins, removal method calls Rem, dataobservation method calls DOb, and (global) shapeobservation method calls SOb. Insertions (removals) insert (remove) a single value in the data set or empty; data observations return a single value in ; shape observations return a value (not necessarily in ) that provides information on the shape of the state, for example, the size of a data structure. Examples of data observations are (queue), (stack), and (pool). Examples of shape observations are that returns true if the data structure is empty and false otherwise, and that returns the number of elements in the data structure.
Even though we refrain from formal definitions, we want to stress that a valid sequence of a container remains valid after deleting observer method calls:
(1) 
There are also containers with multiple insert/remove methods, e.g., a doubleended queue (deque) is a container with insertleft, insertright, removeleft, and removeright methods, to which local linearizability is also applicable. However, local linearizability requires that each method call is either an insertion, or a removal, or an observation. As a consequence, set is not a container according to our definition, as in a set acts as a global observer first, checking whether (some version of) is already in the set, and if not inserts . Also hash tables are not containers for a similar reason.
Note that the arity of each method call in a container being one excludes data structures like snapshot objects. It is possible to deal with higher arities in a fairly natural way, however, at the cost of complicated presentation. We chose to present local linearizability on simple containers only. We present the definition of local linearizability without shape observations here and discuss shape observations in Appendix A.
[In and outmethods]
Let be a container history.
For each thread we define two subsets of the methods in , called inmethods and outmethods of thread , respectively:
=
=
.
Hence, the inmethods for thread are all insertions performed by . The outmethods are all removals and data observers that return values inserted by . Removals that remove the value empty are also automatically added to the outmethods of as any thread (and hence also ) could be the cause of “inserting” empty. This way, removals of empty serve as means for global synchronization. Without them each thread could perform all its operations locally without ever communicating with the other threads. Note that the outmethods of thread need not be performed by , but they return values that are inserted by .
[Threadinduced History] Let be a history. The threadinduced history is the projection of to the in and outmethods of thread , i.e., .
[Local Linearizability] A history is locally linearizable wrt a sequential specification if (1) each threadinduced history is linearizable wrt , and (2) the threadinduced histories form a decomposition of , i.e., for some thread . A data structure is locally linearizable wrt if every history of is locally linearizable wrt . A history over a set of concurrent objects is locally linearizable wrt the sequential specifications for if each threadinduced history is linearizable over and the threadinduced histories form a decomposition of , i.e., for some thread .
Local linearizability is sequentially correct, i.e., a singlethreaded (necessarily sequential) history is locally linearizable wrt a sequential specification iff . Like linearizability [25], local linearizability is compositional. The complete proof of the following theorem and missing or extended proofs of all following properties can be found in Appendix B.
[Compositionality] A history over a set of objects with sequential specifications for is locally linearizable iff is locally linearizable wrt for every .
Proof (Sketch).
The property follows from the compositionality of linearizability and the fact that for every thread and object . ∎
The Choices Made.
Splitting a global history into subhistories and requiring consistency for each of them is central to local linearizability. While this is common in sharedmemory consistency conditions [22, 31, 32, 3, 16, 4, 20], our study of local linearizability is a first step in exploring subhistorybased consistency conditions for concurrent objects.
We chose threadinduced subhistories since threadlocality reduces contention in concurrent objects and is known to lead to high performance as confirmed by our experiments. To assign method calls to threadinduced histories, we took a datacentric point of view by (1) associating data values to threads, and (2) gathering all method calls that insert/return a data value into the subhistory of the associated thread (Def. 3). We associate data values to the thread that inserts them. One can think of alternative approaches, for example, associate with a thread the values that it removed. In our view, the advantages of our choice are clear: First, by assigning inserted values to threads, every value in the history is assigned to some thread. In contrast, in the alternative approach, it is not clear where to assign the values that are inserted but not removed. Second, assigning inserted values to the inserting thread enables eager removals and ensures progress in locally linearizable data structures. In the alternative approach, it seems like the semantics of removing empty should be local.
An orthogonal issue is to assign values from shape observations to threads. In Appendix A, we discuss two meaningful approaches and show how local linearizability can be extended towards shape and data observations that appear in insertion operations of sets.
Finally, we have to choose a consistency condition required for each of the subhistories. We chose linearizability as it is the best (strong) consistency condition for concurrent objects.
4 Local Linearizability vs. Linearizability
We now investigate the connection between local linearizability and linearizability.
Proposition 1 (Lin 1).
In general, linearizability does not imply local linearizability.
Proof.
We provide an example of a data structure that is linearizable but not locally linearizable.
Consider a sequential specification which behaves like a
queue except when the first two insertions were
performed without a removal in between—then the first two elements
are removed out of order.
Formally, iff
(1) where
and for some , or
(2) and for and .
The example below is linearizable wrt .
However, ’s induced history
is not.
∎
The following condition on a data structure specification is sufficient for linearizability to imply local linearizability and is satisfied, e.g., by pool, queue, and stack.
[Closure under DataProjection]
A seq. specification over is closed under dataprojection
For we have , but , i.e., is not closed under dataprojection.
Proposition 2 (Lin 2).
Linearizability implies local linearizability for sequential specifications that are closed under dataprojection.
There exist corner cases where local linearizability coincides with linearizability, e.g., for or , or for singleproducer/multipleconsumer histories.
We now turn our attention to pool, queue, and stack.
Proposition 3.
The seq. specifications , , and are closed under dataprojection.
Proof (Sketch).
[Pool & Queue & Stack, Lin] For pool, queue, and stack, local linearizability is (strictly) weaker than linearizability.
Proof.
Although local linearizability wrt a pool does not imply linearizability wrt a pool (Theorem 3), it still guarantees several properties that ensure sane behavior as stated next.
Proposition 4 (LocLin Pool).
Let be a locally linearizable history wrt a pool. Then:
1.
No value is duplicated, i.e., every remove method appears in at most once.
2.
No outofthinair values, i.e.,
3.
No value is lost, i.e., and
.
Proof.
By direct unfolding of the definitions. ∎
Note that if a history is linearizable wrt a pool, then all of the three stated properties hold, as a consequence of linearizability and the definition of .
5 Local Linearizability vs. Other Relaxed Consistency Conditions
We compare local linearizability with other classical consistency conditions to better understand its guarantees and implications.
Sequential Consistency (SC).
A history is sequentially consistent [25, 30] wrt a sequential specification , if there exists a sequential history and a completion such that (1) is a permutation of , and (2) preserves each thread’s program order, i.e., if , for some thread , then . We refer to as a sequential witness of . A data structure is sequentially consistent wrt if every history of is sequentially consistent wrt .
Sequential consistency is a useful consistency condition for shared memory but it is not really suitable for data structures as it allows for behavior that excludes any coordination between threads [39]: an implementation of a data structure in which every thread uses a dedicated copy of a sequential data structure without any synchronization is sequentially consistent. A sequentially consistent queue might always return empty in one (consumer) thread as the point in time of the operation can be moved, e.g., see Figure 3. In a producerconsumer scenario such a queue might end up with some threads not doing any work.
[Pool, Queue & Stack, SC] For pool, queue, and stack, local linearizability is incomparable to sequential consistency.∎
(Quantitative) Quiescent Consistency (QC & QQC).
Like linearizability and sequential consistency, quiescent consistency [13, 25] also requires the existence of a sequential history, a quiescent witness, that satisfies the sequential specification. All three consistency conditions impose an order on the method calls of a concurrent history that a witness has to preserve. Quiescent consistency uses the concept of quiescent states to relax the requirement of preserving the precedence order imposed by linearizability. A quiescent state is a point in a history at which there are no pending invocation events (all invoked method calls have already responded). In a quiescent witness, a method call has to appear before a method call if and only if there is a quiescent state between and . Method calls between two consecutive quiescent states can be ordered arbitrarily. Quantitative quiescent consistency [27] refines quiescent consistency by bounding the number of reorderings of operations between two quiescent states based on the concurrent behavior between these two states.
The next result about quiescent consistency for pool is needed to establish the connection between quiescent consistency and local linearizability.
Proposition 5.
A pool history satisfying 1.3. of Prop. 4 is quiescently consistent. ∎
[Pool, Queue & Stack, QC] For pool, local linearizability is (strictly) stronger than quiescent consistency. For queue and stack, local linearizability is incomparable to quiescent consistency.∎
Consistency Conditions for Distributed Shared Memory.
There is extensive research on consistency conditions for distributed shared memory [3, 4, 8, 16, 20, 22, 30, 31, 32]. In Appendix E, we compare local linearizability against coherence, PRAM consistency, processor consistency, causal consistency, and local consistency. All these conditions split a history into subhistories and require consistency of the subhistories. For our comparison, we first define a sequential specification for a single memory location. We assume that each memory location is preinitialized with a value . A readoperation returns the value of the last writeoperation that was performed on the memory location or if there was no writeoperation. We denote writeoperations by ins and readoperations by head. Formally, we define as . Note that readoperations are data observations and the same value can be read multiple times. For brevity, we only consider histories that involve a single memory location. In the following, we summarize our comparison. For details, please see Appendix E.
While local linearizability is wellsuited for concurrent data structures, this is not necessarily true for the mentioned sharedmemory consistency conditions. On the other hand, local linearizability appears to be problematic for shared memory. Consider the locally linearizable history in Figure 4. There, the read values oscillate between different values that were written by different threads. Therefore, local linearizability does not imply any of the sharedmemory consistency conditions. In Appendix E, we further show that local linearizability is incomparable to all considered sharedmemory conditions.
6 Locally Linearizable Implementations
In this section, we focus on locally linearizable data structure implementations that are generic as follows: Choose a linearizable implementation of a data structure wrt a sequential specification , and we turn it into a (distributed) data structure called LLD that is locally linearizable wrt . An LLD implementation takes several copies of (that we call backends) and assigns to each thread a backend . Then, when thread inserts an element into LLD , the element is inserted into , and when an arbitrary thread removes an element from LLD , the element is removed from some eagerly, i.e., if no element is found in the attempted backend the search for an element continues through all other backends. If no element is found in one round through the backends, then we return empty.
Proposition 6 (LLD correctness).
Let be a data structure implementation that is linearizable wrt a sequential specification . Then LLD is locally linearizable wrt .
Proof.
Let be a history of LLD . The crucial observation is that each threadinduced history is a backend history of and hence linearizable wrt . ∎
Any number of copies (backends) is allowed in this generic implementation of LLD . If we take just one copy, we end up with a linearizable implementation. Also, any way of choosing a backend for removals is fine. However, both the number of backends and the backend selection strategy upon removals affect the performance significantly. In our LLD implementations we use one backend per thread, resulting in no contention on insertions, and always attempt a local remove first. If this does not return an element, then we continue a search through all other backends starting from a randomly chosen backend.
LLD is an implementation closely related to Distributed Queues (DQs) [18]. A DQ is a (linearizable) pool that is organized as a single segment of length holding backends. DQs come in different flavours depending on how insert and remove methods are distributed across the segment when accessing backends. No DQ variant in [18] follows the LLD approach described above. Moreover, while DQ algorithms are implemented for a fixed number of backends, LLD implementations manage a segment of variable size, one backend per (active) thread. Note that the strategy of selecting backends in the LLD implementations is similar to other work in work stealing [35]. However, in contrast to this work our data structures neither duplicate nor lose elements. LLD (stack) implementations have been successfully applied for managing free lists in the fast and scalable memory allocator scalloc [5]. The guarantees provided by local linearizability are not needed for the correctness of scalloc, i.e., the free lists could also use a weak pool (pool without a linearizable emptiness check). However, the LLD stack implementations provide good caching behavior when threads operate on their local stacks whereas a weak pool would potentially negatively impact performance.
We have implemented LLD variants of strict and relaxed queue and stack implementations. None of our implementations involves observation methods, but the LLD algorithm can easily be extended to support observation methods. For details, please see App. LABEL:sec:lld_and_observers. Finally, let us note that we have also experimented with other locally linearizable implementations that lacked the genericity of the LLD implementations, and whose performance evaluation did not show promising results (see App. LABEL:sec:implementations). As shown in Sec. 4, a locally linearizable pool is not a linearizable pool, i.e., it lacks a linearizable emptiness check. Indeed, LLD implementations do not provide a linearizable emptiness check, despite of eager removes. We provide LLD , a variant of LLD , that provides a linearizable emptiness check under mild conditions on the starting implementation (see App. LABEL:sec:llplusd for details).
Experimental Evaluation.
All experiments ran on a uniform memory architecture (UMA) machine with four core 2GHz Intel Xeon E74850 processors supporting two hardware threads (hyperthreads) per core, 128GB of main memory, and Linux kernel version 3.8.0. We also ran the experiments without hyperthreading resulting in no noticeable difference. The CPU governor has been disabled. All measurements were obtained from the artifactevaluated Scal benchmarking framework [12, 19, 11], where you can also find the code of all involved data structures. Scal uses preallocated memory (without freeing it) to avoid memory management artifacts. For all measurements we report the arithmetic mean and the confidence interval (sample size=10, corrected sample standard deviation).
In our experiments, we consider the linearizable queues MichaelScott queue (MS) [34] and LCRQ [36] (improved version [37]), the linearizable stacks Treiber stack (Treiber) [42] and TS stack [14], the outoforder relaxed FIFO queue [28] and Stack [23] and linearizable wellperforming pools based on distributed queues using random balancing [18] (1RA DQ for queue, and 1RA DS for stack). For each of these implementations (but the pools) we provide LLD variants (LLD LCRQ, LLD TS stack, LLD FIFO, and LLD Stack) and, when possible, LLD variants (LLD MS queue and LLD Treiber stack). Making the pools locally linearizable is not promising as they are already distributed. Whenever LLD is achievable for a data structure implementation we present only results for LLD as, in our workloads, LLD and LLD implementations perform with no visible difference.
We evaluate the data structures on a Scal producerconsumer benchmark where each producer and consumer is configured to execute operations. To control contention, we add a busy wait of between operations. This is important as too high contention results in measuring hardware or operating system (e.g., scheduling) artifacts. The number of threads ranges between and (number of hardware threads) half of which are producers and half consumers. To relate performance and scalability we report the number of data structure operations per second. Data structures that require parameters to be set are configured to allow maximum parallelism for the producerconsumer workload with 80 threads. This results in for all FIFO and Stack variants (40 producers and 40 consumers in parallel on a single segment), for 1RADQ and 1RADS (40 producers and 40 consumers in parallel on different backends). The TS Stack algorithm also needs to be configured with a delay parameter. We use optimal delay () for the TS Stack and zero delay for the LLD TS Stack, as delays degrade the performance of the LLD implementation.
Figure 5 shows the results of the producerconsumer benchmarks. Similar to experiments performed elsewhere [14, 23, 28, 36] the wellknown algorithms MS and Treiber do not scale for 10 or more threads. The stateoftheart linearizable queue and stack algorithms LCRQ and TSinterval Stack either perform competitively with their outoforder relaxed counter parts FIFO and Stack or even outperform and outscale them. For any implementation , LLD and LLD (when available) perform and scale significantly better than does, even slightly better than the stateoftheart pool that we compare to. The best improvement show LLD variants of MS queue and Treiber stack. The speedup of the locally linearizable implementation to the fastest linearizable queue (LCRQ) and stack (TS Stack) implementation at 80 threads is 2.77 and 2.64, respectively. The performance degradation for LCRQ between 30 and 70 threads aligns with the performance of fetchandinc—the CPU instruction that atomically retrieves and modifies the contents of a memory location—on the benchmarking machine, which is different on the original benchmarking machine [36]. LCRQ uses fetchandinc as its key atomic instruction.
7 Conclusion & Future Work
Local linearizability splits a history into a set of threadinduced histories and requires consistency of all such. This yields an intuitive consistency condition for concurrent objects that enables new data structure implementations with superior performance and scalability. Local linearizability has desirable properties like compositionality and wellbehavedness for containertype data structures. As future work, it is interesting to investigate the guarantees that local linearizability provides to client programs along the line of [15].
Acknowledgments
This work has been supported by the National Research Network RiSE on Rigorous Systems Engineering (Austrian Science Fund (FWF): S11402N23, S11403N23, S11404N23, S11411N23), a Google PhD Fellowship, an Erwin Schrödinger Fellowship (Austrian Science Fund (FWF): J3696N26), EPSRC grants EP/H005633/1 and EP/K008528/1, the Vienna Science and Technology Fund (WWTF) trough grant PROSEED, the European Research Council (ERC) under grant 267989 (QUAREM) and by the Austrian Science Fund (FWF) under grant Z211N23 (Wittgenstein Award).
Appendix A Local Linearizability with Shape Observers
There are two possible ways to deal with shape observers: treat them locally, in the threadinduced history of the performing thread, or treat them globally. While a local treatment is immediate and natural to a local consistency condition, a global treatment requires care. We present both solutions next.
[Local Linearizability LSO] A history is locally linearizable with local shape observers (LSO) wrt a sequential specification if it is locally linearizable according to Definition 3 with the difference that the inmethods (Definition 3) also contain all shape observers performed by thread , i.e., .
Global observations require more notation and auxiliary notions. Let for be a collection of sequences over alphabet with pairwise disjoint sets of symbols . A sequence is an interleaving of for if and for all . We write for the set of all interleavings of with .
Given a history and a method call , we write for the (incomplete) history that is the prefix of up to and without , the response event of . Hence, contains all invocation and response events of that appear before .
Let denote the sequential specification of a container . A shape observer in a history has a witness if there exists a sequence such that and for some that is a linearization of the threadinduced history .
Informally, the above definition states that a global shape observer must be justified by a (global) witness. Such a global witness is a sequence that (1) when extended by belongs to the sequential specification, and (2) is an interleaving of linearizations of the threadinduced histories up to .
[Local Linearizability GSO] A history is locally linearizable with global shape observers (GSO) wrt a sequential specification if it is locally linearizable and each shape observer has a witness.
We illustrate the difference in the local vs. the global approach for shape observers with the following example. {example} Consider the following queue history with global observer
where is just a placeholder for a concrete natural number. For , the history is locally linearizable LSO, but not locally linearizable GSO. For , the history is locally linearizable GSO, but not locally linearizable LSO.
Global observers and nondisjoint operations are expected to have negative impact on performance. If one cares for global consistency, local linearizability is not the consistency condition to be used. The restriction to containers and disjoint operations specifies, in an informal way, the minimal requirements for local consistency to be acceptable.
Neither sets nor maps are containers according to our definition. However, it is possible to extend our treatment to sets and maps similar to our treatment of global observers. Locally linearizable sets and maps will be weaker than their linearizable counterparts, but, due to the tight coupling between mutator and observer effects, the gain in performance is unlikely to be as substantial as the one observed in other data structures. The technicalities needed to extend local linearizability to sets and maps would complicate the theoretical development without considerable benefits and we, therefore, excluded such data structures.
Appendix B Additional Results and Proofs
Theorem 3 (Compositionality). A history over a set of objects with sequential specifications for is locally linearizable if and only if is locally linearizable with respect to for every .
Proof.
The property follows from the compositionality of linearizability and the fact that for every thread and object . Assume that over is locally linearizable. This means that all threadinduced histories over are linearizable. Hence, since linearizability is compositional, for each object the history is linearizable with respect to . Now from we have that for every object the history is linearizable for every thread .
Similarly, assume that for every object the history is locally linearizable. Then, for every , is linearizable for every thread . From the compositionality of linearizability, is linearizable for every thread . This proves that is locally linearizable. ∎
Proposition 2 (Lin vs. LocLin 2). Linearizability implies local linearizability for sequential specifications that are closed under dataprojection.
Proof.
Assume we are given a history which is linearizable with respect to a sequential specification that is closed under dataprojection. Further assume that, without loss of generality, is complete. Then there exists a sequential history such that (1) is a permutation of , and (2) if , then also . Given a thread , consider the threadinduced history and let . Then, is a permutation of since and consist of the same events. Furthermore, since is closed under dataprojection and since Equation (1) holds for containers. Finally, we have for each and that, if , then also since and therefore which implies . Thereby, we have shown that is linearizable with respect to , for an arbitrary thread . Hence is locally linearizable with respect to . ∎
Proposition 3 (DataProjection Closedness). The sequential specifications of pool, queue, and stack are closed under dataprojection.
Proof.
Let , , and let
Then, it suffices to check that all axioms for pool (Definition 1 and Table 1) hold for . Clearly, all methods in appear at most once, as they do so in . If , then and, since , . But then also and hence . Finally, if for , then implying that and . But then as well and . This shows that is closed under dataprojection.
Assume now that and is as before (with and for and , respectively). Then, as is closed under dataprojection, satisfies the pool axioms. Moreover, the queueorder axiom (Definition 1 and Table 1) also holds: Assume and . Then and . Since we get and . But this means and . Hence, is closed under dataprojection.
Proposition 4 (LocLin Pool). Let be a locally linearizable history wrt a pool. Then:

No value is duplicated, i.e., every remove method appears in at most once.

There are no outofthinair values, i.e.,

No value is lost, i.e., and
Proof.
Note that if a history is linearizable wrt a pool, then all of the three stated properties hold, as a consequence of linearizability and the definition of . Now assume that is locally linearizable wrt a pool.
If appears twice in , then it also appears twice in some threadinduced history contradicting that is linearizable with respect to a pool. This shows that no value is duplicated.
If , then for some and, since is linearizable with respect to a pool, and . This yields and . Hence, there are no thinair values.
Finally, if for then for all . Let and let be such that . Then and since is linearizable with respect to a pool, and . This yields and . Similarly, the other condition holds. Hence, no value is lost. ∎
Theorem LABEL:thm:queue (Queue Local Linearizability). A queue concurrent history is locally linearizable with respect to the queue sequential specification if and only if

is locally linearizable with respect to the pool sequential specification , and

.
Proof.
Assume is locally linearizable with respect to . Since (with suitably renamed method calls), is locally linearizable with respect to . Moreover, since all are linearizable with respect to , by Theorem LABEL:thm:queuelin, for all we have
Assume are such that and . Then and so and . This implies and .
For the opposite, assume that conditions 1. and 2. hold for a history . We need to show that (1) form a decomposition of , which is clear for a queue, and (2) each is linearizable with respect to .
By 1., each is linearizable with respect to a pool. Assume and . Then and hence by 2., . Again, as we get . According to Theorem LABEL:thm:queuelin this is enough to conclude that each is linearizable with respect to . ∎
Theorem 5 (Pool, Queue, & Stack, SC). For pool, queue, and stack, local linearizability is incomparable to sequential consistency.
Proof.
The following histories, when instantiating with , , and , respectively, and instantiating with , , and , respectively, are sequentially consistent but not locally linearizable wrt pool, queue and stack:

Pool:

Queue:

Stack:
History (a) is already not locally linearizable wrt pool, queue, and stack, respectively, histories (b) and (c) provide interesting examples. The history in Figure 3 is locally linearizable but not sequentially consistent wrt a pool. The following histories are locally linearizable but not sequentially consistent wrt a queue and a stack, respectively:

Queue:
The two threadinduced histories and are both linearizable with respect to a queue. However, the overall history has no sequential witness and is therefore not sequentially consistent: To maintain the queue behavior, the order of operations and cannot be changed. However, this implies that the value instead of the value would have to be removed directly after .

Stack:
The two threadinduced histories and are both linearizable with respect to a stack. The operations and prevent the reordering of operations and . Therefore, the overall history has no sequential witness and hence it is not sequentially consistent.∎
Proposition 5 (Pool, QC). Let be a pool history in which no data is duplicated, no thinair values are returned, and no data is lost, i.e., satisfies 1.3. of Proposition 4. Then is quiescently consistent.
Proof.
Assume is a pool history that satisfies 1.3. of Proposition 4. Let be histories that form a sequential decomposition of . That is and the only quiescent states in any are at the beginning and at the end of it. Note that this decomposition has nothing to do with a threadlocal decomposition. Let be the set of methods of , for . Note that the sanity conditions 1.3. ensure that none of the following two situations can happen:

,

,
Let