A Complexity-Based Hierarchy for Multiprocessor Synchronization

A Complexity-Based Hierarchy for Multiprocessor Synchronization

Faith Ellen
University of Toronto
faith@cs.toronto.edu
   Rati Gelashvili
MIT
gelash@mit.edu
   Nir Shavit
MIT
shanir@csail.mit.edu
   Leqi Zhu
University of Toronto
lezhu@cs.toronto.edu
Abstract

For many years, Herlihy’s elegant computability-based Consensus Hierarchy has been our best explanation of the relative power of various types of multiprocessor synchronization objects when used in deterministic algorithms. However, key to this hierarchy is treating synchronization instructions as distinct objects, an approach that is far from the real-world, where multiprocessor programs apply synchronization instructions to collections of arbitrary memory locations. We were surprised to realize that, when considering instructions applied to memory locations, the computability based hierarchy collapses. This leaves open the question of how to better capture the power of various synchronization instructions.

In this paper, we provide an approach to answering this question. We present a hierarchy of synchronization instructions, classified by the space complexity necessary to solve consensus in an obstruction-free manner using these instructions. Our hierarchy provides a classification of combinations of known instructions that seems to fit with our intuition of how useful some are in practice, while questioning the effectiveness of others. In particular, we prove an essentially tight characterization of the power of buffered read and write instructions. Interestingly, we show a similar result for multi-location atomic assignments.

1 Introduction

Herlihy’s Consensus Hierarchy [Her91] assigns a consensus number to each object, namely, the number of processes for which there is a wait-free binary consensus algorithm using only instances of this object and read-write registers. It is simple, elegant and, for many years, has been our best explanation of synchronization power.

Robustness says that, using combinations of objects with consensus numbers at most , it is not possible to solve wait-free consensus for more than processes [Jay93]. The implication is that modern machines need to provide objects with infinite consensus number. Otherwise, they will not be universal, that is, they cannot be used to implement all objects or solve all tasks in a wait-free (or non-blocking) manner for any number of processes [Her91, Tau06Book, Ray12Book, HS12Book]. Although there are ingenious non-deterministic constructions that prove that Herlihy’s Consensus Hierarchy is not robust [Sch97, hl00], it is known to be robust for deterministic one-shot objects [HR00] and deterministic read-modify-write and readable objects [Rup00]. It is unknown whether it is robust for general deterministic objects.

In adopting this explanation of computational power, we failed to notice an important fact: multiprocessors do not compute using synchronization objects. Rather, they apply synchronization instructions to locations in memory. With this point of view, Herlihy’s Consensus Hierarchy no longer captures the phenomena we are trying to explain.

For example, consider two simple instructions:

  • , which returns the number stored in a memory location and increases its value by 2, and

  • , which returns the number stored in a memory location and sets it to 1 if it contained 0.

(This definition of is slightly stronger than the standard definition, which always sets the location to which it is applied to 1. Both definitions behave identically when the values in the location are in .) Objects that support only one of these instructions have consensus number 2. Moreover, these deterministic read-modify-write objects cannot be combined to solve wait-free consensus for 3 or more processes. However, with an object that supports both instructions, it is possible to solve wait-free binary consensus for any number of processes. The protocol uses a single memory location initialized to . Processes with input perform , while processes with input perform . If the value returned is odd, the process decides 1. If the value 0 was returned from , the process also decides 1. Otherwise, the process decides 0.

Another example considers three instructions:

  • , which returns the number stored in a memory location,

  • , which decrements the number stored in a memory location and returns nothing, and

  • , which multiplies the number stored in a memory location by and returns nothing.

A similar situation arises: Objects that support only two of these instructions have consensus number 1 and cannot be combined to solve wait-free consensus for 2 or more processes. However, using an objects that supports all three instructions, it is possible to solve wait-free binary consensus for any number of processes. The protocol uses a single memory location initialized to . Processes with input perform , while processes with input perform . The second operation by each process is . If the value returned is positive, then the process decides 1. If it is negative, then the process decides 0.

For randomized computation, Herlihy’s Consensus Hierarchy also collapses: randomized wait-free binary consensus among any number of processes can be solved using only read-write registers, which have consensus number 1. Ellen, Herlihy, and Shavit [FHS98] proved that historyless objects, which support only trivial operations, such as , and historyless operations, such as , , and , are necessary to solve this problem. They noted that, in contrast, only one fetch-and-increment or fetch-and-add object suffices for solving this problem. Yet, these objects and historyless objects are similarly classified in Herlihy’s Consensus Hierarchy (i.e. they all have consensus number 1 or 2). They suggested that the number of instances of an object needed to solve randomized wait-free consensus among processes might be another way to classify the power of the object.

Motivated by these observations, we consider a classification of instruction sets based on the number of memory locations needed to solve obstruction-free -valued consensus among processes. Obstruction freedom is a simple and natural progress measure. Some state-of-the-art synchronization operations, for example hardware transactions [intel], do not guarantee more than obstruction freedom. Obstruction freedom is also closely related to randomized computation. In fact, any (deterministic) obstruction free algorithm can be transformed into a randomized wait-free algorithm that uses the same number of memory locations (against an oblivious adversary) [GHHW13]. Obstruction-free algorithms can also be transformed into wait-free algorithms in the unknown-bound semi-synchronous model [FLMS05]. Recently, it has been shown that any lower bound on the number of registers used by obstruction-free algorithms also applies to randomized wait-free algorithms [EGZ18].

1.1 Our Results

Let n-consensus denote the problem of solving obstruction-free -valued consensus among processes. For any set of instructions , let denote the minimum number of memory locations supporting that are needed to solve n-consensus. This is a function from the positive integers, , to . For various instruction sets , we provide upper and lower bounds on . The results are summarized in Table 1.

We begin, in Section 3, by considering the instructions

  • , which multiplies the number stored in a memory location by and returns nothing,

  • , which adds to the number stored in a memory location and returns nothing, and

  • , which sets bit of a memory location to 1 and returns nothing.

We show that one memory location supporting and one of these instructions can be used to solve -consensus. The idea is to show that these instruction sets can implement counters in a single location. We can then use a racing counters algorithm [AH90].

Next, we consider max-registers [AAC09]. These are memory locations supporting

  • , which reads the number stored in a memory location, and

  • , which stores the number in a memory location, provided it contains a value less than , and returns nothing.

In Section 4, we prove that two max registers are necessary and sufficient for solving -consensus.

In Section 5, we prove that a single memory location supporting cannot be used to solve -consensus, for . We also present an algorithm for solving -consensus using such memory locations.

In Section 6, we introduce a family of buffered read and buffered write instructions , for , and show how to solve -consensus using memory locations supporting these instructions. Extending Zhu’s lower bound [Zhu16], we also prove that such memory locations are necessary, which is tight except when is divisible by .

Our main technical contribution is in Section 7, where we show a lower bound of locations, even in the presence of atomic multiple assignment. Multiple assignment can be implemented by simple transactions, so our result implies that such transactions cannot significantly reduce space complexity. The proof further extends the techniques of [Zhu16] via a nice combinatorial argument, which is of independent interest.

There are algorithms that solve -consensus using registers [AH90, BRS15, Zhu15]. This is tight by the recent result of [EGZ18], which shows a lower bound of registers for binary consensus among processes and, hence, for -consensus. In Section 8, we present a modification of a known anonymous algorithm for -consensus [Zhu15], which solves -consensus using memory locations supporting . A lower bound of locations appears in [FHS98]. This lower bound also applies to locations that only support , and instructions.

Finally, in Section 9, we show that an unbounded number of memory locations supporting and either or are necessary and sufficient to solve -consensus, for . Furthermore, we show how to reduce the number of memory locations to when in addition to , and are both available, or and are both available.

Instructions
,
(lower), (upper)
(lower), (upper)
(lower), (upper)
(lower), (upper)
2 (lower), (upper)
2
1
,
,
Table 1: Space Hierarachy

2 Model

We consider an asynchronous system of processes, with ids , that supports a set of deterministic synchronization instructions, , on a set of identical memory locations. The processes take steps at arbitrary, possibly changing, speeds and may crash at any time. Each step is an atomic invocation of some instruction on some memory location by some process. Scheduling is controlled by an adversary. This is a standard asynchronous shared memory model [AW04], with the restriction that every memory location supports the same set of instructions. We call this restriction the uniformity requirement.

When allocated a step by the scheduler, a process performs one instruction on one shared memory location and, based on the result, may then perform an arbitrary amount of local computation. A configuration consists of the state of every process and the contents of every memory location.

Processes can use instructions on the memory locations to simulate (or implement) various objects. An object provides a set of operations which processes can call to access and/or change the value of the object. Although a memory location together with the supported instructions can be viewed as an object, we do not do so, to emphasize the uniformity requirement.

We consider the problem of solving obstruction-free -valued consensus in such a system. Initially, each of the processes has an input from and is supposed to output a value (called a decision), such that all decisions are the same (agreement) and equal to the input of one of the processes (validity). Once a process has decided (i.e. output its decision), the scheduler does not allocate it any further steps. Obstruction-freedom means that, from each reachable configuration, each process will eventually decide a value in a solo execution, i.e. if the adversarial scheduler gives it sufficiently many consecutive steps. When , we call this problem -consensus and, when , we call this problem binary consensus. Note that lower bounds for binary consensus also apply to -consensus.

In every reachable configuration of a consensus algorithm, each process has either decided or has one specific instruction it will perform on a particular memory location when next allocated a step by the scheduler. In this latter case, we say that the process is poised to perform that instruction on that memory location in the configuration.

3 Arithmetic Instructions

Consider a system that supports only and either , , or . We show how to solve -consensus using a single memory location in such a system. The idea is to show that we can simulate certain collections of objects that can solve -consensus.

An -component unbounded counter object has components, each with a nonnegative integral value. It supports an operation on each component, which increments the count stored in the component by , and a operation, which returns the counts of all components. In the next lemma, we present a racing counters algorithm that bears some similarity to a consensus algorithm by Aspnes and Herlihy [AH90].

Lemma 3.1.

It is possible to solve obstruction-free -valued consensus among processes using an
-component unbounded counter.

Proof.

We associate a separate component with each possible input value . All components are initially . Each process alternates between promoting a value (incrementing the component associated with that value) and performing a of all components. A process first promotes its input value. After performing a , if it observes that the count stored in component, , associated with some value is at least larger than the counts stored in all other components, it returns the value . Otherwise, it promotes the value associated with a component containing the largest count (breaking ties arbitrarily).

If some process returns the value , then each other process will increment some component at most once before next performing a . In each of those s, the count stored in will still be larger than the counts stored in all other components. From then on, these processes will promote value and keep incrementing . Eventually, the count in component will be at least larger than the counts in all other components, and these processes will return , ensuring agreement.

Obstruction-freedom follows because a process running on its own will continue to increment the same component, which will eventually be larger than the counts in all other components. ∎

In this protocol, the counts stored in the components may grow arbitrarily large. The next lemma shows that it is possible to avoid this problem, provided each component also supports a operation. More formally, an -component bounded counter object has components, where each component stores a count in . It supports both and operations on each component, along with a operation, which returns the count stored in every component. If a process ever attempts to increment a component that has count or decrement a component that has count , the object breaks (and every subsequent operation invocation returns ).

Lemma 3.2.

It is possible to solve obstruction-free -valued consensus among processes using an -component bounded counter.

Proof.

We modify the construction in Lemma 3.1 slightly by changing what a process does when it wants to increment to promote the value . Among the other components (i.e. excluding ), let be one that stores the largest count. If , it increments , as before. If , then, instead of incrementing , it decrements .

A component with value is never decremented. This is because, after the last time some process observed that it stored a count greater than or equal to , each process will decrement the component at most once before performing a . Similarly, a component never becomes larger than : After the last time some process observed it to have count less than , each process can increment at most once before performing a . If , then either the other components are less than , in which case the process returns without incrementing , or the process decrements some other component, instead of incrementing . ∎

In the following theorem, we show how to simulate unbounded and bounded counter objects.

Theorem 3.3.

It is possible to solve -consensus using a single memory location that supports only and either , , or .

Proof.

We first give an obstruction-free implementation of an -component unbounded counter object using a single location that supports and . By Lemma 3.1, this is sufficient for solving -consensus. The location is initialized with value . For each , let be the ’st prime number. A process increments component by performing . A instruction returns the value currently stored in the memory location. This provides a of all components: component is the exponent of in the prime decomposition of .

A similar construction does not work using only and instructions. For example, suppose one component is incremented by calling and another component is incremented by calling . Then, the value can be obtained by incrementing the first component times or incrementing the second component times.

However, we can use a single memory location that supports to implement an -component bounded counter. By Lemma 3.2, this is sufficient for solving consensus. We view the value stored in the location as a number written in base and interpret the ’th least significant digit of this number as the count of component . The location is initialized with the value 0. To increment , a process performs , to decrement , it performs and provides a of all components.

Finally, in systems supporting and , we can implement an -component unbounded counter by viewing the memory location as being partitioned into blocks, each consisting of bits. Initially all bits are 0. Each process locally stores the number of times it has incremented each component . To increment component , process sets the ’th bit in block to 1, where is the number of times it has previously incremented component . It is possible to determine to current stored in each component via a single : The count stored in component is simply the sum of the number of times each process has incremented . ∎

4 Max-Registers

A max-register object [AAC09] supports two operations, and . The operation sets the value of the max-register to if is larger than the current value and returns the current value of the max-register (which is the largest amongst all values previously written to it). We show that two max-registers are necessary and sufficient for solving -consensus.

Theorem 4.1.

It is not possible to solve obstruction-free binary consensus for processes using a single max-register.

Proof.

Consider a solo terminating execution of process with input and a solo terminating execution of process with input . We show how to interleave these two executions so that the resulting execution is indistinguishable to both processes from their respective solo executions. Hence, both values will be returned, contradicting agreement.

To build the interleaved execution, run both processes until they are first poised to perform . Suppose is poised to perform and is poised to perform . If , let take steps until it is next poised to perform or until the end of , if it performs no more operations. Otherwise, let take steps until it is next poised to perform or until the end of . Repeat this until one of the processes reaches the end of its execution and then let the other process finish. ∎

Theorem 4.2.

It is possible to solve -consensus for any number of processes using only two max-registers.

Proof.

We describe a protocol for -consensus using two max-registers, and . Consider the lexicographic ordering on the set . Let be a fixed prime that is larger than . Note that, for , if and only if . Thus, by identifying with , we may assume that and are max-registers defined on with respect to the lexicographic ordering .

Since no operations decrease the value in a max-register, it is possible to implement an obstruction-free scan operation on and using the double collect algorithm in [AADGMS93]: A process repeatedly collects the values in both locations (performing on each location to obtain its value) until it observes two consecutive collects with the same values.

Initially, both and have value . Each process alternately performs on one component and takes a scan of both components. It begins by performing to , where is its input value. If has value and has value in the scan, then it decides and terminates. If both and have value in the scan, then it performs to . Otherwise, it performs to with the value of in the scan.

To obtain a contradiction suppose that there is an execution in which some process decides value and another process decides value . Immediately before its decision, performed a scan where had value and had value , for some . Similarly, immediately before its decision, performed a scan where had value and had value , for some . Without loss of generality, we may assume that ’s scan occurs after ’s scan. In particular, had value before it had value . So, from the specification of a max-register, . Since , it follows that .

We show inductively, for , that some process performed a scan in which both and had value . By assumption, performed a scan where had value . So, some process performed on . From the algorithm, this process performed a scan where and both had value . Now suppose that and some process performed a scan in which both and had value . So, some process performed on . From the algorithm, this process performed a scan where and both had value .

Consider the smallest value of such that . Note that , so . Hence, some process performed a scan in which both and had value . Since , this scan occurred after the scan by , in which had value . But had value in this scan and had value in ’s scan, so . Since , it follows that . Hence and . This contradicts the choice of . ∎

5 Increment

Consider a system that supports only , , and . We prove that it is not possible to solve -consensus using a single memory location. We also consider a weaker system that supports only , , and and provide an algorithm using memory locations.

Theorem 5.1.

It is not possible to solve obstruction-free binary consensus for processes using a single memory location that supports only , , and .

Proof.

Suppose there is a binary consensus algorithm for two processes, and , using only one memory location. Consider solo terminating executions and by with input and input , respectively. Let and be the longest prefixes of and , respectively, that do not contain a . Without loss of generality, suppose that at least as many instructions are performed in as in . Let be the configuration that results from executing starting from the initial configuration in which has input and the other process, has input .

Consider the shortest prefix of in which performs the same number of instructions as it performs in . Let be the configuration that results from executing starting from the initial configuration in which both and have input . Then must decide in its solo terminating execution starting from configuration . However, and are indistinguishable to process , so it must decide in starting from configuration . If has decided in configuration , then it has decided 0, since takes no steps in . Then both 0 and 1 are decided in execution starting from the initial configuration in which has input and has input . This violates agreement. Thus, cannot have decided in configuration .

Therefore, is poised to perform a in configuration . Let be the remainder of , so . Since there is only one memory location, the configurations resulting from performing this starting from and are indistinguishable to . Thus, also decides starting from . But in this execution, both and are decided, violating agreement. ∎

The following well-known construction converts any algorithm for solving binary consensus to an algorithm for solving -valued consensus [HS12Book].

Lemma 5.2.

Consider a system that supports a set of instructions that includes and . If it is possible solve obstruction-free binary consensus among processes using only memory locations, then it is possible to solve -consensus using only locations.

Proof.

The processes agree bit-by-bit in asynchronous rounds, each using locations. A process starts in the first round with its input value as its value for round . In round , if the ’th bit of its value is , a process writes its value in a designated -location for the round. Otherwise, it writes its value in a designated -location. Then, it performs the obstruction-free binary consensus algorithm using locations to agree on the ’th bit, , of the output. If this bit differs from the ’th bit of its value, the process reads a recorded value from the designated -location for round and adopts its value for the next round. Note that some process must have already recorded a value to this location since, otherwise, the bit would have been agreed upon. This ensures that the values used for round are all input values and they all agree in their first bits. By the end, all processes have agreed on bits, i.e. on one of the at most different input values.scan

We can save two locations because the last round does not require designated and -locations. ∎

We can implement a -component unbounded counter, defined in Section 3, using two locations that support and . The values in the two locations never decrease. Therefore, as in the proof of Theorem 4.2, a operation that returns the values of both counters can be performed using the double collect algorithm [AADGMS93]. By Lemma 3.1, processes can solve obstruction-free binary consensus using a a -component unbounded counter. The next result then follows from Lemma 5.2.

Theorem 5.3.

It is possible to solve -consensus using only memory locations that support only , , and .

6 Buffers

In this section, we consider the instructions and , for , which generalize read and write, respectively. Specifically, an instruction returns the sequence of inputs to the most recent instructions applied to the memory location, in order from least recent to most recent. If the number of instructions previously applied to the memory location is , then the first elements of this sequence are . Subsequent of the conference version of this paper [EGSZ16], Mostéfaoui, Perrin, and Raynal [MPR18] defined a -sliding window register, which is an object that supports only and instructions.

We consider a system that supports the instruction set , for some . We call each memory location in such a system an and say that each memory location has capacity . Note that a 1-buffer is simply a register. For , an -buffer essentially maintains a buffer of the most recent writes to that location and allows them to be read.

In Section 6.1, we show that a single -buffer can be used to simulate a powerful history object that can be updated by at most processes. This will allow us to simulate an obstruction-free variant of Aspnes and Herlihy’s algorithm for -consensus [AH90] and, hence, solve -consensus, using only -buffers. In Section 6.2, we prove that -buffers are necessary, which matches the upper bound whenever is not a multiple of .

6.1 Simulations Using Buffers

A history object, , supports two operations, and , where returns the sequence of all values appended to by prior operations, in order. We first show that, using a single -buffer, , we can simulate a history object, , that supports arbitrarily many readers and at most different appenders.

Lemma 6.1.

A single -buffer can simulate a history object on which at most different processes can perform and any number of processes can perform .

Proof.

Without loss of generality, assume that no value is appended to more than once. This can be achieved by having a process include its process identifier and a sequence number along with the value that it wants to append.

In our implementation, is initially and each value written to is of the form , where is a history of appended values and is a single appended value.

To implement on , a process obtains a history, , by performing on and then performs on . The operation is linearized at this step.

To implement on , a process simply performs an of to obtain a vector , where is the most recently written value. The operation is linearized at this . We describe how the return value of the operation is computed.

We prove that each operation, , on returns the sequence of inputs to all operations on that were linearized before it, in order from least recent to most recent. Let be the step performed by and let be the vector returned by .

Note that if and only if no steps were performed before i.e. if and only if no operations are linearized before . In this case, the empty sequence is returned by the operation, as required.

Now suppose that steps were performed on before , i.e.  operations were linearized before . Inductively assume that each operation which has fewer than operations linearized before it returns the sequence of inputs to those operations.

If , then was the input to an step on performed before . Consider the operation that performed step . It appended the value to and the operation, , that performed returned the history of appended values . Let be the step performed by . Since occurred before , which occurred before , fewer than steps occurred before . Hence, fewer than operations are linearized before . By the induction hypothesis, is the sequence of inputs to the operations linearized before .

If , then . In this case, returns the sequence . Since each operation is linearized at its step and are the inputs to these operations, in order from least recent to most recent, returns the sequence of inputs to the operations linearized before it.

So, suppose that . Let be the longest history amongst . If contains , then returns , where is the prefix of up to, but not including, . By definition, are the inputs to the last operations prior to , so are the last values appended to prior to . Since contains , it also contains all values appended to prior to . It follows that is the the sequence of inputs to the operations linearized before .

Figure 1: When does not contain , there are concurrent operations.

Now suppose that does not contain . Then none of contain . Hence were linearized before and were performed prior to . Since step occurred before , the operations are all concurrent with one another. This is illustated in Figure 1. Therefore are performed by different processes. Only different processes can perform operations on , so no other operations on are linearized between and . Therefore, contains all values appended to prior to . It follows that is the sequence of inputs to the operations linearized before . ∎

This lemma allows us to simulate any object that supports at most updating processes using only a single -buffer. This is because the state of an object is determined by the history of the non-trivial operations performed on it. In particular, we can simulate an array of single-writer registers using a single -buffer.

Lemma 6.2.

A single -buffer can simulate single-writer registers.

Proof.

Suppose that register is owned by process , for . By Lemma 6.1, it is possible to simulate a history object that can be updated by processes and read by any number of processes. To write value to , process appends to . To read , a process reads and finds the value of the most recent write to . This is the second component of the last pair in the history whose first component is . ∎

Thus, we can use -buffers to simulate single-writer registers. An -component unbounded counter shared by processes can be implemented in an obstruction-free way from single-writer registers. Each process records the number of times it has incremented each component in its single-writer register. An obstruction-free can be performed using the double collect algorithm [AADGMS93] and summing. Hence, by Lemma 3.1 we get the following result.

Theorem 6.3.

It is possible to solve -consensus using only -buffers.

6.2 A Lower Bound

In this section, we prove a lower bound on the number of memory locations (supporting and ) necessary for solving obstruction-free binary consensus among processes.

In any configuration, location is covered by process if is poised to perform on . A location is -covered by a set of processes in a configuration if there are exactly processes in that cover it. A configuration is at most -covered by , if every process in covers some location and no location is -covered by , for any .

Let be a configuration and let be a set of processes, each of which is poised to perform in . A block write by from is an execution, starting from , in which each process in takes exactly one step. If a block write is performed that includes different instructions to the same location, and then some process performs on that location, the process gets the same result regardless of the value of that location in .

We say that a set of processes can decide from a configuration if there exists a -only execution from in which is decided. If can decide both and from , then is bivalent from .

To obtain the lower bound, we extend the proof of the lower bound on the number of registers required for solving -process consensus [Zhu16]. We also borrow intuition about reserving executions from the lower bound for anonymous consensus [Gel15]. The following auxiliary lemmas are largely unchanged from [Zhu16]. The main difference is that we only perform block writes on -buffers that are -covered by .

Lemma 6.4.

There is an initial configuration from which the set of all processes in the system is bivalent.

Proof.

Consider an initial configuration, , with two processes and , such that starts with input , for . Observe that can decide from since, initially, is indistinguishable to from the configuration where every process starts with input . Thus, is bivalent from and, therefore, so is the set of all processes. ∎

Lemma 6.5.

Let be a configuration and be a set processes that is bivalent from . Suppose is at most -covered by a set of processes , where . Let be a set of locations that are -covered by in . Let be a block write from by the set of processes in that cover . Then there exists a -only execution from such that is bivalent from and, in configuration , some process in covers a location not in .

Proof.

Suppose some process can decide some value from configuration and is a -only execution from in which is decided. Let be the longest prefix of such that can decide from . Let be the next step by in after .

If is an or is an to a location in , then and are indistinguishable to . Since can decide from , but can only decide from , must be an to a location not in . Thus, in configuration in , covers a location not in and is indistinguishable from to process . Therefore, by definition of , can only decide from and can decide from . This implies that is bivalent from . ∎

The next result says that if a set of processes is bivalent in some configuration, then it is possible to reach a configuration from which 0 and 1 can be decided in solo executions. It does not depend on what instructions are supported by the memory.

Lemma 6.6.

Suppose is a set of at least two processes that is bivalent from configuration . Then it is possible to reach, via a -only execution from , a configuration, , such that, for , there is a process that decides from .

Proof.

Let be the set of all configurations from which is bivalent and which are reachable from by a -only execution. Let be the smallest integer such that there exist a configuration and a set of processes that is bivalent from . Pick any such and let be a set of processes that is bivalent from . Since each process has only one terminating solo execution from and it decides only one value in this execution, it follows that .

Consider a process and let be the set of remaining processes in . Since , there exists such that can only decide from . Let . Then decides from . If decides from , then and satisfy the claim for . So, suppose that decides from .

Since is bivalent from , there is a -only execution from that decides . Let be the longest prefix of such that both and can only decide from . Note that , because is decided in . Let be the next step in after . Then either or can decide from .

First, suppose that is a step by a process in . Since can only decide from , can only decide from . Therefore, decides from . Since decides from , and satisfy the claim for .

Finally, suppose that is a step by . Since decides from , decides from . Therefore, can decide from . However, . By definition of , is not bivalent from . Therefore can only decide from . Since decides from , and satisfy the claim for . ∎

Similar to the induction used by Zhu [Zhu16], from a configuration that is at most -covered by a set of processes , we show how to reach another configuration that is at most -covered by and in which another process covers a location that is not -covered by .

Lemma 6.7.

Let be a configuration and let be a set of processes. If is bivalent from , then there is a -only execution starting from and a set of two processes such that is bivalent from and is at most -covered by the remaining processes .

Proof.

By induction on . The base case is when . Let and let be the empty execution. Since , the claim holds.

Now let and suppose the claim holds for . By Lemma 6.6, there exist a -only execution starting from and a set of two processes that is bivalent from . Pick any process . Then is bivalent from because is bivalent from .

We construct a sequence of configurations reachable from such that, for all , the following properties hold:

  • there exists a set of two processes such that is bivalent from ,

  • is at most -covered by the remaining processes , and

  • if is the set of locations that are -covered by in , then is reachable from by a -only execution which contains a block write to by processes in .

By the induction hypothesis applied to and , there is a -only execution starting from and a set of two processes such that is bivalent from and is at most -covered by .

Now suppose that is a configuration reachable from and and are sets of processes that satisfy all three conditions.

By Lemma 6.5 applied to configuration , there is a -only execution such that is bivalent from , where is a block write to by processes in . Applying the induction hypothesis to and , we get a -only execution leading to a configuration , in which there is a set, , of two processes such that is bivalent from . Additionally, is at most -covered by the set of remaining processes . Note that the execution