DGCC: A New Dependency Graph based Concurrency Control Protocol for Multicore Database Systems

DGCC:A New Dependency Graph based Concurrency Control Protocol for Multicore Database Systems

Abstract

Multicore CPUs and large memories are increasingly becoming the norm in modern computer systems. However, current database management systems (DBMSs) are generally ineffective in exploiting the parallelism of such systems. In particular, contention can lead to a dramatic fall in performance. In this paper, we propose a new concurrency control protocol called DGCC (Dependency Graph based Concurrency Control) that separates concurrency control from execution. DGCC builds dependency graphs for batched transactions before executing them. Using these graphs, contentions within the same batch of transactions are resolved before execution. As a result, the execution of the transactions does not need to deal with contention while maintaining full equivalence to that of serialized execution. This better exploits multicore hardware and achieves higher level of parallelism. To facilitate DGCC, we have also proposed a system architecture that does not have certain centralized control components yielding better scalability, as well as supports a more efficient recovery mechanism. Our extensive experimental study shows that DGCC achieves up to four times higher throughput compared to that of state-of-the-art concurrency control protocols for high contention workloads.

\numberofauthors

1

1 Introduction

Advancement in multicore processors in the last decade have enabled programs to significantly improve performance by exploiting parallelism. Further, the availability of larger and cheaper main memory makes it possible for a significant amount of data to reside in main memory. It is now feasible to have a single multicore system with large memory to handle applications that were previously supported by multiple machines. However, current database management systems (DBMSs) are not designed to fully exploit these new hardware features. In this paper, we will examine the design of multicore in-memory OLTP systems with the goal of improving the throughput of transaction processing by better exploiting modern multicore hardware. In summary, we divide transactions arriving at the DBMS into batches. Every transaction within each batch is chopped up into transaction pieces which are reorganized into an efficient concurrent execution plan that has no contention. We present a new control concurrency protocol based on the dependencies of transactions that ensures the correctness of the execution.

Figure 1: An Example with Two Transactions

We call our new concurrency control protocol Dependency Graph based Concurrency Control (DGCC). DGCC differs from traditional lock based or timestamp based protocols in that it separates the logic for concurrency control from the execution of the transactions. In traditional OLTP systems, each transaction is handled by a worker thread from its beginning to its end. The worker thread is responsible for contention resolution and execution. Since each thread consumes systems resources, there is a limit to the number of threads and hence the number of concurrent transactions that can be present at any one time. Furthermore, overall performance is affected by contention as well as the inability to fully exploiting parallelism. To alleviate the problem and improve scalability, DGCC first chops up a batch of transactions into transaction pieces, and then builds a dependency graph that incorporates the dependency relationship of the transaction operations. DGCC then executes these dependency graphs in a manner that guarantees the execution of the operations is serializable. Furthermore, the execution will have no contention at runtime.

We illustrate the basic idea of DGCC and compare it with the two traditional concurrency control protocols in Figure 1. For a lock based protocol, as shown in Figure 1(a), a deadlock occurs when transaction is holding A’s lock and requesting B’s lock, while transaction is holding B’s lock and requesting A’s lock. To break the deadlock, either transaction or transaction must be aborted. In a timestamp based protocol, shown in Figure 1(b), transaction ’s operations overlap with transaction ’s operations. At the validation phase of transaction , it is found that record has been modified by transaction , which completed after transaction started and had committed earlier. This causes transaction to be aborted. In addition, in both lock based and timestamp based protocols, operations in one transaction must run sequentially within a single thread. As such, the two transactions in Figure 1 (a) and (b) can be concurrently executed by at most two threads. In DGCC, during dependency graph construction phase, transactions are broken down into transaction pieces, which allows the system to parallelize the execution at level of operations. More specifically, DGCC enables concurrent execution of the transaction operations as long as the they do not conflict. As shown in Figure 1 (c), four threads are initiated for transaction and transaction ’s execution as they can simultaneously operate on four different records. If there are operations with dependency (e.g., read and write records and from the two transactions), DGCC will execute them in order. Finally, both transactions will successfully commit. In this manner, DGCC reduces the abort rate while at the same time enabling higher concurrency and guaranteeing serializability.

DGCC consists of a graph construction phase and an execution phase, using a different work partitioning strategy for each phase. In particular, one worker thread is responsible for the construction of each dependency graph. At graph construction phase, worker threads will work in parallel to build different dependency graphs at the same time. If more than one transaction attempt to access the same data, during the execution phase, the dependency graphs constructed by DGCC guarantee that they will be executed in a serialized manner. In general, however, this approach exposes parallelism when the opportunity presents itself.

DGCC is based on batch processing in a multicore in-memory system. As with any batch processing, latency is a valid concern. However, we shall reason that this is feasible in practice. First, in real applications, requests at the client side are always sent to the server in batches so as to reduce the network overhead. More importantly, in-memory systems always need to write transaction logs to disk for the purpose of reliability. In order to reduce disk I/O cost, group commit protocols [7] are not uncommon. In other words, current systems already both receive and commit transaction in a batch manner. Secondly, in the context of in-memory multicore systems, data access is extremely fast compared to that in traditional disk-based systems, thereby reducing latency. Thirdly, the latency due to the batch processing can actually be minimized by the tuning of the batch size. In summary, if the execution strategy is well designed, latency can be controlled to within acceptable bounds. The experiments conducted in our performance study confirms that fast batch processing is achievable.

We have implemented an in-memory OLTP system with DGCC concurrency control protocol that supports high concurrency, efficient recovery and good scalability. The system architecture is designed for the modern multicore environment. Our experiments show that it achieves significantly higher throughput, and scales well compared to other concurrency control protocols.

In summary, this paper makes the following contributions:

  • We propose DGCC, a new concurrency control protocol that separates contention resolution from execution using dependency graph and achieves higher parallelism.

  • A new in-memory multicore OLTP system supporting DGCC is prototyped. Besides DGCC, it supports an efficient recovery mechanism and a customized memory allocation scheme that helps to avoid system memory malloc at the runtime.

  • An extensive performance study of DGCC against three state-of-the-art concurrency control protocols was conducted. The performance study using two benchmarks shows that DGCC achieves up to four times higher throughput than the other three concurrency control protocols.

The remainder of the paper is organized as follows. In Section 2, we introduce classical concurrency control protocols. We present DGCC in Section 3, and the architecture of our prototype system in Section 4. A comprehensive evaluation is presented in Section 5, and we review some related work in Section 6. Finally, the paper is concluded in Section 7.

2 Existing Concurrency Control Protocols

A transaction in a DBMS consists of a sequence of read and write operations. The DBMS must guarantee that (a) only serializable and recoverable schedules are allowed, (b) no operations of committed transactions are lost, and (c) the effects of partial transactions are not retained. In short, the DBMS is responsible to ensure the ACID (Atomicity, Consistency, Isolation and Durability) [12] properties.

In the multicore era, concurrency control protocols should enable multi-user programs to be interleaved and executed concurrently with the net effect being is identical to executing them in a serial order. Essentially, concurrency control protocols ensure the atomicity and isolation properties. Many research efforts have been devoted to this area. We shall follow the canonical categorization in  [31] and review them in two categories, namely lock and timestamp based protocols.

2.1 Lock Based Protocols

The essential idea of lock based protocols is making use of locks to control the access to data. A transaction must acquire a lock on an object before it can operate on the object to prevent unsafe interleaving of transactions. With this kind of protocols, transactions accessing data locked by other transactions may be blocked until the requested locks are released. There are at least two types of locks: write lock and read lock. Write lock is an exclusive lock and read lock can be a shared lock. The rules of lock blocking is usually presented by lock compatibility table [24].

System with lock based protocol may use a global lock manager to grant and release locks. To improve the scalability, de-centralized lock manager has been proposed that co-locate the lock table with the raw data.

Two-phase locking(2PL) [5, 10] is a widely used locking protocol. In the growing phase, a transaction first acquires locks without releasing any. During the shrinking phase, it can only release locks without acquiring any locks. In a multi-programmed environment, lock based protocols have to deal with deadlocks, and transactions may be aborted when a deadlock cannot be prevented. Overall system performance is affected by transaction blocking, deadlock detection and resolving.

2.2 Timestamp Based Protocol

Timestamp based protocols [2, 4] assigns a global timestamp before processing. By ordering the timestamp, the execution order of transactions is determined. When multiple transactions attempt to access the same data, the transaction with smaller timestamp should be executed first. As shown in Figure 1, if conflicts exist during execution, the transaction will be aborted and restarted.

Optimistic Concurrency Control (OCC) [17] and Multi-Version Concurrency Control (MVCC) [3] are two widely used timestamp based protocols. OCC assumes low data contention where conflicts are rare. Transactions can complete without blocking. However, before a transaction commits, a validation is performed to check if there is any conflict. If conflicts exist, the transaction will be aborted and restarted. MVCC maintains multiple versions of each data object and is more efficient for read operations. The read operations can access the data of an appropriate version without being blocked by other write operations. A periodic garbage collection is required to free inactive data.

Timestamp based protocols perform poorly on workloads with high contention, due to their high abort rate. Aborts not only consume computing resources, but also additional work needs to be performed to undo the aborted transactions. Moreover, these kinds of protocols usually requires a centralized manager to assign unique timestamp to transactions. This limits system scalability.

3 Dependency Graph based Concurrency Control

In this section, we present the Dependency Graph based Concurrency Control (DGCC) protocol.

Typically, arriving transactions cannot be processed by the system immediately. They will first wait in a transaction queue. Unlike the worker thread in the lock and timestamp based concurrency control protocols which processes the transactions one by one, DGCC grabs a batch of transactions from the transaction queue to process. The batch size depends on the number of transactions in the transaction queue and the pre-defined maximal batch size. There are two separate phases: Dependency Graph Construction and Dependency Graph Execution. Multi-threading is used in both phases for maximal parallelism. More importantly, no locks are required in the whole process. Neither are there any aborts due to conflicts.

a set of transaction
dependency graph
a schedule of transaction execution
conflict graph of s
a transaction with time stamp
the set of pieces of transaction
the piece of transaction
readset() the read record set of
writeset() the write record set of
accessset() readset() writeset()
one record stored in database
latest write transaction piece on
the dominating set of
timestamp ordering dependency
logic dependency
Table 1: Notations

Table 1 summarizes the notations used in this section

3.1 Chopping Transactions in DGCC

Conventional concurrency control protocols process a single transaction sequentially with no concurrent processing within a transaction. DGCC chops a transaction into a set of smaller transaction pieces according to its type and internal logics. Transactions in OLTP applications are often repetitive and store-procedures are widely used in current systems. A transaction piece consists of a set of store-procedures that operates on some records in the database. Each piece is represented as a vertex in our dependency graph. It is the unit in both the dependency graph construction and dependency graph execution. Transaction pieces may be partially ordered. We define the partial-order between two transaction pieces as a logic dependency in the following subsection.

3.2 Dependency Graph Construction

Figure 2: Dependency Graph Construction

During dependency graph construction, one batch of transactions is divided into several disjoint sets of transactions. A worker thread will construct a dependency graph from a set of transactions . Each transaction is associated with a timestamp . Transactions in a given set are processed ordered by their timestamps. Each transaction, , is further divided into a set of transaction pieces. = {}. We define two types of dependency relations on the pieces: logic dependency relation and timestamp ordering dependency relation . We first define the logic dependency relation :

Definition 1 (Logic Dependency)

Transaction piece logically depends on , denoted as , if and only if and is executed after .

From the above definition, we can see that represents the logical execution order of the pieces within one transaction. Apart from the logic dependency relation, we also need to resolve the execution order of pieces from different transactions, which is defined by timestamp ordering dependency relation . For a transaction piece , and are used to represent the set of records written to and read, respectively. The access set is readset() writeset().

Definition 2 (Timestamp Ordering Dependency)

A timestamp ordering dependency exists if and only if and ( or ).

Definition 3 (Dependency Graph)

Given a set of transactions , and the associated sets of transaction pieces , the dependency graph consists of

  • , and

  • such that , , and or }.

It is not efficient to analyze with every piece in when we add into . Furthermore, explicitly recording all timestamp ordering dependency edges between all the transaction pieces will result in a lot of edges. So during dependency graph construction, we maintain the dominating set for each record that is accessed in . Here we define the latest write transaction piece on as:

Definition 4 (Latest Write Transaction Piece)

Then the dominating set is defined as follows:

Definition 5 (Dominating Set)

The dominating set contains only when there are no subsequent pieces accessing or it will contain all the operations that read after . Hence by maintaining the dominating set for each record , we only need to analyse with the transaction pieces in to add edges when we insert into .

Now, we can summarize the dependency graph construction algorithm for a set of transactions as Algorithm 1. We use the example in Figure 2 to illustrate the dependency graph construction process. There are three transactions , , . Our example begins after and have already been inserted into the dependency graph. The red directed edges represent logical dependency and green directed edges represent timestamp ordering dependency.

When is inserted into the dependency graph, it is divided into three pieces , and . For , we check the dominating set of record D add green directed edges from to and from to . For , we check the dominating set of record and add green directed edges from to . For , there is no dominating set of record and hence we just insert into with no edges connected to it. Apart from adding edges into , we update the dominating set according to the accessset of each piece.

for  in  do
       split as = {};
       for  in  do
             for  in accessset() do
                   if  =  then
                         add into and insert into ;
                         break;
                        
                   end if
                  if  contains only one piece that write on  then
                         add edge from point to representing ;
                         clear and insert into ;
                        
                   end if
                  else
                         if  read on  then
                               add edge from point to representing ;
                               insert into ;
                         end if
                        else
                               for  in  do
                                     add edge from point to representing ;
                                    
                               end for
                              clear and insert into ;
                              
                         end if
                        
                   end if
                  
             end for
            
       end for
      add edges based on dependency;
      
end for
Algorithm 1 construct the dependency graph for one transaction set
Figure 3: Dependency Graph Execution

3.3 Dependency Graph Execution

DGCC executes dependency graphs sequentially in a greedy manner. For a dependency graph , we iteratively select vertices with zero in-degree to execute and remove these vertices as well as their out-going edges from the graph. This process will repeat until there are no vertices left in . We outline the dependency graph execution in Algorithm 2. As Figure 3 shows, at the first round, we choose ,, and to execute and remove their out-going edges. We then iteratively select ,, to execute.

while true do
       select vertices with zero in-degree as {};
       for  in {do
             add ’s corresponding piece into thread pool;
            
       end for
      wait for thread pool have no more pieces to execute;
      
end while
Algorithm 2 execute one dependency graph

3.4 Correctness

We shall now prove that DGCC guarantees strict serializability.

Conflict Serializability

In the previous section, the dependency graph works as a schedule of . We can prove that the schedule, , is conflict-serializable based on Conflict Serializability Theorem[30]. In other words, we need to show that its conflict graph () is acyclic.

Definition 6 (Conflict Graph)

Let be a schedule. The Conflict Graph, () = (,) of , is defined by

As we only have two dependency relations and , the conflict relation in the conflict graph should be either or .

Firstly, let’s consider in . Based on its definition, if there is a directed edge from to . then the timestamp of the second piece, , must be greater than that of the first piece, i.e., . Now if () is cyclic, then we can always find a cycle with edges that , , , , where and . Obviously, this violates the initial condition, namely . In other words, if we only consider the dependency, () must be acyclic.

Next, we consider dependency. Based on its definition, will not lead to an edge in () because only exists between two pieces within the same transaction. So () is still acyclic.

Thus having considered the only two possible forms of dependencies, we can conclude that () must be acyclic and is a conflict-serializable schedule.

Strictness

In a dependency graph construction, we have resolved all the conflicts between transactions. Therefore in executing a dependency graph, there would not be any transaction abort caused by conflicts. Transactions can only be aborted due to updates violating the database’s schema constraints. For these, we add condition-variable-check transaction pieces. As an optimization, if there is more than one condition-variable-check transaction piece, we will combine them together. dependency relations are inserted between the other pieces in the transaction with the condition-variable-check piece. If the condition-variable-check piece aborts, no other pieces in the same transaction that have dependency relations with it will execute. As a consequence, no cascading aborts are possible during the execution of a dependency graph.

3.5 Differences With Transaction Chopping

Transaction chopping [27] is a method that divides transactions into pieces to execute with the aim of achieving better parallelism. It guarantees the serializability of transaction execution by performing static analysis on the relations between transaction pieces. This is known as SC-graph analysis. However a simple static chopping of transactions usually leads to multiple SC-cycles that have to be merged. Hence transaction pieces are still relatively large. DGCC analyzes the relationships between transaction pieces during runtime, yielding smaller transaction pieces. This finer granularity in DGCC, in general, yields more parallelism than transaction chopping. Furthermore, during the execution of the transaction pieces, transaction chopping still requires traditional concurrency control to resolve conflicts. This leads to possible abort and restart of transaction pieces. In DGCC’s dependency graph execution, no transaction pieces will abort due to conflicts.

Two transactions are shown in Figure 4, where transaction reads record and record while transaction writes record and record . Figure 4(a) shows how transaction chopping works with a SC-graph. SC-cycles in SC-graph should be merged. Finally, there is only one piece for transaction and one for transaction . On the contrary, as illustrated in Figure 4(b), DGCC can chop both and into two pieces, which means fine-grained chopping is acceptable in DGCC.

(a) Transaction Chopping by SC-graph
(b) Transaction Chopping in DGCC
Figure 4: Transaction Chopping

4 System Architecture

This section presents the architecture of the transaction processing system we have designed to support DGCC. The system architecture consists of three major components (shown in Figure 5), namely Execution Engine, Storage Manager, and Statistic Manager.

Figure 5: System Architecture

4.1 Execution Engine

Initiator

The execution engine is mainly responsible for managing transaction requests. It maintains a set of request queues, and each queue is handled by a dependency graph constructor. In some applications, transaction requests may have different priorities. The initiator will adjust the priority of each queue according to requirement, e.g., requests of higher priority will be inserted into the queue with higher priority. At the execution time, requests in the queue with a higher priority will be processed first. By default, a transaction’s priority is set according to its timestamp, i.e., a transaction with a smaller timestamp has a higher priority.

Dependency Graph Constructor

The constructor takes a batch of transactions from a queue and resolves their contentions by building a dependency graph. The batch size is the smaller of the number of transactions in the transaction queue and a pre-defined maximum batch size. When system is saturated, the batch size is equal to the maximum batch size. However, we cannot assume that the system is always saturated. After finishing one round of batch processing, the constructor will check the transaction queue. If the number of transactions waiting in the transaction queue is less than the pre-defined maximum batch size, all the available transactions will be processed in this batch. The batch size in our system can be adjusted dynamically to suit workloads of different request rates. This strategy ensures that the system will not wait indefinitely for sufficient number of transactions to arrive before processing them.

For each transaction in the batch, it first generates vertices according to the transaction’s type and its parameters. To avoid any contention, the dependency graph constructor uses one single thread to build each dependency graph. To better exploit parallelism in the CPU, several graphs can be constructed in parallel by different threads. Each graph construction is completely independent thereby eliminating any need for synchronization between the different threads. It is possible that there are still conflicts between the different dependency graphs. We resolve this kind of conflicts by processing the constructed dependency graphs sequentially at a time in the Graph Executor.

Graph Executor

After graph construction, the graph executor will execute the graphs according to their priorities. From the dependency graph, the executor iteratively extracts an executable vertex set consisting of vertices with no incoming edges. The update of these vertices does not depend on any other vertices. It follows that any two vertices in the executable vertex set have no contention. It is therefore safe to allow multiple worker threads to execute the vertices in the executable vertex set, and they can do so without requiring any coordinations. When all the vertices of one graph are processed, the transactions will commit and responses will be sent to their clients.

In our prototype, we implemented a fixed number of threads that will compete to work on either the graph construction or execution. During dependency graph execution, if the executable vertex set at each iteration is relatively small, the overhead of context switching and competition among the worker threads compared to the small amount of work will make multithreading unprofitable. As an optimization, if the size of the executable vertex set is small, we assign all the work to one worker thread instead of allowing all the worker threads to compete.

4.2 Recovery Manager

By maintaining all data in main memory, in-memory systems significantly reduce disk I/Os, and, consequently, achieves better throughput with lower latencies. However, for reliability, most in-memory systems flush transaction logs to disks and perform checkpointing periodically.

(a) 2PL
(b) OCC
(c) MVCC
(d) DGCC
Figure 6: Effect of Write Operations,=0.8

Transaction Logs

Before a transaction commits, the system will first generate its log records and flush them to the log files on disk. The recovery component has logger threads which are responsible for flushing the logs to disks. Traditionally, there are two kinds of logging strategies, ARIES logging [22] and Command logging [21].

In our system, transactions of one graph commit at the same time. Instead of generating log records for a single transaction, our system constructs log records for all the transactions in a batch simultaneously. Writing all these log records at the same time fully utilizes the disk I/O bandwidth, thereby improving the system’s overall performance.

Each vertex in the dependency graph has one log record consisting of the vertex’s function ID, parameters, and dependency information. This information is sufficient for the reconstruction of the dependency graph during recovery. Our logging scheme combines the advantages of both ARIES and command logging. No real data needs to be recorded in the log files, hence reducing the size of the logs. During recovery, we only need to replay the log records to reconstruct the dependency graphs and then execute the reconstructed graph.

Checkpointing

In order to recover our database within a bounded time, our system takes periodic checkpointing. Our recovery component maintains several checkpointing threads. The entire memory is divided up into sections and each checkpointing thread is responsible for one such section.

Even as the checkpointing threads are working, transactions continue to execute. However, those commits are not reflected in the checkpointing. This means our checkpointing is not a consistent snapshot of the database, and it needs to combine with the transaction logs.

To recover from a failure, our system first reloads the latest checkpoint and replays the transaction log records from that time point. It then reprocesses the committed transactions.

4.3 Storage Manager

The system’s storage manager is designed to maintain the whole data in the database. It interacts with the execution engine to retrieve/insert/update/delete the data. Both the B-tree index and hash index are supported.

DGCC guarantees the serializability and zero-conflict for write and read operations. However, insert and delete operations also requires the index to be correctly maintained. Algorithms [26, 20] that have been proposed to exploit more concurrency in indexing are orthogonal to our proposed concurrency control protocol. We can make use of any one of them together with DGCC to enhance the system’s overall performance.

Our system maintains all of its allocated memory space on its own to avoid frequent invocations of system calls (such as malloc). To eliminate the bottlenecks in the storage manager, the system divides up the pre-allocated memory space, and assigns a worker thread to each section to insert/delete its data. It usually has many insert/delete operations for OLTP applications. The memory usage efficiency should be taken into consideration. A garbage collection thread in the storage manager will be invoked periodically to collect inactive objects and compact the memory space.

4.4 Statistics Manager

As shown in Figure 5, our system has a statistics manager that collects runtime statistics information (such as real-time throughput, latency etc.). It also interacts with the other components to adjust the system configuration dynamically. For example, since our system processes transactions in batches due to DGCC, the size of the dependency graph affects both the throughput and latency. A larger batch size is better for supporting higher throughput, and a smaller batch size provides a faster response time. The maximal batch size can be adjusted accordingly based on the statistics and the requirements. Furthermore, using the statistics information collecting from the storage manager, the system decides when to invoke the garbage collection thread.

5 Experiments

In this section, we evaluate the effectiveness of DGCC, by comparing it with the following concurrency control protocols, which implemented in a multicore DMBMs [31].

  • 2PL - Two-Phase Locking with deadlock detection

  • OCC - Optimistic Concurrency Control,

  • MVCC - Multi-Version Concurrency Control

  • DGCC - Dependency Graph based Concurrency Control

In our evaluation, general optimizations for 2PL, OCC and MVCC are enabled to make a fair comparison. They are optimized with a customized memory allocation component to avoid malloc syscall. Moreover, instead of centralized lock tables, all of them support decentralized record-level lock tables.

All the experimental evaluations are conducted on a server with Intel Xeon 2.2 GHz 24-core CPU with 48 hyperthreading and 64GB RAM. It contains 4 NUMA nodes. To eliminate the effects of NUMA architecture, we run most experiments in one NUMA node with 6 cores. Each core has a private 32KB L1 cache, 256KB L2 cache and supports two hyper threads. The cores in the same NUMA node share a 12MB L3 cache.

We use two popular OLTP benchmarks, namely YCSB [6] and TPC-C [1]. YCSB is used to evaluate the performance of these concurrency control protocols under different contention rates caused by data access skewness. TPC-C is used to simulate a complete order-entry environment whose transaction scenario is much more complex than that of YCSB. The contention rate in TPC-C is controlled by the number of warehouses.

The main purpose of concurrency control protocols is to resolve contentions in a multi-programmed environment. There are three factors that typically dominate the intensity of the contentions. The first one is the ratio of write operations in the workload. The second is the data access skewness, in particularly, frequently accessed data encounter contention more easily. Another factor is the number of concurrent worker threads. The higher the number of parallel transactions, the larger is the probability of contention.

In the following experiments, we evaluate the performance of DGCC with respect to all these factors using the two benchmarks. The parameters we used in the experiments are listed in Table 2, with default setting underlined.

Parameter Description Range
YCSB Zipfian parameter
0.0, 0.5, 0.6,0.7,0.8
YCSB read/write ratio
4,1,0.25
worker thread number
1,2,3,4,5,6,7,8
maximal batch size
100,300,500,800,1000,
5000,10000,20000
Table 2: Parameter Ranges for Evaluations
(a) YCSB Low Contention,=0.5 (b) YCSB Median Contention,=0.7 (c) YCSB High Contention,=0.8
Figure 7: Throughput for the YCSB workload,=1
(a) TPC-C (b) NewOrder (c) Payment
Figure 8: Throughput for the TPC-C workload

5.1 Read vs Write Intensive Workloads

Since read-only transaction pieces will not generate any contentions, we used the YCSB benchmark that has both read and write pieces.

Figure 6 shows the performance of different concurrency control protocols on three workloads of different read/write ratios. All protocols perform better on workload of more read pieces. As the write ratio increases in the workload, the performance of 2PL, OCC and MVCC drops dramatically. Since more write pieces translate to higher probabilities of contention, 2PL, OCC and MVCC need to spend a lot of time resolving contentions. DGCC is significantly more resilient to this increase. There is little difference between reads or writes at the dependency graph construction phase. The performance reduction in DGCC is due to the fact that write pieces usually take more time than read pieces.

5.2 Scalability

In Figure 8, we test the performance of the four concurrency control protocols under different contention rates. Contention rate is controlled by setting the parameter in YCSB’s Zipfian distribution. The read/write ratio in the experiments are fixed to 1.

In summary, DGCC shows the best performance under different contention rates. The benefits come mainly from the separation of contention resolution and execution. By resolving contentions in advance, no worker thread is blocked during the execution and the acyclicity of the dependency graph avoids the aborts caused by contention.

It is notable that in Figure 7(a), 2PL has a comparable performance with DGCC. In this experiment, =0.5 in Zipfian distribution results in the lowest contention rate. Under this scenario, 2PL has little overhead because it does not waste time on acquiring locks. Further, deadlocks rarely occur because the probability that more than one transaction competing for the same data is low.

The reason for the drop in DGCC’s performance when the thread count is increased (7 and 8 in our experiment) are two fold. Firstly, our experiments ran on 6 cores. When more than 6 threads are running concurrently, the overhead of context switch becomes significant. Secondly, the increase in thread count will inevitably result in more contention, resulting in higher overhead to resolve contention for all four protocols.

OCC and MVCC are timestamp based protocols. When the data access distribution is not very skewed, they scale well with the number of worker threads. However, compared with 2PL and DGCC, timestamp based protocols have to spend time in assigning the timestamp to each transaction. In order to guarantee the correct serial order, such systems usually use a centralized component to perform the assignment and this easily contributes to the performance bottleneck. Moreover, at the commit time, OCC and MVCC must validate that the execution is serialized according to the assigned timestamps. Whether or not a transaction aborts, transactions accessing the same data have to be validated one by one.

The main cost of OCC and MVCC comes from processing abort when the contention is high. Unlike aborts in lock based protocols, aborts at the commit phase not only cost the usual processing time but also require extra effort in eliminating the effects of those aborted transactions in the database.

Figure 8 shows the evaluation of the four protocols using the TPC-C benchmark. The contention rate in TPC-C is usually controlled by the number of warehouses. In this experiment, we set the number of warehouse to 1 so as to create enough contention. There are five types of transactions in TPC-C: New-Order, Payment, Delivery, Order-Status and Stock-Level. New-Order and Payment are the most frequent transactions, accounting for almost 90% of the whole benchmark. Therefore, in addition to the entire benchmark, we also compare performance using only these two transactions separately.

Figure 10(b) shows the results when only New-Order is considered. Each New-Order transaction, on average, comprises of ten different items. These items’ information have to be read and the related stock information need to be updated. Which item gets accessed is entirely random, and this leads to a relatively low level of contention. Results shown in Figure 8(b) are within the expectation. Although DGCC still achieves the best performance, 2PL comes in a close second.

Figure 8(c) shows the situation when only Payment transactions are involved. Each Payment transaction tries to record a payment from a customer, and it needs to update the warehouse. Those transaction pieces have to be done serially, thereby severely restricting the inherent parallelism. Further more, the longer serial execution logic needs more iterations in the DGCC’s execution phase. This translates to a higher overhead in areas such as work dispatch and worker thread scheduling, and affects the scalability of DGCC as a result.

Figure 8(a) shows the results on the complete TPC-C workload. Other transactions amortized the effects of payment transaction, DGCC has a more balanced workload at each iteration, making it more scalable. However, the high contention caused by Payment transactions is still the bottleneck for the other protocols.

(a) YCSB Low Contention,=0.5 (b) YCSB Median Contention,=0.7 (c) YCSB High Contention,=0.8
Figure 9: Average Latency for the YCSB workload, =1
(a) TPC-C (b) NewOrder (c) Payment
Figure 10: Average Latency for the TPC-C workload

5.3 Data Access Distribution

In reality, OLTP applications tend to access certain data more frequently. For example, in an online shopping scenario, popular items are accessed more frequently than others. The distribution of data accesses has a significant impact on the level of contention. YCSB assumes that data accesses follow a Zipfian distribution whose parameter controls the skewness. For a given number of working threads, a larger translates to a higher contention.

Figure 11: Effects of Access Distribution on YCSB,

Figure 11 shows the impact of on the performance of the four protocols. When is small, data accesses are more likely to be uniformly distributed. This is the ideal case for all the protocols.

As increases, the data access distribution becomes more skewed, resulting in higher contention and lower performance. Yet, compared to 2PL, OCC as well as MVCC, DGCC is significantly less sensitive to increased contention. Higher contention may increase the depth of the dependency graph, and as a result more iterations are required at the execution phase. With increasing contention, the concurrently executable work at each iterations tends to decrease. However, compared to the other protocols, the overhead incurred by DGCC is lower, making DGCC more robust to data access skewness.

5.4 Latency

In this section, we shall evaluate the latency using the OLTP workloads. The system maintains a transaction queue to buffer the transactions that have arrived. The size of the queue affects the average latency of the system. It also restricts the number of transactions in dependency graphs. In the experiments, for each worker thread, we set the default size of the transaction queue to .

Figure 10 and 10 show the latency of four protocols under different workloads. The average latency of OCC and MVCC increases when there is more contention. They require more time to perform validation at the commit phase. Furthermore, the latency of timestamp based protocols is also affected by the centralized timestamp assignment. When there is more contention, 2PL spent much time waiting for locks, leading to an increase in latency.

In both Figure 10 and 10, the latency of DGCC is comparable with the others. Although DGCC is a protocol that has a batch processing front end, the waiting time of one transaction in the transaction queue is much less than the other protocols. So the latency of DGCC is actually smaller. When we take transaction logs into consideration, DGCC commits a group of transactions at the same time and the log size is much smaller than traditional ARIES log. As a result, it invokes less syscalls to flush the logs to disks, and consequently, can make better use of the I/O bandwidth. Overall, this resulted in lower latency compared to the others, thus confirming the efficiency of DGCC.

5.5 Effects of Batch Size

DGCC first constructs a dependency graph for a batch of transactions. The batch size is constrained by the number of transactions in transaction queue and our pre-defined maximal batch size . In practice, the batch size changes dynamically. In particular, when there are more transactions waiting in the transaction queue, a larger batch size is used.

Figure 12(a) shows the effects of the batch size on TPC-C workload. When the number of worker threads is fixed, the throughput of the system increases with the batch size. The increase stops when the computation resource is fully stretched. From such a point onwards, a larger dependency graph leads to higher latency.

When there are more worker threads, it always needs a larger size to fully exploit their computation potential.

(a) Throughput
(b) Latency
Figure 12: Effects of Dependency Graph Size on Throughput and Latency

6 Related Work

Systems with lock based protocols typically require a lock manager, in which lock tables are maintained to grant and release locks. The data structure in lock manger is usually very large and complicated, which incur both storing and processing overheads.

Lightweight Intent Lock(LIL) [16] was proposed to maintain a set of lightweight counters in a global lock table instead of lock queues for intent locks. It simplifies the data structure of intent locks. However, the transactions that cannot obtain all the locks have to be blocked until receiving a release message from other transactions.

In order to reduce the cost of a global lock manager,  [11, 18] propose to keep lock states with each data record. However, this idea requires each record to maintain a lock queue, and hence increases the burden of record management. By compressing all the lock states at one record into a pair of integers,  [25] simplifies the data structure to some extent. However, it achieves this by dividing the database into disjoint partitions, which sacrifice its performance and scalability for workload of high contention.

Several in-memory database prototypes that emphasize on scalability in multi-core systems have been proposed recently.  [31] implemented an in-memory database prototype and evaluated the scalability of seven concurrency control methods. While the reasons differ, the overall result is that none of the methods can scale beyond 1024 cores. For lock based methods, lock thrashing and deadlock avoidance are the main bottlenecks. For time-stamp based methods, the main issues are the high abort ratio and the need for a centralized time-stamp allocation.

[14, 23, 15] assume that data in an in-memory database is partitioned, so as to remove the need for concurrency control.  [14] proposes H-STORE, a partitioned database architecture for in-memory database. Only one thread in each partition is responsible for processing transactions, and there is no need for concurrency control within a partition. DORA [23] is similar to H-STORE, in that it uses a data partitioning strategy and sends queries to different partition’s worker for processing. However, unlike H-STORE, it is able to support concurrency execution of queries in a partition to a certain extent. Both systems cannot scale well for skewed workload and multi-partition transactions.

Hekaton [8], the main memory component of SQL server, employs lock-free data structures and OCC-based MVCC protocol to avoid applying writes until the commit time. However, the centralized timestamp allocation remains the bottleneck, and the read operations become more expensive, since each read needs to update other transaction’s dependency set.

[28] presented Silo, an in-memory OLTP database prototype optimized for multi-core systems. Silo supports a variant of OCC method which employs batch timestamp allocation to alleviate the performance loss. However, workloads with high contention still affect its performance and scalability.

Transactional memory [13, 9] has been shown to provide scalability with less programming complexity. Hence, it attracts much attention.  [19, 29] exploit hardware transaction memory. by chopping up transactions into small operations in order to fit them in hardware transaction memory. They also adopted timestamp based protocols to ensure the serializability.

7 Conclusion

In this paper, we proposed DGCC, a new dependency graph based concurrency control protocol. DGCC separates concurrency control from execution by building dependency graphs for batches of transactions in order to resolve contention before execution. We showed that DGCC can better exploit modern multicore hardware by having higher parallelism. DGCC also removes the need of centralized control components thereby giving better scalability. A prototype DGCC-based OLTP system has been built that also seamlessly integrated an efficient recovery mechanism. Our extensive experimental study on YCSB and TPC-C shows that DGCC achieves a throughput that is four times higher than the classical concurrency control protocols for workloads with high contention.

References

  1. TPC-C. http://www.tpc.org/tpcc/.
  2. P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys (CSUR), 13(2):185–221, 1981.
  3. P. A. Bernstein and N. Goodman. Multiversion concurrency control—theory and algorithms. ACM Transactions on Database Systems (TODS), 8(4):465–483, 1983.
  4. P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems, volume 370. Addison-wesley New York, 1987.
  5. P. A. Bernstein, D. W. Shipman, and W. S. Wong. Formal aspects of serializability in database concurrency control. Software Engineering, IEEE Transactions on, (3):203–216, 1979.
  6. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143–154. ACM, 2010.
  7. D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems, volume 14. ACM, 1984.
  8. C. Diaconu, C. Freedman, E. Ismert, P.-A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling. Hekaton: Sql server’s memory-optimized oltp engine. In Proceedings of the 2013 international conference on Management of data, pages 1243–1254. ACM, 2013.
  9. D. Dice, O. Shalev, and N. Shavit. Transactional locking ii. In Distributed Computing, pages 194–208. Springer, 2006.
  10. K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger. The notions of consistency and predicate locks in a database system. Communications of the ACM, 19(11):624–633, 1976.
  11. V. Gottemukkala and T. J. Lehman. Locking and latching in a memory-resident database system. In VLDB, pages 533–544, 1992.
  12. T. Haerder and A. Reuter. Principles of transaction-oriented database recovery. ACM Computing Surveys (CSUR), 15(4):287–317, 1983.
  13. M. Herlihy and J. E. B. Moss. Transactional memory: Architectural support for lock-free data structures, volume 21. ACM, 1993.
  14. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: a high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1(2):1496–1499, 2008.
  15. A. Kemper and T. Neumann. Hyper: A hybrid oltp&olap main memory database system based on virtual memory snapshots. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on, pages 195–206. IEEE, 2011.
  16. H. Kimura, G. Graefe, and H. A. Kuno. Efficient locking techniques for databases on modern hardware. In ADMS@ VLDB, pages 1–12, 2012.
  17. H.-T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.
  18. T. J. Lehman and V. Gottemukkala. The design and performance evaluation of a lock manager for a memory-resident database system., 1996.
  19. V. Leis, A. Kemper, and T. Neumann. Exploiting hardware transactional memory in main-memory databases. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 580–591. IEEE, 2014.
  20. J. J. Levandoski, D. B. Lomet, and S. Sengupta. The bw-tree: A b-tree for new hardware platforms. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 302–313. IEEE, 2013.
  21. N. Malviya, A. Weisberg, S. Madden, and M. Stonebraker. Rethinking main memory oltp recovery. In Data Engineering (ICDE), 2014 IEEE 30th International Conference on, pages 604–615. IEEE, 2014.
  22. C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. Aries: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems (TODS), 17(1):94–162, 1992.
  23. I. Pandis, R. Johnson, N. Hardavellas, and A. Ailamaki. Data-oriented transaction execution. Proceedings of the VLDB Endowment, 3(1-2):928–939, 2010.
  24. R. Ramakrishnan, J. Gehrke, and J. Gehrke. Database management systems, volume 3. McGraw-Hill New York, 2003.
  25. K. Ren, A. Thomson, and D. J. Abadi. Lightweight locking for main memory database systems. Proceedings of the VLDB Endowment, 6(2):145–156, 2012.
  26. J. Sewall, J. Chhugani, C. Kim, N. Satish, and P. Dubey. Palm: Parallel architecture-friendly latch-free modifications to b+ trees on many-core processors. Proc. VLDB Endowment, 4(11):795–806, 2011.
  27. D. Shasha, F. Llirbat, E. Simon, and P. Valduriez. Transaction chopping: Algorithms and performance studies. ACM Transactions on Database Systems (TODS), 20(3):325–363, 1995.
  28. S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 18–32. ACM, 2013.
  29. Z. Wang, H. Qian, J. Li, and H. Chen. Using restricted transactional memory to build a scalable in-memory database. In Proceedings of the Ninth European Conference on Computer Systems, page 26. ACM, 2014.
  30. G. Weikum and G. Vossen. Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery. Elsevier, 2001.
  31. X. Yu, G. Bezerra, A. Pavlo, S. Devadas, and M. Stonebraker. Staring into the abyss: An evaluation of concurrency control with one thousand cores. Proceedings of the VLDB Endowment, 8(3), 2014.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
223965
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description